Step 1: (Learn Dynamics from experience)
Observation to Encoder → latent representation → predict next action
Last latent state + new observation → new latent space
Step 2: (Learn behavior in imagination)
Observation to Encoder → latent representation → predict next action
Last latent state + new observation → new latent space
Using learned behavior to act in the environment
p is for distributions that generate samples in the real environment
q for their approximations that enable latent imagination
First model: Dynamics
Representation: $p_\theta(s_t|s_{t-1}, a_{t-1}, o_t)$
Learns the current state given the current observation, previous state and action
Transition: $q_\theta(s_t|s_{t-1}, a_{t-1})$
Learns the next state without observation
Reward: $q_\theta(r_t|s_{t})$
Learns the reward given state
Second model: Behavior
Action: $q_\phi(a_t|s_t)$
Learns the action from imagined trajectories/states
Third model: Value
$v_\psi(s_t)$
Estimates the expected imagined rewards that the action model achieves from each state