Step 1: (Learn Dynamics from experience)

Observation to Encoder → latent representation → predict next action

Last latent state + new observation → new latent space

Step 2: (Learn behavior in imagination)

Observation to Encoder → latent representation → predict next action

Last latent state + new observation → new latent space

Using learned behavior to act in the environment

Untitled

p is for distributions that generate samples in the real environment

q for their approximations that enable latent imagination

First model: Dynamics

Representation: $p_\theta(s_t|s_{t-1}, a_{t-1}, o_t)$

Learns the current state given the current observation, previous state and action

Transition: $q_\theta(s_t|s_{t-1}, a_{t-1})$

Learns the next state without observation

Reward: $q_\theta(r_t|s_{t})$

Learns the reward given state

Second model: Behavior

Action: $q_\phi(a_t|s_t)$

Learns the action from imagined trajectories/states

Third model: Value

$v_\psi(s_t)$

Estimates the expected imagined rewards that the action model achieves from each state

Untitled