Untitled

In model-free RL,

Optimal action based on learned policy

we do not attempt to learn the system dynamics and just let take in the observation from sensors. We only learn $\pi_\theta$, the optimal policy in order to get the best actions. We assume that the system dynamics will be implicitly learned.

In model-based learning,

Optimal action based on known/unknown (and learned) system dynamics

Untitled

Generic Monte Carlo Tree Search

$s_{next}$ = TreePolicy ($s_1$), where $s_{next}$ is the next leaf

Evaluate value of $s_{next}$ with DefaultPolicy($s_{next}$)

Update all values in tree between $s_1$ and $s_{next}$

Screenshot 2023-09-08 at 3.57.00 PM.png

LQR

RL — LQR & iLQR Linear Quadratic Regulator

LQR takes the second derivative of the cost function

Untitled

Screenshot 2023-09-08 at 5.33.27 PM.png

  1. backward pass to optimize the cost (Q) at each timestep, until we find reach the initial state

    We find the optimal action

    Untitled

  2. We will find the initial optimal action to take based on the calculated matricies by forward propagation until target state (see right figure)

    We find the next state

    Untitled

Untitled