https://openreview.net/pdf?id=HkglHcSj2N

Goal-conditioned policy (Kaelbling, 1993; Schaul et al., 2015)

Screenshot 2024-07-31 at 8.56.33 am.png

Method

  1. Relabelling trajectories
    1. Relabel to consider transitions (s, a, s’, g), we can consider the transitions $(s, a, s’, g = s_{t + k})$