auto-improvement.github.io

Screenshot 2024-07-31 at 10.16.34 am.png

  1. VLM generates tasks
  2. Uses Image-Editing diffusion model to generate images of subgoals
  3. Goal conditioned Robot Policy
    1. obs: 256x256 RGB images
    2. action space: delta eef control 5HZ
  4. VLM for success detection

Inspo for autonomous-izing amzn task: perhaps eliminate the whole move and pick process and just use the wrist camera to fondle around in bin?

It will be much impressive if it is a fully autonomous industrial system

Foundational model takeaway