Related works:
CLIPort: What and Where Pathways for Robotic Manipulation
Main Source:
Grounding: Grounding an instruction means understanding what to manipulate based on language
Scene semantics: Which object
Object semantics: What part of the object
6 DOF reasoning + precision
CLIPort has keypoint-based grounding, only simple planar manipulation
RobotMoo VLM (vision language model) grounding: Only pick and place bc only scene semantics, and less on object semantics
BC-Z/SayCan excessive data collection
PerAct: voxelized - discretization
Words
Language-Based Manipulation:
Skill-based Manipulation: