Scaling Up and Distilling Down

Language-Guided Robot Skill Acquisition

Huy Ha1, Pete Florence2, Shuran Song1,3
1Columbia University, 2Google DeepMind, 3Stanford University

www.cs.columbia.edu/~huy/scalingup/

Skill Learning

Skill Learning

a large set of reusable and robust skills

plan

novel scenarios and
new tasks
How can we scalably
acquire robot skills?

Behavior Cloning

BC-Z , Jang et al, CoRL 2021.

Diffusion Policy , Chi et al, RSS 2023.

✅ reliably produces robust robot skills

Behavior Cloning

✅ reliably produces robust robot skills

❌ reliance on human demonstration collection

Scalable
Skill
Learning

Reinforcement Learning

✅ automatic data collection and policy learning

Reinforcement Learning

✅ automatic data collection and policy learning

❌ exploration in sparse reward and long-horizon

Scalable
Skill
Learning

How can we get a
Unified Framework
for
Scalable Skill Learning?

Language-guided Robot Data Generation
&
Language-conditioned Robot Policy Learning

Scale up Language-labelled Robot Data Generation

Scale up Language-labelled Robot Data Generation

✅ rich manipulation skills

✅ flexibility to novel tasks & domains

sampling-based planners generates diverse robot behavior

sampling-based planners generates diverse robot behavior

(succeeds some of the times)

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

def bus_is_balanced_on_the_block(state) -> bool:
    return is_on_top_of("bus", "block")

Verify-and-Retry

✅ Increases success rate

✅ Demonstrates retrying behavior

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Distill Down to a Language-conditioned Visuo-motor Policy

Evaluation

Evaluation

Evaluation

Evaluation

High Entropy, Precise Actions


Distilled Common Sense


Data Generation Results

Data Generation Results

Data Generation Results

Data Generation Results

Distilled Retrying Behavior

Distilled Policy Results

High Distilled Performance from Diverse Attempts

Verify & Retry
🚀 High data generation success rate
🔄 Retry after failure
+
Diverse Attempts
🎲 Sampling-based planners
🧠 Diffusion Policy

Real-world Deployment With No Fine-tuning

Real-world Deployment With No Fine-tuning

A Framework for Scalable Skill Learning

A Framework for Scalable Skill Learning

A Framework for Scalable Skill Learning

A Framework for Scalable Skill Learning

The Language-guided Automation Recipe

The Language-guided Automation Recipe

Language Models
✅ high-level flexibility

guide

External Tools
✅ low-level heavy-lifting

Google Bard , Google Blog

OpenAI GPT-4 , Unite.AI

The Language-guided Automation Recipe

A Robot Skill Learning Workflow

A Robot Skill Learning Workflow

A Robot Skill Learning Workflow

Language-guided Robot Learning

Language to Rewards for Robotic Skill Synthesis , Yu et al

Language-guided Robot Learning

TidyBot , Wu et al, IROS 2023

Language-guided Robot Learning

Infinigen , Raistrick et al, CVPR 2023

How we can put robotics on the same scaling trends as large language models while not compromising on robust manipulation and control?

Future Work

Limitations

  1. Sim2Real. Online adaptation to novel visual & physical domains.
  2. Trajectory Generation. More embodiments and tasks.
  3. Asset/Environment Design. Procedural & learned approaches.

Opportunities

  1. Policy Scalability. A trade-off between inductive biases and expressivity.
  2. Data Investigation. Fix the algorithm, control data generation.

Scaling Up and Distilling Down

Language-Guided Robot Skill Acquisition

1Columbia University, 2Google Research

www.cs.columbia.edu/~huy/scalingup/

Domain Randomization

Full Sim Results

Policy Ablations

Language Model Ablations

Error Analysis