Below are the abstracts and links to the papers:
Best System Paper Award
FlingBot: The Unreasonable Effectiveness of Dynamic Manipulation for Cloth Unfolding
Huy Ha Columbia University and Shuran Song Columbia University
High-velocity dynamic actions (e.g., fling or throw) play a crucial role in our everyday interaction with deformable objects by improving our efficiency and effectively expanding our physical reach range. Yet, most prior works have tackled cloth manipulation using exclusively single-arm quasi-static actions, which requires a large number of interactions for challenging initial cloth configurations and strictly limits the maximum cloth size by the robot’s reach range. In this work, we demonstrate the effectiveness of dynamic flinging actions for cloth unfolding with our proposed self-supervised learning framework, FlingBot. Our approach learns how to unfold a piece of fabric from arbitrary initial configurations using a pick, stretch, and fling primitive for a dual-arm setup from visual observations. The final system achieves over 80% coverage within 3 actions on novel cloths, can unfold cloths larger than the system’s reach range, and generalizes to T-shirts despite being trained on only rectangular cloths. We also finetuned FlingBot on a real-world dual-arm robot platform, where it increased the cloth coverage over 4 times more than the quasi-static baseline did. The simplicity of FlingBot combined with its superior performance over quasi-static baselines demonstrates the effectiveness of dynamic actions for deformable object manipulation.
Toward Robots That Learn To Summarize Their Actions In Natural Language: A Set Of Tasks
Chad DeChant Columbia University and Daniel Bauer Columbia University
Robots should be able to report in natural language what they have done. They should provide concise summaries, respond to questions about them, and be able to learn from the natural language responses they receive to their summaries. We propose that developing the capabilities for robots to summarize their actions is a new and necessary challenge that should be taken up by the robotic learning community. We propose an initial framework for robot action summarization, presented as a set of tasks that can serve as a target for research and a measure of progress.
The Boombox: Visual Reconstruction from Acoustic Vibrations
Boyuan Chen Columbia University, Mia Chiquier Columbia University, Hod Lipson Columbia University, and Carl Vondrick Columbia University
Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical.
While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and their containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.
A Columbia Engineering robot has learned to predict its partner robot’s future actions and goals based on just a few initial video frames. The study is part of a broader effort to endow robots with the ability to understand and anticipate the goals of other robots, purely from visual observations.