Clione

BridgeData V2

Unverified
  • December 13, 2025, 03:47 AM
  • December 13, 2025, 03:38 AM
  • October 24, 2025, 09:36 PM
Last updated
Unknown
Release date
January 17, 2024
Size
60096 samples | 441.0 GB
License
CC BY 4.0
Tags
robot learning
imitation learning
reinforcement learning
multi-task
multi-environment
object manipulation
environment manipulation
generalizable learning

BridgeData V2 is a diverse, large-scale robotic manipulation dataset containing 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. Of these trajectories, 50,365 are teleoperated demonstrations across 13 skills and 9,731 are rollouts from a scripted, heavily randomized pick-and-place policy (intended to boost the robustness of the object repositioning skill). The dataset is compatible with open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions.

To support broad generalization, data was collected for a wide range of tasks in many environments with varying objects, camera poses, and workspace positions. Each trajectory is labeled with a natural language instruction corresponding to the task the robot is performing.

The authors provide the teleopearated demonstration data and the data from the scripted pick-and-place policy as separate zip files. They also provide example model training code and pre-trained weights on their website.

All of the data was collected on a WidowX 250 6DOF robot arm. They collect demonstrations by teleoperating the robot with a VR controller. The control frequency is 5 Hz and the average trajectory length is 38 timesteps. For sensing, they use an RGBD camera that is fixed in an over-the-shoulder view, two RGB cameras with poses that are randomized during data collection, and an RGB camera attached to the robot's wrist. The images are saved at a 640x480 resolution.

Data collection is credited to Abraham Lee, Mia Galatis, Caroline Johnson, Christian Aviña, Samantha Huang, and Nicholas Lofrese. Microsoft Research assisted in labeling parts of the data with language. Research was supported by the TPU Research Cloud and partly supported by ONR N00014-20-1-2383 and NSF IIS-2150826.

BridgeData V2

Modality
trajectory
Format
JPEG
Annotation
Content Task description
Type Natural language
Language English
Annotators Crowdsourced
Quality control None
Source
Author
Homer Walke
Kevin Black
Frederik Ebert
Aviral Kumar
Anikait Singh
Yanlai Yang
Patrick Yin
Gengchen Yan
Kuan Fang
Ashvin Nair
Tony Zhao
Quan Vuong
Chongyi Zheng
Philippe Hansen-Estruch
Andre He
Vivek Myers
Moo Jin Kim
Max Du
Karl Schmeckpeper
Bernadette Bucher
Georgios Georgakis
Kostas Daniilidis
Chelsea Finn
Sergey Levine
Institution
University of California, Berkeley
Stanford University
Google DeepMind
Carnegie Mellon University
Contact
homer_walke@berkeley.edu

Citation

@inproceedings{walke2023bridgedata,
  author = {Walke, Homer and Black, Kevin and Lee, Abraham and Kim, Moo Jin and Du, Max and Zheng, Chongyi and Zhao, Tony and Hansen-Estruch, Philippe and Vuong, Quan and He, Andre and Myers, Vivek and Fang, Kuan and Finn, Chelsea and Levine, Sergey},
  booktitle = {Conference on Robot Learning (CoRL)},
  title = {BridgeData V2: A Dataset for Robot Learning at Scale},
  year = {2023}
}

Example usage

Similar datasets



Clione is an open repository for transparent dataset sourcing, supporting responsible research in robotics and machine learning.
Our mission is to make finding and understanding datasets easy and intutive.

About FAQs Contact