RoboSet is a large-scale multi-task dataset collected across a range of everyday household activities in kitchen scenes. It consists of multiple datasets, including human-collected data and expert datasets acquired via trained policies. RoboSet was designed to facilitate pre-training and offline learning research with RoboHive, a unifed framework for robot learning introduced by the authors.
The real-world human-collected data contains a mix of kinesthetic demonstrations and teleoperated demonstrations. The data consists of semantic activities (defined by a natural language instruction, each activity consists of 4-6 tasks), with four different camera views per frame and variations in the scene for every demonstration. Each task involves executing a distinct skill on an object in the scene. There are a total of 12 skills, represented across 38 tasks. The dataset is structured in the form of trajectories, capturing essential information at each time step. These trajectories comprise observations, actions, rewards, RGB visuals from multiple camera views, and other relevant environmental information. Kinesthetic demonstrated data was collected by playing back a demo trajectory with a new scene obtained by re-arranging objects, for every rollout. The teleoperated data was collected using an Oculus Quest 2 controller. The teleoperator would use the controller to guide the robot to perform the task; teleoperation ensured that each rollout was unique. At least some of the human data is collected through human teleoperation using PUPPET. During the collection process, a human teleoperator uses an HTC Vive headset and controller to control the robot in an end effector space. The authors subsequently replay and parse the trajectories in each target environment to collect task-relevant information. The human trajectories in RoboSet are mostly successful. Overall there are 28,500 trajectories in the human datasets, out of which 9,500 are collected through teleoperation, and 19,000 are collected through kinesthetic playback.
For expert datasets, the authors use a trained task-specific NPG policy for the target task and roll out 25 trajectories in the environment each for three different trained agents. The expert dataset also contains failure trajectories.
Some of the data is adapted from other datasets, such as the human trajectories collected from the DAPG project. They replay the original trajectories into RoboHive’s corresponding environments. This enables them to reuse datasets from prior work while supporting information that wasn’t contained in the original dataset, such as RGB observations.
To utilize RoboSet effectively, users are expected to download the dataset of their task of interest, and then they can employ their preferred training method to train policies using this data. To evaluate the performance of these trained policies, users are encouraged to perform 25 rollouts in the environment. Lastly, users can compare their methods to the authors' baseline experiments.
@inproceedings{RoboHive,
title = {RoboHive -- A Unified Framework for Robot Learning},
author = {Vikash Kumar, Rutav Shah, Gaoyue Zhou, Vincent Moens, Vittorio Caggiano, Jay Vakil, Abhishek Gupta, Aravind Rajeswaran},
booktitle = {NeurIPS: Conference on Neural Information Processing Systems},
year = {2023},
url = {https://sites.google.com/view/robohive},
eprint = {https://arxiv.org/abs/2310.06828}
}