We're building a robot data engine that turns real-world experiences into structured training data for our foundation models. You will architect and maintain the data platform powering our robot learning stack, ensuring high-quality fleet data is captured, synchronized, labeled, and available for large-scale training.
Requirements
- Design and maintain ETL pipelines to collect, synchronize, and process data from distributed robot fleets.
- Implement intelligent triggers to capture the most informative episodes for learning.
- Develop multi-modal data storage and query systems for video, audio, proprioception, and action data.
- Automate annotation and labeling pipelines using AI-assisted tools.
- Integrate on-device logging with cloud pipelines for seamless dataset creation.
- Provide training-ready datasets to autonomy teams and monitor data quality at scale.