We are looking for a Data Engineer to help create large-scale datasets that power the next generation of generative models.
Requirements
- Develop and maintain scalable infrastructure for large-scale image and video data acquisition
- Manage and coordinate data transfers from various licensing partners
- Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation
- Implement scalable and efficient tools to visualize, cluster, and deeply understand the data
- Optimize and parallelize data processing workflows to handle billion-scale datasets efficiently
- Ensure data quality, diversity, and proper annotation (including captioning) for training readiness
- Getting training data from alternative sources such as user preferences into trainable format
- Work closely in the model development loop to update data as necessitated by the training trajectory
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan