We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. We're seeking a Senior Member of the Core Data Team/ Principal Scientist to lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models.
Requirements
- Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets
- Develop frameworks to assess data diversity, duplication, and informativeness
- Collaborate with model training teams to identify data bottlenecks and optimize dataset performance
- Provide leadership on data quality strategy and shape internal best practices
- Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance
- Contribute to research and development of tools that automate data preprocessing and validation
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance