As a Senior Software Engineer in the ML Training Platform team, you will take ownership of major projects within our ML Training Platform—creating reliable, extensible solutions for data transformations, distributed model training, and rapid experimentation in production. Collaborate closely with ML Engineers, Platform & Infra engineers, and partner teams to ensure our platform supports high-volume, GPU-accelerated training in a fast-evolving environment.
Requirements
- 6+ years of industry experience in software engineering, with a deep understanding of distributed systems and data-intensive ML pipelines in production.
- Hands-On ML Platform/Infra Experience – Familiar with modern machine learning stacks (e.g., PyTorch, LightGBM, TensorFlow) and have built or maintained large-scale training environments.
- Strong CS fundamentals – Excel at crafting solutions that handle scale, complexity, and reliability challenges.
- Proven Project Ownership – Can break down complex initiatives, estimate accurately, and deliver major projects with minimal oversight.
- Collaboration & Communication – Adept at partnering across functions, setting expectations, and ensuring alignment among diverse stakeholders.
- Thrive on Continuous Improvement – Proactively identify gaps, reduce technical debt, and optimize resource usage, balancing cost and performance.
Benefits
- 401(k) plan with employer matching
- 16 weeks of paid parental leave
- Wellness benefits
- Commuter benefits match
- Paid time off and paid sick leave in compliance with applicable laws
- Medical, dental, and vision benefits
- 11 paid holidays
- Disability and basic life insurance
- Family-forming assistance
- Mental health program