Anthropic is seeking a Staff Infrastructure Engineer to join our Pre-training team, responsible for developing the next generation of large language models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.
Requirements
- 7+ years of experience outside of internships
- Strong software engineering skills with experience in building distributed systems
- Expertise in Python
- Hands-on experience with distributed computing frameworks, particularly Apache Spark
- Deep understanding of cloud computing platforms and distributed systems architecture
- Experience with high-throughput, fault-tolerant system design
- Strong background in performance optimization and system scaling
- Excellent problem-solving skills and attention to detail
- Strong communication skills and ability to work in a collaborative environment
- Advanced degree in Computer Science or related field
- Experience with language model training infrastructure
- Strong background in distributed systems and parallel computing
- Expertise in tokenization algorithms and techniques
- Experience building high-throughput, fault-tolerant systems
- Deep knowledge of monitoring and observability practices
- Experience with infrastructure-as-code and configuration management
- Background in MLOps or ML infrastructure
Benefits
- Competitive compensation
- Benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Lovely office space