Altimate AI is a fast-growing AI startup revolutionizing enterprise data operations. We're looking for a Site Reliability Engineer to join our founding team and architect our infrastructure strategy from the ground up. As a key member of our team, you'll collaborate with software, AI, and data engineering teams to build a robust, secure, and high-performance infrastructure.
Requirements
- 8+ years of hands-on experience deploying, managing, and scaling applications on Kubernetes in production environments
- Strong experience with major cloud providers (AWS, Google Cloud, and/or Azure)
- Proficiency in setting up and maintaining CI/CD pipelines using modern tools (e.g. Jenkins, GitHub Actions, GitLab CI/CD, Argo CD)
- Extensive experience with Infrastructure as Code tools such as Terraform, Ansible, or Pulumi
- Strong scripting and automation skills in Bash, Python, and/or Go
- Deep knowledge of observability and logging/monitoring frameworks (Prometheus, Grafana, ELK/EFK stack, Datadog, etc.)
- Proven experience leading DevOps initiatives and building out scalable, reliable infrastructure (especially in a startup or greenfield environment)
- Excellent troubleshooting skills with experience in performance tuning, failure analysis, and incident management for high-availability, mission-critical systems
- Strong verbal and written communication skills
Benefits
- Competitive salary
- Meaningful equity stake
- Access to cutting-edge AI infrastructure and resources
- Regular team off-sites and opportunities to attend and present at leading industry conferences
- Learning budget for AI courses, books, and computing resources
- Dynamic and intellectually stimulating work environment with a team of talented engineers
- Opportunities to shape the direction of the company and leave a lasting impact