Altimate AI is a fast-growing AI startup revolutionizing enterprise data operations. We're looking for a Site Reliability Engineer to join our founding team and architect our infrastructure strategy from the ground up. As a key member of our team, you'll collaborate with software, AI, and data engineering teams to build a robust, secure, and high-performance infrastructure.

Requirements

8+ years of hands-on experience deploying, managing, and scaling applications on Kubernetes in production environments
Strong experience with major cloud providers (AWS, Google Cloud, and/or Azure)
Proficiency in setting up and maintaining CI/CD pipelines using modern tools (e.g. Jenkins, GitHub Actions, GitLab CI/CD, Argo CD)
Extensive experience with Infrastructure as Code tools such as Terraform, Ansible, or Pulumi
Strong scripting and automation skills in Bash, Python, and/or Go
Deep knowledge of observability and logging/monitoring frameworks (Prometheus, Grafana, ELK/EFK stack, Datadog, etc.)
Proven experience leading DevOps initiatives and building out scalable, reliable infrastructure (especially in a startup or greenfield environment)
Excellent troubleshooting skills with experience in performance tuning, failure analysis, and incident management for high-availability, mission-critical systems
Strong verbal and written communication skills

Benefits

Competitive salary
Meaningful equity stake
Access to cutting-edge AI infrastructure and resources
Regular team off-sites and opportunities to attend and present at leading industry conferences
Learning budget for AI courses, books, and computing resources
Dynamic and intellectually stimulating work environment with a team of talented engineers
Opportunities to shape the direction of the company and leave a lasting impact

Requirements

8+ years of hands-on experience deploying, managing, and scaling applications on Kubernetes in production environments

Strong experience with major cloud providers (AWS, Google Cloud, and/or Azure)

Proficiency in setting up and maintaining CI/CD pipelines using modern tools (e.g. Jenkins, GitHub Actions, GitLab CI/CD, Argo CD)

Extensive experience with Infrastructure as Code tools such as Terraform, Ansible, or Pulumi

Strong scripting and automation skills in Bash, Python, and/or Go

Deep knowledge of observability and logging/monitoring frameworks (Prometheus, Grafana, ELK/EFK stack, Datadog, etc.)

Proven experience leading DevOps initiatives and building out scalable, reliable infrastructure (especially in a startup or greenfield environment)

Excellent troubleshooting skills with experience in performance tuning, failure analysis, and incident management for high-availability, mission-critical systems

Strong verbal and written communication skills

Benefits

Competitive salary

Meaningful equity stake

Access to cutting-edge AI infrastructure and resources

Regular team off-sites and opportunities to attend and present at leading industry conferences

Learning budget for AI courses, books, and computing resources

Dynamic and intellectually stimulating work environment with a team of talented engineers

Opportunities to shape the direction of the company and leave a lasting impact

Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer

Forward Deployed Software Engineer (Data)

Forward Deployed Software Engineer-Data

Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer

Forward Deployed Software Engineer (Data)

Forward Deployed Software Engineer-Data

Job Details

About Altimate.ai