We are seeking a Staff Site Reliability Engineer to play a critical role in building and scaling the infrastructure behind ServiceTitan’s new AI platform. The ideal candidate will have technical depth and strategic thinking, with expertise in Azure, Terraform, Kubernetes, and modern IaC and container orchestration best practices.

Requirements

Lead the design, implementation, and optimization of scalable, resilient infrastructure for cloud-native AI services on Azure.
Establish true continuous delivery (CD) pipelines supporting blue-green deployments, automatic rollbacks, and progressive delivery patterns.
Champion observability excellence - define best practices for metrics, tracing, and logging; help product team design meaningful SLIs, SLOs, and error budgets.
Drive automation across the entire lifecycle: infrastructure provisioning, testing, deployment, and recovery.
Partner with the engineering team to design reliable, fault-tolerant services and perform resilience and capacity reviews.
Mentor engineers and foster a reliability culture across teams — enabling others to build self-healing, observable systems.

Benefits

Flextime, recognition, and support for autonomous work
Holistic health and wellness benefits
Support for Titans at all stages of life

Requirements

Lead the design, implementation, and optimization of scalable, resilient infrastructure for cloud-native AI services on Azure.
Establish true continuous delivery (CD) pipelines supporting blue-green deployments, automatic rollbacks, and progressive delivery patterns.
Champion observability excellence - define best practices for metrics, tracing, and logging; help product team design meaningful SLIs, SLOs, and error budgets.
Drive automation across the entire lifecycle: infrastructure provisioning, testing, deployment, and recovery.
Partner with the engineering team to design reliable, fault-tolerant services and perform resilience and capacity reviews.
Mentor engineers and foster a reliability culture across teams — enabling others to build self-healing, observable systems.

Benefits

Flextime, recognition, and support for autonomous work
Holistic health and wellness benefits
Support for Titans at all stages of life

Staff Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Staff Site Reliability Engineer

Staff Site Reliability Engineer

Staff Software Engineer

Staff Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Staff Site Reliability Engineer

Staff Site Reliability Engineer

Staff Software Engineer

Job Details

About ServiceTitan