As a Systems Reliability Engineer (SRE), you’ll own the reliability, scalability, and security posture of the platforms that power our agentic workflows. You’ll build the guardrails and operational foundations that let product and AI teams ship quickly without sacrificing uptime, observability, or customer trust.

Requirements

4+ years in SRE/DevOps/Infrastructure roles supporting production systems with meaningful uptime requirements
AWS Expertise: Strong hands-on experience operating workloads in AWS (IAM, VPC/networking, compute, storage, monitoring, and security controls)
Solid understanding of distributed systems failure modes (timeouts, retries, cascading failures), and how to design for resilience
Strong incident leadership instincts; comfortable being the calm, methodical driver during outages
Automation Mindset: You automate first—repeatable environments, scripted operations, and minimal manual toil
Clear Communicator: Can write crisp runbooks, postmortems, and technical proposals; able to align engineering, product, and ops on priorities
Proven ability to improve security posture and reliability without blocking delivery

Benefits

Equity & Ownership: Competitive equity so you grow alongside the company
Impact & Visibility: Direct access to co-founders; your work directly improves customer trust and operational outcomes
Collaborative Culture: Tight-knit team of seasoned operators and AI experts
Flexible Work: Hybrid with core Bay Area presence and remote flexibility

Requirements

4+ years in SRE/DevOps/Infrastructure roles supporting production systems with meaningful uptime requirements
AWS Expertise: Strong hands-on experience operating workloads in AWS (IAM, VPC/networking, compute, storage, monitoring, and security controls)
Solid understanding of distributed systems failure modes (timeouts, retries, cascading failures), and how to design for resilience
Strong incident leadership instincts; comfortable being the calm, methodical driver during outages
Automation Mindset: You automate first—repeatable environments, scripted operations, and minimal manual toil
Clear Communicator: Can write crisp runbooks, postmortems, and technical proposals; able to align engineering, product, and ops on priorities
Proven ability to improve security posture and reliability without blocking delivery

Benefits

Equity & Ownership: Competitive equity so you grow alongside the company
Impact & Visibility: Direct access to co-founders; your work directly improves customer trust and operational outcomes
Collaborative Culture: Tight-knit team of seasoned operators and AI experts
Flexible Work: Hybrid with core Bay Area presence and remote flexibility

System Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

System Reliability Engineer

Site reliability engineer

Site Reliability Engineer

System Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

System Reliability Engineer

Site reliability engineer

Site Reliability Engineer

Job Details

About BackOps.ai