We are seeking a Site Reliability Engineer (SRE) to own the availability, resilience, and operational readiness of our cloud-native platform running on AWS. This role is responsible for ensuring our systems are designed to tolerate failure, recover quickly, and support safe, continuous delivery.
Requirements
- Strong production experience operating systems on AWS
- Hands-on experience with containerized workloads on ECS Fargate
- Proven experience owning system reliability, availability, and recovery
- Experience designing and executing disaster recovery tests and failover simulations
- Experience participating in or leading incident response
- Strong understanding of CI/CD, release engineering, and deployment strategies
- Hands-on experience with CloudFormation or equivalent infrastructure-as-code tools
- Experience working with Bitbucket or similar source control systems
- Familiarity with managed databases, Kafka, and OpenSearch
- Strong scripting and automation skills (e.g., Python, Bash)
Benefits
- Competitive salary and stock options plan (with approval)
- 5 weeks of PTO
- 5 sick leave days
- Multisport card
- Flexible work hours and a hybrid work setup
- Professional growth and development opportunities
- Global, collaborative, and inclusive company culture