We are looking for a Staff Site Reliability Engineer to focus on Developer Experience and design, build, and maintain high-performance, scalable, and reliable services. We believe in a DevOps philosophy where every engineering team should be responsible for the software they build and deploy.
Requirements
- Ensure high availability, performance, and scalability of mission-critical systems and services.
- Lead the design and implementation of resilient and fault-tolerant infrastructure.
- Drive incident response, root cause analysis, and postmortem culture.
- Mentor others in incident practices.
- Write and maintain operational documentation, runbooks, and architecture diagrams.
- Drive and promote protocols on production readiness and operational excellence.
- Own and evolve infrastructure automation using Terraform or similar tools to remove as much as possible any human intervention.
- Help automate infrastructure provisioning and other engineering processes by working on automations built on top of an engineering platform written in GitHub Actions.
- Build internal platforms, tools, and frameworks to improve developer productivity and service reliability.
- Work closely with software engineers, platform teams, and product managers to align on company goals.
- Coach and up-skill other engineering team members.
- Plan for growth of Talkdesk's infrastructure.
Benefits
- Competitive salary and benefits package
- Opportunity to work with a leading cloud contact center provider
- Chance to be part of a dynamic and innovative team
- Professional growth and development opportunities
- Flexible work arrangements
- Employee recognition and reward programs