We are seeking a highly experienced Site Reliability Engineer to lead technical initiatives, architect and implement advanced solutions, and provide leadership in incident management. The ideal candidate has extensive experience in SRE, platform engineering, or software development with a strong operational focus.
Requirements
- Extensive experience with minimum 5 years in SRE, platform engineering, or software development with a strong operational focus.
- Demonstrated experience in providing technical leadership, guidance, or mentorship to engineering teams.
- Expert-level practical knowledge of cloud platforms, especially GCP.
- Deep hands-on experience with container orchestration (Kubernetes) and infrastructure-as-code (Terraform, Helm, ArgoCD).
- Strong command of multiple scripting and programming languages (Python, Go, Bash).
- Proven expertise in building and leveraging advanced monitoring and observability tools (Prometheus, Grafana, ELK stack).
- Exceptional analytical, problem-solving, and debugging skills at a senior level.
- Excellent communication, collaboration, and influencing skills.
- Organizational and leadership skills are a big plus
Benefits