We are seeking a skilled and motivated Site Reliability Engineer to join our team in Trimble’s Core Cloud Platform. The ideal candidate will have a strong background in cloud platforms, infrastructure as code, and automation via programming/scripting languages.
Requirements
- Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments
- Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting
- Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes
- Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention
- Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed
- Lead incident response efforts, conduct root cause analysis, and implement long-term solutions for complex issues
- Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks
- Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles
- Participate in on-call rotations and handle critical incidents with confidence and expertise.
- Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team
Benefits
- 401k Matching
- Retirement Plan
- Visa Sponsorship