We're seeking a Site Reliability Engineer (SRE) to join one of our Scrum teams and help ensure the reliability, scalability, and performance of the Florence platform. As an SRE, you'll work closely with product engineers while actively leveraging AI to improve observability, incident response, automation, and overall platform reliability.

Requirements

Be an embedded member of a Scrum team, participating in planning, refinement, reviews, and retrospectives
Use AI-powered tools to enhance system reliability, operational efficiency, and developer productivity
Design, build, and operate reliable, scalable cloud infrastructure supporting platform and product services
Apply AI-assisted analysis to monitoring, alerting, and observability data to detect, predict, and prevent incidents
Define and maintain SLOs, SLIs, and error budgets to guide reliability decisions
Collaborate with software engineers to embed reliability and AI-driven automation into the software development lifecycle
Lead and participate in incident response, root cause analysis, and postmortems, leveraging AI insights where appropriate
Automate operational tasks and reduce toil through AI-enabled and traditional automation approaches
Contribute to disaster recovery planning, testing, and operational readiness
Produce and maintain documentation such as runbooks, operational guides, and system diagrams
Contribute code as a secondary responsibility, with coding assignments focused on building reliability tooling, automation, and integrations using AI-assisted development practices

Benefits

Competitive compensation package
Medical and dental insurance
Office space in the heart of the city

Requirements

Be an embedded member of a Scrum team, participating in planning, refinement, reviews, and retrospectives

Use AI-powered tools to enhance system reliability, operational efficiency, and developer productivity

Design, build, and operate reliable, scalable cloud infrastructure supporting platform and product services

Apply AI-assisted analysis to monitoring, alerting, and observability data to detect, predict, and prevent incidents

Define and maintain SLOs, SLIs, and error budgets to guide reliability decisions

Collaborate with software engineers to embed reliability and AI-driven automation into the software development lifecycle

Lead and participate in incident response, root cause analysis, and postmortems, leveraging AI insights where appropriate

Automate operational tasks and reduce toil through AI-enabled and traditional automation approaches

Contribute to disaster recovery planning, testing, and operational readiness

Produce and maintain documentation such as runbooks, operational guides, and system diagrams

Contribute code as a secondary responsibility, with coding assignments focused on building reliability tooling, automation, and integrations using AI-assisted development practices

Site Reliability Engineer (SRE)

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer (SRE)

GTM Data Analytics Engineer

Lead Site Reliability Engineer

Site Reliability Engineer (SRE)

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer (SRE)

GTM Data Analytics Engineer

Lead Site Reliability Engineer

Job Details

About Florence Healthcare - US