Associate Site Reliability Engineer

Remote · USA Full-time New today

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The Associate Site Reliability Engineer will design, develop, and deploy AI-powered solutions, maintain Kubernetes-based infrastructure, and collaborate with developers to automate workflows. This role is essential for ensuring high availability and reliability of applications.

Responsibilities

Design, develop, and deploy AI-powered solutions using no-code, low-code, and advanced platforms, translating business needs into scalable applications that enhance products, workflows and decision-making
Design, deploy, and maintain Kubernetes-based infrastructure to ensure high availability and scalability of applications
Build and manage CI/CD pipelines using GitHub Actions to enable fast and reliable deployments
Use Terraform to provision and manage infrastructure in Google Cloud Platform (GCP)
Manage and optimize Apache Kafka-based systems to ensure reliable message streaming and data processing
Monitor and improve system performance and reliability using Prometheus and Grafana
Collaborate with developers to automate workflows and implement best practices for infrastructure-as-code (IaC)
Write Python scripts for automation and tooling to enhance operational efficiency
Troubleshoot and resolve system issues to minimize downtime and impact on users
Participate in on-call rotations and incident response to ensure high service reliability

Skills

1+ years of experience with Google Cloud Platform (GCP) services such as Compute Engine, Kubernetes Engine, and Cloud Storage
1+ years of hands-on experience with Kubernetes for deploying and managing containerized applications
1+ years of experience in understanding GitHub Actions for creating and maintaining CI/CD pipelines
1+ years of experience in proficiency in Python for scripting, automation, and tooling
1+ years of experience with Apache Kafka for building, maintaining, and troubleshooting message-driven systems
1+ years of experience using Prometheus and Grafana for monitoring and observability
Basic level of knowledge of Terraform for infrastructure provisioning and management
Familiarity with other cloud providers (e.g., AWS or Azure)
Knowledge of Helm for Kubernetes package management
Experience with debugging and optimizing distributed systems
Exposure to security best practices for cloud infrastructure
Knowledge of Java for developing and troubleshooting backend systems
Familiarity with DataHub or similar data cataloging and metadata management platforms
Understanding of Artificial Intelligence (AI) concepts and tools, such as building or managing machine learning pipelines, integrating AI models, or working with ML platforms like TensorFlow, PyTorch, or Vertex AI
Experience with Golang for developing infrastructure tools or cloud-native applications

Benefits

A comprehensive benefits package
Incentive and recognition programs
Equity stock purchase
401k contribution (all benefits are subject to eligibility requirements)

Company Overview

Optum is a healthcare company that provides pharmacy services, health care operations, and population health management. It is a sub-organization of UnitedHealth Group. It was founded in 2011, and is headquartered in Eden Prairie, Minnesota, USA, with a workforce of 10001+ employees. Its website is https://www.optum.com/.

Apply To This Job

Apply

Associate Site Reliability Engineer

Related roles

Staff Software Developer 2

Associate Software Engineer

[Remote] Advanced Degree Software Engineer - Database Technologies

[Remote] Associate Software Engineer / Software Engineer (C# .NET, Angular)

[2026] Senior Machine Learning Engineer, AI Platform - PhD Early Career

Business Systems Administrator

Software Developer - New Graduate

Applied AI Specialist I

Member of Technical Staff 1- Core Data Path

Quantitative Equity Technology, Infrastructure Software Engineer

Senior Drug Safety Associate, Pharmacovigilance - US - Remote

Experienced Customer Experience Specialist (German & English) – Driving Customer Success at arenaflex

001AHI - Fullstack Engineer (Node + React)

Experienced Part-Time Remote Customer Service Representative – Work from Home Opportunity at arenaflex

IT EPIC Application Analyst

Hamal- Prompt Creation Expert Chinese (Simplified)

Territory Manager in Training |Early Careers| Water

IT Help Desk Specialist

Contract Administrator

Experienced Customer Service Representative – Global Airport Operations