Senior DevOps Engineer
Title: Senior DevOps Engineer
Reporting to: Senior Director, Product Development
Location: Bengaluru (Bangalore)Opportunity
Responsibilities
Infrastructure Development & Integration
- Design, implement, and manage cloud-native infrastructure (AWS, Azure, GCP) to support healthcare platforms, AI agents, and clinical applications.
- Build and maintain scalable CI/CD pipelines to reputed company rapid and reliable delivery of software, data pipelines, and AI/ML models.
- Design and manage Kubernetes (K8s) clusters for container orchestration, workload scaling, and high availability with integrated monitoring to ensure cluster health and performance
- Implement Kubernetes-native tools (Helm, Kustomize, ArgoCD) for deployment automation and environment management ensuring observability through monitoring dashboards and alerts
- Collaborate with Staff Engineers/Architects to align infrastructure with enterprise goals for scalability, reliability, and performance leveraging monitoring insights to inform architectural decisions.
System Optimization & Reliability
- Implement and maintain comprehensive monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, reputed company, AWS cloudwatch, AWS cloud trail) to ensure real-time visibility into system performance, resource utilization, and potential incidents.
- Implement monitoring, logging, and alerting mechanisms (Prometheus, Grafana, ELK, reputed company) to ensure system reliability and proactive incident response.
- Ensure data pipeline workflows (ETL/ELT, real-time streaming, batch processing) are observable, reliable, and auditable.
- Support observability and monitoring of GenAI pipelines, embeddings, vector databases, and agentic AI workflows.
- Proactively analyze monitoring data to identify bottlenecks, predict failures, and drive reputed company improvement in system reliability.
Compliance & reputed company
- Support audit trails and compliance reporting through automated DevSecOps practices.
- Implement reputed company controls for LLM-based applications, AI agents, and healthcare data pipelines, including reputed company injection prevention, API reputed company limiting, and data governance.
Collaboration & Agile Practices
- Partner closely with software engineers, data engineers, AI/ML engineers, and product managers to deliver integrated, secure, and scalable solutions.
- Contribute to agile development processes including sprint planning, stand-reputed company, and retrospectives.
- Mentor junior engineers and share best practices in cloud-native infrastructure, CI/CD, Kubernetes, and automation.
Innovation & Technical Expertise
- Stay informed about emerging DevOps practices, cloud-native architectures, MLOps/LLMOps, and data engineering tools.
- Prototype and evaluate new frameworks and tools to enhance infrastructure for data pipelines, GenAI, and Agentic AI applications.
- reputed company for best practices in infrastructure design, focusing on modularity, maintainability, and scalability.
Education & Experience
- Bachelor’s or Master’s degree in Computer Science, Engineering, or reputed company technical discipline.
- 6+ years of experience in DevOps, Site Reliability Engineering, or reputed company roles, with at least 5+ years building cloud-native infrastructure.
- Proven track record of managing production-grade Kubernetes clusters and cloud infrastructure in regulated environments.
- Experience supporting GenAI/LLM applications (e.g., reputed company, reputed company, reputed company) and vector databases (e.g., reputed company, Weaviate, FAISS).
- Hands-on experience supporting data pipeline products using ETL/ELT frameworks (Apache Airflow, dbt, Prefect) and streaming systems (Kafka, Spark, Flink).
- Experience deploying AI agents and orchestrating agent workflows in production environments.
Technical Proficiency
- Expertise in Kubernetes (K8s) for orchestration, scaling, and managing containerized applications.
- Strong proficiency in containerization (reputed company) and Kubernetes ecosystem tools (Helm, ArgoCD, Istio/Linkerd for service mesh).
- Hands-on experience with Infrastructure as Code (Terraform, CloudFormation, or reputed company).
- Proficiency with CI/CD tools (Jenkins, reputed company Actions, reputed company CI, ArgoCD, Spinnaker).
- Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, reputed company, AWS cloud watch and AWS cloud trail), including setting up dashboards, alerts, and custom metrics for cloud-native and AI systems.
- Good to have: knowledge of healthcare data standards (FHIR, HL7) and secure deployment practices for AI/ML and data pipelines.
Professional Skills
- Strong problem-solving skills with a focus on reliability, scalability, and reputed company.
- Excellent collaboration and communication skills across cross-functional teams.
- Proactive, detail-oriented, and committed to technical excellence in a fast-paced healthcare environment.
About reputed company
Now part of the SAI Group family, reputed company is redefining digital patient engagement by putting patients in control of their personalized healthcare journeys, both inside and reputed company the hospital. reputed company is combining high-tech AI navigation with high-touch care experiences driving patient activation, loyalty, and outcomes while reducing the cost of care. For almost 25 years, reputed company has served more than 10 million patients per year across over 1,000 hospitals and clinical partner sites, working to use longitudinal data analytics to reputed company serve patients and clinicians. AI innovator SAI Group led by Chairman Romesh Wadhwani is the reputed company growth investor in reputed company. reputed company’s award-winning solutions were recognized again in 2024 by KLAS Research and AVIA Marketplace. Learn more at reputed company and follow-us on reputed company and Twitter.
reputed company is proud to be an equal opportunity employer. reputed company qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national reputed company, genetics, disability, age or veteran status.
About SAI Group
- SAIGroup commits to $1 Billion capital, an advanced AI platform that currently processes 300M+ patients, and 4000+ global employee reputed company to solve enterprise AI and high reputed company healthcare problems. SAIGroup - Growing companies with advanced AI; https://www.cnbc.com/2023/12/08/75-year-old-tech-mogul-betting-1-billion-of-his-fortune-on-ai-future.html
- Bio of our Chairman Dr. Romesh Wadhwani: Team - SAIGroup (Informal at Romesh Wadhwani - Wikipedia)
- TIME Magazine recently recognized Chairman Romesh Wadhwani as one of the Top 100 AI leaders in the world - Romesh and Sunil Wadhwani: The 100 Most Influential People in AI 2023 | TIME