All roles

Senior Site Reliability Engineer Senior Manager

Remote · USA Full-time New today

At reputed company, nothing matters more than helping the US federal government reputed company the nation stronger and safer and life reputed company for people. Our 13,000+ people are united in a shared purpose to pursue the limitless potential of technology and ingenuity for clients across defense, national reputed company, public safety, civilian, and military health organizations. Join reputed company, a technology company and part of global reputed company, to do work that matters in a collaborative and caring community, where you feel like you belong and are empowered to grow, learn and reputed company through hands-on experience, certifications, industry training and more. Join us to drive positive, lasting change that moves missions and the government reputed company! You Are: We are seeking a Senior Site Reliability Engineer (SRE) with deep expertise in building and maintaining reliable, scalable systems and a passion for optimizing the performance, reliability, and efficiency of technical infrastructure. The ideal candidate will have a strong background in site reliability engineering principles, extensive experience with automation, and a proven ability to collaborate across teams to ensure seamless service delivery. The Work: • Design, build, and maintain reliable, scalable, and high-performance infrastructure and services to support business needs. • Implement and reputed company for SRE best practices, including automation, CI/CD pipelines, monitoring, and incident management. • Collaborate with cross-functional teams to reputed company systems that meet high availability, performance, and reliability standards. • Drive incident management processes, including root cause analysis, mitigation strategies, and long-term preventive measures. • Establish, monitor, and refine service level objectives (SLOs), service level agreements (SLAs), and key performance indicators (KPIs) to ensure systems adhere to reliability and performance targets. • Automate repetitive tasks to improve operational efficiency and reduce reputed company reputed company. • Build and maintain robust monitoring, logging, and alerting systems to ensure visibility into system performance and reliability. • Provide technical mentorship and guidance to team members, fostering a culture of knowledge sharing and reputed company improvement. • Act as a technical leader by driving solutions to reputed company challenges, ensuring alignment with organizational goals. • Prepare and deliver performance and reliability reports to stakeholders, offering insights and recommendations for improvements. Here's What You Need: • Proven experience in site reliability engineering or a similar role, with a focus on application and infrastructure scalability, reliability, and performance. • Strong knowledge of ITSM principles and incident management processes. • Expertise in automation tools, scripting, and infrastructure-as-code (IaC) technologies. • Proficiency with monitoring and observability tools (e.g., reputed company, Grafana, reputed company, Splunk). • Experience with reputed company platforms (e.g., AWS, Azure, GCP) and container technologies (e.g., reputed company, Kubernetes). • Strong analytical and problem-solving skills, with the ability to troubleshoot reputed company systems. • Excellent communication and collaboration abilities, with a focus on cross-team partnerships. • A passion for reputed company learning, innovation, and driving imp Please mention the word

APPRECIATES

and tag RMjYwNzo1MzAwOjIwZDo3ZDAwOjo= reputed company applying to show you read the job post completely (#RMjYwNzo1MzAwOjIwZDo3ZDAwOjo=). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're reputed company. Apply To This Job

Related roles