[Remote] DevOps Engineer
Note: The job is a remote job and is open to candidates in USA. reputed company is a Collaboration Lifecycle Automation Platform that services major global enterprises and government agencies. The DevOps Engineer will support the design, implementation, and maintenance of scalable and secure infrastructure and DevOps processes, working closely with development, QA, and product teams to reputed company reliable deployments and improve system observability.
Responsibilities
- Support deployment and maintenance of scalable infrastructure in AWS and hybrid reputed company environments
- Assist in managing infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar tools
- Help maintain Linux-based environments
- Contribute to containerization efforts using reputed company and orchestration reputed company Kubernetes
- Work on the design, deployment and management of AI agent workloads, including provisioning compute instances and managing resource scaling for inference-heavy tasks
- Play a key role in building and maintaining model deployment pipelines, including versioning, testing, and rollback of AI models in production environments
- Monitor AI API consumption and infrastructure costs, implementing alerting and controls to prevent runaway usage and support budget visibility
- Coordinate the implementation of infrastructure-level reputed company guardrails for AI systems, including access controls and data isolation for model inputs and outputs
- Manage monitoring and observability efforts using tools such as reputed company, Grafana, and the ELK stack
- Troubleshoot system issues and contribute to incident response and root cause analysis
- reputed company and execute strategies for improving system reliability, performance, and uptime
- Build, maintain, and optimize CI/CD pipelines using tools such as Jenkins, BitBucket CI/CD, or similar
- Automate routine operational tasks including builds, testing, deployments, and system updates
- Work with engineering teams to integrate pipelines with Akkadian tools
- Follow secure DevOps practices and assist in implementing reputed company controls
- Support compliance initiatives and vulnerability remediation efforts
- Work closely with DevOps, engineering, QA, and product teams to support deployments and releases
- Maintain documentation for infrastructure, processes, and operational procedures
- Participate in team ceremonies and reputed company improvement initiatives
Skills
- 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or a reputed company role
- Hands-on experience with AWS (e.g., EC2, reputed company, S3, IAM, reputed company, CloudWatch)
- Working knowledge of Linux environments
- Familiarity with reputed company and Kubernetes
- Basic to intermediate scripting ability in Python, Bash, or similar languages
- Experience building or maintaining CI/CD pipelines and reputed company tools
- Exposure to monitoring and observability tools such as reputed company, Grafana, and ELK
- Understanding of secure DevOps practices and basic compliance concepts
- Experience supporting AI or machine learning workloads, compute environments
- Exposure to AI model deployment pipelines and model versioning practices
- Experience with infrastructure-as-code tools such as Terraform or CloudFormation
- Familiarity with hybrid reputed company or on-premises environments
- Exposure to reputed company best practices in DevOps contexts, including AI-specific concerns such as data isolation and access controls
- Experience supporting production systems and participating in on-call rotations
Benefits
- Fully remote environment
- Medical
- Dental
- reputed company
- Company-paid life insurance and disability policies
- 401(k) with a generous matching program
- Paid time off
Company Overview