What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...world. Our customers trust us to handle their data with air-tight security measures, which is something that we guarantee. The AWS Data... ...looking for a creative, innovative, results-oriented program manager in the Northern Virginia/Greater Washington D.C area to lead...
...Posting End: 01/14/2026 Job ID: R0268131 EARN A BONUS UP TO $500! Hiring immediately! At Wegmans, our cashiers make sure... ...life and financial wellness Our employees have put us high on Fortune 100 Best Companies to Work For list every year since it was first...
...Thank you for your interest in a career with us! The Northshore School District is committed to a diverse workforce that reflects our students and our community, one that embraces and models equity and cultural competency. This is a "pool" posting for multiple...
Step Into a High-Income Sales to Leadership Career Ready to Lead, Inspire, and Grow? AtLife Anchor Insurance, we're not just offering a... ...future. What You Bring to the Table Sales or leadership experience (B2B preferred) Confidence in motivating and coaching a team Strategic...
Now Hiring: HCC Medical Coder (Fresher 10+ Years) | Remote (USA)