What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...shopping carts or while unloading trucks; Frame shop contains glass cutter and heat press; work hours include nights, weekends and early mornings Applicants in the U.S. must satisfy federal, state, and local legal requirements of the job. Michaels requires all team...
...Job Description Reliable Nurse Staffing is seeking a travel nurse RN ICU - Intensive Care Unit for a travel nursing job in Delta, Colorado. Job Description & Requirements ~ Specialty: ICU - Intensive Care Unit ~ Discipline: RN ~ Start Date: 12/31/2025~...
...Job Description Job Description Bilingual Spanish Math Tutors Bilingual Spanish Math Tutors Are you passionate about teaching and learning? Are you motivated to make a positive impact in the lives of urban, working-class youth? If yes, this position with SmartStart...
...originating, selling and delivering SAP-based Supply Chain Transformation projects that make a... .... Ultimately, you are a confident manager who spots and stays ahead of the SAP... ...understanding with SAP Supply Chain and SAP Logistics - including Inventory Management,...
...Job Description Job Description Chemical Mixer / Blender is responsible for mixing raw materials to create finished products according to batch tickets. Duties and Responsibilities: Set up and start all mixing tanks and blenders. Obtains and understands...