Responsibilities Own reliability, availability, scalability, and security of production systems Design and operate highly available, fault‑tolerant, multi‑region cloud architectures Define and manage SLOs, SLIs, SLAs, and error budgets for critical services Lead high‑severity incidents and drive effective post‑incident reviews Improve MTTD and MTTR through automation, tooling, and runbooks Operate and evolve Kubernetes (EKS) platforms and multi‑tenant deployments Work with Infrastructure‑as‑Code (Terraform, CloudFormation, Pulumi) at scale Build and improve CI/CD pipelines and deployment safeguards Design and maintain observability (metrics, logs, traces, alerting) Drive capacity planning, performance optimisation, and cloud cost efficiency Partner with Security & Compliance on SOC 2, ISO 27001, GDPR, and DORA controls Mentor SREs and influence reliability‑first engineering practices across teams Qualifications 6+ years in SRE, DevOps, or cloud infrastructure roles (2+ years in a senior/lead capacity) Strong AWS experience (EKS, RDS/Aurora, S3, MSK, VPC, IAM, ALB/NLB) Deep Kubernetes operational expertise Proven incident management and post‑mortem leadership Solid experience with IaC, CI/CD, and automation Strong scripting or programming skills (Python, Go, Bash) Hands‑on observability experience (Prometheus, Grafana, Datadog, ELK, OpenTelemetry) Excellent communication and cross‑team collaboration skills #J-18808-Ljbffr

Senior Site Reliability Engineer

YELLOSA

Similar jobs

Software Engineer Iii

LEXISNEXIS

Senior Managing Consultant – Risk & Dispute Strategy

HKA

Chief Network Architect & Platform Lead

SHARE

Early-Career Electrical Engineer — Power & Energy

SUPPORTFINITY™

Retail Store Manager Trainee — Fast-Track To Leadership

CLICKS GROUP LIMITED

Adjunct Teaching Faculty | Chemical Engineering

WORCESTER POLYTECH

Hr Manager (Logistics)

CLICKS GROUP LIMITED

Receive similar jobs by email