Our client is looking for a Senior-Level DevOps Engineer to join their engineering team. This role is suited to a highly experienced, hands‑on, and technically strong DevOps professional with deep cloud infrastructure expertise and a passion for building and maintaining scalable, high‑availability production environments. The successful candidate will take ownership of complex multi‑cloud infrastructure, lead deployment and monitoring strategies, support mission‑critical production systems, and collaborate closely with development, QA, and engineering teams to ensure reliable, secure, and efficient platform operations across global environments. Key Responsibilities Design, implement, maintain, and optimise highly available multi‑cloud infrastructure environments across AWS and supporting cloud platforms Manage and scale production workloads across multiple AWS regions with a strong focus on uptime, reliability, and security Build, maintain, and improve Infrastructure‑as‑Code using Terraform across Development, Testing, and Production environments Design and maintain CI/CD pipelines using Jenkins and deployment orchestration tools such as Spinnaker, ArgoCD, or Harness Implement and manage Blue/Green and Red/Black deployment strategies, including rollback and artifact promotion processes Administer and optimise AWS RDS/Aurora MySQL environments, including upgrades, migrations, backups, restores, and performance tuning Manage and monitor messaging systems such as RabbitMQ, including scaling consumers and load balancing using HAProxy and Nginx Monitor infrastructure health using Prometheus, Grafana, ELK Stack, and related monitoring tools Troubleshoot complex production issues, conduct root cause analysis, and lead post‑mortem investigations to reduce MTTR Perform advanced Linux administration, Bash scripting, networking troubleshooting, and performance optimisation Collaborate with software engineers and product teams to ensure smooth deployments and operational excellence Contribute to infrastructure architecture, technical strategy, scalability planning, and cost optimisation initiatives Participate in on‑call rotations and act as an escalation point during production incidents Maintain and improve AI/ML infrastructure pipelines, GPU workloads, and distributed processing environments where applicable Requirements 5+ years’ hands‑on DevOps and cloud infrastructure experience Advanced AWS experience managing production systems across multiple regions Strong expertise with: EC2 VPC design, routing, peering, and ACLs IAM roles and policies S3 and CloudFront Security Groups Extensive Terraform experience, including: Remote state management Environment separation Infrastructure code reviews and refactoring Strong Jenkins pipeline creation and CI/CD automation experience Experience with deployment orchestration tools such as Spinnaker, ArgoCD, or Harness Experience implementing Blue/Green or Red/Black deployment methodologies Production upgrades and migrations Backup and restore procedures Performance tuning Experience with HAProxy and Nginx load balancing Strong monitoring and logging experience using Prometheus, Grafana, ELK Stack, or equivalent Proven production incident response and on‑call support experience Strong Bash scripting and troubleshooting capabilities Comfortable supporting and debugging: PHP applications Python automation and AI integrations Experience with Docker or containerised environments Exposure to multi‑cloud infrastructure environments (AWS and GCP preferred) Experience operating high‑availability systems with 24/7 uptime requirements Exposure to AI/ML infrastructure, GPU workloads, or video/media processing systems (advantageous) AWS certifications (advantageous) Technical & Professional Skills Advanced AWS cloud infrastructure management Strong Terraform and Infrastructure‑as‑Code expertise CI/CD pipeline architecture and deployment automation Database administration and performance optimisation Monitoring, observability, and incident response management Linux systems administration and troubleshooting Messaging systems and distributed architecture support Infrastructure scalability and cost optimisation Strong networking and load balancing knowledge Experience supporting AI/ML and high‑throughput environments Multi‑cloud platform exposure and operational support Preferred Qualifications Tertiary qualification in Computer Science, Information Technology, Engineering, or a related field Relevant AWS, DevOps, or cloud certifications Experience working in fast‑paced Agile or product‑based environments Experience operating large‑scale, customer‑facing production systems Strong analytical and troubleshooting abilities High attention to detail and operational excellence Strong communication and collaboration skills Ability to work effectively under pressure in high‑availability environments Strong ownership mentality and accountability Ability to lead during critical production incidents Passion for automation, scalability, and continuous improvement Strong mentoring and knowledge‑sharing approach #J-18808-Ljbffr
Senior Devops Engineer
NETWORK RECRUITMENT
city of johannesburg metropolitan municipality, city of johannesburg metropolitan municipality
Published 4 days ago
Report job