The successful candidate will take ownership of complex multi-cloud infrastructure, lead deployment and monitoring strategies, support mission-critical production systems, and collaborate closely with development, QA, and engineering teams to ensure reliable, secure, and efficient platform operations across global environments.Key Responsibilities: Design, implement, maintain, and optimise highly available multi-cloud infrastructure environments across AWS and supporting cloud platforms Manage and scale production workloads across multiple AWS regions with a strong focus on uptime, reliability, and security Build, maintain, and improve Infrastructure-as-Code using Terraform across Development, Testing, and Production environments Design and maintain CI/CD pipelines using Jenkins and deployment orchestration tools, such as Spinnaker, ArgoCD, or Harness Implement and manage Blue/Green and Red/Black deployment strategies, including rollback and artifact promotion processes Administer and optimise AWS RDS/Aurora MySQL environments, including upgrades, migrations, backups, restores, and performance tuning Manage and monitor messaging systems, such as RabbitMQ, including scaling consumers and load balancing using HAProxy and Nginx Monitor infrastructure health using Prometheus, Grafana, ELK Stack, and related monitoring tools Troubleshoot complex production issues, conduct root cause analysis, and lead post-mortem investigations to reduce MTTR Perform advanced Linux administration, Bash scripting, networking troubleshooting, and performance optimisation Support platform deployments and debugging across PHP, Python, and JavaScript-based services Collaborate with software engineers and product teams to ensure smooth deployments and operational excellence Contribute to infrastructure architecture, technical strategy, scalability planning, and cost optimisation initiatives Participate in on-call rotations and act as an escalation point during production incidents Maintain and improve AI/ML infrastructure pipelines, GPU workloads, and distributed processing environments where applicable Requirements: 5+ years hands-on DevOps and cloud infrastructure experience Advanced AWS experience managing production systems across multiple regions Strong expertise with:EC2 RDS/Aurora (MySQL) VPC design, routing, peering, and ACLs IAM roles and policies S3 and CloudFront Security Groups Extensive Terraform experience, including:Modular infrastructure design Remote state management Environment separation Infrastructure code reviews and refactoring Strong Jenkins pipeline creation and CI/CD automation experience Experience with deployment orchestration tools, such as Spinnaker, ArgoCD, or Harness Experience implementing Blue/Green or Red/Black deployment methodologies Strong MySQL database administration experience, including:Production upgrades and migrations Backup and restore procedures Performance tuning Proven RabbitMQ production support experience Experience with HAProxy and Nginx load balancing Strong monitoring and logging experience using Prometheus, Grafana, ELK Stack, or equivalent Proven production incident response and on-call support experience Advanced Linux administration skills (Ubuntu CLI) Strong Bash scripting and troubleshooting capabilities Solid networking fundamentals Comfortable supporting and debugging:PHP applications Python automation and AI integrations JavaScript-based deployment environments Experience with Docker or containerised environments Exposure to multi-cloud infrastructure environments (AWS and GCP preferred) Experience operating high-availability systems with 24/7 uptime requirements Exposure to AI/ML infrastructure, GPU workloads, or video/media processing systems (advantageous) AWS certifications (advantageous) Technical & Professional Skills: Advanced AWS cloud infrastructure management Strong Terraform and Infrastructure-as-Code expertise CI/CD pipeline architecture and deployment automation Database administration and performance optimisation Monitoring, observability, and incident response management Linux systems administration and troubleshooting Messaging systems and distributed architecture support Infrastructure scalability and cost optimisation Strong networking and load balancing knowledge Experience supporting AI/ML and high-throughput environments Multi-cloud platform exposure and operational support Preferred Qualifications: Tertiary qualification in Computer Science, Information Technology, Engineering, or a related field Relevant AWS, DevOps, or cloud certifications Experience working in fast-paced Agile or product-based environments Experience operating large-scale, customer-facing production systems Key Competencies: Strong analytical and troubleshooting abilities High attention to detail and operational excellence Strong communication and collaboration skills Ability to work effectively under pressure in high-availability environments Strong ownership mentality and accountability Proactive, solution-driven mindset Ability to lead during critical production incidents Passion for automation, scalability, and continuous improvement Strong mentoring and knowledge-sharing approach For more exciting IT vacancies, visit:
Senior Devops Engineer
NETWORK RECRUITMENT
johannesburg, johannesburg
Published 11 days ago
Report job