Position Specifications Attribute Details Position Title Senior Site Reliability Engineer (RHEL Specialist) Primary Location Remote Minimum Experience 5+ Years in Systems Engineering, DevOps, or SRE roles Reporting Structure Reports to the Head of Infrastructure & Platform Engineering Language Requirements Portuguese and English 1.3 Role Summary The Senior Site Reliability Engineer (RHEL Specialist) is a critical technical leadership role responsible for ensuring that our production environments are resilient, performant, and highly automated. Unlike traditional systems administration, this role treats infrastructure as a software problem. You will be the primary custodian of our Red Hat Enterprise Linux (RHEL) ecosystem, applying advanced engineering practices to manage thousands of nodes across on-premise virtualization and public cloud platforms. Your mission is to bridge the gap between software development and systems operations by designing self-healing systems and robust Ansible-based automation frameworks. You will be expected to proactively identify system inefficiencies, optimize kernel performance, and architect CI/CD pipelines that empower development teams while maintaining strict production stability. Core Mission Statement: To engineer a world-class RHEL environment where manual intervention is the exception, not the rule. Through advanced automation and deep observability, you will ensure our services achieve 99.99% availability while enabling rapid, low-risk software delivery. 1.4 Ideal Candidate Profile The ideal candidate is a proactive problem-solver with a "software-first" approach to infrastructure. We are looking for an individual who: Possesses a deep-seated expertise in the RHEL kernel , system internals, and performance tuning. Views Ansible and Python as their primary tools for managing complexity at scale. Demonstrates a proven track record of managing Docker and Kubernetes workloads in high-traffic production settings. Is naturally curious and proactive, often identifying and resolving system bottlenecks before they trigger an alert. Thrives in a collaborative DevOps culture and is comfortable navigating the complexities of hybrid-cloud environments (AWS, Azure, or GCP). 2. Key Responsibilities The Senior Site Reliability Engineer (RHEL Specialist) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of our enterprise Linux services. This role demands a unique blend of systems engineering expertise and software development skills to build and run large-scale, distributed, fault-tolerant systems. 2.1 Automation & Infrastructure Orchestration Ansible Framework Design: Architect, implement, and maintain enterprise-grade automation solutions using Ansible for our Red Hat Enterprise Linux (RHEL) fleet. This includes developing custom Ansible roles, modules, and playbooks to automate system provisioning, configuration management, and patching. Standard Operating Environment (SOE): Maintain and evolve the RHEL SOE across hybrid-cloud environments, ensuring consistency between on-premise virtualization (VMware/KVM) and public cloud instances. Infrastructure as Code (IaC): Transform manual infrastructure workflows into automated code-based processes, ensuring that every component of the RHEL environment is version-controlled and reproducible. 2.2 Development & Toil Reduction Scripting & Tooling: Develop advanced scripts in Python and Bash to automate repetitive operational tasks (toil). You will be expected to build internal tools that enhance the productivity of the entire engineering organization. System Integration: Write code to integrate infrastructure components with internal APIs, monitoring tools, and #J-18808-Ljbffr
Senior Site Reliability Engineer
FUTURE FIT
workfromhome, workfromhome
Published 14 days ago
Report job