Key Responsibilities Contribute to the global design and implementation of scalable and fault tolerant infrastructure systems that support engineering and operational needs. Contribute to the deployment, configuration, and maintenance of distributed storage and database systems Analyse system failures, performance issues, and misconfigurations across hardware, software, and network layers. Lead and mentor the computer systems engineers and contribute to strategic technical planning. Qualification BTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 13 years Experience BENG/MTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 9 years experience MENG in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 7 yearsexperience. PHD in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 5 years 3+ years in a technical leadership or software/system architectural role with direct responsibility for large-/platform-scale distributed systems. Demonstrated hands‑on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e.g. Kubernetes), DevOps/SRE practices and cloud‑native technologies. Experience leading teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains Knowledge In-depth understanding of systems engineering principles, including performance optimisation, fault tolerance, and resource scheduling in Linux-based environments. Strong knowledge of containerised environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and runtime architectures (containerd, CRI). Expertise in infrastructure-as-code, continuous integration/deployment (CI/CD), and configuration management tools (e.g., GitLab CI, Ansible, Terraform, ArgoCD). Advanced understanding of distributed computing and storage architectures, including Ceph, S3, NFS, and local/clustered file systems. Operational and architectural fluency in relational and NoSQL database systems (e.g., PostgreSQL, MySQL, MongoDB), including replication, backups, and performance tuning. Working knowledge of networking fundamentals, security protocols, and systems‑level observability (e.g., Prometheus, Grafana, ELK/EFK stack). Familiarity with the HPC ecosystem (e.g., SLURM, job schedulers) is beneficial for environments supporting scientific or research computing. Please call us on NB: Should you not hear from us within 6 weeks, please consider your application unsuccessful. #J-18808-Ljbffr
Senior Compute Systems Engineer
THE HIRING HOUSE
Remote, Remote
Published 10 days ago
Report job