Essential Skills Strong hands-on experience with Python (3.x) and PySpark for large-scale data processing Minimum 3+ years working with AWS services such as Athena, Glue, Lambda, S3, and ECS Experience with NoSQL (DynamoDB) and relational databases (Oracle/PostgreSQL) , including advanced Oracle SQL Proven experience with Oracle Cloud Infrastructure (OCI) services Expertise in data formats and schema design including Parquet, AVRO, JSON, XML and CSV Solid experience building ETL pipelines using AWS Glue or similar tools Experience with Docker and containerisation (Kubernetes/OpenShift advantageous) Strong scripting skills (Bash, PowerShell ) and familiarity with Linux/Unix environments Hands-on experience with data quality frameworks and validation techniques Familiarity with DevOps practices , including Terraform/CloudFormation, CI/CD pipelines, Git, and Jenkins Role & Responsibilities Design, build, and maintain scalable data pipelines and ETL workflows Develop Python and PySpark applications for data transformation at scale Implement and optimise data lakes and data warehouses on cloud platforms Ensure data quality, integrity, and consistency through testing and validation Translate business requirements into technical data models and specifications Review and propose solution architectures and design alternatives Manage and support cloud infrastructure and CI/CD pipelines Produce and maintain technical documentation, runbooks, and artefacts Support production environments , including monitoring, troubleshooting, and incident management Collaborate with BI teams to optimise data for tools such as Tableau and Business Objects Mentor and support team members, contributing to knowledge sharing and capability building Actively participate in Agile ceremonies and continuous improvement initiatives Must Haves (Non-Negotiable) Minimum 35 years experience as a Data Engineer Proven expertise in Python, PySpark, and cloud-based data engineering (AWS and/or OCI) Strong experience in ETL development and data pipeline design Solid understanding of data modelling and schema design (non low-code approaches) Hands-on experience with both relational and NoSQL databases Experience with cloud infrastructure, CI/CD pipelines, and DevOps practices Ability to work in cross-functional teams and deliver in Agile environments Advantageous Skills Experience with Kafka, AWS Kinesis, or streaming data platforms Knowledge of AWS Redshift, EMR , and other analytical/warehouse solutions Familiarity with enterprise cloud data frameworks (e.g., BMW Cloud Data Hub or similar) Experience with Java/JEE and application servers Exposure to monitoring tools such as CloudWatch and Grafana AWS certifications (e.g., AWS Certified Cloud Practitioner ) Experience building and integrating REST APIs Experience with MongoDB or other NoSQL technologies Understanding of BI schema design and reporting optimisation Qualifications Relevant degree in IT, Computer Science, Engineering , or equivalent practical experience 35 years of hands-on experience in data engineering roles Cloud certifications such as AWS or Oracle Cloud certifications are highly desirable Advanced degrees or specialised data engineering certifications are advantageous
Data Engineer (Senior) 1053
OPEN SOURCE (PTY) LTD
menlyn, menlyn
Published 1 days ago
Report job