What you will be doing: Design, build, and maintain scalable data pipelines and lakehouse structures Deliver data solutions supporting analytics, BI, machine learning, and Generative AI applications Apply enterprise data lake and lakehouse principles to ensure solutions are reliable, secure, governed, and fit for downstream consumption Translate business and analytical requirements into production-ready data solutions Build and operate solutions using Databricks, including Delta Lake, Databricks Jobs & Workflows, Unity Catalog, Databricks Bundles, notebooks, and shared libraries Enable data consumption for GenAI use cases such as RAG, AI services, and agent workflows Support analytics platforms, reporting tools, and downstream operational systems Build data pipelines for Generative AI applications, including curated knowledge datasets, structured and semi-structured data, metadata, and lineage management Enable GenAI data patterns including Retrieval Augmented Generation (RAG), prompt/context preparation, and AI model input/output flows Work closely with AI Engineers and Product Owners to align engineering deliverables to AI and GenAI use cases Develop production-grade pipelines using Python, PySpark, SQL, and Apache Spark Implement automated testing and CI/CD practices for data engineering workloads Ensure data solutions are observable, resilient, performant, and cost-efficient Support operational stability, incident resolution, and root cause analysis Collaborate within Agile, cross-functional product squads alongside AI/ML engineers, analytics teams, platform teams, and security stakeholders Contribute to engineering reviews, standards, and design discussions Maintain documentation, operational runbooks, and governance compliance What we are looking for: Relevant Degree or Diploma in Computer Science, Information Technology, Data Engineering, or related field 6+ years experience as a Senior / Lead Data Engineer 2+ years hands-on experience working in Databricks environments Strong understanding of enterprise data lake and lakehouse architecture Strong proficiency in Python, SQL, and Apache Spark Experience building and operating production-grade data platforms Experience working in enterprise or regulated environments Strong understanding of data governance, security, and operational best practices Experience working in Agile, product-aligned squads Strong analytical and problem-solving skills Excellent collaboration and communication skills Advantageous: Experience supporting AI, ML, or Generative AI workloads from a data engineering perspective Familiarity with RAG data patterns and AI-serving datasets Exposure to vector or embedding-ready data workflows Cloud-native data platform experience (AWS or Azure) Experience supporting analytics and AI operational workloads at scale Please note if you do not hear from us within 3 weeks, please consider your application unsuccessful.Follow for the Latest Vacancies Join Psybergate Careers Channel here:
Senior Data Engineer
PSYBERGATE (PTY) LTD
johannesburg, johannesburg
Published 22 days ago
Report job