GeneralResponsibilities
This role isresponsible for designing developing maintaining and optimizing ETL(Extract Transform Load) processes in Databricks for datawarehousing data lakes and analytics.
The developer will workclosely with data architects and business teams to ensure theefficient transformation and movement of data to meet businessneeds including handling Change Data Capture (CDC) and streamingdata.
Tools usedare :
Azure Databricks Delta Lake Delta LiveTables and Spark to process structured and unstructureddata.
Azure Databricks / PySpark (goodPython / PySpark knowledge required) to build transformations of rawdata into curated zone in the data lake.
AzureDatabricks / PySpark / SQL (good SQL knowledge required) to developand / or troubleshoot transformations of curated data intoFHIR.
Datadesign
o Understand therequirements. Recommend changes to models to support ETLdesign.
o Define primary keys indexingstrategies and relationships that enhance data integrity andperformance across layers.
o Define the initialschemas for each data layer
o Assist with datamodelling and updates of sourcetotarget mappingdocumentation
o Document and implement schemavalidation rules to ensure incoming data conforms to expectedformats and standards
o Design data qualitychecks within the pipeline to catch inconsistencies missing valuesor errors early in the process.
o Proactivelycommunicate with business and IT experts on any changes required toconceptual logical and physical models communicate and reviewtimelines dependencies and risks.
Development of ETL strategyand solution for different sets ofdata modules
o Understand theTables and Relationships in the data model.
oCreate low level design documents and test cases for ETLdevelopment.
o Implement errorcatching loggingretry mechanisms and handling data anomalies.
oCreate the workflows and pipeline design
Development and testing of data pipelineswith Incremental and Full Load.
oDevelop high quality ETL mappings / scripts / notebooks
o Develop and maintain pipeline from Oracle data source toAzure Delta Lakes and FHIR
o Perform unittesting
o Ensure performance monitoring andimprovement
Performance reviewdata consistency checks
oTroubleshoot performance issues ETL issues log activity for eachpipeline and transformation.
o Review andoptimize overall ETL performance.
Endtoend integrated testing for Full Loadand Incremental Load
Plan for Go Live ProductionDeployment.
o Create productiondeployment steps.
o Configure parameters scriptsfor go live. Test and review the instructions.
oCreate release documents and help build and deploy code acrossservers.
Go Live Support andReview after Go Live.
o Reviewexisting ETL process tools and provide recommendation on improvingperformance and reduce ETL timelines.
o Reviewinfrastructure and remediate issues for overall processimprovement
Knowledge Transfer toMinistry staff development of documentation on the workcompleted.
o Document work andshare the ETL endtoend design troubleshooting steps configurationand scripts review.
o Transferdocuments scripts and review of documents toMinistry.
SkillsExperience and Skill SetRequirements
Please note this role is part of aHybrid Work Arrangement and resource(s) will be required to work ata minimum of 3 days per week at the Office Location.
Must Have Skills
- 7 years using ETL tools such as Microsoft SSISstored procedures TSQL
- 2 Delta Lake Databricksand Azure Databricks pipelines
- Strong knowledgeof Delta Lake for data management andoptimization.
- Familiarity with DatabricksWorkflows for scheduling and orchestratingtasks.
- 2 years Python andPySpark
- Solid understanding of the MedallionArchitecture (Bronze Silver Gold) and experience implementing it inproduction environments.
- Handson experiencewith CDC tools (e.g. GoldenGate) for managing realtimedata.
- SQL Server Oracle
Experience :
- Experience of 7 years of working with SQLServer TSQL Oracle PL / SQL development or similar relationaldatabases
- Experience of 2 years of working withAzure Data Factory Databricks and Pythondevelopment
- Experience building data ingestionand change data capture using Oracle Golden Gate
- Experience in designing developing andimplementing ETL pipelines using Databricks and related tools toingest transform and store largescaledatasets
- Experience in leveraging DatabricksDelta Lake Delta Live Tables and Spark to process structured andunstructured data.
- Experience working withbuilding databases data warehouses and working with delta and fullloads
- Experience on Data modeling and toolse.g. SAP Power Designer Visio orsimilar
- Experience working with SQL Server SSISor other ETL tools solid knowledge and experience with SQLscripting
- Experience developing in an Agileenvironment
- Understanding data warehousearchitecture with a delta lake
- Ability toanalyze design develop test and document ETL pipelines fromdetailed and highlevel specifications and assist introubleshooting.
- Ability to utilize SQL toperform DDL tasks and complex queries
- Goodknowledge of database performance optimizationtechniques
- Ability to assist in therequirements analysis and subsequentdevelopments
- Ability to conduct unit testingand assist in test preparations to ensure dataintegrity
- Work closely with Designers BusinessAnalysts and other Developers
- Liaise withProject Managers Quality Assurance Analysts and BusinessIntelligence Consultants
- Design and implementtechnical enhancements of Data Warehouse asrequired.
Development Database andETL experience (60 points)
- Experience in developingand managing ETL pipelines jobs and workflows inDatabricks.
- Deep understanding of Delta Lakefor building data lakes and managing ACID transactions schemaevolution and data versioning.
- Experienceautomating ETL pipelines using Delta Live Tables including handlingChange Data Capture (CDC) for incremental dataloads.
- Proficient in structuring data pipelineswith the Medallion Architecture to scale data pipelines and ensuredata quality.
- Handson experience developingstreaming tables in Databricks using Structured Streaming andreadStream to handle realtime data.
- Expertisein integrating CDC tools like GoldenGate or Debezium for processingincremental updates and managing realtime dataingestion.
- Experience using Unity Catalog tomanage data governance access control and ensurecompliance.
- Skilled in managing clusters jobsautoscaling monitoring and performance optimization in Databricksenvironments.
- Knowledge of using DatabricksAutoloader for efficient batch and realtime dataingestion.
- Experience with data governance bestpractices including implementing security policies access controland auditing with Unity Catalog.
- Proficient increating and managing Databricks Workflows to orchestrate jobdependencies and schedule tasks.
- Strongknowledge of Python PySpark and SQL for data manipulation andtransformation.
- Experience integratingDatabricks with cloud storage solutions such as Azure Blob StorageAWS S3 or Google Cloud Storage.
- Familiaritywith external orchestration tools like Azure DataFactory
- Implementing logical and physical datamodels
- Knowledge of FHIR is anasset
Design Documentation and Analysis Skills (20points)
- Demonstratedexperience in creating design documentation suchas :
- Schema definitions
- Errorhandling and logging
- ETL ProcessDocumentation
- Job Scheduling and DependencyManagement
- Data Quality and ValidationChecks
- Performance Optimization and ScalabilityPlans
- TroubleshootingGuides
- DataLineage
- Security and Access Control Policiesapplied withinETL
- Experience in FitGapanalysis system use case reviews requirements reviews codingexercises and reviews.
- Participate in defectfixing testing support and development activities forETL
- Analyze and document solution complexityand interdependencies including providing support for datavalidation.
- Strong analytical skills fortroubleshooting problemsolving and ensuring dataquality.
Certifications (10points)
Certified in one or moreof the following certifications :
- Databricks Certified Data EngineerAssociate
- Databricks Certified ProfessionalData Engineer
- Microsoft Certified : Azure DataEngineer Associate
- AWS Certified Data AnalyticsSpecialty
- Google Cloud Professional DataEngineer
Communication Leadership Skills andKnowledge Transfer (10 points)
- Ability to collaborateeffectively with crossfunctional teams and communicate complextechnical concepts to nontechnicalstakeholders.
- Strong problemsolving skills andexperience working in an Agile or Scrumenvironment.
- Ability to provide technicalguidance and support to other team members on Databricks bestpractices.
- Must have previous work experiencein conducting Knowledge Transfer sessions ensuring the resourceswill receive the required knowledge to support thesystem.
- Must develop documentation andmaterials as part of a review and knowledge transfer to othermembers.
Must Have Skills
- 7 years using ETL tools such as Microsoft SSISstored procedures TSQL
- 2 Delta Lake Databricksand Azure Databricks pipelines
- Strong knowledgeof Delta Lake for data management andoptimization.
- Familiarity with DatabricksWorkflows for scheduling and orchestratingtasks.
- 2 years Python andPySpark
- Solid understanding of the MedallionArchitecture (Bronze Silver Gold) and experience implementing it inproduction environments.
- Handson experiencewith CDC tools (e.g. GoldenGate) for managing realtimedata.
- SQL ServerOracle