RQ08100 Software Developer - ETL

Rubicon Path

Toronto, Ontario, Canada

Temps plein

GeneralResponsibilities

This role isresponsible for designing developing maintaining and optimizing ETL(Extract Transform Load) processes in Databricks for datawarehousing data lakes and analytics.

The developer will workclosely with data architects and business teams to ensure theefficient transformation and movement of data to meet businessneeds including handling Change Data Capture (CDC) and streamingdata.

Tools usedare :

Azure Databricks Delta Lake Delta LiveTables and Spark to process structured and unstructureddata.

Azure Databricks / PySpark (goodPython / PySpark knowledge required) to build transformations of rawdata into curated zone in the data lake.

AzureDatabricks / PySpark / SQL (good SQL knowledge required) to developand / or troubleshoot transformations of curated data intoFHIR.

Datadesign

o Understand therequirements. Recommend changes to models to support ETLdesign.

o Define primary keys indexingstrategies and relationships that enhance data integrity andperformance across layers.

o Define the initialschemas for each data layer

o Assist with datamodelling and updates of sourcetotarget mappingdocumentation

o Document and implement schemavalidation rules to ensure incoming data conforms to expectedformats and standards

o Design data qualitychecks within the pipeline to catch inconsistencies missing valuesor errors early in the process.

o Proactivelycommunicate with business and IT experts on any changes required toconceptual logical and physical models communicate and reviewtimelines dependencies and risks.

Development of ETL strategyand solution for different sets ofdata modules

o Understand theTables and Relationships in the data model.

oCreate low level design documents and test cases for ETLdevelopment.

o Implement errorcatching loggingretry mechanisms and handling data anomalies.

oCreate the workflows and pipeline design

Development and testing of data pipelineswith Incremental and Full Load.

oDevelop high quality ETL mappings / scripts / notebooks

o Develop and maintain pipeline from Oracle data source toAzure Delta Lakes and FHIR

o Perform unittesting

o Ensure performance monitoring andimprovement

Performance reviewdata consistency checks

oTroubleshoot performance issues ETL issues log activity for eachpipeline and transformation.

o Review andoptimize overall ETL performance.

Endtoend integrated testing for Full Loadand Incremental Load

Plan for Go Live ProductionDeployment.

o Create productiondeployment steps.

o Configure parameters scriptsfor go live. Test and review the instructions.

oCreate release documents and help build and deploy code acrossservers.

Go Live Support andReview after Go Live.

o Reviewexisting ETL process tools and provide recommendation on improvingperformance and reduce ETL timelines.

o Reviewinfrastructure and remediate issues for overall processimprovement

Knowledge Transfer toMinistry staff development of documentation on the workcompleted.

o Document work andshare the ETL endtoend design troubleshooting steps configurationand scripts review.

o Transferdocuments scripts and review of documents toMinistry.

SkillsExperience and Skill SetRequirements

Please note this role is part of aHybrid Work Arrangement and resource(s) will be required to work ata minimum of 3 days per week at the Office Location.

Must Have Skills

7 years using ETL tools such as Microsoft SSISstored procedures TSQL
2 Delta Lake Databricksand Azure Databricks pipelines
Strong knowledgeof Delta Lake for data management andoptimization.
Familiarity with DatabricksWorkflows for scheduling and orchestratingtasks.
2 years Python andPySpark
Solid understanding of the MedallionArchitecture (Bronze Silver Gold) and experience implementing it inproduction environments.
Handson experiencewith CDC tools (e.g. GoldenGate) for managing realtimedata.
SQL Server Oracle

Experience :

Experience of 7 years of working with SQLServer TSQL Oracle PL / SQL development or similar relationaldatabases
Experience of 2 years of working withAzure Data Factory Databricks and Pythondevelopment
Experience building data ingestionand change data capture using Oracle Golden Gate
Experience in designing developing andimplementing ETL pipelines using Databricks and related tools toingest transform and store largescaledatasets
Experience in leveraging DatabricksDelta Lake Delta Live Tables and Spark to process structured andunstructured data.
Experience working withbuilding databases data warehouses and working with delta and fullloads
Experience on Data modeling and toolse.g. SAP Power Designer Visio orsimilar
Experience working with SQL Server SSISor other ETL tools solid knowledge and experience with SQLscripting
Experience developing in an Agileenvironment
Understanding data warehousearchitecture with a delta lake
Ability toanalyze design develop test and document ETL pipelines fromdetailed and highlevel specifications and assist introubleshooting.
Ability to utilize SQL toperform DDL tasks and complex queries
Goodknowledge of database performance optimizationtechniques
Ability to assist in therequirements analysis and subsequentdevelopments
Ability to conduct unit testingand assist in test preparations to ensure dataintegrity
Work closely with Designers BusinessAnalysts and other Developers
Liaise withProject Managers Quality Assurance Analysts and BusinessIntelligence Consultants
Design and implementtechnical enhancements of Data Warehouse asrequired.

Development Database andETL experience (60 points)

Experience in developingand managing ETL pipelines jobs and workflows inDatabricks.
Deep understanding of Delta Lakefor building data lakes and managing ACID transactions schemaevolution and data versioning.
Experienceautomating ETL pipelines using Delta Live Tables including handlingChange Data Capture (CDC) for incremental dataloads.
Proficient in structuring data pipelineswith the Medallion Architecture to scale data pipelines and ensuredata quality.
Handson experience developingstreaming tables in Databricks using Structured Streaming andreadStream to handle realtime data.
Expertisein integrating CDC tools like GoldenGate or Debezium for processingincremental updates and managing realtime dataingestion.
Experience using Unity Catalog tomanage data governance access control and ensurecompliance.
Skilled in managing clusters jobsautoscaling monitoring and performance optimization in Databricksenvironments.
Knowledge of using DatabricksAutoloader for efficient batch and realtime dataingestion.
Experience with data governance bestpractices including implementing security policies access controland auditing with Unity Catalog.
Proficient increating and managing Databricks Workflows to orchestrate jobdependencies and schedule tasks.
Strongknowledge of Python PySpark and SQL for data manipulation andtransformation.
Experience integratingDatabricks with cloud storage solutions such as Azure Blob StorageAWS S3 or Google Cloud Storage.
Familiaritywith external orchestration tools like Azure DataFactory
Implementing logical and physical datamodels
Knowledge of FHIR is anasset

Design Documentation and Analysis Skills (20points)

Demonstratedexperience in creating design documentation suchas :
Schema definitions
Errorhandling and logging
ETL ProcessDocumentation
Job Scheduling and DependencyManagement
Data Quality and ValidationChecks
Performance Optimization and ScalabilityPlans
TroubleshootingGuides
DataLineage
Security and Access Control Policiesapplied withinETL
Experience in FitGapanalysis system use case reviews requirements reviews codingexercises and reviews.
Participate in defectfixing testing support and development activities forETL
Analyze and document solution complexityand interdependencies including providing support for datavalidation.
Strong analytical skills fortroubleshooting problemsolving and ensuring dataquality.

Certifications (10points)

Certified in one or moreof the following certifications :

Databricks Certified Data EngineerAssociate
Databricks Certified ProfessionalData Engineer
Microsoft Certified : Azure DataEngineer Associate
AWS Certified Data AnalyticsSpecialty
Google Cloud Professional DataEngineer

Communication Leadership Skills andKnowledge Transfer (10 points)

Ability to collaborateeffectively with crossfunctional teams and communicate complextechnical concepts to nontechnicalstakeholders.
Strong problemsolving skills andexperience working in an Agile or Scrumenvironment.
Ability to provide technicalguidance and support to other team members on Databricks bestpractices.
Must have previous work experiencein conducting Knowledge Transfer sessions ensuring the resourceswill receive the required knowledge to support thesystem.
Must develop documentation andmaterials as part of a review and knowledge transfer to othermembers.

Must Have Skills

7 years using ETL tools such as Microsoft SSISstored procedures TSQL
2 Delta Lake Databricksand Azure Databricks pipelines
Strong knowledgeof Delta Lake for data management andoptimization.
Familiarity with DatabricksWorkflows for scheduling and orchestratingtasks.
2 years Python andPySpark
Solid understanding of the MedallionArchitecture (Bronze Silver Gold) and experience implementing it inproduction environments.
Handson experiencewith CDC tools (e.g. GoldenGate) for managing realtimedata.
SQL ServerOracle

Il y a 1 jour

Emplois reliés

RQ08100 - Software Developer - ETL - Senior

freelance.ca

Toronto, Ontario

Job Opportunity: Senior Software Developer - ETL (RQ08100). ETL tools (Microsoft SSIS, T-SQL, Oracle). Hands-on with SQL Server, data modeling, and ETL pipeline optimization. Design, develop, optimize, and maintain ETL processes in Databricks for data lakes and analytics. ...

RQ08100 - Software Developer - ETL - Senior

Maarut Inc

Toronto, Ontario

Thisrole is responsible for designing developing maintaining andoptimizing ETL (Extract Transform Load) processes in Databricks fordata warehousing data lakes and analytics. Thedeveloper will work closely with data architects and business teamsto ensure the efficient transformation and movement of d...

RQ08100: Software Developer - ETL

Rubicon Path

Toronto, Ontario

Review existing ETL process, tools and provide recommendation on improving performance and reduce ETL timelines. This role is responsible for designing, developing, maintaining, and optimizing ETL (Extract, Transform, Load) processes in Databricks for data warehousing, data lakes, and analytics. The...

RQ08100 Software Developer - ETL

Rubicon Path

Toronto, Ontario

This role isresponsible for designing developing maintaining and optimizing ETL(Extract Transform Load) processes in Databricks for datawarehousing data lakes and analytics. The developer will workclosely with data architects and business teams to ensure theefficient transformation and movement of d...

Software Developer - ETL - Senior (ETL,Azure)

freelance.ca

Toronto, Ontario

Software Developer - ETL - Senior. Position: Software Developer - ETL - Senior . ETL role; strong understanding of ETL principles, including data extraction, transformation, and loading processes; knowledge of common ETL design patterns. Demonstrated experience in integrating various data sources an...

RQ07961 OPGT MOD - One 1 Software Developer ETL - Senior

Randstad Canada

Toronto, Ontario

Are you a skilled Senior ETL Software Developer looking to make an impact? We are seeking a talented professional to join our integrations team, where you will design, develop, and optimize ETL processes for integrating various systems. The Office of the Public Guardian and Trustee (OPGT) requires a...

Software Developer ETL

Wirehead

Toronto, Ontario

ETL tools such as Microsoft SSIS, stored procedures, T-SQL. Development, Database and ETL experience:. Experience in developing and managing ETL pipelines, jobs, and workflows in Databricks. Experience automating ETL pipelines using Delta Live Tables, including handling Change Data Capture (CDC) for...

RQ08079 - Software Developer - ETL - Senior

Maarut Inc

Toronto, Ontario

Experience with SSIS, SSRS,PowerBI Data Migration, ETL: Demonstrated experience with ETLdevelopment, data pipelines, workflow orchestration and dataingestion, transformation, and movement Demonstrated experience inintegrating various data sources and systems, both on-premises andin the cloud, using ...