Job Overview
- Lead the architecture, design, and oversee implementation of modular and scalable data ELT / ETL pipelines and data infrastructure leveraging the wide range of data sources across the organization.
- Design curated common data models that offer an integrated, business-centric single source of truth for business intelligence, reporting, and downstream system use.
- Work closely with infrastructure and cyber teams to ensure data is secure in transit and at rest.
- Create, guide, and enforce code templates for delivery of data pipelines and transformations for structured, semi-structured and unstructured data sets.
- Develop modeling guidelines that ensure model extensibility and reuse by employing industry standard disciplines for building facts, dimensions, bridge, aggregates, slowly changing dimensions, and other dimensional and fact optimizations.
- Establish standards database system fields, including primary and natural key combinations that optimize join performance in a multi-domain, multiple subject area physical (structured zone) and semantic model (curated zone)
- Ensure model extensibility by employing industry standard disciplines for building facts, dimensions, bridge, aggregates, slowly changing dimensions, and other dimensional and fact optimizations.
- Transform data and map to more valuable and understandable semantic layer sets for consumption, transitioning from system centric language to business-centric language.
- Collaborate with business analysts, data scientists, data engineers, data analysts and solution architects to develop data pipelines to feed our data marketplace.
- Introduce new technologies to the environment through research and POCs and prepare POC code designs that can be implemented and productionized by developers.
- Work with tools in the Microsoft Stack; Azure Data Factory, Azure Data Lake, Azure SQL Databases, Azure Data Warehouse, Azure Synapse Analytics Services, Azure Databricks, Microsoft Purview, and Power Bl
- Work within the agile SCRUM work management framework in delivery of products and services, including contributing to feature & user story backlog item development, and utilizing related Kanban / SCRUM toolsets.
- Document as-built architecture and designs within the product description.
- Design data solutions that enable batch, near-real-time, event-driven, and / or streaming approaches depending on business requirements
- Design & advise on orchestration of data pipeline execution to ensure data products meet customer latency expectations, dependencies are managed, and datasets are as up to date as possible, with minimal disruption to end-customer use.
- Ensure that designs are implemented with proper attention to data security, access management, and data cataloging requirements.
- Approve pull requests related to production deployments.
- Demonstrate solutions to business customers to ensure customer acceptance and solicit feedback to drive iterative improvements.
- Assist in troubleshooting issues for datasets produced by the team (Tier 3 support), on an as required basis.
- Guide data modelers, business analysts and data scientists in the build of models optimized for KPI delivery, actionable feedback / writeback to operational systems and enhancing the predictability of machine learning models and experiments.
Qualifications
- 4 years year University education in computer science, software engineering or other relevant programs within data engineering, data analysis, artificial intelligence, or machine learning.
- Minimum experience of +6 years and up to and including 8 years of experience is required in data modeling, data warehouse design, and data solution architecture in a Big Data environment is considered necessary.
- Experience guiding data lake ingestion and data modeling projects in a cloud environment, experience in modeling relational and in-memory models with star / snowflake schemas.
- Experience with designing and implementing event-driven (pub / sub), near-real-time, or streaming data solutions, involving structured, semi-structured and unstructured data across various platforms.
- Requires an extensive knowledge in designing a data model to solve a business problem, specifying a data pipeline design pattern to bring data into a data warehouse, optimizing data structures to achieve required performance, designing low-latency and / or event-driven patterns of data processing, creation of a common data model to support current and future business needs.
Il y a plus de 30 jours