
Explore how data flows from transactional systems to analytical processing using OLTP and OLAP, and learn how Azure Data Factory moves data into a data lake.
Download and install the Azure Storage Explorer, then sign in and connect your storage accounts. Navigate blob containers to upload, download, copy, clone, and delete files.
Create your first Azure Data Factory pipeline to ingest web store online sales data from Azure Blob Storage into the data lake using JSON datasets and copy activities.
Explore the advantages and disadvantages of Azure Data Factory for data ingestion, including ETL and ELT approaches, CI/CD integration, and a wide connector ecosystem.
Explore metadata driven ingestion in Azure Data Factory to process multiple files with one pipeline, reducing maintenance. A metadata database guides the orchestration and dynamic ingestion into the data lake.
Submit a metadata-driven one-off batch ingestion, initialize the batch, and run ingestion from Azure Blob storage into the Data Engineering Data Lake using an Azure Data Factory pipeline.
Create an Azure SQL database to store metadata for batch pipeline management, configure a basic 2 gigabyte database in the data engineering resource group, and enable access via firewall rules.
Explore how the email recipients table links to the system info table using a system id foreign key, enabling targeted notifications by system and dataset.
Enable dynamic subject and message parameters in the email notification pipeline, and use a switch to send status-based notifications (success, failed, default).
Make logging dynamic with a standalone utility pipeline to record both success and failure metadata in the pipeline log table, using parameters like snapshot date, status, and run.
Learn to extend the azure data factory ingestion pipeline by logging pipeline metadata on success or failure, using variables and time zone aware timestamp formatting.
Make datasets metadata driven by replacing hardcoded json sources with runtime parameters for container, directory, and file name, and use cloning to read from folders with dynamic content.
Observe source-to-target details and upload missing files to azure blob storage. Clone and version the ingestion pipeline, then configure a metadata-driven lookup with a stored procedure for the copy activity.
Explore the pipeline log and tables to track run IDs and batches, measure latency and duration, and audit data ingestion into the data lake via the source to target view.
Set up an Azure DevOps Git repository to store code and metadata SQL scripts for your metadata-driven pipelines, with a readme, folders, and commits to the main branch.
Configure reading from azure blob storage by creating the finance container, uploading finance csv and accounting json files, and building a data factory dataset to list container contents as metadata.
Plan a data factory pipeline that reads from Azure Blob storage using source container and file type parameters, filters by metadata, ingests CSV files into the data lake.
Create an event based trigger in Azure Data Factory by selecting storage events, monitoring the finance container for blob created events, and attaching it to a pipeline.
Publish code to the main branch by creating a pull request, merge changes, publish the pipeline, and start the trigger, then monitor the event grid for at least ten minutes.
Verify event driven ingestion by triggering pipelines in Azure Data Factory as you upload files to the storage account, and confirm the raw finance data lands in the data lake.
Create an Azure DevOps organization, set up a project named daybreak with a git repository to host code and plan work, and create a pdb-provision-resource-groups repository for infrastructure as code.
Install the azure cli across platforms, learn to run commands in windows powershell, cmd, or git bash, and verify installation by checking the version and signing into your azure tenant.
Merge the pr test branch into master to demonstrate branch isolation, create and commit a new file, and verify the merge using git commands and Visual Studio Code.
Download a 64-bit Windows self-hosted runtime agent and configure it in the default Azure DevOps pool using PowerShell and the config.cmd, with a personal access token.
Add a final task to an Azure DevOps YAML pipeline to run a PowerShell script that provisions an Azure resource group, configuring the Azure PowerShell task and script path.
Learn to build a reusable log analytics bicep module as infrastructure as code and modify the main bicep file to deploy log analytics across multiple resource groups using modular programming.
Define a log analytics bicep module template by specifying required and optional inputs, including name, location, tags, sku, log retention, and public network access for ingestion and query.
Declare and annotate input parameters in the Log Analytics Bicep template, including tags, location, workspace name, retention in days, skew, and network access for ingestion and query.
Invoke the log analytics bicep module template using a module declaration command with a unique deployment name and parameters including log analytics workspace name, retention days, and public network access.
Build a PowerShell script to compile Bicep to ARM JSON, with parameters for file path, Bicep template, and output file, plus try-catch error handling and debugging output.
The main objective of this course is to help you to learn Data Engineering techniques of building Metadata-Driven frameworks with Azure Data Engineering tools such as Data Factory, Azure SQL, and others.
Building Frameworks are now an industry norm and it has become an important skill to know how to visualize, design, plan and implement data frameworks.
The framework that we are going to build together is referred to as the Metadata-Driven Ingestion Framework.
Data ingestion into the data lake from the disparate source systems is a key requirement for a company that aspires to be data-driven, and finding a common way to ingest the data is a desirable and necessary requirement.
Metadata-Driven Frameworks allow a company to develop the system just once and it can be adopted and reused by various business clusters without the need for additional development, thus saving the business time and costs. Think of it as a plug-and-play system.
The first objective of the course is to onboard you onto the Azure Data Factory platform to help you assemble your first Azure Data Factory Pipeline. Once you get a good grip of the Azure Data Factory development pattern, then it becomes easier to adopt the same pattern to onboard other sources and data sinks.
Once you are comfortable with building a basic azure data factory pipeline, as a second objective we then move on to building a fully-fledged and working metadata-driven framework to make the ingestion more dynamic, and furthermore, we will build the framework in such a way that you can audit every batch orchestration and individual pipeline runs for business intelligence and operational monitoring.
Creating your first Pipeline
What will be covered is as follows;
1. Introduction to Azure Data Factory
2. Unpack the requirements and technical architecture
3. Create an Azure Data Factory Resource
4. Create an Azure Blob Storage account
5. Create an Azure Data Lake Gen 2 Storage account
6. Learn how to use the Storage Explorer
7. Create Your First Azure Pipeline.
Metadata Driven Ingestion
1. Unpack the theory on Metadata Driven Ingestion
2. Describing the High-Level Plan for building the User
3. Creation of a dedicated Active Directory User and assigning appropriate permissions
4. Using Azure Data Studio
5. Creation of the Metadata Driven Database (Tables and T-SQL Stored Procedure)
6. Applying business naming conventions
7. Creating an email notifications strategy
8. Creation of Reusable utility pipelines
9. Develop a mechanism to log data for every data ingestion pipeline run and also the batch itself
10. Creation of a dynamic data ingestion pipeline
11. Apply the orchestration pipeline
12. Explanation of T-SQL Stored Procedures for the Ingestion Engine
13. Creating an Azure DevOps Repository for the Data Factory Pipelines
Event-Driven Ingestion
1. Enabling the Event Grid Provider
2. Use the Getmetadata Activity
3. Use the Filter Activity
4. Create Event-Based Triggers
5. Create and Merge new DevOps Branches
Bonus Course: Provision Infra with Azure BICEP
The goal of this course is to help students learn how to professionally write and develop Azure DevOps Infrastructure as Code with BICEP, YAML, Git, and PowerShell.
Azure DevOps is a leading automation and DevOps platform and the students will be taken through the following;
An in-depth introduction to Infrastructure as Code with the Azure DevOps platform
A definition of DevOps and how Azure as a SaaS (Software as a Service) platform facilitates the practice of the DevOps methodology
An Introduction to YAML pipelines on the Azure DevOps platform
An Introduction to BICEP and ARM templates for developing Infrastructure as Code (IaC) on the Azure DevOps Platform
An overview of Industry leading DevOps tools
Git is an industry-leading distributed version control system and is a very critical component of Azure DevOps therefore students will be taken through a Git Crash Course that covers the following basic aspects;
The creation of a local Git Repository
Learn how to stage and commit single and multiple files
Branching management with Git including Merging
Git with Bash and Visual Studio Code
Learn how to time travel and undo changes
Students may find it necessary to learn about how to set up Azure DevOps Pipeline Agents as Self-Hosted Azure DevOps agents for running CI / CD pipelines, perhaps the situation could be cost saving in a work environment or a cost-effective personal environment, and therefore the students will learn the following;
Set up Billing for Microsoft and Self Hosted pipeline agents
Installation and Set Up for a Self Hosted pipeline agents
Setting up of a Personal Access Token
Configuration of a Self-Hosted Agent
YAML is a leading configuration management technology for developing CI / CD pipelines, perhaps the best way to learn how to write YAML pipelines is for the student to be taken through how to provision infrastructure with YAML, Powershell, and BICEP. The initial focus will be the provisioning of the resource group and there and therefore the students will learn the following;
How to Create an Azure Service Connection
Cloning an Azure DevOps Repository
Writing PowerShell Script to Provision a Resource Group
How to Add Stages, Jobs, and Steps in a YAML pipeline template
Running the YAML pipeline on Azure DevOps
How to develop Azure Variables Group and pass them into YAML templates
How to override BICEP parameters using YAML
One aspect of professionalism in coding is how projects are structured for coding efficiency and ease of management, the other aspect in the naming convention of resources. The course will take through students on the following.
Creating Project Structures for a DevOps and BICEP project using Bash and Git
Establish a standard naming convention for resources using BICEP and PowerShell
The heart of provisioning and deploying infrastructure in Azure is the adoption of BICEP, and students will learn the following in terms of developing BICEP in a professional manner;
Development of a BICEP template to provision Log Analytics and Data Factory
How to add Input Parameters to a BICEP template
How to create BICEP Modules for Log Analytics and Data Factory
How to add Tagging Information to BICEP modules
How to structure a naming convention with BICEP
How to use run time and compile time variables and parameters
How to write a PowerShell Script to Transpile BICEP to an ARM template
How to Manage Dependencies between Resources with BICEP
How to manage BICEP template errors