Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Azure Data Factory for Beginners - Build Data Ingestion

Name: Azure Data Factory for Beginners - Build Data Ingestion
Rating: 4.6 (1781 reviews)

Learn Azure Data Factory by building a Metadata-driven Ingestion Framework as an industry standard

Created byDavid Charles Academy

Last updated 1/2023

English

What you'll learn

Azure Data Factory
Azure Blob Storage
Azure Gen 2 Data Lake Storage
Azure Data Factory Pipelines
Data Engineering Concepts
Data Lake Concepts
Metadata Driven Frameworks Concepts
Industry Example on How to build Ingestion Frameworks
Dynamic Azure Data Factory Pipelines
Email Notifications with Logic Apps
Tracking of Pipelines and Batch Runs
Version Management with Azure DevOps
An in-depth introduction to Infrastructure as Code with the Azure DevOps platform
A definition of DevOps and how Azure as a SaaS (Software as a Service) platform that facilitates the practice of the DevOps methodology
An Introduction to YAML pipelines on the Azure DevOps platform
An Introduction to BICEP and ARM templates for developing Infrastructure as Code (IaC) on the Azure DevOps Platform
An overview of Industry leading DevOps tools
The creation of a local Git Repository
Learn how to stage and commit single and multiple files
Branching management with Git including Merging
Git with Bash and Visual Studio Code
Learn how to time travel and undo changes
Set up Billing for Microsoft and Self Hosted pipeline agents
Installation and Set Up for a Self Hosted pipeline agents
Setting up of a Personal Access Token
Configuration of a Self-Hosted Agent
How to Create an Azure Service Connection
Cloning an Azure DevOps Repository
Writing PowerShell Script to Provision a Resource Group
How to Add Stages, Jobs and Steps in a YAML pipeline template
Running the YAML pipeline on Azure DevOps
How to develop Azure Variables Group and pass them into YAML templates
How to override BICEP parameters using YAML
Creating Project Structures for a DevOps and BICEP project using Bash and Git
Establish a standard naming convention for resources using BICEP and PowerShel
Development of a BICEP template to provision Log Analytics and Data Factory
How to add Input Parameters to a BICEP template
How to create BICEP Modules for Log Analytics and Data Factory
How to add Tagging Information to BICEP modules
How to structure a naming convention with BICEP
How to use run time and compile time variables and parameters
How to write a PowerShell Script to Transpile BICEP to an ARM template
How to Manage Dependencies between Resources with BICEP
How manage BICEP template errors

Course content

14 sections • 134 lectures • 12h 37m total length

Introduction to ADF5:05
Explore how data flows from transactional systems to analytical processing using OLTP and OLAP, and learn how Azure Data Factory moves data into a data lake.
Requirements Discussion and Technical Architecture2:09
Register a Free Azure Account4:14
Create A Data Factory Resource8:39
Create A Storage Account and Upload Data7:32
Create Data Lake Gen 2 Storage Account5:51
Download Storage Explorer4:28
Download and install the Azure Storage Explorer, then sign in and connect your storage accounts. Navigate blob containers to upload, download, copy, clone, and delete files.
Create Your First Azure Pipeline16:28
Create your first Azure Data Factory pipeline to ingest web store online sales data from Azure Blob Storage into the data lake using JSON datasets and copy activities.
Closing Remarks6:20
Explore the advantages and disadvantages of Azure Data Factory for data ingestion, including ETL and ELT approaches, CI/CD integration, and a wide connector ecosystem.

Introduction - Metadata Driven Ingestion3:18
Explore metadata driven ingestion in Azure Data Factory to process multiple files with one pipeline, reducing maintenance. A metadata database guides the orchestration and dynamic ingestion into the data lake.
High Level Plan6:55
Submit a metadata-driven one-off batch ingestion, initialize the batch, and run ingestion from Azure Blob storage into the Data Engineering Data Lake using an Azure Data Factory pipeline.
Create Active Directory User2:32
Assign the Contributor Role to the User3:35
Disable Security Defaults1:49
Creating the Metadata Database9:30
Create an Azure SQL database to store metadata for batch pipeline management, configure a basic 2 gigabyte database in the data engineering resource group, and enable access via firewall rules.
Install Azure Data Studio5:54
Create Metadata Tables and Stored Procedures8:14
Reconfigure Existing Data Factory Artifacts7:09
Set Up Logic App for Handling Email Notifications9:24
Modify the Data Factory Pipeline to Send an Email Notification10:16
Create Linked Service for Metadata Database and Email Dataset4:07
Create Utility Pipeline to Send Email Notifications14:43
Explaining the Email Recipients Table5:22
Explore how the email recipients table links to the system info table using a system id foreign key, enabling targeted notifications by system and dataset.
Explaining the Get Email Addresses Stored Procedure2:30
Modify Ingestion Pipeline to use the Email Utility Pipeline4:40
Tracking the Triggered Pipeline12:29
Making the Email Notifications Dynamic16:52
Enable dynamic subject and message parameters in the email notification pipeline, and use a switch to send status-based notifications (success, failed, default).
Making Logging of Pipeline Information Dynamic10:52
Make logging dynamic with a standalone utility pipeline to record both success and failure metadata in the pipeline log table, using parameters like snapshot date, status, and run.
Add a new way to log the main ingestion pipeline13:28
Learn to extend the azure data factory ingestion pipeline by logging pipeline metadata on success or failure, using variables and time zone aware timestamp formatting.
Change the Logging of Pipelines to send Fail message only8:06
Creating Dynamic Datasets11:25
Make datasets metadata driven by replacing hardcoded json sources with runtime parameters for container, directory, and file name, and use cloning to read from folders with dynamic content.
Reading from Source To Target Part 18:09
Observe source-to-target details and upload missing files to azure blob storage. Clone and version the ingestion pipeline, then configure a metadata-driven lookup with a stored procedure for the copy activity.
Reading from Source To Target Part 212:53
Explaining the Source To Target Stored Procedure4:48
Add Orchestration Pipeline Part 17:10
Add Orchestration Pipeline Part 29:02
Fixing the Duplicating Batch Ingestions8:19
Understanding the Pipeline Log and Related Tables9:33
Explore the pipeline log and tables to track run IDs and batches, measure latency and duration, and audit data ingestion into the data lake via the source to target view.
Understanding the GetBatch Stored Procedure4:59
Understanding the Set Batch Status and GetRunID3:33
Setting Up an Azure DevOps Git Repository6:39
Set up an Azure DevOps Git repository to store code and metadata SQL scripts for your metadata-driven pipelines, with a readme, folders, and commits to the main branch.
Publishing the Data Factory to Azure DevOps8:20
Closing Remarks2:42

Introduction3:05
Read From Azure Storage Plan1:20
Configure reading from azure blob storage by creating the finance container, uploading finance csv and accounting json files, and building a data factory dataset to list container contents as metadata.
Create Finance Container and Upload Files2:14
Create Source Dataset6:14
Write To Data Lake - Raw Plan2:33
Create Finance Container and Directories2:39
Create Sink Dataset4:35
Data Factory Pipeline Plan1:45
Plan a data factory pipeline that reads from Azure Blob storage using source container and file type parameters, filters by metadata, ingests CSV files into the data lake.
Create Data Factory and Read Metadata6:19
Add Filter By CSV5:03
Add Dataset to Read Files2:50
Add the For Each CSV File Activity and Test Ingestion8:37
Adding the Event Based Trigger Plan1:18
Enable the Event Grid Provider2:00
Delete File and Add Event Based Trigger1:47
Create Event Based Trigger4:05
Create an event based trigger in Azure Data Factory by selecting storage events, monitoring the finance container for blob created events, and attaching it to a pipeline.
Publish Code to Main Branch and Start Trigger3:09
Publish code to the main branch by creating a pull request, merge changes, publish the pipeline, and start the trigger, then monitor the event grid for at least ten minutes.
Trigger Event Based Ingestion3:26
Verify event driven ingestion by triggering pipelines in Azure Data Factory as you upload files to the storage account, and confirm the raw finance data lands in the data lake.
Closing Remarks2:14

Introduction14:27
Register a Free Account4:14
Create DevOps Organisation and Project7:15
Create an Azure DevOps organization, set up a project named daybreak with a git repository to host code and plan work, and create a pdb-provision-resource-groups repository for infrastructure as code.
Install Azure CLI5:51
Install the azure cli across platforms, learn to run commands in windows powershell, cmd, or git bash, and verify installation by checking the version and signing into your azure tenant.
Install BICEP1:45
Install Git Bash4:32
Install Visual Studio Code2:42
Install Visual Studio Code Extensions2:54
Set Git Bash as Default Terminal1:49

Introduction to Git6:24
Create Local Git Repository2:42
Staging and Committing one File6:55
Staging and Committing Multiple File6:06
Create a New Branch in Local Repo2:37
Modify and Commit Code in the New Branch2:50
Merging Branches4:23
Merge the pr test branch into master to demonstrate branch isolation, create and commit a new file, and verify the merge using git commands and Visual Studio Code.
Git with Visual Studio Code6:24
Viewing a State in the Past4:28
Undoing Changes7:45
Reset Changes5:57

Set up Billing and Microsoft Hosted Runtime Agent10:12
Install Dot Net SDK and Core Framework4:09
Install Powershell 72:35
Install Visual Studio 2022 Community Edition3:14
Set Up Personal Access Token2:53
Download and Configure Agent8:34
Download a 64-bit Windows self-hosted runtime agent and configure it in the default Azure DevOps pool using PowerShell and the config.cmd, with a personal access token.
Test Self Hosted Agent2:52

Create a Service Connection2:33
Clone Repository4:45
Create The Project Structure3:28
Create the Resource Group Powershell Script12:22
Add Stage and Job to Resource Group Yaml File6:28
Add Tasks to Resource Group YAML and Run Pipeline13:44
Add a final task to an Azure DevOps YAML pipeline to run a PowerShell script that provisions an Azure resource group, configuring the Azure PowerShell task and script path.
Validating the Provisioned Resource Group4:01

Introduction - Log Analytics Module6:14
Learn to build a reusable log analytics bicep module as infrastructure as code and modify the main bicep file to deploy log analytics across multiple resource groups using modular programming.
Log Analytics BICEP Module Template4:30
Define a log analytics bicep module template by specifying required and optional inputs, including name, location, tags, sku, log retention, and public network access for ingestion and query.
Add Input Parameters to Log Analytics Template9:08
Declare and annotate input parameters in the Log Analytics Bicep template, including tags, location, workspace name, retention in days, skew, and network access for ingestion and query.
Develop Log Analytics BICEP Module Template6:21
Plan for Developing the Main BICEP file for Log Analytics1:37
Add Tagging and Resource Naming Variables4:30
Add Parameter Variables for BICE Log Analytics Module5:43
Invoke the BICEP Log Analytics Module Template6:11
Invoke the log analytics bicep module template using a module declaration command with a unique deployment name and parameters including log analytics workspace name, retention days, and public network access.

Requirements

Basic PC / Laptop

Description

The main objective of this course is to help you to learn Data Engineering techniques of building Metadata-Driven frameworks with Azure Data Engineering tools such as Data Factory, Azure SQL, and others.

Building Frameworks are now an industry norm and it has become an important skill to know how to visualize, design, plan and implement data frameworks.

The framework that we are going to build together is referred to as the Metadata-Driven Ingestion Framework.

Data ingestion into the data lake from the disparate source systems is a key requirement for a company that aspires to be data-driven, and finding a common way to ingest the data is a desirable and necessary requirement.

Metadata-Driven Frameworks allow a company to develop the system just once and it can be adopted and reused by various business clusters without the need for additional development, thus saving the business time and costs. Think of it as a plug-and-play system.

The first objective of the course is to onboard you onto the Azure Data Factory platform to help you assemble your first Azure Data Factory Pipeline. Once you get a good grip of the Azure Data Factory development pattern, then it becomes easier to adopt the same pattern to onboard other sources and data sinks.

Once you are comfortable with building a basic azure data factory pipeline, as a second objective we then move on to building a fully-fledged and working metadata-driven framework to make the ingestion more dynamic, and furthermore, we will build the framework in such a way that you can audit every batch orchestration and individual pipeline runs for business intelligence and operational monitoring.

Creating your first Pipeline

What will be covered is as follows;

1. Introduction to Azure Data Factory

2. Unpack the requirements and technical architecture

3. Create an Azure Data Factory Resource

4. Create an Azure Blob Storage account

5. Create an Azure Data Lake Gen 2 Storage account

6. Learn how to use the Storage Explorer

7. Create Your First Azure Pipeline.

Metadata Driven Ingestion

1. Unpack the theory on Metadata Driven Ingestion

2. Describing the High-Level Plan for building the User

3. Creation of a dedicated Active Directory User and assigning appropriate permissions

4. Using Azure Data Studio

5. Creation of the Metadata Driven Database (Tables and T-SQL Stored Procedure)

6. Applying business naming conventions

7. Creating an email notifications strategy

8. Creation of Reusable utility pipelines

9. Develop a mechanism to log data for every data ingestion pipeline run and also the batch itself

10. Creation of a dynamic data ingestion pipeline

11. Apply the orchestration pipeline

12. Explanation of T-SQL Stored Procedures for the Ingestion Engine

13. Creating an Azure DevOps Repository for the Data Factory Pipelines

Event-Driven Ingestion

1. Enabling the Event Grid Provider

2. Use the Getmetadata Activity

3. Use the Filter Activity

4. Create Event-Based Triggers

5. Create and Merge new DevOps Branches

Bonus Course: Provision Infra with Azure BICEP

The goal of this course is to help students learn how to professionally write and develop Azure DevOps Infrastructure as Code with BICEP, YAML, Git, and PowerShell.

Azure DevOps is a leading automation and DevOps platform and the students will be taken through the following;

An in-depth introduction to Infrastructure as Code with the Azure DevOps platform
A definition of DevOps and how Azure as a SaaS (Software as a Service) platform facilitates the practice of the DevOps methodology
An Introduction to YAML pipelines on the Azure DevOps platform
An Introduction to BICEP and ARM templates for developing Infrastructure as Code (IaC) on the Azure DevOps Platform
An overview of Industry leading DevOps tools

Git is an industry-leading distributed version control system and is a very critical component of Azure DevOps therefore students will be taken through a Git Crash Course that covers the following basic aspects;

The creation of a local Git Repository
Learn how to stage and commit single and multiple files
Branching management with Git including Merging
Git with Bash and Visual Studio Code
Learn how to time travel and undo changes

Students may find it necessary to learn about how to set up Azure DevOps Pipeline Agents as Self-Hosted Azure DevOps agents for running CI / CD pipelines, perhaps the situation could be cost saving in a work environment or a cost-effective personal environment, and therefore the students will learn the following;

Set up Billing for Microsoft and Self Hosted pipeline agents
Installation and Set Up for a Self Hosted pipeline agents
Setting up of a Personal Access Token
Configuration of a Self-Hosted Agent

YAML is a leading configuration management technology for developing CI / CD pipelines, perhaps the best way to learn how to write YAML pipelines is for the student to be taken through how to provision infrastructure with YAML, Powershell, and BICEP. The initial focus will be the provisioning of the resource group and there and therefore the students will learn the following;

How to Create an Azure Service Connection
Cloning an Azure DevOps Repository
Writing PowerShell Script to Provision a Resource Group
How to Add Stages, Jobs, and Steps in a YAML pipeline template
Running the YAML pipeline on Azure DevOps
How to develop Azure Variables Group and pass them into YAML templates
How to override BICEP parameters using YAML

One aspect of professionalism in coding is how projects are structured for coding efficiency and ease of management, the other aspect in the naming convention of resources. The course will take through students on the following.

Creating Project Structures for a DevOps and BICEP project using Bash and Git
Establish a standard naming convention for resources using BICEP and PowerShell

The heart of provisioning and deploying infrastructure in Azure is the adoption of BICEP, and students will learn the following in terms of developing BICEP in a professional manner;

Development of a BICEP template to provision Log Analytics and Data Factory
How to add Input Parameters to a BICEP template
How to create BICEP Modules for Log Analytics and Data Factory
How to add Tagging Information to BICEP modules
How to structure a naming convention with BICEP
How to use run time and compile time variables and parameters
How to write a PowerShell Script to Transpile BICEP to an ARM template
How to Manage Dependencies between Resources with BICEP
How to manage BICEP template errors

Who this course is for:

Aspiring Data Engineers
Developers that are curious about Azure Data Factory as an ETL alternative

Azure Data Factory for Beginners - Build Data Ingestion

What you'll learn

Explore related topics

Course content

Inroduction - Build your first Azure Data Pipeline9 lectures • 1hr 1min

Metadata Driven Ingestion34 lectures • 4hr 19min

Event Driven Ingestion19 lectures • 1hr 5min

Introduction and Installation (Bonus Course - Provision Infra with Azure BICEP )9 lectures • 45min

Git Crash Course (Bonus Course - Provision Infra with Azure BICEP )11 lectures • 57min

Azure DevOps Pipeline Agents (Bonus Course - Provision Infra with Azure BICEP )7 lectures • 34min

Introduction to YAML Pipelines (Bonus Course - Provision Infra with Azure BICEP7 lectures • 47min

Create Project Structure for Log Analytics4 lectures • 12min

Log Analytics BICEP Module (Bonus Course - Provision Infra with Azure BICEP )8 lectures • 44min

Compile BICEP to ARM (Bonus Course - Provision Infra with Azure BICEP )3 lectures • 16min

Requirements

Description

Who this course is for: