Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Data Engineering on Microsoft Azure: The Definitive Guide

Name: Data Engineering on Microsoft Azure: The Definitive Guide
Rating: 4.3 (619 reviews)

Hands-On Introduction to Azure Data Services. Learn Data Factory, Synapse Analytics, SQL Database, and more

Created byWadson Guimatsa

Last updated 3/2022

English

English [Auto],Japanese [Auto],

What you'll learn

Provision Azure SQL Databases. Use query tools such as Azure Data Studio or SQL Server Management Studio (SSMS) to connect to Azure SQL databases
Create storage accounts to store unstructured and semi-structured data. use Azure Data Studio to upload data to Azure
Manage identities, keys, and secrets across different data platform technologies using Azure Key Vault
Use the Azure CLI to generate Shared Access Signature (SAS). How access storage resources using SAS
Build ETL pipeline using Azure Data Factory. Create advanced transformation logic using Data Flows
Trigger a Data Pipeline based on storage events or specific time
Use T-SQL to query Relational Databases and data in Azure Storage
Transform data using Azure Synapse Analytics. Create external tables to read or write data to files in Azure Storage
Build a modern data warehouse using Azure Synapse Analytics
Analyze data using serverless Apache Spark pool in Synapse Analytics

Course content

8 sections • 96 lectures • 11h 56m total length

Introduction2:26
Discover how to use Microsoft Azure for data engineering, from Azure SQL databases and Azure Blob Storage to Azure Data Factory pipelines and Azure Synapse Analytics.
What is Data - A simple definition5:15
Define data and categorize its sources and structure, from devices to observations. Examine categorical, ordinal, and quantitative data, and identify Microsoft Azure services to store and process this data.
What is Structured Data3:47
Learn how structured, relational data is stored in rows and columns, with primary and foreign keys, using SQL in relational databases such as Azure SQL and other systems.
What is Non Relational Data2:52
Explain non-relational data by comparing semi-structured JSON-based records with unstructured data, noting that fields can vary across records, and outline formats such as Apache Spark, ORC, Avro, and Azure storage.
What is Data Ingestion3:59
Data ingestion moves data from origin into storage, processing, and analysis systems, in real time or batches, using Azure IoT Hub, Apache Kafka, Azure Data Factory, and Synapse pipelines.
What is Data Processing6:40
Understand data processing from raw data to meaningful derived data, including data cleansing and transformations. Compare batch processing and stream processing, with workflows and Azure or open-source tools.
Batch Processing vs Stream Processing3:00
What is Data Analytics8:33

Create a Single Instance Database8:30
Create a single-instance Azure SQL Database to store relational data, configure a resource group, server, and firewall, and explore the Adventure Works sample database with Query Editor.
What is a Resource Group4:01
Learn how a resource in Azure encompasses running cloud components like databases, servers, and networks, and how resource groups organize these resources by life cycle and security.
Azure Data Studio - Introduction8:30
Install Azure Data Studio, connect to an Azure SQL Server, configure firewall to allow your IP address, and explore databases with autocompletion.
Create a Virtual Machine7:03
Create and connect a Linux virtual machine in the Azure portal, set up a resource group and network, and access an Azure SQL database from the VM using SSH.
Connect to Azure Database from an Azure Virtual Machine10:07
Install the mssql CLI on the Ubuntu 18.04 VM and connect to Azure SQL using server, database, and credentials. Enable Azure services access to permit the connection.
Authentication and Authorization11:02
Understand authentication and authorization in Azure SQL Database and SQL Server with payroll, log files, and settings databases today.
Create a SQL Login and User10:03
Create a sql login and a database user in azure sql, map the login to the adventureworks user, and grant read-only access.
SQL Server Management Studion - Introduction5:07
Install and use SQL Server Management Studio to administer Azure SQL databases, compare it with Azure Data Studio, connect via object explorer, and launch Azure Data Studio from SSMS.
Understanding Tables and Views7:45
Explore tables and views in a relational database, showing how tables store data in rows and columns with primary keys and constraints, and how views store sql queries for reusable results.
How to Create a Database Diagram in SSMS4:15
Create a database diagram in SSMS to reveal relationships between tables, use foreign keys to reference related data, and visualize table connections in a relational database.
Azure Cost Management - How to Create a Budget in Azure11:21

Introduction to Azure Storage2:57
Create an Azure Storage Account7:38
Create an Azure storage account in the portal by selecting a subscription, resource group, and a unique name, then configure region, standard performance, redundancy, soft delete, and access tier.
Upload Data using the Azure Portal8:37
Upload data to an Azure storage account via the Azure portal by creating a private blob container, uploading files, and exploring blob management, hot access tier, and soft delete recovery.
Upload Data using Azure Storage Explorer11:15
Learn to use Azure Storage Explorer to manage Azure cloud storage from a desktop, connect to storage accounts and containers, and upload data using access keys.
Connect a Storage Account with a Shared Access Signature12:11
Learn how to grant restricted, time-limited access to Azure storage resources using shared access signatures, including account-level and service-level SAS, with guidance on creation and rotation in the portal.
Azure CLI - Generate a Shared Access Signature13:09
Generate a shared access signature with the Azure CLI by specifying account name, container, permissions, expiry, and authentication mode, then build the valid url.
Understanding Data Redundancy5:47
Explore how Azure organizes data centers into regions and availability zones to ensure data redundancy, region-based resource deployment, and geography-driven privacy and compliance for storage and applications.
Azure Storage Account - Redundancy Options10:09
Compare locally redundant storage, zone redundant storage, geo redundant storage, and geo zone redundant storage to balance cost and high availability through primary-region replication and optional secondary-region access.

Section Intro2:23
Explore Azure Data Factory to build pipelines. Extract from Azure sql and csv, transform with mapping data flows, and load into storage using linked services, datasets, activities, and integration runtimes.
Create Data Store and Target12:00
Create a new SQL Server and database in a resource group. Import the Worldwide Importer backpack with SQL Server Management Studio and configure the firewall.
Create Data Factory and Linked Services13:04
Create an Azure Data Factory v2 and two linked services for the SQL database and Azure Blob Storage, showing how to connect to external resources and validate connections.
Create Datasets6:33
Create two datasets in data factory: a sql server table dataset using the sql linked service, and a blob storage csv dataset with a pipe delimiter, then publish both.
Create Pipeline and Activities8:33
Create an Azure Data Factory pipeline to move data from SQL Server to Azure Blob Storage using a copy data operation, datasets, and a CSV sink, with validation and debugging.
Create Mapping Data Flow and Adding Sources7:17
Mapping Data Flow - Joining Sources3:17
Join a sales order CSV with the sales order line SQL table using an inner join on order ID, then aggregate by customer ID and order ID to count items.
Mapping Data Flow - Aggregate Data5:55
Add an aggregate transformation to group by customer ID and stock item ID, sum the quantity to create a total, and write the results to an inline dataset sink.
Mapping Data Flow Execution4:10
Execute a data flow by first copying data from a SQL table to CSV, then reading it with the data flow to join the order data and output JSON.
Mapping Data Flow and Apache Spark Execution6:58
Explore how data flows transform data without code and execute on a scaled Apache Spark cluster within an Azure Data Factory pipeline, producing many small part files from parallel partitions.

Introduction2:34
Build a data pipeline in Azure Data Factory, reading json and copying from Azure SQL to parquet. Use a mapping data flow to derive columns, aggregate, and produce csv outputs.
Cost Warning - Data Pipeline Pricing3:35
IMPORTANT Download Resources Before Starting0:05
Azure SQL - Contained Users9:45
Learn how contained database users store authentication inside the database, making databases portable across environments. Create Azure Data Factory SQL user, set a strong password, and grant data reader access.
Azure Key Vault - Store SQL Server Secrets8:28
Explain why hardcoding SQL server credentials in Azure Data Factory is risky, and show creating an Azure Key Vault to store SQL server secrets and link them to Data Factory.
Azure Key Vault - Linked Service7:33
Create Azure Storage Account2:47
Create a storage account in Azure with a resource group and a unique name, then create an input container and a transaction-optimized output file share for data factory access.
Azure Managed Identity - Create a Linked Service To Azure Blob Storage10:45
Create a dataset from delivery info json extracting invoice id and date. Set up a linked service to Azure blob storage using a managed identity and Azure Active Directory authentication.
Azure Role Based Access Control - Grant Access To Managed Identity11:48
Grant a managed identity read-only access to a specific blob container using Azure RBAC, assigning the Storage Blob Data Reader role to a data factory.
Create a Dataset for the Lookup Activity3:41
Create a metadata-container-dev, upload cities to process.json, and grant the data factory storage blob data contributor access. Then define a managed-identity dataset, test the path, and preview city_id and city_name.
Azure Data Factory - Lookup Activity5:43
Explore Azure Data Factory's lookup activity to process data by city using a metadata JSON file, avoiding hard-coded city IDs, and enable dynamic per-city pipelines.
Azure Data Factory - ForEach Activity & Pipeline Expressions9:08
Learn to implement for each loops in Azure Data Factory, parameterize items with dynamic content and expressions, and pass lookup outputs to copy data into parquet files per city.
Azure Data Factory - ForEach Activity - Part II7:40
Build a data factory pipeline using the for each activity to process three cities, with managed identity access, datasets and link services, and validate parquet outputs.
Parameterize a Dataset Part I - Container Name9:01
Parameterize the sink container name and include city names in file names, creating date-based folders and passing pipeline parameters to datasets in Azure Data Factory.
Parameterize a Dataset Part II - Directory Name3:41
Parameterize the storage directory with expression language and the current date. Format the date with UTC now to create a year-month-day folder, publish the pipeline, and debug.
Parameterize a Dataset Part III - File Name7:54
Mapping Data Flow - JSON Source8:44
Create a mapping data flow that merges delivery info json with a parquet file by invoice id, while handling zip deflate and disabling soft delete, to generate delivery reports.
Mapping Data Flow - Parquet Source6:53
Read parquet source files organized by dynamic current date folders, use mapping data flow to create year-month-day paths, and infer schemas from parquet data.
Mapping Data Flow - JOIN & Derived Column Transformations9:35
Join delivery json and city parquet data with an inner join on invoice id, fix data types. Use a derived column to extract latitude and longitude, then drop unused fields.
Mapping Data Flow - Aggregate Transformation4:53
Master aggregate transformations in a data flow by grouping by city name, city ID, and delivery status, and count invoices per group to produce total invoices.
Mappind Data Flow - Parameterized CSV File Sink8:11
Add a parameterized csv sink in a data flow, defining a dataset and blob storage link with container and directory parameters. Validate, publish, and generate part files via date-based paths.
Azure Data Factory - Store SAS In Azure Key Vault6:36
In azure data factory, merge all csv part files into a single file and store it in an azure file share with a sas secured by azure key vault.
Azure Data Factory - Copy Activity Merge Behaviour7:33
Configure an Azure data factory pipeline to copy and merge multiple csv files from an Azure file share using parameterized datasets and a merge files copy behavior.
Azure Data Factory - End To End Pipeline Execution3:43
Execute an end-to-end azure data factory pipeline to process json inputs and parquet files, producing csv outputs, and monitor the run as the data flow merges into a single csv.
Azure Data Factory - Storage Event Triggers11:27
Learn to automate Azure Data Factory pipelines with a storage event trigger that fires when cities-to-process.json uploads to storage, using Event Grid and blob created events.

IMPORTANT Download Resources Before Starting0:05
Data Processing - OLAP vs OLTP9:10
Compare online transaction processing (oltp) and online analytical processing (olap) using azure synapse analytics, highlighting etl, data warehouses, data lakes, and serverless versus dedicated query options.
Azure Synapse Analytics - Create a Synapse workspace7:37
Create an Azure Synapse workspace in the portal, configure resource groups, data lake storage, and access roles, then explore Synapse Studio hubs for data, develop, integration, monitor, and manage.
Azure Synapse Analytics - Serverless SQL Pool Introduction8:24
Learn to query data lake with the serverless SQL pool in Azure Synapse, paying for data processed. Use parquet and delta formats and connect with Azure Data Studio or SSMS.
Serverless SQL pool - Connect with Azure AD User & Azure Data Studio9:17
Connect to a serverless SQL pool endpoint with Azure Data Studio using Azure AD authentication, configure access to storage accounts, and run queries against Parquet data in Azure Data Lake.
Serverless SQL pool - Server Level Credential9:55
Understand server level credentials for serverless sql pool to access azure storage with open row set, and compare server level vs database-scoped credentials for shared access signatures.
Openrowset - Read Parquet Files11:59
Learn to use open row set to read remote data with Transact-SQL. Use the built-in bulk provider in Azure Synapse Analytics to query parquet files in Azure storage.
Openrowset - Read CSV Files17:09
Openrowset - Read JSON - Line Delimited JSON10:54
Query json documents with serverless sql pool in Azure Synapse Analytics. Read line-delimited json and arrays, and extract customer id and demographic data with json value and json query.
Openrowset - Read JSON - Array of Objects7:30
Learn to read a line-delimited json file containing an array of objects by using openrowset with a row terminator and the openjson function, returning documents as rows and columns.
Serverless SQL pool - Introduction to External Tables5:18
Learn to use external tables in Azure Synapse Analytics to query or export data stored in Hadoop, Azure Blob Storage, or Azure Data Lake Storage with serverless SQL pool.
Serverless SQL pool - Create External Table - Part I6:42
Create an external table in serverless sql pool by configuring a database, a master key, and database-scoped credentials; the table stores metadata while data stays in the storage account.
Serverless SQL pool - Create External Table - Part II9:00
Create an external table in Azure Synapse serverless pool by specifying database, schema, location, data source, and file format, then query CSV or parquet data as a native SQL table.
Serverless SQL pool - Create External Table III - How to Handle Dirty Records7:48
Explore how Polybase external tables in serverless SQL pool handle dirty records with reject options, and learn to resolve data type mismatches by using string columns or dedicated pools.
Serverless SQL pool - CETAS - Create External Table As Select12:55
Explore CETAS in serverless sql pool to create external tables from a select and export results to Azure storage, using parquet and csv formats.

Apache Spark - Architecture11:45
Learn how Apache Spark runs on a cluster in Azure Synapse Analytics, including the driver and executors, cluster manager, and node manager, enabling scalable data processing.
Create a Serverless Apache Spark Pool4:10
Create a serverless Apache Spark pool in Synapse Analytics with three-node cluster, each node 4 CPU cores and 32 GB RAM, then note costs start only when a job runs.
Create and Run a Spark Notebooks8:52
Create a spark notebook from the spark pool, load address.parquet into a spark dataframe, attach to the pool, and observe the driver and executors forming the cluster.
Scaling a Serverless Apache Spark Pool7:32
Learn how to scale a serverless Apache Spark pool in Synapse Analytics by adjusting node counts, executors, and driver, enabling autoscale options, and monitoring with the Spark History Server.
Azure Synapse Analytics - Workspace Quotas4:54
Learn how Azure Synapse Analytics workspace quotas cap CPU cores for data flows and Apache Spark pools, manage executors and drivers, and request capacity increases via the Azure portal.
Working with Azure Data Lake Storage9:06
Learn to read data from Azure Data Lake Storage in Synapse Analytics using a managed identity and linked services. Enable hierarchical namespaces for big data analytics.
Working with Azure Blob Storage12:33
Learn to read CSV data from Azure blob storage with Apache Spark in Synapse by creating a blob storage linked service and configuring authentication for multiple CSV files.
Working with Azure SQL11:51
Learn to connect Apache Spark to Azure SQL from Synapse using the Azure SQL Connector, configure JDBC URL and Key Vault credentials, read, query, or write data with Spark.
Practice - Configure your favorite IDE tool0:08

Dedicated SQL pool - Introduction7:27
Understand the dedicated sql pool architecture, including the control node, compute nodes, data movement service, distributed tables, and distribution methods like round robin, hash, and replicate.
Synapse - Create a dedicated SQL pool7:10
Learn how to create a dedicated SQL pool in Synapse, name it, choose a performance level, estimate costs by data warehouse units, and manage compute with pause and scale options.
Load data - Copy Statement15:09
Learn to load data into a dedicated sql pool with the copy statement, including creating a schema and a data engineer user, granting permissions, and handling headers.
Load data - CREATE TABLE AS SELECT (CTAS)11:57
Learn to load data into a dedicated sql pool by creating an external table, configuring data sources and file formats, and using CTAS to populate a read table from storage.
Star Schema - Architecture of a Data Warehouse7:08
Learn how a data warehouse uses a star schema to organize data into fact and dimension tables, using integration and staging tables, to relate sales to time, product, and customers.
Hash-distributed table8:31
Create a hash distributed table to improve performance of large fact tables by assigning rows to distributions with a hash function, and choose a distribution column.
Hash-distributed table - Choose the Distribution Column7:26
Learn how to check data distribution in a hash-distributed table, spot skew beyond 10%, and select a distribution column that balances rows to prevent performance issues.
Round-robin distributed table6:08
Learn to create a round-robin distributed table that evenly distributes rows across distributions, reducing data skew and enabling fast loading for staging tables, though joins may trigger data shuffles.
Practice: Create and Load Data into a Dedicated SQL Pool1:19
Create and load a data warehouse with a dimension table and a fixed table for the TPC dataset. Practice copying data into tables or using Azure Data Factory copy activity.
Workload Management - How to managed query performance1:26

Requirements

Microsoft Azure Subscription
Basic SQL knowledge or programming skills
No Data Engineering experience is needed. You will learn everything you need to know

Description

Do you want to jump-start your career as an Azure Data Engineer?

Welcome to the Definitive Guide to Data Engineering on Microsoft Azure.
In this course, you will learn how to use the extensive family of Azure Data Services to build a modern data and analytics platform.

I'll take you step-by-step through engaging video tutorials and teach you everything you need to know to succeed as an Azure Data Engineer.

By the time you complete this course, you will be able to:

Architect a Data Solution on Azure
Integrate relational data and unstructured data
Build data processing pipelines using Azure Data Factory
Integrate and transform data from various data systems
Securely access data stores with Azure Key Vault and Azure role-based access control (Azure RBAC)
Perform exploratory data analysis with Azure Synapse Analytics
Manage your costs

The lectures in this course are hands-on and with lots of explanations.
I also use animations to break down complex topics.

There are SQL scripts and Jupyter notebooks that you can use to follow along easily.

You will be working closely with the documentation of Microsoft Azure as it is essential to know how to find the most up-to-date information about any service.

You will learn how to combine a range of Azure services to ingest, store and process data of all types and sizes from any data source.

Throughout this course, we will learn how to use and combine multiple Azure Services and tools, including:

Azure SQL Databases
Azure Storage
Azure Role-Based Access Control - RBAC
Azure Data Studio
Azure Storage Explorer
Azure Cost Management
Azure Data Factory
Azure Key Vault
Azure Managed Identity
Azure Synapse Analytics
- Synapse Workspace
- Azure Synapse Studio
- Serverless SQL Pool
- Serverless Apache Spark Pool
- Synapse Analytics Dedicated SQL Pool

At the beginning of each Data Service, you will be introduced to the Service, learn what it is for, and then learn how to use it.

Enroll now, and learn

Microsoft, Windows, Microsoft Azure, and all Azure Data Services are either registered trademarks or trademarks of Microsoft group of companies.
This course is not certified, accredited, affiliated with, or endorsed by Microsoft Corporation.

Who this course is for:

Anyone who wants to start using Azure in their career & get paid for their cloud and Data Engineering Skills
Software developers curious about Data Engineering
Database Developer or Database Administrators (DBA)

Data Engineering on Microsoft Azure: The Definitive Guide

What you'll learn

Explore related topics

Course content

Introduction - Understanding Core Data Concepts8 lectures • 37min

Azure SQL - Introduction11 lectures • 1hr 28min

Azure Blob Storage - Introduction8 lectures • 1hr 12min

Azure Data Factory - Core Concepts10 lectures • 1hr 10min

Practice Section: Build an ETL Pipeline with Azure Data Factory25 lectures • 2hr 52min

Azure Synapse Analytics - Serverless SQL pool15 lectures • 2hr 14min

Azure Synapse Analytics - Serverless Apache Spark pool9 lectures • 1hr 11min

Azure Synapse Analytics - Dedicated SQL Pool10 lectures • 1hr 14min

Requirements

Description

Who this course is for: