
Discover Azure Synapse Analytics for data engineers, covering data integration, enterprise data warehousing, and big data analytics with serverless SQL pool, Spark pool, Synapse Link, and Power BI integration.
Create a free Azure account and explore 12 months of popular free services plus £150 or $200 free credit for 30 days, with student options available.
Explore the Azure portal overview and sign in at portal.azure.com to access resources. Navigate the menu, home, dashboard, all services, favorites, and recent resources using Azure monitor and Microsoft Learn.
Explore Azure Synapse Analytics as a limitless analytics service that unifies data integration, enterprise data warehousing, and big data analytics, with scalable compute and storage and serverless or dedicated options.
Explore the evolution from traditional data warehouses to data lakes and the modern data warehouse, highlighting ETL, governance, unstructured data, and Azure Synapse Analytics implementation.
Trace the emergence of Azure Synapse Analytics as a unified platform for data integration, data lake, big data analytics, and reporting, featuring Synapse Pipelines, Spark Pool, and Serverless SQL Pool.
Create an Azure Synapse Analytics workspace using the guided wizard. Attach a data lake Gen2 storage account and container, and configure the serverless SQL pool.
Explore the Azure Synapse Analytics workspace to access built-in serverless SQL pool and manage dedicated SQL, Spark, and Data Explorer pools, plus configure access controls and the SQL endpoint.
Learn to access Azure Synapse Studio via the Azure portal or web.azuresynapse.net, and navigate its main areas—Home, Data, Develop, Integrate, Monitor, Manage—for unified development and monitoring.
Navigate the Data Hub in Synapse to manage workspace assets and linked storage, create serverless SQL databases, connect external data, and link datasets like the Bing COVID-19 dataset.
Develop scripts in the develop hub by creating sql scripts for serverless and dedicated pools, kql for data explorer, notebooks for spark, data flows, and gallery samples.
Learn how to create and manage data pipelines in Azure Synapse Analytics, using the Integrate hub to copy data, orchestrate transformations, and invoke notebooks, Spark jobs, or SQL procedures.
Explore the monitor hub to track serverless SQL pool activity, pool status, and query execution for SQL and KQL workloads, while understanding data-driven billing and pipeline runs.
Manage an Azure Synapse workspace by configuring pools (serverless, dedicated SQL; Spark; Data Explorer), creating linked services, pipelines and triggers, integration runtimes, security roles, and git integration.
Explore Azure Synapse Analytics capabilities through a hands-on project using New York taxi trip data, including data lake upload, functional dashboards and nonfunctional monitoring, and a detailed solution architecture.
Explore the NYC taxi data ecosystem, comparing yellow, green, for-hire, and high-volume vehicles, using data dictionaries, lookup tables, and the factbook to analyze trips from 2009 to 2021.
Overview of NYC taxi data files and formats used in the project, including trip data, taxi zones, calendar, and mapping files for rate codes and payments.
Upload the NYC taxi data to the data lake by creating a blob container in Azure Storage Explorer, then upload the raw folder organized by year and month.
Outline data discovery, ingestion, and transformation requirements for a data lake: ensure quality, apply schema, enable t-sql and pay-per-query access, store in Parquet, and support BI and IoT reporting.
Explore the solution architecture for Azure Synapse Analytics, covering four compute options with serverless sql pool as the compute engine, bronze–silver–gold data layers, external tables, parquet, and Power BI integration.
Explore the Serverless SQL pool in Azure Synapse Analytics, its architecture, features, and cost model; learn to connect from Azure Data Studio and work with T-SQL statements and limitations.
Explore serverless sql pool in Azure Synapse, a pay-per-query engine that reads data from the data lake using t-sql, with Polaris driving control and compute node architecture and external tables.
Explore serverless sql pool cost control by analyzing data processed components, including data and partitioning with parquet metadata, and implementing UI and T-SQL limits for daily, weekly, and monthly usage.
Connect to Azure Synapse Serverless SQL pool from Azure Data Studio using SQL login or Azure Active Directory, then run queries and explore notebooks, IntelliSense, and source control.
Explore reading delimited files with the open rosette function in Azure Synapse, handling headers, delimiters, and escaping. Learn to specify data types and query subsets of columns.
Use the openrowset function to read remote Azure storage files, returning data as rows in CSP, parquet, or delta. It requires bulk and format parameters, with optional reject options.
Learn to read a taxi zone csv from a data lake with the open rosette function, set header row, and specify field and terminator options, using CSP parser 2.0.
Explore inferring and defining explicit data types for signup data in CSP files, using SB_describe_first_result_set and max column length to optimize performance and reduce costs in a serverless Synapse pool.
Apply UTF eight collation to avoid implicit conversions by specifying it at the column level or database level, and verify default collation in Azure Synapse Analytics.
Select a subset of columns in Azure Synapse with or without headers, using ordinal positions and the first row option to improve performance and control column naming.
debug data type mismatches and truncation errors in azure synapse analytics for data engineers by using clearer messages from version 1.0 and restoring the zone length to 50.
Create external data sources in Azure Synapse to point to storage containers and avoid hard-coded URLs, then use them in select to access raw data and bronze, silver, and gold.
Craft a hand written select on the calendar CSP file using openrowset, define width class, and join with trip data to report week versus weekend statistics.
Learn to handle delimiter conflicts in vendor data files for Azure Synapse Analytics by using an escape connector or field codes to preserve commas within data for the CSP parser.
Demonstrate reading a tab separated values file in Azure Synapse by explicitly setting the field terminator to tab, aliasing the dataset in Snap Studio, and publishing changes to Synapse repository.
Explore processing line-delimited, standard, and classic multi-line JSON with CSP parser and open rosette, then extract data using JSON value and open JSON functions.
Learn to parse line-delimited JSON with a CSP parser, extract payment type and payment type description using JSON_VALUE, and cast results to smallint and varchar in Azure Synapse Analytics.
Openjson converts json into rows and columns, supports explicit data types and arrays, and enables easier column naming, demonstrated on a payment type dataset.
Discover how to query and explode JSON arrays in Azure Synapse Analytics using JSON value and open JSON, apply cross apply, and extract payment type descriptions from nested arrays.
Explore processing standard JSON files in Azure Synapse by reading the entire JSON string from a vertical tab terminated file, using open json to extract six elements into two columns.
Learn to process multi-line JSON by reading it into a single JSON string, overriding the row terminator to a vertical tab, and using the open JSON function.
Explore querying large, partitioned data across year and month folders using recursive folder access, wildcard patterns, and file-level or folder-level selections in Azure Synapse Analytics.
Learn to use the file name and file path functions to attach metadata, count records by file, and extract year and month for partitioned queries.
Learn to query parquet files in Azure Synapse by reading folders with automatic schema inference, refine data types, and save cost while boosting performance by selecting only the needed columns.
Query folders and subfolders in Azure Synapse Analytics to read partitioned park data using wildcard characters, file name and file part functions, and recursive folder access.
Discover how delta lake uses parquet data with a delta log for transactions and time travel, and why the main folder only and partitioned year and month matter for queries.
Explore data discovery by querying files directly without loading into databases, identify records and counts per day, week, and month, and join datasets with simple transformations to drive business value.
Identify duplicates in a file by counting records per primary key (location ID) and using having count greater than one in Azure Synapse Analytics.
Identify data quality issues in the total amount field using basic checks (min, max, average) and nulls, revealing negative values and null payment types to inform clean data and reporting.
Join files to compute trips per borough by combining trip data with taxi zone data using openrowset joins. Ensure location_id is not null, then group by borough and chart results.
Compute trip duration in hours by taking the difference between pickup time and drop off time using diff, then group by hourly ranges to count trips while filtering invalid records.
Identify the percentage of cash and credit card transactions by borough by joining trip data, taxi zone, and payment type, computing totals and percentages to guide campaigns.
Welcome!
I am looking forward to helping you with learning one of the in-demand data engineering tools in the cloud, Azure Synapse Analytics! This course has been taught with implementing a data engineering solution using Azure Synapse Analytics for a real world project of analysing and reporting on NYC Taxi trips data.
This is like no other course in Udemy for Azure Synapse Analytics. Once you have completed the course including all the assignments, I strongly believe that you will be in a position to start a real world data engineering project on your own and also proficient on Azure Synapse Analytics. The primary focus of the course is Azure Synapse Analytics, but it also covers the relevant concepts and connectivity to the other technologies mentioned.
The course follows a logical progression of a real world project implementation with technical concepts being explained and the scripts and notebooks being built at the same time. Even though this course is not specifically designed to teach you the skills required for passing the exams Azure Data Engineer Associate Certification [DP-203] or Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI [DP-500], it can greatly help you get most of the necessary skills required for the exams.
I value your time as much as I do mine. So, I have designed this course to be fast-paced and to the point. Also, the course has been taught with simple English and no jargons. I start the course from basics and by the end of the course you will be proficient in the technologies used.
Currently the course teaches you the following
Azure Synapse Analytics Architecture
Serverless SQL Pool
Spark Pool
Dedicated SQL Pool
Synapse Pipelines
Synapse Link for Cosmos DB / Hybrid Transactional and Analytical Processing (HTAP) capability
Power BI Integration with Azure Synapse Analytics
Azure Data Lake Storage Gen2 integration with Azure Synapse Analytics
Project using NYC Taxi Trips data using the above technologies
Please note that the following are not currently covered
Data Flows
Advanced concepts around Dedicated SQL Pool
Spark Programming
SQL Fundamentals