Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

AWS Glue - The Complete Masterclass

Name: AWS Glue - The Complete Masterclass
Rating: 4.3 (3731 reviews)

Master building complete AWS Glue ETL Pipelines, Glue Data Quality, Glue Data Brew along with other AWS resources

Bestseller

Role Play

Created byData Soup

Last updated 2/2026

English

German [Auto],English [Auto],

What you'll learn

Understanding of AWS Glue Data Catalog and creating AWS Glue Database, Glue Tables and Crawlers
Using AWS Glue Studio, creating the ETL pipeline along with scheduled triggers, conditional triggers and glue workflow
KMS, IAM Role, SNS, S3 and other associated AWS resources associated with Glue. Understanding and creation of all the resources
Understanding of AWS Glue Data Quality and creating the associated Glue ETL pipeline
Understanding AWS Glue Data Brew , creating the recipe, project and job to curate the dataset
Understanding the AWS Glue streaming, creating the stream using the Python shell job and load the stream using the Spark streaming
Different ways AWS Glue job can fail and debugging the failure and fix
Creating the AWS resources for AWS Glue Pipeline using the AWS console and cloudformation

Course content

11 sections • 78 lectures • 4h 11m total length

Introduction3:52
Course Overview4:59
Design and deploy end-to-end AWS Glue pipelines, incorporating data quality, streaming jobs, debugging workflows, and CloudFormation templates, IAM, SNS, and data catalog crawlers.
Glue Pipeline Resources (Section 2,3 and 5) Overview3:01
Explore AWS Glue pipeline resources, including crawler, catalog, and workflows. Configure via AWS Console and CloudFormation; use S3 as source and target with CMS encryption and IAM policies for ETL.

Section Overview0:58
Explore identity and access management concepts, create user groups and policies, and implement encryption with CMS keys, plus set up server-side and client-side encryption and an SNS topic for notifications.
IAM 101 - Authentication, Authorization and Identities2:38
Explore identity and access management in AWS, detailing authentication, authorization, and identities (users, groups, roles). Learn how policies define allowed actions on resources like glue jobs.
IAM Lab - Setting Up Users and User Group2:59
Create IAM users in the AWS console, enable programmatic access with an auto-generated password, then form a user group with an administrative policy and assign the user for CLI access.
IAM Lab - Setting Up IAM Role3:13
I AM 101 - Policies4:45
KMS 101 And KMS Lab - Setting Up KMS Key3:03
Explore the key management service by enabling encryption at rest and in transit, compare client-side and server-side encryption, and create a symmetric customer-managed key with a resource policy.
AWS SNS 1012:38
Create and configure an AWS SNS topic and email subscription to notify on AWS Glue job failures, using standard messaging and subscription confirmation.
Recap1:13
Recap of section z covers creating an IAM user and group with administrator access, policies and keys, an S3 encryption key, an SNS topic, and a Glue job role.
Create GlueJobRole
Section 2 Quiz

Section Overview0:47
Learn sd 101 concepts, anchor data in a demo bucket with a cms key, adjust bucket policies, and explore cli 101 and cloud formation 101 basics.
AWS S3 1013:43
Learn aws s3 101 by creating a demo bucket with a unique name, enabling encryption with cms, blocking public access, and applying deny policy that allows only a specific role.
AWS CLI 1013:15
Configuring AWS CLI using IAM User Credentials3:22
Configure the AWS CLI using IAM user credentials, generate and manage access keys, and verify the connection before exploring commands for IAM, cloud formation, and S3 services.
AWS Cloudformation 1014:41
Explore how cloud formation uses templates as infrastructure as code to create, update, and delete AWS resources with JSON templates, enabling consistent, parameterized deployments across environments via stacks.
Create S3 Bucket - awsglueudemycourse-datasoup-gluejob2-source
Optional Assignment - Create S3 Bucket for GlueJob1 target -

Section Overview0:39
Download, unzip, and review the source data files containing ETL artifacts, scripts, cloud forms, and template sources, then create and populate the ETL, artifact, and cloud formation buckets for deployment.
Course Materials1:09
Creating S3 Buckets1:58
Create and configure three S3 buckets for AWS Glue workflows: a template bucket for Glue source data, an ETL artifacts bucket, and a crawler data bucket for lab data sources.
Uploading Data to S3 Buckets2:38
Upload data to s3 buckets by selecting the cloud data bucket, uploading a folder of cloud files, and managing artifacts in the artifacts bucket for the glue job templates.
Upload city_temperature.csv file to bucket

Section Overview1:38
Create your first glue database and explore the glue data catalog through hands-on labs. Schedule and orchestrate glue jobs with triggers and workflows to visualize the end-to-end pipeline.
AWS Glue Catalog 1013:28
Explore the aws glue data catalog essentials, including databases, tables, partitions, and connections, and learn how metadata supports etl jobs and redshift workflows.
AWS Glue Crawler 1016:03
Learn how AWS Glue crawler accesses data sources, extracts metadata, creates and updates table definitions in the cloud catalog, and manages schemas and partitions with classifiers as needed.
AWS Glue Crawler Classifier 1013:16
Explore how AWS Glue crawler classifiers infer schemas and create tables in the Glue catalog, and build custom classifiers to handle csv, json, xml with grok patterns.
Crawler Lab - First Glue Crawler Creation4:17
First Glue Crawler Running4:19
Run the first AWS Glue crawler, monitor the cloud watch logs, and view the created CSV and JSON tables in the data catalog, confirming the schema and metadata.
Crawler Lab - Second Glue Crawelr Creation5:24
Drive a second AWS Glue crawler lab to create and update a table in the AWS Glue catalog, expanding from three to six columns as json data uploads occur.
Crawler Lab - Third Glue Crawelr Creation2:26
Create and run a third AWS Glue crawler for an S3 CSV data source, creating a prefixed table in database 101.
Crawler Lab - Forth Glue Crawler Creation7:04
Create an AWS Glue crawler to ingest historical data while excluding current data, generate historical and historical year tables with partitions by extracted date and year, enabling partition-aware queries.
Crawler Lab - Fifth Glue Crawler Creation And Running2:54
Create and run an AWS Glue crawler to catalog current data from an S3 bucket while excluding historical folders, resulting in a new current table without partitions.
AWS Glue Job 1017:21
Create and configure AWS Glue jobs for ETL with script location, artifact buckets, cloud formation, and optional libraries; choose Python shell or Spark jobs, set bookmarks, parameters, and autoscale.
AWS Glue Trigger 1014:18
AWS Glue Workflow 1012:52
Discover how AWS Glue workflows orchestrate triggers and jobs to manage complex ETLs, monitor each step, and diagnose failures, with the ability to stop and resume as needed.
Recap2:13
Glue Catalog Deep Dive: Build a Crawler, Classify Data, and Verify Tables

Section Overview1:20
Review CloudFormation template 101, update and upload the template, adjust account-specific paths, and use the AWS CLI to push updated templates to the bucket while debugging stack failures.
CloudFormation Templates 1010:59
Explore CloudFormation templates 101 to create and update AWS Glue resources, including jobs, roles, and workflows, update the S3 bucket pad, and use JSON and YAML templates using the console.
First Glue Pipeline - CFN Templates3:34
Learn to build the first glue pipeline with CloudFormation templates, provisioning a glue job resource (spark or python) and an IAM role, with script location and bucket permissions.
Second Glue Pipeline - CFN Templates3:16
Build a second glue pipeline with CFN templates to create a crawler, glue job, and a workflow with scheduled and conditional triggers for automated data processing.
Glue Job 345 - CFN Template2:05
Recap CFN Template Update1:05
Upload CFN Templates to S32:13
Upload cloud formation templates to S3 using the AWS CLI, syncing local templates to the target bucket and organizing them in a templates folder for future Glue jobs.

Section Overview0:57
Deploy your first AWS Glue blue job by preparing source and target, using CloudFormation templates, debugging deployments, and validating results via job output and CloudWatch logs.
Getting Ready For Glue Pipeline Creation2:41
We will make sure that required buckets ,files, CloudFormation templates and IAM Roles are in place before we create our first Glue Job.
Deploying Glue Pipeline Stack Using CloudFormation5:46
CloudFormation Template Deployment Debugging6:41
Analyzing Glue Job Script And Running The Job5:17
Examine how to run a glue job script using glue and spark contexts, convert between dynamic frames and data frames, apply transformations, and write output to the target bucket.
Going Through The Log And Verifying Job Output4:06
Inspect the job output by reviewing output logs, verifying printed data frames, and using logger.info to confirm logs, while understanding glue log groups and cloud watch access.

Section Overview1:21
Align for deployment, deploy the AWS Glue job, fix the script despite failures, and ensure the AWS Glue job workflow runs the job and crawler to create data catalog table.
Section Prerequisite1:59
Fix Error Retrieving The Script5:40
Fix Launch Error And Glue Argument Error5:40
Debug a glue job by reading script parameters, processing a header CSV with country, city, and year filters, and writing partitioned results to destination bucket; fix launch and access errors.
Fix Resource Policy Error - Error Reading From Source Bucket3:33
Fix Identity Policy Error - Error Reading The Key3:00
Workflow Running GlueJob21:34
Recap4:26
Diagnose Glue Job Failure: Role/Trust Misconfig (Glue Can’t Assume Role)

Section Oveerview2:22
Getting Ready For Glue Streaming Pipeline2:05
Prepare the required S3 buckets and artifacts for the three glue streaming pipelines, ensure the transformation input and lookup files exist, and verify the cloud formation template to provision resources.
Deploying Glue Streaming Job Infrastructure4:04
Deploy a cloud formation stack to provision AWS Glue job resources, roles, an AWS Glue catalog database, and a JSON streaming table with a Kinesis stream for real-time data.
Lab - Creating Python Shell Glue Job For Stream Generation2:13
Create a Python shell Glue job to generate Kinesis real-time streaming records from a CSV in S3, then publish to a Kinesis stream for downstream Glue jobs.
Lab - Creating Glue Streaming Loading Job3:14
Design a Glue streaming loader job to read from a Kinesis stream and write to an S3 bucket, using a visual canvas and a shared script for subsequent jobs.
Lab- Creating Glue Streaming Transforming Job3:30
Recap Before Running All Three Glue Streaming Jobs2:25
Recap of three aws glue streaming jobs: a python cell job reads csv from kinesis, a loader writes to s3, and a transformer builds a final frame with country codes.
Running Glue Streaming Generator Job1:35
Running Glue Streaming Transformation Job3:49
Launch a spark streaming Glue transformation to read streaming data, update country names from codes, and write CSV output to the transformation output location with a kinesis checkpoint.
Section Recap2:25
Discover three AWS Glue streaming jobs: a Kinesis-based CSV reader, a streaming loader loading data to S3, and a transformer that maps country names to codes.

Section Overview1:43
Explore data quality fundamentals, configure a data quality rule set on the existing blue catalog table, run the Glue job, evaluate outcomes, and enable CloudWatch alerts for failures.
Data Quality 1012:43
Setting Up Data Quality Rule Set4:02
Evaluate the quality of a CSV table in AWS Glue by applying a recommended data quality rule set, performing data quality checks, editing rules, and auditing results with CloudWatch metrics.
Glue Job With Data Quality Check3:38
Create a glue job with inbuilt data quality checks using the evaluate data quality transformation on a CSV dataset, compare results, and halt on failures with a notification.
Running the Glue Job2:56
Setting Up Glue Data Quality CloudWatch Metrics4:08
Set up a CloudWatch alarm for data quality checks in AWS glue, using a custom metric, evaluation context, and failure notifications via the SAS topic email, triggering on one-minute failures.
Receiving Alerts for Data Quality Issues1:49
AWS Glue Data Quality: Explain Rulesets, Integrate into ETL, Win Manager Confidence

Requirements

Understanding of ETL concepts
AWS account to perform all the labs
No cloud experience is required

Description

Learn the latest in AWS Glue - And learn to use it with other AWS resources.

In this growing world of data and growing cloud computing, it is necessary to have the core competency in cloud ETL tool also. AWS Glue come with the in built Spark support, Data Quality and data curation using Data brew. The top technology, finance and insurance companies like JPMC, Vanguard, BCBS, Amazon, Capital One, Capgemini, FINRA and more are all using AWS Glue to run their ETL on PetaBytes scale of data everyday.

AWS Glue provides server less and scalable ETL solution where scripts can be written in Python, Spark and currently using Ray. It also provides the visual drag and drop options to create the ETL pipelines. As now more and more companies are migrating to cloud it has caused an explosion in demand for this skill! With the mastery of AWS Glue, you now have the ability to quickly become one of the most knowledgeable people in the job market!

This course will teach the basics in AWS Glue Data Catalog, AWS Glue Studio, AWS resources such as IAM, SNS, KMS, CloudFormation, CloudWatch and continuing on to learning how to use AWS Glue to build ETL solution for the organization! Once we've done that we'll go through how to use the Glue Data Quality, Glue Streaming and Glue Data Brew ETL pipelines. All along the way you'll have multiple labs to create all the resources and ETL pipelines using AWS console and CloudFormation templates that you put you right into a real world situation where you need to use your new skills to solve a real problem!

This course now includes the role playing also to solidify your understanding on the concepts learned.

If you're ready to jump into the data engineering world of AWS Glue, this is the course for you!

Who this course is for:

Data Engineer, ETL Developer, Data warehouse developer or BI Develper who is moving from on premised to AWS cloud for ETL
Data Scientists who want to understand the Glue ETL concepts and curate the data
Software Development Engineer who wants to do ETL in the AWS cloud

AWS Glue - The Complete Masterclass

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 12min

Glue Resources Setup Part 1 - IAM, KMS, SNS8 lectures • 21min

Glue Resources Setup Part 2 - S3, AWS CLI , CloudFormation and CloudWatch5 lectures • 16min

Creating Bucket And Uploading Data For the Course4 lectures • 6min

Glue Resources SetUp Part 3 - Glue Catalog, Crawler15 lectures • 58min

CloudFormation Templates7 lectures • 15min

First AWS Glue Pipeline Creation6 lectures • 25min

AWS Glue Job Debugging9 lectures • 27min

Glue Streaming Job10 lectures • 28min

Glue Data Quality8 lectures • 21min

Requirements

Description

Who this course is for: