
Design and deploy end-to-end AWS Glue pipelines, incorporating data quality, streaming jobs, debugging workflows, and CloudFormation templates, IAM, SNS, and data catalog crawlers.
Explore identity and access management concepts, create user groups and policies, and implement encryption with CMS keys, plus set up server-side and client-side encryption and an SNS topic for notifications.
Explore identity and access management in AWS, detailing authentication, authorization, and identities (users, groups, roles). Learn how policies define allowed actions on resources like glue jobs.
Explore the key management service by enabling encryption at rest and in transit, compare client-side and server-side encryption, and create a symmetric customer-managed key with a resource policy.
Configure the AWS CLI using IAM user credentials, generate and manage access keys, and verify the connection before exploring commands for IAM, cloud formation, and S3 services.
Explore how cloud formation uses templates as infrastructure as code to create, update, and delete AWS resources with JSON templates, enabling consistent, parameterized deployments across environments via stacks.
Create your first glue database and explore the glue data catalog through hands-on labs. Schedule and orchestrate glue jobs with triggers and workflows to visualize the end-to-end pipeline.
Explore the aws glue data catalog essentials, including databases, tables, partitions, and connections, and learn how metadata supports etl jobs and redshift workflows.
Learn how AWS Glue crawler accesses data sources, extracts metadata, creates and updates table definitions in the cloud catalog, and manages schemas and partitions with classifiers as needed.
Explore how AWS Glue crawler classifiers infer schemas and create tables in the Glue catalog, and build custom classifiers to handle csv, json, xml with grok patterns.
Drive a second AWS Glue crawler lab to create and update a table in the AWS Glue catalog, expanding from three to six columns as json data uploads occur.
Create an AWS Glue crawler to ingest historical data while excluding current data, generate historical and historical year tables with partitions by extracted date and year, enabling partition-aware queries.
Create and configure AWS Glue jobs for ETL with script location, artifact buckets, cloud formation, and optional libraries; choose Python shell or Spark jobs, set bookmarks, parameters, and autoscale.
Upload cloud formation templates to S3 using the AWS CLI, syncing local templates to the target bucket and organizing them in a templates folder for future Glue jobs.
Deploy your first AWS Glue blue job by preparing source and target, using CloudFormation templates, debugging deployments, and validating results via job output and CloudWatch logs.
We will make sure that required buckets ,files, CloudFormation templates and IAM Roles are in place before we create our first Glue Job.
Align for deployment, deploy the AWS Glue job, fix the script despite failures, and ensure the AWS Glue job workflow runs the job and crawler to create data catalog table.
Design a Glue streaming loader job to read from a Kinesis stream and write to an S3 bucket, using a visual canvas and a shared script for subsequent jobs.
Launch a spark streaming Glue transformation to read streaming data, update country names from codes, and write CSV output to the transformation output location with a kinesis checkpoint.
Explore data quality fundamentals, configure a data quality rule set on the existing blue catalog table, run the Glue job, evaluate outcomes, and enable CloudWatch alerts for failures.
Learn the latest in AWS Glue - And learn to use it with other AWS resources.
In this growing world of data and growing cloud computing, it is necessary to have the core competency in cloud ETL tool also. AWS Glue come with the in built Spark support, Data Quality and data curation using Data brew. The top technology, finance and insurance companies like JPMC, Vanguard, BCBS, Amazon, Capital One, Capgemini, FINRA and more are all using AWS Glue to run their ETL on PetaBytes scale of data everyday.
AWS Glue provides server less and scalable ETL solution where scripts can be written in Python, Spark and currently using Ray. It also provides the visual drag and drop options to create the ETL pipelines. As now more and more companies are migrating to cloud it has caused an explosion in demand for this skill! With the mastery of AWS Glue, you now have the ability to quickly become one of the most knowledgeable people in the job market!
This course will teach the basics in AWS Glue Data Catalog, AWS Glue Studio, AWS resources such as IAM, SNS, KMS, CloudFormation, CloudWatch and continuing on to learning how to use AWS Glue to build ETL solution for the organization! Once we've done that we'll go through how to use the Glue Data Quality, Glue Streaming and Glue Data Brew ETL pipelines. All along the way you'll have multiple labs to create all the resources and ETL pipelines using AWS console and CloudFormation templates that you put you right into a real world situation where you need to use your new skills to solve a real problem!
This course now includes the role playing also to solidify your understanding on the concepts learned.
If you're ready to jump into the data engineering world of AWS Glue, this is the course for you!