
Join this introductory course to explore data engineering with AWS analytics tools, including AWS Glue, Kinesis, Athena, and EMR, and practice with labs requiring an AWS account and internet.
Explore the course curriculum with hands-on labs, building Windows and Linux machines, mastering S3 and IAM, and applying Kinesis, AWS Glue crawlers, and catalogs for real-time analytics.
Develop an AWS Glue project that builds an end-to-end ETL pipeline moving data from a source S3 bucket (json) to a target bucket (csv) using a crawler and data catalog.
Learn to launch a Windows machine in the AWS console, configure key pairs and security groups, then connect using the RTP client with the generated password.
Learn to create a Linux machine on AWS by launching an instance, selecting a Linux image, generating a key pair, configuring security, and connecting via SSH to run commands.
Explore identity access management (IAM) concepts, including users, groups, roles, and policies, and learn how to create users, assign access, and attach multiple policies.
Learn to create IAM users, assign limited policies, and grant programmatic and web access, demonstrating read-only and admin permissions while avoiding root account use.
Create a development group, add a user to the group, and attach an access policy to grant S3 permissions, illustrating how group permissions propagate to the user account.
Learn how to create an IAM role and attach multiple policies to enable AWS Glue to access S3 and trigger Lambda, enabling automated data ETL workflows.
Explore cloud storage, how to upload and access files over a network, with examples like Google Drive and Amazon S3, and note upcoming bucket creation and file identification.
Master the basics of Amazon S3 as secure, durable, scalable object storage for storing any file type accessed from anywhere on the web with 99.99% availability and durability.
Explore Amazon S3 use cases for data lakes, big data analytics, AI ML, and scalable cloud-native apps with backup, 99% availability, and lifecycle archiving using Glacier tiers.
Create and manage an Amazon S3 bucket by configuring a unique name, region, and access settings, then upload and delete files, and finally delete the bucket.
Explore how S3 lifecycle transitions data between storage classes—standard, intelligent tiering, and Glacier—balancing cost and access speed, and learn how to configure and view storage class changes in the console.
Learn how to enable versioning on an S3 bucket, upload files, and manage multiple object versions to restore or download previous data.
Learn how ETL works with AWS Glue by extracting data from web apps and IoT sensors, transforming it through cleaning, filtering, and joining, and loading it into a target location.
Explore AWS Glue, a serverless data integration service that combines and prepares data for analytics and machine learning using extract, transform, load, and data crawlers, with visual and coding interfaces.
Explore the benefits of AWS Glue, a serverless data integration service that enables faster, collaborative data preparation with automated ETL scripts and crawlers, scaling automatically.
Explore use cases of data processing with Lambda-triggered events that automatically run jobs when new data arrives in S3, enable automated scripts, and use a data catalog to discover datasets.
Understand glue terminology such as data catalog, crawler, and classifier, and see how crawlers scan S3 data to populate metadata and create tables and databases with triggers.
Explore the Glue architecture, including data stores, crawlers, data catalogs, and ETL jobs that extract, transform, and load data from sources using automated or scripted workflows.
Build an AWS Glue workflow by creating an S3 data store and bucket, crawling data to populate the catalog, and generating ETL scripts and JSON outputs in Glue Studio.
Master glue transformations in this lab. Set up S3 input and output buckets, crawl data to create a catalog, and run a Glue Studio job with rename and aggregate operations.
Build a multi-source etl workflow in AWS Glue by joining two inputs, applying transformations, and saving the result as csv in an S3 output bucket.
Execute an end-to-end AWS analytics project using S3, Glue, and Lambda to crawl CSV data, build an ETL pipeline, transform data, and run jobs.
Learn to implement an end-to-end real-time partitioning workflow using AWS Glue crawler to create a single partitioned table from multi-folder S3 data, and load transformed CSV output via Glue Studio.
Explore real time streaming data and how online games, ecommerce actions, and IoT events are processed in real time with Kinesis to drive instant insights.
Explore real-time streaming with the Kinesis family—video streams for live video analytics, data streams for real-time data, data firehose for near real-time storage, and analytics with Apache Flink.
Learn how Kinesis enables real-time streaming, analytics, and storage via Kinesis data streaming and Kinesis Firehose, processing IoT, clickstream, and video streams with Amazon S3 and a data warehouse.
Generate real-time data with the Kinesis Data Firehose to stream into Amazon S3, using input and output buckets, a delivery stream, and demo data for testing.
This end-to-end project demonstrates real-time streaming with Kinesis Firehose, routing data to S3 buckets and analytics results to S3 via Kinesis Data Analytics, using two pipelines.
Learn how Amazon Athena provides a serverless, interactive SQL query service to analyze data directly on S3 without moving it. It is not a database or data warehouse.
Learn how Athena analyzes data stored in an S3 bucket without moving it, using an external table to query and analyze data directly in S3.
Identify who can use Athena to analyze logs stored in S3, including cloud, flow, app, and IoT data, using basic SQL knowledge to run queries.
Compare SQL Server and Athena, contrasting SQL Server's ml operations and database management with Athena's serverless, external-table analytics, and note Athena's lack of user-defined functions and DDL support.
Demonstrate how to use an Athena crawler to create a table from S3 data, build a data catalog, and run queries in the Athena editor.
Create a table in Athena directly from an s3 bucket without a crawler, configure the database and s3 path, define csv columns, and query the data.
Execute an end-to-end AWS Athena project on S3 superstore data, building a data catalog with a crawler, then query total sales, total profit, and top locations by state or city.
Set up a glue crawler to catalog daily s3 data and create partitions in a table. Query the partitioned data in athena by date to analyze daily files.
Understand Amazon EMR, Elastic MapReduce, for big data analytics with Hadoop frameworks. See when to use EMR for computing and real-time data processing with S3 and Kinesis.
Learn how EMR simplifies big data deployment by creating master and slave nodes, launching multi-node clusters, and one-click deployment of Spark and other frameworks.
Explore cost-efficient pay-as-you-go pricing and integration with other services via IAM policies, then deploy, scale, and monitor EMR clusters with cloud watch and log center, ensuring 99.99% availability and security.
Understand EMR architecture with storage options like S3, cluster resource management with MapReduce, and installable applications; plan storage, framework, and cluster deployment and monitoring.
Identify real-time streaming, interactive analysis, and genomics use cases in AWS analytics, and learn pay-per-second pricing with a one-minute minimum, plus terminate resources to avoid charges.
Learn to launch and configure an EMR cluster in the AWS console, selecting software, hardware, and security settings for quick, scalable analytics workloads.
Explore EMR cluster configuration via the advanced option, customize software, select multiple master nodes, configure data catalog, and tune hardware and networking for optimized cluster utilization.
Build an end-to-end real-time data pipeline on AWS, transforming JSON from S3 to CSV with Glue, then query with Athena to derive business metrics.
Learn end-to-end data integration using aws glue: ingest employee and department data from s3, run crawlers and transformations to produce a final output, and query with athena for analysis.
In this course, we would learn the following:
1) We will start with Basics on Serverless Computing .
2) We will learn Schema Discovery, ETL, Scheduling, and Tools integration using Serverless AWS Glue Engine built on Spark environment.
3) We will learn to develop a centralized Data Catalogue too using Serverless AWS Glue Engine.
4) We will learn to query data lake using Serverless Athena Engine build on the top of Presto and Hive.
5) We will learn about kinesis family and learn how we can handle real time data and do analytics
Businesses have always wanted to manage less infrastructure and more solutions. Big data challenges are continuously challenging the infrastructure boundaries. Having Serverless Storage, Serverless ETL, Serverless Analytics, and Serverless Reporting, all on one cloud platform had sounded too good to be true for a very long time. But now its a reality on AWS platform. AWS is the only cloud provider that has all the native serverless components for a true Serverless Data Lake Analytics solution.
This course understands your time is important, and so the course is designed to be laser-sharp on lecture timings, where all the trivial details are kept at a minimum and focus is kept on core content for experienced AWS Developers / Architects / Administrators. By the end of this course, you can feel assured and confident that you are future-proof for the next change and disruption sweeping the cloud industry.
I am very passionate about AWS Serverless computing on Data and Analytics platform, and am covering A-to-Z of all the topics discussed in this course.
So if you are excited and ready to get trained on AWS Serverless Analytics platform, I am ready to welcome you in my class !