
In this lab, we'll focus on creating a data catalog using AWS Glue and leveraging Amazon Athena to query and analyze the cataloged data.
In this lab, we'll focus on executing an ETL (Extract, Transform, Load) job using AWS Glue. Participants will learn how to create and run ETL jobs to process data from various sources, transform it according to defined business logic, and load it into target destinations.
In this lab, we'll focus on setting up a mechanism to trigger SNS (Simple Notification Service) notifications for Amazon S3 upload events using Amazon EventBridge. Participants will learn how to configure EventBridge rules to detect S3 upload events and seamlessly forward them to SNS topics for notification delivery. Through practical exercises, we'll explore the process of setting up event rules, defining event patterns, and configuring SNS topics to receive notifications, enabling real-time alerts for S3 upload activities. By the end of the lab, participants will gain hands-on experience in orchestrating event-driven architectures on AWS, facilitating timely notifications and automated responses to changes in S3 bucket contents.
In this lab, we'll delve into orchestrating Lambda functions using AWS Step Functions State Machine. Participants will learn how to design and implement robust and scalable workflows by coordinating the execution of multiple Lambda functions. Through hands-on exercises, we'll explore the capabilities of Step Functions to manage state transitions, error handling, and parallel execution, enabling us to create resilient and efficient serverless architectures. By the end of the lab, participants will gain practical experience in building complex workflows that seamlessly integrate Lambda functions, facilitating the automation of various business processes and tasks on AWS.
In this lab, we'll focus on orchestrating ETL (Extract, Transform, Load) workflows using a combination of AWS Glue, AWS Lambda, Amazon EventBridge, and AWS Step Functions. Participants will gain hands-on experience in designing and implementing end-to-end ETL processes, starting from data extraction, through transformation and enrichment, to loading into target destinations. Through practical exercises, we'll explore how to leverage these AWS services to automate and orchestrate complex data workflows, ensuring reliability, scalability, and flexibility in data processing pipelines.
In this lab, we'll explore storing and retrieving data from an Amazon Kinesis Data Stream using the AWS Command Line Interface (CLI). Participants will gain practical experience in using the AWS CLI to create a Kinesis Data Stream, publish data to the stream, and retrieve data from it. Through hands-on exercises, we'll learn how to interact with Kinesis Data Streams via the CLI, understand key concepts such as shards and records, and develop proficiency in managing streaming data on AWS.
In this lab, we'll dive into building a Python-based producer and consumer application for Amazon Kinesis Data Streams using the Boto3 SDK. We'll gain hands-on experience in developing a Python script to publish data to a Kinesis Data Stream as a producer, and another script to consume and process the data from the stream as a consumer. Through practical exercises, we'll learn how to interact with Kinesis Data Streams programmatically, handle data ingestion and processing tasks efficiently, and implement scalable streaming data solutions on AWS using Python and Boto3.
In this lab, we will explore the process of generating and writing simulated weather data from an Amazon Kinesis stream to Amazon S3 using AWS Lambda functions. We'll gain hands-on experience in setting up the Kinesis stream, creating Lambda functions to process incoming data, and configuring the integration to store the processed data in S3. Through practical exercises, we'll explore real-time data ingestion techniques, serverless computing concepts, and best practices for building scalable data pipelines on AWS, enabling us to effectively handle streaming data for various use cases such as analytics, monitoring, and forecasting.
Let's understand AWS EC2 hands-on before running Big Data Labs on Amazon EMR. Understanding EC2 is essential before learning EMR because EC2 provides the foundational infrastructure, such as compute instances and storage, on which EMR operates for big data processing. If you are already familiar with EC2, you can skip this lecture and move to the next one on Amazon EMR.
In this lab, we will delve into the process of running Spark transformation jobs utilizing Amazon EMR on EC2 instances. We will gain hands-on experience in setting up an EMR cluster, configuring Spark applications, and executing transformation tasks on datasets.
In this lab, we will learn how to create a data warehouse using Amazon Redshift and data stored in Amazon S3. WE will explore the process of setting up Amazon Redshift, loading data from Amazon S3 into Redshift tables, and performing basic data analysis queries. Through hands-on exercises, we will gain practical experience in leveraging Redshift's capabilities to build a scalable and efficient data warehouse solution for a organization's analytical needs.
Explore AWS Glue DataBrew, a no-code solution for transforming and preparing data effortlessly. This session highlights how you can clean, standardize, and enrich datasets with over 250 built-in transformations, all through an intuitive interface—no programming required
Prompts
"Give me a Lambda function to invoke an AWS Bedrock model. I will integrate Lambda with API Gateway, so ensure the function can be tested directly and also using API Gateway."
Deployment Prompt (CLI)
"Give me the AWS CLI command to deploy this Lambda function." Role Creation Prompt "I don’t have a Lambda execution role." Testing Prompt "Function created. How to test from Lambda UI?"
Model Access Prompt
"I have access to this model Llama 3.1 70B Instruct, so change the Lambda code." Test Event Prompt "Give JSON to test."
API Gateway Setup Prompt
"Now help me create an API Gateway endpoint and give instructions to test also using CLI." Validation Prompt "Is API Gateway pointing to this function?"
Debug Prompt
"Give me new Lambda function with debug statements."
Check Connectivity Prompt
"How do I know if API Gateway is able to reach Lambda?"
Final API Setup Prompt
"Give me code to create API Gateway for Lambda."
This hands-on course is designed for individuals familiar with AWS to enhance their skills in data engineering. Students should have a basic understanding of Python, SQL, and database concepts. However, even beginners to data engineering can follow along and learn. The course is minimal on theory, focusing instead on practical aspects of data engineering on AWS. Participants will gain practical experience through a series of labs covering essential AWS services such as Glue, Lambda, Kinesis, S3, Redshift, EventBridge, and more. While the labs provide practical exercises, participants are encouraged to refer to AWS documentation for a full understanding of concepts. This course will also give you practical experience to aid in your preparation for the Data Engineering certification (DEA-C01).
AWS Data Engineering Labs :
Creating a data catalog in Glue and viewing data in Athena
Running an ETL job using Glue
Triggering SNS Notification for S3 Upload Event using EventBridge
Orchestrating Lambda functions with Step Functions State Machine
ETL Workflow Orchestration with AWS Glue Lambda EventBridge Step Functions
Storing and Retrieving Data from a Kinesis Data Stream Using AWS CLI
Kinesis Data Stream Python Boto3 Producer & Consumer
Writing simulated weather data from a Kinesis Stream to S3 with AWS Lambda
Running Spark transformation jobs using Amazon EMR on EC2
Creating a Data Warehouse on S3 data using Amazon Redshift
Understanding PySpark Basics with Databricks
Setting up Databricks on AWS
Vibe coding with GitHub Copilot to build data pipelines using simple natural language conversation.
Prerequisites:
Basic understanding of AWS
Basic knowledge of Python, SQL, Spark and database concepts
Note: Even if you are a beginner to AWS and Data Engineering, you can still follow and learn from this course.
This course uses high-quality AI-generated text-to-speech narration to complement the powerful visuals and enhance your learning experience.