Master AWS Lambda Functions for Data Engineers using Python

Name: Master AWS Lambda Functions for Data Engineers using Python
Rating: 4.3 (268 reviews)

Build Lambda Functions using Python, Lambda Triggers, Deploy using layers and Docker, Validate using Glue and Athena

Created byDurga Viswanatha Raju Gadiraju

Last updated 12/2024

English

What you'll learn

Setup required tools on Windows to develop the code for ETL Data Pipelines using Python and AWS Services
Setup Project or Development Environment to develop applications using Python and AWS Services
Getting Started with AWS by creating account in AWS and also configure AWS CLI as well as Review Data Sets used for the project
Develop Core Logic to Ingest Data from source to AWS s3 using Python boto3
Getting Started with AWS Lambda Functions using Python 3 Run-time Environment
Refactor the application, build zip file to deploy as AWS Lambda Function
Create AWS Lambda Function using Zip file and Validate
Troubleshoot issues related to AWS Lambda Functions using AWS Cloudwatch
Build custom docker image for the application and push to AWS ECR
Create AWS Lambda Function using the custom docker image in AWS ECR
Develop Applications using AWS Lambda Functions by adding Python Modules as Layers

Course content

13 sections • 154 lectures • 13h 19m total length

Introduction to Mastering AWS Lambda Functions for Data Engineers3:49
Deploy two lambda functions, GHActivity ingestor and ghactivity transformer, to ingest gharchive.org data into S3 as JSON. Transform to parquet, create Glue Catalog, analyze with Athena, and monitor with CloudWatch.
Resources used for Mastering AWS Lambda Functions for Data Engineers2:19

Introduction to Getting Started on Windows with Required Tools2:03
Learn to set up a Windows development environment for building end-to-end ETL pipelines with Python and AWS, including WSL Ubuntu VM, Docker Desktop, and Visual Studio Code remote development.
Overview of Powershell on Windows 10 or Windows 114:25
Setup Ubuntu VM on Windows 10 or 11 using wsl6:07
Setup Ubuntu VM on Windows 10 or 11 using wsl5:17
Install ubuntu 20.04 on windows via wsl, set up a user, and connect with wsl to explore the linux home directory and run commands like ls, cd, and pwd.
Setup Docker Desktop on Windows4:43
Validate Docker on Windows using Command Line leveraging Power Shell2:46
Review Docker Desktop Resource Configurations3:44
Install Visual Studio Code on Windows2:44
Install Remote Development Extension Kit for Visual Studio Code1:40
Install Python 3.9 and Distutils on Windows using wsl Ubuntu8:07
Install Python 3.9 on Ubuntu under WSL, verify availability, set up a Python 3.9 virtual environment with distutils, and activate it for AWS Lambda development.
Review Tools Installed for Application Development using Python and AWS Service2:50

Setup Project Folder using Visual Studio Code3:01
Ensure Python 3.9 for the Project4:03
Create Python Virtual Environment using Python 32:43
Install Required Dependencies for the Project using AWS Services6:55
Ensure AWS CLI to interact with AWS Services using AWS CLI Commands2:10
Recommendation to use Personal AWS Account for the course2:22
Use your personal AWS account to take this course to avoid permission issues and ensure smooth learning, with costs kept low and a clean, closed account afterward.

Setup and Login into AWS Account3:39
Setup AWS IAM User with Administrator Permissions4:20
Configure and Validate AWS CLI5:49
Configure AWS CLI with custom profile as default4:47
Configure the aws cli to use a default profile by exporting aws_profile, enabling commands like aws s3 ls and mb without --profile and ensuring access to the correct account.
Recap of Date Arithmetic using Python8:02
Validate Python boto3 to interact with AWS Services7:50
Validate boto3 setup in Python to interact with AWS services and read S3 buckets using a profile. Demonstrate switching between ghactivity and default accounts by toggling the aws_profile environment variable.
Setup and Validate Jupyter based Interactive Environment3:17
Review GHActivity Data Details2:56
Download GHActivity Data using requests6:55
Review GHActivity Data using Pandas4:25

Managing s3 using Python boto38:22
Learn to manage s3 with python boto3 in a jupyter environment, including setting aws_profile, listing buckets, and creating new buckets with a unique prefix.
Overview of AWS Dynamodb3:59
Create DynamoDB Table for Job Details11:20
Create DynamoDB Table for Job Run Details6:34
Recap of Date Arithmetic using Python8:02
Get First Run Details to Copy GHActivity Data to AWS s38:47
Get Incremental Load Logic for next file7:18
Understand AWS s3 concepts such as buckets and objects9:19
Copying or Uploading Files to AWS s3 as objects using Python boto310:27
Explore how to upload files to s3 as objects using Python boto3, including configuring aws_profile, creating an s3 client, and using upload_file with bucket and key.
Writing Python Objects or Data as AWS s3 Objects using boto38:35
Save GHActivity Data to AWS s37:30
Learn to save gh activity data to AWS S3 by uploading byte content with a boto3 s3 client using put_object, then validate with pandas read_json from S3.
Convert Date Time to Integer Unix Epoch using Python3:27
Save Job Run Details to DynamoDB Table6:26
Validate Data Copied to AWS s3 and job run details6:54
Run and Validate End to End Process11:41

Introduction to Getting Started with AWS Lambda Functions2:42
Learn to start with AWS Lambda and deploy your first Python 3.9 function to ingest data from gharchive to S3. Understand passing arguments, custom handlers, and resource settings.
Overview of AWS Lambda and Getting Started using Python 37:24
Passing Arguments to AWS Lambda and Processing using Python4:25
Using Custom Handlers for AWS Lambda Functions using Python 33:03
Apply custom module and handler names in AWS Lambda by editing runtime settings; rename lambda_function to app and lambda_handler to lambda_main, deploy, and test to ensure proper import.
Using AWS Services such as s3 in AWS Lambda Functions10:00
Learn how to interact with AWS services from Lambda by creating an S3 client with boto3, listing buckets, and configuring IAM permissions for read-only S3 access.
Recap of handling permissions using AWS IAM Roles and User Groups3:21
Explore how AWS IAM roles grant service-to-service permissions, and how users and user groups manage external access with least-privilege policies, illustrated by Lambda and S3 access.
Develop AWS Lambda Function to list objects from AWS S3 Bucket9:02
Passing Environment Variables to AWS Lambda Functions3:28
Customizing Resources such as memory used for AWS Lambda Function7:05
Setup Local Development Environment for AWS Lambda Functions5:39
Develop logic for AWS Lambda Function using external packages6:11
Build Zip file to deploy as AWS Lambda Function2:26
Deploy Application with External Dependencies as AWS Lambda Function8:55
Understand Problem Statement for Python Application for AWS2:05
Develop a Python-based lambda for downloading gharchive data using wget and requests, access response.content, and upload to S3, while setting up and validating a Windows development environment for AWS deployment.
Setup Python Project for AWS Lambda using Visual Studio Code3:06
Core Logic to upload files to AWS S3 using Python boto35:54
Develop Python Application to upload files to AWS s3 using Python boto34:55
Build Zip File for Python Application to deploy as AWS Lambda Function3:54
Deploy Python Application as AWS Lambda Function using Zip File6:26
Conclusion and request for rating and feedback2:13
Review how to set up a Python development environment for AWS on Windows with WSL and Visual Studio Code, deploy to Lambda, validate functionality, and rate with feedback on Itversity.

Introduction to Build and Deploy AWS Lambda Function using Zip File3:41
Learn to build, test, and deploy a Python Lambda function that ingests data from gharchive to AWS S3, with modular code, a custom Lambda handler, zip deployment, permissions, and monitoring.
Update Application Code with Core logic for Ingestion7:13
Overview of Validating User Defined Functions using Python CLI5:45
Validate Application using Core Logic to ingest data6:27
Add Lambda Handler to ingest data to AWS s33:56
Build Zip File for Python Application to deploy as AWS Lambda Function6:12
Build and bundle a python lambda deployment by creating a dependencies folder, installing requirements.txt, and packaging the app and dependencies into ghactivity-aws.zip for AWS Lambda deployment.
Upload Python Application Zip File to s3 and deploy as AWS Lambda Function5:22
Set Custom Handler and required Environment Variables for AWS Lambda Function4:40
Granting Permissions on AWS s3 and Dynamodb to AWS Lambda Function via Role2:30
Change Memory and Timeout for AWS Lambda Function and Test2:36
Recap and Overview of Monitoring Lambda Functions using Cloudwatch7:22
Limitations of Deploying AWS Lambda Function using Zip file0:58
Automate Build of AWS Lambda Function using Shell Scripts7:09

Introduction to Build and Deploy AWS Lambda Function using Custom Docker Imag2:44
Master AWS lambda functions with custom docker images, exceeding zip size limits, by building a python3 runtime image, validating locally with docker and curl, then deploying via ECR.
Create Dockerfile for Custom Docker Image for AWS Lambda Function5:28
Create a custom docker image for AWS lambda by writing a Dockerfile, copying requirements.txt, installing dependencies with python 3.9, copying the app, and setting the lambda handler app.lambda_ingest.
Create Custom Docker Image for AWS Lambda Function using Python 3 Run-time3:59
Validate Custom Docker Image by creating Docker Container4:27
Run the application using Python CLI in the Docker Container6:10
Run the Docker Container with the Credentials and Environment Variables10:38
Run a custom docker image to validate a python aws lambda workflow by mounting the host .aws credentials and configuring aws_profile, bucket name, and folder, then verify s3 operations.
Validate AWS Lambda Function Locally using Docker and Curl5:24
Validate AWS Lambda functions locally with Docker and curl by port forwarding 8080 to 9080 and posting to the Lambda invocation endpoint.
Create AWS ECR Repository for Custom Docker Image1:51
Push Custom Docker Image for AWS Lambda Function to AWS ECR6:25
Create AWS Lambda Function using Custom Docker Image in AWS ECR4:48
Run and Validate AWS Lambda Function created using Custom Docker Image4:23
Create Shell Script to Build and Push Docker Image to AWS ECR4:38
Add Command to build script to reconfigure AWS Lambda Function to latest docker8:46
Automate updating a lambda function to the latest custom docker image in ecr by listing images, retrieving function details, and running a script to rebuild, push, and reconfigure.

Introduction to AWS s3 Event Notifications with Lambda or s3 Triggers on Lamb1:49
Switching to Different Profile for the demos on AWS s3 Event Notifications3:03
Setup Project to explore AWS Lambda Triggers or s3 Event Notifications3:49
Setup Required Datasets for AWS Lambda Triggers or s3 Event Triggers6:27
Listing AWS s3 Buckets and Objects using Python Boto36:00
Learn to list AWS S3 buckets and objects with Python using boto3, creating an S3 client, calling list_buckets and list_objects, and extracting bucket names and object keys.
Listing AWS s3 Objects based on Prefix using Python boto34:43
Learn to filter S3 objects with boto3 list_objects using a prefix, iterate through contents and object keys, and understand bucket, object, and prefix concepts for S3 events and event triggers.
Overview of AWS s3 Events and Event Notification2:40
Create Simple AWS Lambda Function for s3 Event Notifications or Triggers3:29
Create Trigger for AWS Lambda Function for s3 Put Event7:38
Get AWS s3 Object Details from AWS Lambda Arguments3:53
Master how to retrieve S3 object key from a lambda event triggered by S3, using Python to navigate event records and S3 object details, and verify results in CloudWatch logs.
Develop Read Logic to read from s38:07
Learn to read csv data from s3, convert records to json using the json module and boto3, and write the results as json files within a lambda function.
Convert CSV to JSON Strings using AWS Lambda Function5:35
Trigger an AWS Lambda function from S3 to read CSV content, convert each record into a JSON string, and write the results back to S3.
Add Logic to Upload JSON File to s3 and Validate AWS Lambda Trigger7:36
Configure AWS Lambda as AWS s3 Event Notification5:23
Configure s3 event notifications to invoke a lambda that converts csv to json when files arrive in the airetail retail_db prefix. Validate results via cloud watch logs and s3 triggers.

Introduction to Develop and Deploy AWS Lambda to transform the data0:59
Basic Approach to convert JSON to Parquet using Python Pandas10:40
Convert json data from S3 to parquet with pandas by reading json lines and dropping payload, then write snappy parquet back to S3; plan chunking for lambda deployment.
Convert JSON Files to Parquet using Pandas in Chunks8:32
Learn to convert json to parquet in chunks using pandas, processing 10,000-record batches to overcome memory limits in aws lambda, and partitioning outputs by year, month, day, and hour.
Setup Job Metadata for JSON to Parquet2:39
Add Core Logic to transform data to the Application Code4:31
Develop Wrapper to transform the data from JSON to Parquet6:58
Create a ghactivity transform to parquet wrapper to deploy the json-to-parquet transformation as a lambda function, wiring file_name, bucket_name, and tgt_folder, and saving job_run_details.
Validate Function to Convert JSON to Parquet5:55
Validate a wrapper function that converts json to parquet by setting environment variables, authenticating with AWS, invoking ghactivity_transform_to_parquet, and preparing a lambda handler with logging and a 200 status.
Develop Lambda Handler to transform JSON Files to Parquet3:04
Define lambda_transformer to extract last_run_file_name from event, invoke ghactivity_transform_to_parquet to transform json to parquet, return status 200 with job_run_details, and deploy to aws lambda using docker image for curl validation.
Validate Lambda Handler to transform the data locally using curl10:30
Push Docker Image to ECR and Update Existing Lambda Function4:54
Deploy and Run Transformer as AWS Lambda Function6:40
Deploy the ghactivity transformer lambda using a container image, set S3 and DynamoDB permissions, and configure 4 GB memory with a 3-minute timeout for processing json to parquet with snappy.
Perform Validations of AWS Lambda Function to Transform the data6:01

Requirements

A computer science or IT Degree or 1 or 2 years of IT Experience
Basic Linux Skills with ability to run commands using Terminal
Programming Skills using Python is required
Valid AWS Account to use the AWS Services to learn how to build Data Pipelines using AWS Lambda Functions

Description

Do you want to learn AWS Lambda Functions by building an end-to-end data pipeline using Python as Programming Language and other key AWS Services such as Boto3, S3, Dynamodb, ECR, Cloudwatch, Glue Catalog, Athena, etc? Here is one course using which you will learn AWS Lambda Functions by implementing an end-to-end pipeline by using all the services mentioned.

As part of this course, you will learn how to develop and deploy lambda functions using the zip files, custom docker images as well as layers. Also, you will understand how to trigger lambda functions from Eventsbridge as well as Step Functions.

Set up required tools on Windows to develop the code for ETL Data Pipelines using Python and AWS Services. You will take care of setting up Ubuntu using wsl, Docker Desktop, and Visual Studio Code along with Remote Development Extension Kit so that you can develop Python-based applications using AWS Services.
Setup Project or Development Environment to develop applications using Python and AWS Services on Windows and Mac.
Getting Started with AWS by creating an account in AWS and also configuring AWS CLI as well as Review Data Sets used for the project
Develop Core Logic to Ingest Data from source to AWS s3 using Python boto3. The application will be built using Boto3 to interact with AWS Services, Pandas for date arithmetic, and requests to get the files from the source via REST API.
Getting Started with AWS Lambda Functions using Python 3.9 Run-time Environment
Refactor the application, and build a zip file to deploy as AWS Lambda Function. The application logic includes capturing bookmarks as well as Job Run details in Dynamodb. You will also get an overview of Dynamodb and how to interact with Dynamodb to manage Bookmark as well as Job Run details.
Create AWS Lambda Function using a Zip file, deploy using AWS Console and Validate.
Troubleshoot issues related to AWS Lambda Functions using AWS Cloudwatch
Build a custom docker image for the application and push it to AWS ECR
Create AWS Lambda Function using the custom docker image in AWS ECR and then validate.
Get an understanding of AWS s3 Event Notifications or s3-based triggers on Lambda Function.
Develop another Python application to transform the data and also write the data in the form of Parquet to s3. The application will be built using Pandas by converting 10,000 records at a time to Parquet.
Build orchestrated pipeline using AWS s3 Event Notifications between the two Lambda Functions.
Schedule the first lambda function using AWS EventsBridge and then validate.
Finally, create an AWS Glue Catalog table on the s3 location which has parquet files, and validate by running SQL Queries using AWS Athena.
After going through the complete life cycle of Deploying and Scheduling Lambda Function and also validating the data by using Glue Catalog and AWS Athena, you will also understand how to use Layers for Lambda Function.

Here are the key takeaways from this training:

Develop Python Applications and Deploy as Lambda Functions by using a Zip-based bundle as well as a custom docker image.
Monitor and troubleshoot the issues by going through Cloudwatch logs.
The entire application code used for the demo along with the notebook used to come up with core logic.
Ability to build solutions using multiple AWS Services such as Boto3, S3, Dynamodb, ECR, Cloudwatch, Glue Catalog, Athena, etc

Who this course is for:

University Students who want to learn AWS Lambda Functions with hands on and real time examples
Aspiring Data Engineers and Data Scientists who want to master AWS Lambda Functions for Data Processing with real time examples
Entry Level IT Professionals with prior basic level programming experience who would like to build complete projects using AWS and Python
Experienced Application Developers who would like to explore how to build end to end applications using Python and AWS Services
Experienced Data Engineers to build end to end data pipelines using Python and AWS Services
Any IT Professional who is keen to get started with AWS using some of the key services such as s3, Dynamodb, Lambda, ECR, IAM, etc.

Master AWS Lambda Functions for Data Engineers using Python

What you'll learn

Explore related topics

Course content

Introduction to Mastering AWS Lambda Functions for Data Engineers2 lectures • 6min

Getting Started on Windows with Required Tools11 lectures • 44min

Setup Development Environment to build Data Pipelines using AWS6 lectures • 21min

Getting Started with AWS and Review Data Sets10 lectures • 52min

Core Logic to Ingest Data from Web Service to AWS s315 lectures • 1hr 59min

Getting Started with AWS Lambda Functions20 lectures • 1hr 42min

Build and Deploy AWS Lambda Function using Zip File13 lectures • 1hr 4min

Build and Deploy AWS Lambda Function using Custom Docker Image13 lectures • 1hr 10min

Overview of AWS s3 Event Notifications with Lambda or s3 Triggers on Lambda14 lectures • 1hr 10min

Develop and Deploy AWS Lambda Function to transform the data12 lectures • 1hr 11min

Requirements

Description

Who this course is for: