
Deploy two lambda functions, GHActivity ingestor and ghactivity transformer, to ingest gharchive.org data into S3 as JSON. Transform to parquet, create Glue Catalog, analyze with Athena, and monitor with CloudWatch.
Learn to set up a Windows development environment for building end-to-end ETL pipelines with Python and AWS, including WSL Ubuntu VM, Docker Desktop, and Visual Studio Code remote development.
Install ubuntu 20.04 on windows via wsl, set up a user, and connect with wsl to explore the linux home directory and run commands like ls, cd, and pwd.
Install Python 3.9 on Ubuntu under WSL, verify availability, set up a Python 3.9 virtual environment with distutils, and activate it for AWS Lambda development.
Use your personal AWS account to take this course to avoid permission issues and ensure smooth learning, with costs kept low and a clean, closed account afterward.
Configure the aws cli to use a default profile by exporting aws_profile, enabling commands like aws s3 ls and mb without --profile and ensuring access to the correct account.
Validate boto3 setup in Python to interact with AWS services and read S3 buckets using a profile. Demonstrate switching between ghactivity and default accounts by toggling the aws_profile environment variable.
Learn to manage s3 with python boto3 in a jupyter environment, including setting aws_profile, listing buckets, and creating new buckets with a unique prefix.
Explore how to upload files to s3 as objects using Python boto3, including configuring aws_profile, creating an s3 client, and using upload_file with bucket and key.
Learn to save gh activity data to AWS S3 by uploading byte content with a boto3 s3 client using put_object, then validate with pandas read_json from S3.
Learn to start with AWS Lambda and deploy your first Python 3.9 function to ingest data from gharchive to S3. Understand passing arguments, custom handlers, and resource settings.
Apply custom module and handler names in AWS Lambda by editing runtime settings; rename lambda_function to app and lambda_handler to lambda_main, deploy, and test to ensure proper import.
Learn how to interact with AWS services from Lambda by creating an S3 client with boto3, listing buckets, and configuring IAM permissions for read-only S3 access.
Explore how AWS IAM roles grant service-to-service permissions, and how users and user groups manage external access with least-privilege policies, illustrated by Lambda and S3 access.
Develop a Python-based lambda for downloading gharchive data using wget and requests, access response.content, and upload to S3, while setting up and validating a Windows development environment for AWS deployment.
Review how to set up a Python development environment for AWS on Windows with WSL and Visual Studio Code, deploy to Lambda, validate functionality, and rate with feedback on Itversity.
Learn to build, test, and deploy a Python Lambda function that ingests data from gharchive to AWS S3, with modular code, a custom Lambda handler, zip deployment, permissions, and monitoring.
Build and bundle a python lambda deployment by creating a dependencies folder, installing requirements.txt, and packaging the app and dependencies into ghactivity-aws.zip for AWS Lambda deployment.
Master AWS lambda functions with custom docker images, exceeding zip size limits, by building a python3 runtime image, validating locally with docker and curl, then deploying via ECR.
Create a custom docker image for AWS lambda by writing a Dockerfile, copying requirements.txt, installing dependencies with python 3.9, copying the app, and setting the lambda handler app.lambda_ingest.
Run a custom docker image to validate a python aws lambda workflow by mounting the host .aws credentials and configuring aws_profile, bucket name, and folder, then verify s3 operations.
Validate AWS Lambda functions locally with Docker and curl by port forwarding 8080 to 9080 and posting to the Lambda invocation endpoint.
Automate updating a lambda function to the latest custom docker image in ecr by listing images, retrieving function details, and running a script to rebuild, push, and reconfigure.
Learn to list AWS S3 buckets and objects with Python using boto3, creating an S3 client, calling list_buckets and list_objects, and extracting bucket names and object keys.
Learn to filter S3 objects with boto3 list_objects using a prefix, iterate through contents and object keys, and understand bucket, object, and prefix concepts for S3 events and event triggers.
Master how to retrieve S3 object key from a lambda event triggered by S3, using Python to navigate event records and S3 object details, and verify results in CloudWatch logs.
Learn to read csv data from s3, convert records to json using the json module and boto3, and write the results as json files within a lambda function.
Trigger an AWS Lambda function from S3 to read CSV content, convert each record into a JSON string, and write the results back to S3.
Configure s3 event notifications to invoke a lambda that converts csv to json when files arrive in the airetail retail_db prefix. Validate results via cloud watch logs and s3 triggers.
Convert json data from S3 to parquet with pandas by reading json lines and dropping payload, then write snappy parquet back to S3; plan chunking for lambda deployment.
Learn to convert json to parquet in chunks using pandas, processing 10,000-record batches to overcome memory limits in aws lambda, and partitioning outputs by year, month, day, and hour.
Create a ghactivity transform to parquet wrapper to deploy the json-to-parquet transformation as a lambda function, wiring file_name, bucket_name, and tgt_folder, and saving job_run_details.
Validate a wrapper function that converts json to parquet by setting environment variables, authenticating with AWS, invoking ghactivity_transform_to_parquet, and preparing a lambda handler with logging and a 200 status.
Define lambda_transformer to extract last_run_file_name from event, invoke ghactivity_transform_to_parquet to transform json to parquet, return status 200 with job_run_details, and deploy to aws lambda using docker image for curl validation.
Deploy the ghactivity transformer lambda using a container image, set S3 and DynamoDB permissions, and configure 4 GB memory with a 3-minute timeout for processing json to parquet with snappy.
Do you want to learn AWS Lambda Functions by building an end-to-end data pipeline using Python as Programming Language and other key AWS Services such as Boto3, S3, Dynamodb, ECR, Cloudwatch, Glue Catalog, Athena, etc? Here is one course using which you will learn AWS Lambda Functions by implementing an end-to-end pipeline by using all the services mentioned.
As part of this course, you will learn how to develop and deploy lambda functions using the zip files, custom docker images as well as layers. Also, you will understand how to trigger lambda functions from Eventsbridge as well as Step Functions.
Set up required tools on Windows to develop the code for ETL Data Pipelines using Python and AWS Services. You will take care of setting up Ubuntu using wsl, Docker Desktop, and Visual Studio Code along with Remote Development Extension Kit so that you can develop Python-based applications using AWS Services.
Setup Project or Development Environment to develop applications using Python and AWS Services on Windows and Mac.
Getting Started with AWS by creating an account in AWS and also configuring AWS CLI as well as Review Data Sets used for the project
Develop Core Logic to Ingest Data from source to AWS s3 using Python boto3. The application will be built using Boto3 to interact with AWS Services, Pandas for date arithmetic, and requests to get the files from the source via REST API.
Getting Started with AWS Lambda Functions using Python 3.9 Run-time Environment
Refactor the application, and build a zip file to deploy as AWS Lambda Function. The application logic includes capturing bookmarks as well as Job Run details in Dynamodb. You will also get an overview of Dynamodb and how to interact with Dynamodb to manage Bookmark as well as Job Run details.
Create AWS Lambda Function using a Zip file, deploy using AWS Console and Validate.
Troubleshoot issues related to AWS Lambda Functions using AWS Cloudwatch
Build a custom docker image for the application and push it to AWS ECR
Create AWS Lambda Function using the custom docker image in AWS ECR and then validate.
Get an understanding of AWS s3 Event Notifications or s3-based triggers on Lambda Function.
Develop another Python application to transform the data and also write the data in the form of Parquet to s3. The application will be built using Pandas by converting 10,000 records at a time to Parquet.
Build orchestrated pipeline using AWS s3 Event Notifications between the two Lambda Functions.
Schedule the first lambda function using AWS EventsBridge and then validate.
Finally, create an AWS Glue Catalog table on the s3 location which has parquet files, and validate by running SQL Queries using AWS Athena.
After going through the complete life cycle of Deploying and Scheduling Lambda Function and also validating the data by using Glue Catalog and AWS Athena, you will also understand how to use Layers for Lambda Function.
Here are the key takeaways from this training:
Develop Python Applications and Deploy as Lambda Functions by using a Zip-based bundle as well as a custom docker image.
Monitor and troubleshoot the issues by going through Cloudwatch logs.
The entire application code used for the demo along with the notebook used to come up with core logic.
Ability to build solutions using multiple AWS Services such as Boto3, S3, Dynamodb, ECR, Cloudwatch, Glue Catalog, Athena, etc