
Provision a single-node redshift cluster and enable public access. Connect from a local sql client using the endpoint and port 5439 to test the connection with DBeaver.
Learn to ingest data into a Redshift cluster using copy commands, creating a transactional layer schema with orders, order items, reviews, and products from CSV and Parquet files in S3.
Enrich and centralize data by stitching transactional data with third-party user behavior data using AWS Glue crawlers, Athena, and PySpark on AWS. Centralize the data in Redshift for analytics.
Create an external schema with redshift spectrum linked to the data catalog to query S3 data as if it resides in redshift, then create an external table from csv files.
Learn to perform cross database joins in Redshift Spectrum by joining parquet_output with an RDBMS table to fetch the English category names and group by year.
Create and query a Redshift view from a custom SQL, import it into QuickSight, join category translations, and build filtered dashboards while managing memory.
Analyze how Redshift sort keys, especially compound sort keys, organize data by column order to speed up queries. Learn how the first column affects data block access and overall performance.
Learn how vacuum reclaims space from deletes and updates in Redshift, why it re-sorts data, and the main vacuum types: full, delete-only, and reindex for planning.
Create and apply a new parameter group for the Redshift cluster, add the schema to its search path, and reboot to enable sort keys on the orders table.
Install the Docker desktop engine, sign up for a free account, and verify Docker desktop runs as a daemon to enable Docker commands.
Build a docker image, push to AWS ECR, and deploy a lambda function to run the container; read a csv, apply pandas transformations, and write results back.
Learn how to design a real-time, serverless e-commerce transaction processing solution on AWS using Lambda, DynamoDB, and API Gateway, with scalable, fault-tolerant, and cost-effective ingestion and retrieval via API endpoints.
Learn to build serverless data processing workflows with AWS Step Functions, Lambda, and Glue that extract from a MySQL RDS, perform ETL with PySpark in Glue, and write to S3.
AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .
Data Scientists/Analysts/Business Analysts will soon be expected to (if not already) become all-rounders and handle the technical aspect of data ingestion/engineering/warehousing .
Anyone who has the basic understanding of how cloud works can benefit from this course because :
- This course is designed keeping in mind end to end life cycle of a typical data engineering project
- Provides a practical solution to real-world use-cases
This Course covers :
Setting up a data warehouse in AWS Redshift from scratch
Basic Data Warehousing Concepts
Writing server-less AWS Glue Jobs (pyspark and python shell) for ETL and batch processing
AWS Athena for ad-hoc analysis (when to use Athena)
AWS Data Pipeline to sync incremental data
Lambda functions to trigger and automate ETL/Data Syncing processes
QuickSight Setup , Analyses and Dashboards
Prerequisites for this course are :
Python / Sql (Absolute must)
PySpark (should know how to write some basic Pyspark scripts)
Willingness to explore ,learn and put in the extra effort to succeed
An active AWS Account
Important Note - This course makes use of the free tiers for Redshift and RDS , so you will not be billed for them unless you exceed the free tier usage which should be more than enough to get enough practice from this course .
Also , this course makes use of AWS UI on the browser for creating clusters and setting up jobs , there is no bash scripting involved. One can use any operating system to perform the lab sessions in this course .
This course is not code-intense or code-heavy ,there is only 35% coding involved , the rest is execution,understanding and chaining different component together. The whole purpose of this course is to make everyone aware of and feel comfortable with all the tools/features used in this course .
Some Tips :
Try to watch the videos at 1.2X speed
Every time you work on a new component or feature , do some research on the other tools that are meant for the same purpose and see how they differ and in what aspects , For Eg Redshift/Athena vs Snowflake or Bigquery , QuickSight vs PowerBi vs Microstrategy