Data Engineering on AWS Vol 1 - OLAP & Data Warehouse

Name: Data Engineering on AWS Vol 1 - OLAP & Data Warehouse
Rating: 4.7 (64 reviews)

Detailed training (Level 350) on AWS Data Engineering Services Redshift, S3, Athena, Hive, Glue Catalog, Lakeformation

Created bySoumyadeep Dey

Last updated 3/2025

English

What you'll learn

Understand Data Engineering (Volume 1) on AWS using S3, Redshift, Athena and Hive
Know Redshift, S3 and Athena up to Level 350+ with HANDS-ON
Production level projects and hands-on to help candidates provide on-job-like training
Get access to datasets of size 100 GB - 200 GB and practice using the same
Learn Python for Data Engineering with HANDS-ON (Functions, Arguments, OOP (class, object, self), Modules, Packages, Multithreading, file handling etc.
Learn SQL for Data Engineering with HANDS-ON (Database objects, CASE, Window Functions, CTE, CTAS, MERGE, Materialized View etc.)

Course content

26 sections • 216 lectures • 46h 26m total length

Course Introduction and Resources24:28
2. Course Introduction and Course Contents12:11
Course Details - Projects, About Me6:20

AWS Cloud and EC2 Introduction10:15
EC2 Console & HandsOn13:43
Explore ec2 components—ami, instance type, and ebs root and data volumes—and follow a hands-on to launch an Amazon Linux 2 instance, configure security group, key pair, and Elastic IP.
EBS Theory11:29
Explore elastic block store basics for EC2, including block storage, IOPS, and GP2/GP3 volumes. Learn about EBS snapshots, multi-attach, and S3-backed backups.
EBS Hands On8:34

1. SQL Introduction14:04
Learn SQL fundamentals: databases, schemas, tables, CRUD operations, and joins and views—along with analytics SQL concepts and ANSI standards on OLTP and data warehousing with PostgreSQL.
2. SQL Client & Server Setup12:17
3. SQL Database Objects Theory17:18
4. Database Objects Hands On29:19
Create customers, sellers, and orders tables in a Postgres Aurora cluster, load datasets from S3, and apply referential integrity and basic schema changes.
5. CRUD Operations14:43
Learn to perform CRUD with SQL, including read, update, and delete, plus inserts and load commands; master selecting columns, filtering with where and like, and grouping for analytics.
6. SELECT Operators16:56
7. CASE COALESCE Functions9:28
8. DATE Functions5:45
9. CTAS Cast Concat14:10
Demonstrate string and data type transformations with concat, cast, and substring, and teach create table as and insert into select for invoice-focused reporting.
10. Update Delete Truncate10:06
11. HAVING Clause7:42
12. Inner Join, Left Join, Right Join, Outer Join19:27
13. Union Intersect View12:58
14. Materialized View8:18
15. Common Table Expression (CTE)10:48
16. SQL Window Functions22:40
17. MERGE statement & Summary10:52

1. Python Intro - Architecture, PyCharm, Virtual Env26:28
2. PyCharm & CLI Walkthrough8:51
Master Python basics, from interpreted vs compiled language to classes, objects, data types, and type conversions, then perform a PyCharm and CLI walkthrough.
3. Compiled vs Interpreted7:41
4. Everything is Python is Object7:38
5. String Data Type6:58
6. Number Data Type2:57
Explore the number data type with integers and decimals, using zip code, price, and quantity examples to show initialization, type checks, and basic operations like power, div mod, and rounding.
7. List Data Type9:43
8. Tuple Data Type5:11
Explore the tuple data type, which you cannot modify inside, learn how to initialize and access its values, and count occurrences and offsets of elements.
9. Set & Dict Data Type, Type Conversion12:31
10. Python Operators3:20
11. Set up Python interpreter in PyCharm8:31
12. Print & Input Functions11:20
13. IF Statement5:41
14. For & While loops10:33
15. Functions Intro9:53
16. Function Scoping8:21
17. Functions RETURN7:53
18. Function Arguments4:58
19. Modify Arguments7:09
20. Positional & Keyword Arguments5:37
Explore how positional arguments map to function parameters and how keyword arguments allow explicit mapping, including type handling and common errors from mismatched names.
21. args & kwargs10:59
22. Class Object Self27:43
23. Class-Instance Variables, __init__14:55
24. Class Object Exercise 114:48
25. Class Object Exercise 213:47
Create a hike generator class to compute salary hikes. Use a lookup table for years of experience to determine hikes; include an order app with discounts.
26. Inheritance4:36
27. Python Memory Management8:05
28. Modules & Packages23:14
29. HandsOn Exercise1:31
Implement a modular bank data system by creating modules and packages to add customers, accounts, and loans, then query customer, account, and loan details by IDs.
30. Module Pre-compilation3:27
31. Namespace & __name__7:55
32. Error Handling in Python11:32
33. File Handling13:39
Explore Python file handling in data engineering, covering open and with file operations, reading and writing CSV and JSON, and hands-on examples with read, read lines, and write.
34. CSV & JSON module10:56
Learn to read csv files with csv.reader, map rows to dictionaries with csv.dict_reader, and write with csv.writer; then use the json module to read and deserialize json data from files.
35. Python Multi-threading concept17:37
36. Multi-threading hands-on and exercise22:01
37. Debugging & Profiling18:26

1. S3 Introduction 114:38
2. S3 Introduction 222:49
Explore how AWS S3 fits as a data lake, lakehouse, and distributed storage in the data engineering pipeline, enabling raw and processed data storage, analytics, and archival.
3. S3 Basics5:43
4. S3 Basics Hands-on18:53
5. S3 Versioning13:06
6. S3 Encryption5:51
7. Storage Class20:18
8. S3 Multipart Upload12:51
9. Lifecycle Policies15:03
Learn how to use S3 lifecycle policies and rules to automatically move objects between storage classes and expire or delete older versions, reducing storage costs.
10. Cross Region Replication10:12
11. S3 Mountpoint9:21
Learn how the S3 mount point mounts an S3 bucket as a local file system on EC2, translating Unix commands to S3 API calls with caching for read-heavy workloads.
12. Security - S3 Identity Based Policy19:02
Explore how identity-based policies and bucket policies govern S3 access, using IAM for authentication and authorization, and apply actions such as list, get, put, and delete objects.
13. Security - S3 Bucket Policy8:29
Learn to use S3 bucket policies as resource-based policies to grant a specific user list buckets, list objects, put object, and delete object.
14. Bucket Policy with VPC, IP address, VPCE3:49
15. S3 Access Point16:26
16. S3 Object Lambda18:54
17. Pre-signed URL4:33
Use pre-signed URLs to grant temporary, object-level access for external users without IAM, via console or CLI, with expirations up to 12 hours, for download or upload.
18. S3 Performance Considerations5:31
19. S3 Pricing13:00
20. Architectural Patterns using S37:24

1. Data Modelling Introduction17:14
2. Normal Forms 1NF 2NF 3NF28:00
3. Relations: one-to-one, one-to-many, many-to-one, many-to-many8:50
4. Dimensional modelling - Facts, Dimensions & Grains24:39
Explore dimensional modeling concepts, including facts and dimensions, star and snowflake schemas, and the grain of fact tables, to design OLAP data warehouses.
5. Grains Exercise9:18
Identify grains for two OLAP use cases: the vehicle sale details per customer as the fact granularity, and the record of each employee on every opportunity for month-wise workforce analysis.
6. Dimensional Modelling Technique14:59
7. Types of Fact & Dimension Tables10:09

1. Redshift Infra19:36
2. Redshift Infra HandsOn21:41
3. Redshift Architecture - Zone Map, Columnar Storage15:02
Understand redshift architecture with leader and compute nodes, including node slices and MPP parallelism on Ra3 and Dc2, plus columnar storage and zone maps for selective IO.
4. Cluster Resize - Elastic & Classic8:31
5. Cluster Resize - HandsOn5:03
6. Cluster Pause & Rename4:30
Rename an AWS Redshift cluster, monitor status changes from modifying to unavailable to available. Pause the cluster by enabling automated snapshot to avoid compute charges.
7. Snapshot & Backup7:42
Resume paused clusters to enable snapshots, create and differentiate manual and automated snapshots with retention settings, and delete snapshots while configuring cross-region snapshot copy in Redshift.
8. Redsfhit Infra Conclusion3:04
Conclude the Redshift infrastructure by detailing clusters with leader and compute nodes, ra3 and dc2 types, Redshift managed storage on ra3, node slices, columnar storage, and zone maps for queries.

Requirements

Good to have AWS and SQL knowledge

Description

This is Volume 1 of Data Engineering course on AWS. This course will give you detailed explanations on AWS Data Engineering Services like S3 (Simple Storage Service), Redshift, Athena, Hive, Glue Data Catalog, Lake Formation. This course delves into the data warehouse or consumption and storage layer of Data Engineering pipeline. In Volume 2, I will showcase Data Processing (Batch and Streaming) Services.

You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). Moreover, this course will provide you hands-on exercises that match with real-time scenarios like Redshift query performance tuning, streaming ingestion, Window functions, ACID transactions, COPY command, Distributed & Sort key, WLM, Row level and column level security, Athena partitioning, Athena WLM etc.

Some other highlights:

Contains training of data modelling - Normalization & ER Diagram for OLTP systems. Dimensional modelling for OLAP/DWH systems.
Data modelling hands-on.
Other technologies covered - EC2, EBS, VPC and IAM.

This is Part 1 (Volume 1) of the full data engineering course. In Part 2 (Volume 2), I will be covering the following Topics.

Spark (Batch and Stream processing using AWS EMR, AWS Glue ETL, GCP Dataproc)
Kafka (on AWS & GCP)
Flink
Apache Airflow
Apache Pinot
AWS Kinesis and more.

Who this course is for:

Data Engineers, Data Scientists, Data Analysts
Python developers, Application Developers, Big Data Developers
Database Administrators (DBA), Big Data Administrators
Solutions Architect, Cloud Architect, Big Data Architect
Technical Managers, Engineering Managers, Project Managers

Data Engineering on AWS Vol 1 - OLAP & Data Warehouse

What you'll learn

Explore related topics

Course content

Introduction - Data Engineering Volume 1 on AWS3 lectures • 43min

(Optional) AWS Pre-requisites - EC2 & EBS4 lectures • 44min

(Optional) AWS Pre-requisites - VPC6 lectures • 1hr 17min

(Optional) AWS Pre-requisites - IAM2 lectures • 27min

(Optional) Non AWS Pre-requisites - SQL Basics17 lectures • 3hr 57min

(Optional) Non AWS Pre-requisites - Python Basics37 lectures • 6hr 36min

Data Engineering Introduction3 lectures • 1hr 1min

AWS Distributed Storage - S3 (Simple Storage Service) for Data Engineers20 lectures • 4hr 6min

Data Modelling - Normalization, ER Diagram, Dimensional Modelling,7 lectures • 1hr 53min

Data Warehouse on AWS - Redshift Infra8 lectures • 1hr 25min

Requirements

Description

Who this course is for: