
Explore ec2 components—ami, instance type, and ebs root and data volumes—and follow a hands-on to launch an Amazon Linux 2 instance, configure security group, key pair, and Elastic IP.
Explore elastic block store basics for EC2, including block storage, IOPS, and GP2/GP3 volumes. Learn about EBS snapshots, multi-attach, and S3-backed backups.
Explore AWS IAM authentication and authorization, including users, policies, groups, roles, and ARN usage for EC2, VPC, S3, and Redshift, with a hands-on admin user setup.
Learn SQL fundamentals: databases, schemas, tables, CRUD operations, and joins and views—along with analytics SQL concepts and ANSI standards on OLTP and data warehousing with PostgreSQL.
Create customers, sellers, and orders tables in a Postgres Aurora cluster, load datasets from S3, and apply referential integrity and basic schema changes.
Learn to perform CRUD with SQL, including read, update, and delete, plus inserts and load commands; master selecting columns, filtering with where and like, and grouping for analytics.
Demonstrate string and data type transformations with concat, cast, and substring, and teach create table as and insert into select for invoice-focused reporting.
Master Python basics, from interpreted vs compiled language to classes, objects, data types, and type conversions, then perform a PyCharm and CLI walkthrough.
Explore the number data type with integers and decimals, using zip code, price, and quantity examples to show initialization, type checks, and basic operations like power, div mod, and rounding.
Explore the tuple data type, which you cannot modify inside, learn how to initialize and access its values, and count occurrences and offsets of elements.
Explore how positional arguments map to function parameters and how keyword arguments allow explicit mapping, including type handling and common errors from mismatched names.
Create a hike generator class to compute salary hikes. Use a lookup table for years of experience to determine hikes; include an order app with discounts.
Implement a modular bank data system by creating modules and packages to add customers, accounts, and loans, then query customer, account, and loan details by IDs.
Explore Python file handling in data engineering, covering open and with file operations, reading and writing CSV and JSON, and hands-on examples with read, read lines, and write.
Learn to read csv files with csv.reader, map rows to dictionaries with csv.dict_reader, and write with csv.writer; then use the json module to read and deserialize json data from files.
Explore data mart and data mesh concepts within the data engineering pipeline, showing how department-specific data warehouses and decentralized pipelines enable targeted analytics and reporting.
Explore how AWS S3 fits as a data lake, lakehouse, and distributed storage in the data engineering pipeline, enabling raw and processed data storage, analytics, and archival.
Learn how to use S3 lifecycle policies and rules to automatically move objects between storage classes and expire or delete older versions, reducing storage costs.
Learn how the S3 mount point mounts an S3 bucket as a local file system on EC2, translating Unix commands to S3 API calls with caching for read-heavy workloads.
Explore how identity-based policies and bucket policies govern S3 access, using IAM for authentication and authorization, and apply actions such as list, get, put, and delete objects.
Learn to use S3 bucket policies as resource-based policies to grant a specific user list buckets, list objects, put object, and delete object.
Use pre-signed URLs to grant temporary, object-level access for external users without IAM, via console or CLI, with expirations up to 12 hours, for download or upload.
Explore dimensional modeling concepts, including facts and dimensions, star and snowflake schemas, and the grain of fact tables, to design OLAP data warehouses.
Identify grains for two OLAP use cases: the vehicle sale details per customer as the fact granularity, and the record of each employee on every opportunity for month-wise workforce analysis.
Understand redshift architecture with leader and compute nodes, including node slices and MPP parallelism on Ra3 and Dc2, plus columnar storage and zone maps for selective IO.
Rename an AWS Redshift cluster, monitor status changes from modifying to unavailable to available. Pause the cluster by enabling automated snapshot to avoid compute charges.
Resume paused clusters to enable snapshots, create and differentiate manual and automated snapshots with retention settings, and delete snapshots while configuring cross-region snapshot copy in Redshift.
Conclude the Redshift infrastructure by detailing clusters with leader and compute nodes, ra3 and dc2 types, Redshift managed storage on ra3, node slices, columnar storage, and zone maps for queries.
This is Volume 1 of Data Engineering course on AWS. This course will give you detailed explanations on AWS Data Engineering Services like S3 (Simple Storage Service), Redshift, Athena, Hive, Glue Data Catalog, Lake Formation. This course delves into the data warehouse or consumption and storage layer of Data Engineering pipeline. In Volume 2, I will showcase Data Processing (Batch and Streaming) Services.
You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). Moreover, this course will provide you hands-on exercises that match with real-time scenarios like Redshift query performance tuning, streaming ingestion, Window functions, ACID transactions, COPY command, Distributed & Sort key, WLM, Row level and column level security, Athena partitioning, Athena WLM etc.
Some other highlights:
Contains training of data modelling - Normalization & ER Diagram for OLTP systems. Dimensional modelling for OLAP/DWH systems.
Data modelling hands-on.
Other technologies covered - EC2, EBS, VPC and IAM.
This is Part 1 (Volume 1) of the full data engineering course. In Part 2 (Volume 2), I will be covering the following Topics.
Spark (Batch and Stream processing using AWS EMR, AWS Glue ETL, GCP Dataproc)
Kafka (on AWS & GCP)
Flink
Apache Airflow
Apache Pinot
AWS Kinesis and more.