Mastering Amazon Redshift and Serverless for Data Engineers

Name: Mastering Amazon Redshift and Serverless for Data Engineers
Rating: 4.6 (535 reviews)

In-Depth Course on Amazon Redshift, Redshift Serverless, Integration with EMR, AWS Step Functions, AWS Lambda and more

Highest Rated

Created byDurga Viswanatha Raju Gadiraju

Last updated 8/2022

English

What you'll learn

Getting Started with Amazon Redshift using AWS Web Console
Copy Data from s3 into AWS Redshift Tables using Redshift Queries or Commands
Develop Applications using Redshift Cluster using Python as Programming Language
Copy Data from s3 into AWS Redshift Tables using Python as Programming Language
Create Tables using Databases setup on AWS Redshift Database Server using Distribution Keys and Sort Keys
Run AWS Redshift Federated Queries connecting to traditional RDBMS Databases such as Postgres
Perform ETL using AWS Redshift Federated Queries using Redshift Capacity
Integration of AWS Redshift and AWS Glue Catalog to run queries using Redshift Spectrum
Run AWS Redshift Spectrum Queries using Glue Catalog Tables on Datalake setup using AWS s3
Getting Started with Amazon Redshift Serverless by creating Workgroup and Namespace
Integration of AWS EMR Cluster with Amazon Redshift using Serverless Workgroup
Develop and Deploy Spark Application on AWS EMR Cluster where the processed data will be loaded into Amazon Redshift Serverless Workgroup

Course content

14 sections • 208 lectures • 15h 51m total length

Introduction to Mastering Amazon Redshift and Serverless for Data Engineers0:15

Getting Started with Amazon Redshift - Introduction0:56
Create Redshift Cluster using Free Trial3:34
Connecting to Database using Redshift Query Editor3:33
Get list of tables querying information schema3:34
Run Queries against Redshift Tables using Query Editor3:37
Create Redshift Table using Primary Key3:35
Insert Data into Redshift Tables7:17
Update Data in Redshift Tables5:13
Delete data from Redshift tables4:17
Redshift Saved Queries using Query Editor3:40
Deleting Redshift Cluster2:37
Restore Redshift Cluster from Snapshot4:48

Copy Data from s3 to Redshift - Introduction1:27
Setup Data in s3 for Redshift Copy4:55
Create Database and Table for Redshift Copy Command3:33
Create IAM User with full access on s3 for Redshift Copy3:37
Run Copy Command to copy data from s3 to Redshift Table3:15
Discover how to copy data from s3 into the Redshift orders table using the copy command, configure credentials, and troubleshoot with stl_load_errors.
Troubleshoot Errors related to Redshift Copy Command2:17
Run Copy Command to copy from s3 to Redshift table2:12
Validate using queries against Redshift Table2:44
Overview of Redshift Copy Command5:26
Create IAM Role for Redshift to access s34:49
Copy Data from s3 to Redshift table using IAM Role6:05
Setup JSON Dataset in s3 for Redshift Copy Command3:59
Copy JSON Data from s3 to Redshift table using IAM Role3:57

Develop application using Redshift Cluster - Introduction0:59
Allocate Elastic Ip for Redshift Cluster3:46
Enable Public Accessibility for Redshift Cluster4:01
Update Inbound Rules in Security Group to access Redshift Cluster5:16
Create Database and User in Redshift Cluster4:57
Connect to database in Redshift using psql3:47
Change Owner on Redshift Tables3:06
Download Redshift JDBC Jar file1:51
Connect to Redshift Databases using IDEs such as SQL Workbench4:30
Setup Python Virtual Environment for Redshift4:45
Run Simple Query against Redshift Database Table using Python6:30
Truncate Redshift Table using Python3:56
Create IAM User to copy from s3 to Redshift Tables2:23
Validate Access of IAM User using Boto34:51
Run Redshift Copy Command using Python6:31

Redshift Tables with Distkeys and Sortkeys - Introduction3:58
Quick Review of Redshift Architecture3:34
Create multi-node Redshift Cluster4:34
Connect to Redshift Cluster using Query Editor2:47
Create Redshift Database1:34
Create Redshift Database User3:46
Create Redshift Database Schema5:37
Default Distribution Style of Redshift Table4:14
Grant Select Permissions on Catalog to Redshift Database User3:22
Update Search Path to query Redshift system tables7:09
Validate table with DISTSTYLE AUTO6:27
Create Cluster from Snapshot to the original state6:59
Overview of Node Slices in Redshift Cluster3:39
Overview of Distribution Styles3:48
Distribution Strategies for retail tables in Redshift2:17
Create Redshift tables with distribution style all5:50
Troubleshoot and Fix Load or Copy Errors4:03
Create Redshift Table with Distribution Style Auto3:49
Create Redshift Tables using Distribution Style Key7:50
Delete Cluster with manual snapshot1:27

Redshift Federated Queries and Spectrum - Introduction1:28
Overview of integrating RDS and Redshift for Federated Queries5:30
Create IAM Role for Redshift Cluster2:26
Setup Postgres Database Server for Redshift Federated Queries7:27
Create tables in Postgres Database for Redshift Federated Queries6:02
Creating Secret using Secrets Manager for Postgres Database4:05
Accessing Secret Details using Python Boto36:47
Reading Json Data to Dataframe using Pandas8:51
Write JSON Data to Database Tables using Pandas10:43
Create IAM Policy for Secret and associate with Redshift Role4:45
Create Redshift Cluster using IAM Role with permissions on secret5:01
Create Redshift External Schema to Postgres Database6:00
Update Redshift Cluster Network Settings for Federated Queries9:43
Performing ETL using Redshift Federated Queries4:46
Clean up resources added for Redshift Federated Queries3:09
Grant Access on Glue Data Catalog to Redshift Cluster for Spectrum3:51
Setup Redshift Clusters to run queries using Spectrum2:33
Quick Recap of Glue Catalog Database and Tables for Redshift Spectrum2:25
Review how to access Redshift Spectrum via Glue Data Catalog, expose databases and tables as external schema, and query json data stored in S3 using Redshift.
Create External Schema using Redshift Spectrum3:21
Run Queries using Redshift Spectrum3:37
Cleanup the Redshift Cluster1:10

Introduction to Setup Redshift Spectrum Database using Redshift Serverless1:31
Setup Files in S3 for Glue Catalog and Redshift Spectrum Database Tables7:26
Cleanup Glue Catalog Database and Crawler using AWS Glue Console3:29
Create Glue Crawler to Setup Glue Catalog Database and Tables for Redshift Shift3:38
Run Glue Crawler to Create Glue Catalog Database and Tables for Redshift Spectu4:19
Create Redshift Serverless Workgroup and Namespace for Redshift Spectrum4:37
Accessing Redshift using Jupyter Based Environment of VS Code4:43
Create Database and User for Data Mart using AWS Redshift Query Editor4:22
Create Database and User for Data Mart using Jupyter Notebooks5:54
Create External Schema in Redshift Database using Glue Catalog Database6:57
Validate External Schema Setup using Redshift Query Editor3:19

Introduction to Basic SQL Queries using AWS Redshift SQL5:59
Overview of Using WITH Clause in Redshift SQL Queries5:21
Overview of Using Views in Redshift SQL Queries4:08
Filtering Data using AWS Redshift SQL6:18
Filtering Data using Boolean AND in Redshift SQL6:00
Filtering Data using LIKE Operator in Redshift SQL10:03
Filtering Data using Boolean OR and IN Operators in Redshift SQL8:19
Overview of Count and Sum using Redshift SQL5:41
Getting Total Average using Redshift SQL3:02
Perform Total Aggregations based on Condition using Redshift SQL3:40
Get Count and Distinct Count using Redshift SQL3:41
Get Sum and Average on Order Item Measures using Redshift SQL7:56
Perform Grouped Aggregations using Redshift SQL4:44
Filtering on Aggregate Results using HAVING on GROUP BY3:23
Overview of Order Of Execution of SQL using Group By and Having9:10
Overview of Joins using Redshift Tables1:50

Create AWS EC2 Elastic IP and Key Pair for AWS EMR Cluster3:05
Create Shell Script for AWS EMR Bootstrap Action to install boto35:29
Create AWS EMR Cluster to integrate with Amazon Redshift4:56
Attach Elastic IP to the AWS EMR Master Node and Validate SSH Connectivity6:56
Setup Project for AWS EMR and Redshift Integration using VS Code Remote Deve6:32
Setup Amazon Redshift Serverless Workgroup and Validate Connetivity4:57
Connect to Redshift Serverless Workgroup from AWS EMR Master using psql4:37
Setup Required Database and User in Amazon Redshift Serverless Workgroup2:48
Install Python Library psycopg2 to connect to Redshift Databases using Python1:35
Validate Redshift Connectivity using Python from AWS EMR Master Node3:17
Create and Validate Redshift Database Tables6:13
Create Secret for Redshift Database using AWS Secrets Manager7:37
Validate Python Boto3 on Master Node of AWS EMR Cluster3:24
Read Secret from AWS Secrets Manager using Python Boto33:42
Validate Redshift Connectivity from Master Node of AWS EMR Cluster6:34
Launch Pyspark CLI with Redshift Dependencies on AWS EMR Master Node4:43
Validate Redshift Connectivity using Spark on AWS EMR Cluster7:11
Develop Code to Validate Spark and Redshift Integration using EMR4:43
Setup GHActivity Data in AWS s36:47
Read and Process Data using Pyspark to write into Redshift Table2:50
Develop Write Logic to load Spark Dataframe into Redshift Table6:40
Validate Spark Load Process to Amazon Redshift Table5:18
Validate the spark load of json data into Amazon Redshift by checking row counts and data integrity, and fix column name mismatches through aliases or renaming.
Understanding AWS s3 Temp Location specified in Spark Applications3:40
Conclusion on Integration of AWS EMR with Amazon Redshift2:57

Requirements

A computer science or IT Degree or 1 or 2 years of IT Experience
Ability to write SQL Queries using any Relational or Data Warehouse or MPP Database
Basic Linux Skills with ability to run commands using Terminal
Basic Programming using Python is desired even though it is mandatory for most part of the course

Description

AWS or Amazon Redshift is one of the key AWS Services used in building Data Warehouses or Data Marts to serve reports and dashboards for business users. As part of this course, you will end up learning AWS or Amazon Redshift by going through all the important features of AWS or Amazon Redshift to build Data Warehouses or Data Marts.

We have covered features such as Federated Queries, Redshift Spectrum, Integration with Python, AWS Lambda Functions, Integration of Redshift with EMR, and End-to-End Pipeline using AWS Step Functions.

Here is the detailed outline of the course.

First, we will understand how to Get Started with Amazon Redshift using AWS Web Console. We will see how to create a cluster, how to connect to the cluster, and also how to run the queries using a Web-based query editor. We will also go ahead and create a Database and tables in the Redshift Cluster. Once we set up a Database and tables, we will also go through the details related to CRUD Operations against tables in Databases in Redshift Cluster.
Once we have the databases and tables in Redshift Cluster, it is time for us to understand how to get data into the tables in Redshift Cluster. One of the common approaches we use to get data into the Redshift cluster is by Copying Data from s3 into Redshift Tables. We will go through the step-by-step process of copying the data into Redshift tables from s3 using the copy command.
Python is one of the prominent programming languages to build Data Engineering or ETL Applications. It is extensively used to build ETL Jobs to get data into Database Tables in Redshift Cluster. Once we understand how to get data from s3 to Redshift tables using Copy Command, we will learn how to Develop Python-based Data Engineering or ETL Applications using Redshift Cluster. We will learn how to perform CRUD operations and also how to take run COPY Commands using Python-based programs.
Once we understand how to build applications using Redshift Cluster, we will go through some of the key concepts used while creating Redshift Tables with Distkeys and Sortkeys.
We can also connect to remote databases such as Postgres and run queries directly on the remote database tables using Redshift Federated Queries and also we can run queries on top of Glue or Athena Catalog using Redshift Spectrum. You will learn how to leverage Redshift Federated Queries and Spectrum to process data in remote Database tables or s3 without copying the data.
You will also get an overview of Amazon Redshift Serverless as part of Getting Started with Amazon Redshift Serverless.
Once you learn Amazon Redshift Serverless, you will end up deploying a Pipeline where a Spark Application is deployed on AWS EMR Cluster which will load the data processed by Spark into Redshift.

Who this course is for:

University Students who want to learn AWS Redshift for Data Warehousing
Aspiring Data Engineers and Data Scientists who want to learn about AWS Redshift for Data Warehousing
Experienced Application Developers who would like to explore AWS Redshift for Data Warehousing
Experienced Data Engineers to build end to end data pipelines using Python around Data Marts created using AWS Redshift
Any IT Professional who is keen to deep dive into AWS Redshift for Data Warehousing on AWS

Mastering Amazon Redshift and Serverless for Data Engineers

What you'll learn

Explore related topics

Course content

Introduction to Mastering Amazon Redshift and Serverless for Data Engineers1 lecture • 1min

Getting Started with Amazon Redshift12 lectures • 47min

Copy Data from s3 into Redshift Tables13 lectures • 48min

Develop Applications using Redshift Cluster15 lectures • 1hr 1min

Redshift Tables with Distkeys and Sortkeys20 lectures • 1hr 27min

Redshift Federated Queries and Spectrum21 lectures • 1hr 44min

Getting Started with Amazon Serverless Redshift6 lectures • 19min

Setup Redshift Spectrum Schema using Redshift Serverless11 lectures • 50min

Basic SQL Queries using AWS Redshift SQL16 lectures • 1hr 29min

Integration of AWS EMR with Amazon Redshift24 lectures • 1hr 57min

Requirements

Description

Who this course is for: