Mastering Amazon Redshift and Serverless for Data Engineers
What you'll learn
- Getting Started with Amazon Redshift using AWS Web Console
- Copy Data from s3 into AWS Redshift Tables using Redshift Queries or Commands
- Develop Applications using Redshift Cluster using Python as Programming Language
- Copy Data from s3 into AWS Redshift Tables using Python as Programming Language
- Create Tables using Databases setup on AWS Redshift Database Server using Distribution Keys and Sort Keys
- Run AWS Redshift Federated Queries connecting to traditional RDBMS Databases such as Postgres
- Perform ETL using AWS Redshift Federated Queries using Redshift Capacity
- Integration of AWS Redshift and AWS Glue Catalog to run queries using Redshift Spectrum
- Run AWS Redshift Spectrum Queries using Glue Catalog Tables on Datalake setup using AWS s3
- Getting Started with Amazon Redshift Serverless by creating Workgroup and Namespace
- Integration of AWS EMR Cluster with Amazon Redshift using Serverless Workgroup
- Develop and Deploy Spark Application on AWS EMR Cluster where the processed data will be loaded into Amazon Redshift Serverless Workgroup
- A computer science or IT Degree or 1 or 2 years of IT Experience
- Ability to write SQL Queries using any Relational or Data Warehouse or MPP Database
- Basic Linux Skills with ability to run commands using Terminal
- Basic Programming using Python is desired even though it is mandatory for most part of the course
AWS or Amazon Redshift is one of the key AWS Services used in building Data Warehouses or Data Marts to serve reports and dashboards for business users. As part of this course, you will end up learning AWS or Amazon Redshift by going through all the important features of AWS or Amazon Redshift to build Data Warehouses or Data Marts.
We have covered features such as Federated Queries, Redshift Spectrum, Integration with Python, AWS Lambda Functions, Integration of Redshift with EMR, and End-to-End Pipeline using AWS Step Functions.
Here is the detailed outline of the course.
First, we will understand how to Get Started with Amazon Redshift using AWS Web Console. We will see how to create a cluster, how to connect to the cluster, and also how to run the queries using a Web-based query editor. We will also go ahead and create a Database and tables in the Redshift Cluster. Once we set up a Database and tables, we will also go through the details related to CRUD Operations against tables in Databases in Redshift Cluster.
Once we have the databases and tables in Redshift Cluster, it is time for us to understand how to get data into the tables in Redshift Cluster. One of the common approaches we use to get data into the Redshift cluster is by Copying Data from s3 into Redshift Tables. We will go through the step-by-step process of copying the data into Redshift tables from s3 using the copy command.
Python is one of the prominent programming languages to build Data Engineering or ETL Applications. It is extensively used to build ETL Jobs to get data into Database Tables in Redshift Cluster. Once we understand how to get data from s3 to Redshift tables using Copy Command, we will learn how to Develop Python-based Data Engineering or ETL Applications using Redshift Cluster. We will learn how to perform CRUD operations and also how to take run COPY Commands using Python-based programs.
Once we understand how to build applications using Redshift Cluster, we will go through some of the key concepts used while creating Redshift Tables with Distkeys and Sortkeys.
We can also connect to remote databases such as Postgres and run queries directly on the remote database tables using Redshift Federated Queries and also we can run queries on top of Glue or Athena Catalog using Redshift Spectrum. You will learn how to leverage Redshift Federated Queries and Spectrum to process data in remote Database tables or s3 without copying the data.
You will also get an overview of Amazon Redshift Serverless as part of Getting Started with Amazon Redshift Serverless.
Once you learn Amazon Redshift Serverless, you will end up deploying a Pipeline where a Spark Application is deployed on AWS EMR Cluster which will load the data processed by Spark into Redshift.
Who this course is for:
- University Students who want to learn AWS Redshift for Data Warehousing
- Aspiring Data Engineers and Data Scientists who want to learn about AWS Redshift for Data Warehousing
- Experienced Application Developers who would like to explore AWS Redshift for Data Warehousing
- Experienced Data Engineers to build end to end data pipelines using Python around Data Marts created using AWS Redshift
- Any IT Professional who is keen to deep dive into AWS Redshift for Data Warehousing on AWS
20+ years of experience in executing complex projects using a vast array of technologies including Big Data and the Cloud.
ITVersity, Inc. - is a US-based organization that provides quality training for IT professionals and we have a track record of training hundreds of thousands of professionals globally.
Building an IT career for people with required tools such as high-quality material, labs, live support, etc to upskill and cross-skill is paramount for our organization.
At this time our training offerings are focused on the following areas:
* Application Development using Python and SQL
* Big Data and Business Intelligence
* Datawarehousing, Databases
- 4.3 Instructor Rating
- 12,751 Reviews
- 129,738 Students
- 12 Courses
- 4.3 Instructor Rating
- 20,607 Reviews
- 263,577 Students
- 22 Courses