Note: The vast majority of the first section is free. Please view these free videos. They will give you an idea of how the rest of the course is structured. Thank you.
Although a gross oversimplification, Amazon Redshift is a traditional data warehouse platform.
Data warehousing has been around for quite a number of years now. There have been many evolutions in data modeling, storage, and ultimately the vast variety of tools that the business user now has available to help utilize their quickly growing stores of data.
As the industry is moving more towards self service business intelligence solutions for business users, there are also changes in how data is being stored. Amazon Redshift is one of those "game-changing" platforms that is not only driving down the total cost, but also driving up the ability to store even more data to enable even better business decisions to be made.
One of the greatest features about all Amazon’s service is that much of the mundane administration tasks have been removed. The hardware, software patching, and disk management (all of which are no small tasks) have been taken on by Amazon. Disk management, particularly the automated recovery from disk failure, and even the ability to begin querying a cluster that is being restored (even before it is done) are all powerful and compelling things Amazon has done to reduce your workload and increase up-time.
In the course we will create nodes, called redshift cluster. Once we spun up a node we can upload our data sets and perform data analysis. We will walk through all the steps necessary to begin using a redshift cluster in the real world.
One of the greatest benefits of Redshift is blazing fast query performance. There are two core items that are responsible for this. The use of columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. The parallelizing of queries across many nodes is known as MPP or Massive Parallel Processing.
The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the CPUs and drives, and a 10GigE mesh network to maximize throughput between nodes.
The last nail in the coffin for the traditional brick and mortal data warehouse is cost. Redshift accomplishes all this at a fraction of the cost of the traditional data warehouse.
If you are looking to expand your knowledge about Amazon’s data platform and specially about their Redshift service then this course is for you.
Thank you and welcome to Redshift.
I want to make sure you are in the right place.
This course is Redshift.
Redshift is Amazon's data warehouse inside AWS. (Their cloud)
Let's define what a data warehouse is.
It's not the same as an OLTP system.
You'll need an AWS account for this course.
The account is free and if you stick to the free tier you'll only incur small usage fees.
Don't leave a cluster up.
When you are learning spin up and delete the cluster in the same time span.
This is one of the largest benefits to Redshift.
Let's learn what it is.
MPP breaks data sets down into bite size segments.
Let's learn more about MPP in the short video.
The great thing about the cloud is that much of the mundane is offloaded to our cloud provider.
Let's learn what is offloaded in the short video.
Download the course content here.
The download button is on the top right hand side of this lecture.
Let's wrap up what we've learned in our first section.
An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster.
Let's learn more about nodes in this short lesson.
Before you create you real world cluster you have some things to think about.
Let's take a look at a few of the more important ones.
Let's provision our first cluster.
Provision is the swanky word and spin up is what the nerds use.
You can't connect to you cluster without an inbound rule.
Let's set one up.
Deleting a cluster is very straightforward.
Let's see how it's done in this short lesson.
There is no Amazon provided tool to interact with your data when you are using Redshift.
So, we need a third party one.
This one is free.
The client tool will need to connect to the cluster.
Drivers do this.
Let's connect via a JDBC driver in this lesson.
There is a lot of moving parts to setting this up.
Let's go over all the steps one more time before we dig into our data.
In this lecture we are going to look at the recommend approach to copying data into our Redshift tables.
Let's load some data into an S3 bucket then use the copy command to move it into our cluster.
Sort keys are similar in nature to primary keys in OLTP databases.
Let's learn the basics of sort keys for Redshift.
Snapshots are backups.
Let's look at some snapshot considerations in Redshift.
Let's take a moment to learn how to create backups and learn some of the nuances of restoring.
In this lecture let's vertically scale our cluster. This simply means adding more resources to it.
Let's take a quick look at Cloudwatch. This tools gives us key metics for measuring our performance.
I've been a production SQL Server DBA most of my career.
I've worked with databases for over two decades. I've worked for or consulted with over 50 different companies as a full time employee or consultant. Fortune 500 as well as several small to mid-size companies. Some include: Georgia Pacific, SunTrust, Reed Construction Data, Building Systems Design, NetCertainty, The Home Shopping Network, SwingVote, Atlanta Gas and Light and Northrup Grumman.
Experience, education and passion
I learn something almost every day. I work with insanely smart people. I'm a voracious learner of all things SQL Server and I'm passionate about sharing what I've learned. My area of concentration is performance tuning. SQL Server is like an exotic sports car, it will run just fine in anyone's hands but put it in the hands of skilled tuner and it will perform like a race car.
Certifications are like college degrees, they are a great starting points to begin learning. I'm a Microsoft Certified Database Administrator (MCDBA), Microsoft Certified System Engineer (MCSE) and Microsoft Certified Trainer (MCT).
Born in Ohio, raised and educated in Pennsylvania, I currently reside in Atlanta with my wife and two children.