AWS Certified Machine Learning Specialty 2020 - Hands On!
4.5 (1,816 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
13,229 students enrolled

AWS Certified Machine Learning Specialty 2020 - Hands On!

Learn SageMaker, feature engineering, model tuning, and the AWS machine learning ecosystem. Be prepared for the exam!
Bestseller
4.5 (1,816 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
13,229 students enrolled
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 9.5 hours on-demand video
  • 2 articles
  • 1 Practice Test
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • What to expect on the AWS Certified Machine Learning Specialty exam
  • Amazon SageMaker's built-in machine learning algorithms (XGBoost, BlazingText, Object Detection, etc.)
  • Feature engineering techniques, including imputation, outliers, binning, and normalization
  • High-level ML services: Comprehend, Translate, Polly, Transcribe, Lex, Rekognition, and more
  • Data engineering with S3, Glue, Kinesis, and DynamoDB
  • Exploratory data analysis with scikit_learn, Athena, Apache Spark, and EMR
  • Deep learning and hyperparameter tuning of deep neural networks
  • Automatic model tuning and operations with SageMaker
  • L1 and L2 regularization
  • Applying security best practices to machine learning pipelines
Requirements
  • Associate-level knowledge of AWS services such as EC2
  • Some existing familiarity with machine learning
  • An AWS account is needed to perform the hands-on lab exercises
Description

[ Updated for 2020's latest SageMaker features and new AWS ML Services. Happy learning! ]

Nervous about passing the AWS Certified Machine Learning - Specialty exam (MLS-C01)? You should be! There's no doubt it's one of the most difficult and coveted AWS certifications. A deep knowledge of AWS and SageMaker isn't enough to pass this one - you also need deep knowledge of machine learning, and the nuances of feature engineering and model tuning that generally aren't taught in books or classrooms. You just can't prepare enough for this one.

This certification prep course is taught by Frank Kane, who spent nine years working at Amazon itself in the field of machine learning. Frank took and passed this exam on the first try, and knows exactly what it takes for you to pass it yourself. Joining Frank in this course is Stephane Maarek, an AWS expert and popular AWS certification instructor on Udemy.

In addition to the 9-hour video course, a 30-minute quick assessment practice exam is included that consists of the same topics and style as the real exam. You'll also get four hands-on labs that allow you to practice what you've learned, and gain valuable experience in model tuning, feature engineering, and data engineering.

This course is structured into the four domains tested by this exam: data engineering, exploratory data analysis, modeling, and machine learning implementation and operations. Just some of the topics we'll cover include:

  • S3 data lakes

  • AWS Glue and Glue ETL

  • Kinesis data streams, firehose, and video streams

  • DynamoDB

  • Data Pipelines, AWS Batch, and Step Functions

  • Using scikit_learn

  • Data science basics

  • Athena and Quicksight

  • Elastic MapReduce (EMR)

  • Apache Spark and MLLib

  • Feature engineering (imputation, outliers, binning, transforms, encoding, and normalization)

  • Ground Truth

  • Deep Learning basics

  • Tuning neural networks and avoiding overfitting

  • Amazon SageMaker, in depth

  • Regularization techniques

  • Evaluating machine learning models (precision, recall, F1, confusion matrix, etc.)

  • High-level ML services: Comprehend, Translate, Polly, Transcribe, Lex, Rekognition, and more

  • Security best practices with machine learning on AWS

Machine learning is an advanced certification, and it's best tackled by students who have already obtained associate-level certification in AWS and have some real-world industry experience. This exam is not intended for AWS beginners.

If there's a more comprehensive prep course for the AWS Certified Machine Learning - Specialty exam, we haven't seen it. Enroll now, and gain confidence as you walk into that testing center.

Who this course is for:
  • Individuals performing a development or data science role seeking certification in machine learning and AWS.
Course content
Expand all 114 lectures 09:37:36
+ Introduction
3 lectures 10:01

Get the most from this course - learn how to adjust the video playback speed, enable closed captions, and ensure good video streaming.

Preview 02:10

Download the notebooks you'll need throughout the hands-on labs in this course.

Get the Course Materials
01:42
+ Data Engineering
23 lectures 01:25:59
Section Intro: Data Engineering
01:04
Amazon S3 - Overview
05:04
Amazon S3 - Storage Tiers & Lifecycle Rules
04:29
Amazon S3 Security
08:05
Kinesis Data Streams & Kinesis Data Firehose
08:38
Kinesis Data Analytics
04:25
Lab 1.2 - Kinesis Data Analytics
07:22
Kinesis Video Streams
02:55
Kinesis ML Summary
01:12
Glue Data Catalog & Crawlers
02:32
Lab 1.3 - Glue Data Catalog
04:23
Glue ETL
02:10
Lab 1.4 - Glue ETL
06:20
Lab 1.5 - Athena
01:26
Lab 1 - Cleanup
01:32
AWS Data Stores in Machine Learning
03:09
AWS Data Pipelines
02:39
AWS Batch
01:51
AWS DMS - Database Migration Services
01:58
AWS Step Functions
02:44
Full Data Engineering Pipelines
05:09
Data Engineering Summary
00:48

Reinforce your knowledge of some key points in this section.

Quiz: Data Engineering
5 questions
+ Exploratory Data Analysis
21 lectures 02:26:11
Section Intro: Data Analysis
01:12

High-level overview of how Jupyter notebooks, Pandas, Numpy, Matplotlib, Seaborn, and scikit-learn play a role in exploratory data analysis and preparing your training data for machine learning.

Python in Data Science and Machine Learning
12:08

We'll walk through a Jupyter notebook that explores, cleans, and normalizes training data to build a real machine learning model to predict if mammogram results are benign or malignant.

Example: Preparing Data for Machine Learning in a Jupyter Notebook.
10:21

We'll cover the differences between numerical, categorical, and ordinal data.

Types of Data
04:31

Topics covered include normal distributions, Poisson distributions, binomial distributions, Bernoulli distributions, and the difference between probability density functions and probability mass functions.

Preview 06:05

We'll talk about how time series data consists of separate signals from trends, seasonality, and noise.

Time Series: Trends and Seasonality
03:57

A quick overview of Amazon Athena, and how it can be used to query your unstructured, structured, or semi-structured data in S3 in a serverless setting.

Introduction to Amazon Athena
05:06

High-level features of QuickSight, Amazon's data visualization product, including its new machine learning capabilities.

Overview of Amazon Quicksight
05:59

There are lots of visualization choices; bar and line graphs, heat maps, tree maps, pivot tables, and much more - all of which are offered by QuickSight. Let's talk about how to decide which kind of graph is most appropriate for illustrating different aspects of your data.

Types of Visualizations, and When to Use Them.
04:46

How Amazon EMR works, including how a Hadoop cluster's architecture works. What is HDFS and EMRFS? What are different usage modes for EMR? How does it scale? What can it do?

Elastic MapReduce (EMR) and Hadoop Overview
07:14

How Apache Spark has supplanted MapReduce; the architecture of Spark, and its capabilities, including Spark Streaming, MLLib, GraphX, and Spark SQL. How Spark integrates with AWS and Kinesis.

Apache Spark on EMR
09:59

Zeppelin notebooks run on your EMR cluster to control Spark, but EMR notebooks can run outside of your cluster and control the provisioning of the cluster itself, too. We'll also discuss the security features available with EMR, and how to choose an instance type for the master, core, and task nodes of your cluster.

EMR Notebooks, Security, and Instance Types
04:10

We'll introduce what the world of feature engineering is all about, and why it is so important to getting good results from your machine learning models. And, we'll dive into the "curse of dimensionality," and why more features usually isn't better.

Feature Engineering and the Curse of Dimensionality
06:34

A big part of feature engineering is dealing with missing data. We'll discuss various approaches, including mean imputation, dropping, and using machine learning for imputation including KNN, deep learning, and regression methods such as MICE.

Preview 08:04

Training models with highly unbalanced data sets - such as in fraud detection, where very few observations are actual fraud, is a big problem. We'll talk about ways to address this from a feature engineering standpoint, including oversampling, undersampling, and SMOTE.

Dealing with Unbalanced Data
05:35

We'll introduce how to compute variance and standard deviation, and how to identify outliers as a function of standard deviation and in box-and-whisker plots. We'll also give a shout-out to Amazon's Random Cut Forest algorithm.

Handling Outliers
08:30

We'll round out our tour of feature engineering with a discussion of binning numerical data, transforming data to create new features to discover sub-linear and super-linear patterns, one-hot encoding, scaling and normalization, and the importance of shuffling your training data.

Binning, Transforming, Encoding, Scaling, and Shuffling
07:59

Humans can be the most important tool for creating missing data, especially labels. We'll talk about how Amazon SageMaker Ground Truth manages human labeling tasks and optimizes them, as well as using unsupervised techniques such as Rekognition and Comprehend to fabricate features and labels from existing data.

Amazon SageMaker Ground Truth and Label Generation
04:28

As TF-IDF (Term Frequency - Inverse Document Frequency) may be new to you, we'll start by reviewing how TF-IDF works and how it fits into a search engine solution.

Preview 06:18

As we begin our hands-on lab, we'll start by spinning up an EMR cluster configured to run Apache Spark and Zeppelin, and walk through the process for securely connecting to it from your desktop at home.

Lab: Preparing Data for TF-IDF with Spark and EMR, Part 2
09:46

In part 3, we'll use Zeppelin and Spark to explore our Wikipedia data subset, organize it into the format we need, tokenize and transform it, clean it, and finally use it to actually perform a search on our data.

Lab: Preparing Data for TF-IDF with Spark and EMR, Part 3
13:29

Let's do a quick knowledge check of what you've learned in the exploratory data analysis domain. These aren't the sorts of questions you'll get on the exam; it's just for review purposes.

Quiz: Exploratory Data Analysis
5 questions
+ Modeling
48 lectures 04:10:00

We'll cover the biological inspiration of deep learning, and how this translates to artificial neural networks.

Introduction to Deep Learning
09:23

We'll dive deep into activation functions, including linear, step, logistic / sigmoid, hyperbolic tangent, ReLU, Leaky ReLu, PReLu, Swish, and more - and how to choose between them.

Activation Functions
10:50

Convolutional Neural Networks, or CNN's, are inspired by the human visual cortex and are useful for object recognition and other tasks. We'll cover how they work, some popular CNN architectures such as ResNet, and how CNN's are built in Keras and Tensorflow.

Convolutional Neural Networks
12:09

Recurrent Neural Networks, or RNN's, are well suited for problems involving sequences of data, such as predicting markets or machine translation. We'll cover how RNN's work, some popular variants included LSTM and GRU, and different applications of them.

Recurrent Neural Networks
10:48

A brief discussion of EMR's built-in support for deep learning with Apache MXNet, deep learning AMI's for EC2, and EC2 instance types appropriate for deep learning.

Deep Learning on EC2 and EMR
01:32

Hyperparameter tuning of deep neural networks is a complex subject. We'll talk about how deep neural nets are trained with gradient descent, and how your choice of learning rate and batch size affects your training. Sometimes it's counter-intuitive!

Preview 04:48

Deep neural networks are prone to overfitting. We'll cover some simple regularization techniques to combat this, including dropout, early stopping, and simply using a smaller network.

Regularization Techniques for Neural Networks (Dropout, Early Stopping)
06:41

What is the vanishing gradient problem, and what can be done to combat it? Also, what's gradient checking?

Grief with Gradients: The Vanishing Gradient problem
04:28

What is L1 and L2 regularization, how do they differ, and how do you choose between them?

L1 and L2 Regularization
03:04

How to read and interpret various kinds of confusion matrices, allowing you to distinguish true and false positives and negatives from an overall accuracy metric.

The Confusion Matrix
05:30

We'll cover various ways to measure classifiers, including precision, recall, ROC curves, F1, RMSE, and AUC. We'll discuss how to interpret these metrics, and how to decide which one is relevant to the problem you're trying to solve.

Precision, Recall, F1, AUC, and more
06:59

Two ensemble methods are bagging and boosting, and they solve very different problems.

Ensemble Methods: Bagging and Boosting
03:43

The heart of AWS's machine learning offering is SageMaker. We'll cover what it does and its architecture at a high level, and how it's used together with ECR and S3.

Preview 08:06

The Linear Learner algorithm in SageMaker is a robust means of regression or classification in systems that can be described in a linear manner.

Linear Learner in SageMaker
04:59

The XGBoost algorithm is winning a lot of Kaggle competitions lately; if you care about accuracy, it's a great choice. SageMaker includes the open source XGBoost algorithm; we'll cover what it does, how to use it, and how to tune it.

XGBoost in SageMaker
02:55

The Seq2Seq algorithm is commonly used for machine translation tasks. It is implemented as an RNN or CNN with attention under the hood.

Seq2Seq in SageMaker
04:47

DeepAR is a powerful RNN-based model for extrapolating time series, and sets of related time series, into the future.

DeepAR in SageMaker
04:06

BlazingText can operate in supervised mode to assign labels to sentences, or in Word2Vec mode to build an embedding layer of related words.

BlazingText in SageMaker
04:55

Object2Vec is a general mechanism for building embeddings of objects based on arbitrary pairs of data.

Object2Vec in SageMaker
04:44

The Object Detection algorithm identifies objects in images, together with their bounding boxes.

Object Detection in SageMaker
04:02

Image Classification is used to identify what objects are in an image, but without data on where those objects are within the image.

Image Classification in SageMaker
04:08

Semantic Segmentation identifies objects within images at a per-pixel level, using segmentation masks.

Semantic Segmentation in SageMaker
03:48

Random Cut Forest is Amazon's algorithm for anomaly detection in a series of data.

Random Cut Forest in SageMaker
03:01

Neural Topic Modeling is a neural network-based technique for clustering documents into a specific number of topics, in an unsupervised manner.

Neural Topic Model in SageMaker
03:25

LDA is another topic modeling technique in SageMaker that does not rely on neural networks, but just looks at commonalities in the terms contained by documents.

Latent Dirichlet Allocation (LDA) in SageMaker
03:09

KNN is a simple method for classification or regression by just analyzing the K observations most similar to a new observation.

K-Nearest-Neighbors (KNN) in SageMaker
02:59
K-Means Clustering in SageMaker
05:00
Principal Component Analysis (PCA) in SageMaker
03:20

Factorization Machines are generally used for classification or regression of sparse data, for example in recommender systems.

Factorization Machines in SageMaker
04:11

IP Insights uses deep learning to identify anomalous behavior from IP addresses in your web log data.

IP Insights in SageMaker
02:58

We'll review how reinforcement learning (specifically Q-Learning and Markov Decision Processes) works with an example of an AI-driven video game, and cover how reinforcement learning works within SageMaker.

Reinforcement Learning in SageMaker
12:23

SageMaker has the ability to spin up multiple training jobs to automatically explore different hyperparameter settings, and settle on the best values to use for your deployed model. There are some important best practices to follow that we'll cover.

Automatic Model Tuning
05:55

SageMaker integrates with Apache Spark, so you can use Spark to pre-process massive data sets, and hand off your data to SageMaker for training and deployment.

Apache Spark with SageMaker
03:17

We'll cover what's new in SageMaker for 2020 - mainly SageMaker Studio, a new ML IDE on top of SageMaker Notebooks, Experiments, Debugger, Autopilot, and Model Monitor.

SageMaker Studio, and new SageMaker features for 2020
06:06

Amazon Comprehend is a high-level NLP (natural language processing) service, capable of identifying objects, key phrases, languages, sentiments, and syntax in arbitrary text.

Amazon Comprehend
05:49

Translate is AWS's high-level service for machine translation.

Amazon Translate
01:54

Transcribe is AWS's high-level service for speech-to-text.

Amazon Transcribe
04:16

Polly is the AWS service for text-to-speech. There are many ways to control it that we'll talk about.

Preview 05:38

Rekognition is the AWS service for computer vision. It's capable of object detection, facial recognition and analysis, celebrity detection, text extraction, and more.

Amazon Rekognition
07:45

Forecast is an AWS service for time-series analysis. It can select from multiple time series prediction models to find the best one for your particular data sets.

Amazon Forecast
01:45

Lex is billed as the heart of Alexa; it's really a chatbot-building service.

Amazon Lex
03:07

We'll briefly mention Amazon Personalize, Amazon Textract, DeepRacer, and DeepLens.

The Best of the Rest: Other High-Level AWS Machine Learning Services
02:50

We'll cover the newest high-level ML services for 2020: AWS DeepComposer, Amazon Fraud Detector, Amazon CodeGuru, Contact Lens for Amazon Connect, Amazon Kendra, and Amazon Augmented AI (A2I)

New ML Services for 2020
06:18

Some examples of assembling AWS's high-level machine learning services into complete applications.

Putting them All Together
02:08

We'll set up a deep learning AMI on EC2, and connect to Jupyter Notebook from our desktop, and import our deep learning CNN model to experiment with.

Lab: Tuning a Convolutional Neural Network on EC2, Part 1
08:59

We'll walk through preparing the input data for our CNN and building our initial model for it in Tensorflow and Keras.

Lab: Tuning a Convolutional Neural Network on EC2, Part 2
09:06

Next, we'll improve our model by applying dropout layers to avoid overfitting, and we'll explore the effect changing the batch size and learning rate has on our results, and why.

Lab: Tuning a Convolutional Neural Network on EC2, Part 3
06:29

Let's do a quick knowledge check of what you've learned in the modeling domain. These aren't the sorts of questions you'll get on the exam; it's just for review purposes.

Quiz: Modeling
5 questions
+ ML Implementation and Operations
11 lectures 01:02:57
Section Intro: Machine Learning Implementation and Operations
01:10

We'll go in depth on how SageMaker containers work and their expected format, and how production variants can be used to divide traffic between different versions of a model.

Preview 10:55

SageMaker Neo can compile some SageMaker inference images into code that may be run on embedded devices, when latency matters a lot. IoT Greengrass is what gets the code where it needs to be.

SageMaker On the Edge: SageMaker Neo and IoT Greengrass
04:18

We'll review some general AWS security best practices, and the specifics of how SageMaker encrypts your data at rest and in transit using KMS.

SageMaker Security: Encryption at Rest and In Transit
04:31

There are some special cases when using VPC's to keep your SageMaker environment secure. We'll also cover the IAM policies relevant to SageMaker, and how to log and monitor SageMaker with CloudTrail and CloudWatch.

SageMaker Security: VPC's, IAM, Logging, and Monitoring
04:02

Some general guidelines on choosing an instance type for SageMaker training and inference, and how to use Spot instances to reduce your training costs.

SageMaker Resource Management: Instance Types and Spot Training
03:35

Elastic Inference can accelerate deep learning inference deployments at a lower cost than deploying dedicated GPU instances. Automatic scaling can automatically add and remove inference nodes in response to load, as measured by CloudWatch. We'll also talk about ensuring your SageMaker resources are spread across multiple availability zones.

SageMaker Resource Management: Elastic Inference, Automatic Scaling, AZ's
04:34

Inference Pipelines allow you to chain together multiple containers for inference.

SageMaker Inference Pipelines
01:39

In part one, we'll spin up a SageMaker notebook and import our CNN model developed with Keras and Tensorflow.

Lab: Tuning, Deploying, and Predicting with Tensorflow on SageMaker - Part 1
05:20

In part 2, we'll test our model locally on the notebook instance, and kick off a training job using SageMaker on a P3 instance.

Lab: Tuning, Deploying, and Predicting with Tensorflow on SageMaker - Part 2
10:33

Finally, we'll deploy our model and use it to make inferences. And, we'll use SageMaker's automatic model tuning to explore the space of hyperparameters to find the best values for our model, and deploy a new, tuned model.

Lab: Tuning, Deploying, and Predicting with Tensorflow on SageMaker - Part 3
12:20

Let's do a quick knowledge check of what you've learned in the ML implementation and operations domain. These aren't the sorts of questions you'll get on the exam; it's just for review purposes.

Quiz: ML Implementation and Operations
5 questions
+ Wrapping Up
6 lectures 19:56
Section Intro: Wrapping Up
00:24

There are additional practice exams from third parties and from Amazon that are worth taking. Amazon also offers an exam guide, free sample questions, and free online courses of their own. The SageMaker developer guide is also an important resource.

More Preparation Resources
05:52

What to expect on your test day, and how to make sure you're in top form for it. We'll also cover some strategies on how to manage your time during the exam, and find the best answers.

Preview 10:04
You Made It!
00:46
Save 50% on your AWS Exam Cost!
01:41
Get an Extra 30 Minutes on your AWS Exam - Non Native English Speakers only
01:09
+ Practice Exams
2 lectures 02:31

This 10-question warmup test should give you a good idea of how prepared you really are for the full practice exam, and for the real one - without investing 3 hours in the process. We chose these questions to be representative of the domains covered by the real exam, and some of the more difficult topics you'll be expected to know on it. If you're surprised by the topics and level of detail you encounter, you know you have more preparation and studying to do.

The AWS Certified Machine Learning Specialty exam goes beyond AWS topics, and tests your knowledge in feature engineering, model tuning, and modeling as well as how deep neural networks work. You need to both have expert-level knowledge of AWS's machine learning services (especially SageMaker), and expert-level knowledge in machine learning and AI in general. Many of the questions seem specifically designed to confound people who have only learned the theory of AI but have not applied it in practice.

Warmup Test: Quick Assessment
10 questions
THANK YOU!
01:32
Bonus Lecture: Get the Full 3-Hour Practice Exam
00:59