AWS Certified Machine Learning Specialty 2020 - Hands On!
4.5 (1,293 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
10,369 students enrolled

AWS Certified Machine Learning Specialty 2020 - Hands On!

Learn SageMaker, feature engineering, model tuning, and the AWS machine learning ecosystem. Be prepared for the exam!
4.5 (1,293 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
10,369 students enrolled
Current price: $12.99 Original price: $199.99 Discount: 94% off
5 days left at this price!
30-Day Money-Back Guarantee
This course includes
  • 9.5 hours on-demand video
  • 2 articles
  • 1 Practice Test
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • What to expect on the AWS Certified Machine Learning Specialty exam
  • Amazon SageMaker's built-in machine learning algorithms (XGBoost, BlazingText, Object Detection, etc.)
  • Feature engineering techniques, including imputation, outliers, binning, and normalization
  • High-level ML services: Comprehend, Translate, Polly, Transcribe, Lex, Rekognition, and more
  • Data engineering with S3, Glue, Kinesis, and DynamoDB
  • Exploratory data analysis with scikit_learn, Athena, Apache Spark, and EMR
  • Deep learning and hyperparameter tuning of deep neural networks
  • Automatic model tuning and operations with SageMaker
  • L1 and L2 regularization
  • Applying security best practices to machine learning pipelines
  • Associate-level knowledge of AWS services such as EC2
  • Some existing familiarity with machine learning
  • An AWS account is needed to perform the hands-on lab exercises

[ Updated for 2020's latest SageMaker features and new AWS ML Services. Happy learning! ]

Nervous about passing the AWS Certified Machine Learning - Specialty exam (MLS-C01)? You should be! There's no doubt it's one of the most difficult and coveted AWS certifications. A deep knowledge of AWS and SageMaker isn't enough to pass this one - you also need deep knowledge of machine learning, and the nuances of feature engineering and model tuning that generally aren't taught in books or classrooms. You just can't prepare enough for this one.

This certification prep course is taught by Frank Kane, who spent nine years working at Amazon itself in the field of machine learning. Frank took and passed this exam on the first try, and knows exactly what it takes for you to pass it yourself. Joining Frank in this course is Stephane Maarek, an AWS expert and popular AWS certification instructor on Udemy.

In addition to the 9-hour video course, a 30-minute quick assessment practice exam is included that consists of the same topics and style as the real exam. You'll also get four hands-on labs that allow you to practice what you've learned, and gain valuable experience in model tuning, feature engineering, and data engineering.

This course is structured into the four domains tested by this exam: data engineering, exploratory data analysis, modeling, and machine learning implementation and operations. Just some of the topics we'll cover include:

  • S3 data lakes

  • AWS Glue and Glue ETL

  • Kinesis data streams, firehose, and video streams

  • DynamoDB

  • Data Pipelines, AWS Batch, and Step Functions

  • Using scikit_learn

  • Data science basics

  • Athena and Quicksight

  • Elastic MapReduce (EMR)

  • Apache Spark and MLLib

  • Feature engineering (imputation, outliers, binning, transforms, encoding, and normalization)

  • Ground Truth

  • Deep Learning basics

  • Tuning neural networks and avoiding overfitting

  • Amazon SageMaker, in depth

  • Regularization techniques

  • Evaluating machine learning models (precision, recall, F1, confusion matrix, etc.)

  • High-level ML services: Comprehend, Translate, Polly, Transcribe, Lex, Rekognition, and more

  • Security best practices with machine learning on AWS

Machine learning is an advanced certification, and it's best tackled by students who have already obtained associate-level certification in AWS and have some real-world industry experience. This exam is not intended for AWS beginners.

If there's a more comprehensive prep course for the AWS Certified Machine Learning - Specialty exam, we haven't seen it. Enroll now, and gain confidence as you walk into that testing center.

Who this course is for:
  • Individuals performing a development or data science role seeking certification in machine learning and AWS.
Course content
Expand all 114 lectures 09:37:35
+ Introduction
3 lectures 10:01

Get the most from this course - learn how to adjust the video playback speed, enable closed captions, and ensure good video streaming.

Preview 02:10

Download the notebooks you'll need throughout the hands-on labs in this course.

Get the Course Materials
+ Data Engineering
23 lectures 01:25:59
Section Intro: Data Engineering
Amazon S3 - Overview
Amazon S3 - Storage Tiers & Lifecycle Rules
Amazon S3 Security
Kinesis Data Streams & Kinesis Data Firehose
Kinesis Data Analytics
Lab 1.2 - Kinesis Data Analytics
Kinesis Video Streams
Kinesis ML Summary
Glue Data Catalog & Crawlers
Lab 1.3 - Glue Data Catalog
Glue ETL
Lab 1.4 - Glue ETL
Lab 1.5 - Athena
Lab 1 - Cleanup
AWS Data Stores in Machine Learning
AWS Data Pipelines
AWS Batch
AWS DMS - Database Migration Services
AWS Step Functions
Full Data Engineering Pipelines
Data Engineering Summary

Reinforce your knowledge of some key points in this section.

Quiz: Data Engineering
5 questions
+ Exploratory Data Analysis
21 lectures 02:26:11
Section Intro: Data Analysis

High-level overview of how Jupyter notebooks, Pandas, Numpy, Matplotlib, Seaborn, and scikit-learn play a role in exploratory data analysis and preparing your training data for machine learning.

Python in Data Science and Machine Learning

We'll walk through a Jupyter notebook that explores, cleans, and normalizes training data to build a real machine learning model to predict if mammogram results are benign or malignant.

Example: Preparing Data for Machine Learning in a Jupyter Notebook.

We'll cover the differences between numerical, categorical, and ordinal data.

Types of Data

Topics covered include normal distributions, Poisson distributions, binomial distributions, Bernoulli distributions, and the difference between probability density functions and probability mass functions.

Preview 06:05

We'll talk about how time series data consists of separate signals from trends, seasonality, and noise.

Time Series: Trends and Seasonality

A quick overview of Amazon Athena, and how it can be used to query your unstructured, structured, or semi-structured data in S3 in a serverless setting.

Introduction to Amazon Athena

High-level features of QuickSight, Amazon's data visualization product, including its new machine learning capabilities.

Overview of Amazon Quicksight

There are lots of visualization choices; bar and line graphs, heat maps, tree maps, pivot tables, and much more - all of which are offered by QuickSight. Let's talk about how to decide which kind of graph is most appropriate for illustrating different aspects of your data.

Types of Visualizations, and When to Use Them.

How Amazon EMR works, including how a Hadoop cluster's architecture works. What is HDFS and EMRFS? What are different usage modes for EMR? How does it scale? What can it do?

Elastic MapReduce (EMR) and Hadoop Overview

How Apache Spark has supplanted MapReduce; the architecture of Spark, and its capabilities, including Spark Streaming, MLLib, GraphX, and Spark SQL. How Spark integrates with AWS and Kinesis.

Apache Spark on EMR

Zeppelin notebooks run on your EMR cluster to control Spark, but EMR notebooks can run outside of your cluster and control the provisioning of the cluster itself, too. We'll also discuss the security features available with EMR, and how to choose an instance type for the master, core, and task nodes of your cluster.

EMR Notebooks, Security, and Instance Types

We'll introduce what the world of feature engineering is all about, and why it is so important to getting good results from your machine learning models. And, we'll dive into the "curse of dimensionality," and why more features usually isn't better.

Feature Engineering and the Curse of Dimensionality

A big part of feature engineering is dealing with missing data. We'll discuss various approaches, including mean imputation, dropping, and using machine learning for imputation including KNN, deep learning, and regression methods such as MICE.

Preview 08:04

Training models with highly unbalanced data sets - such as in fraud detection, where very few observations are actual fraud, is a big problem. We'll talk about ways to address this from a feature engineering standpoint, including oversampling, undersampling, and SMOTE.

Dealing with Unbalanced Data

We'll introduce how to compute variance and standard deviation, and how to identify outliers as a function of standard deviation and in box-and-whisker plots. We'll also give a shout-out to Amazon's Random Cut Forest algorithm.

Handling Outliers

We'll round out our tour of feature engineering with a discussion of binning numerical data, transforming data to create new features to discover sub-linear and super-linear patterns, one-hot encoding, scaling and normalization, and the importance of shuffling your training data.

Binning, Transforming, Encoding, Scaling, and Shuffling

Humans can be the most important tool for creating missing data, especially labels. We'll talk about how Amazon SageMaker Ground Truth manages human labeling tasks and optimizes them, as well as using unsupervised techniques such as Rekognition and Comprehend to fabricate features and labels from existing data.

Amazon SageMaker Ground Truth and Label Generation

As TF-IDF (Term Frequency - Inverse Document Frequency) may be new to you, we'll start by reviewing how TF-IDF works and how it fits into a search engine solution.

Preview 06:18

As we begin our hands-on lab, we'll start by spinning up an EMR cluster configured to run Apache Spark and Zeppelin, and walk through the process for securely connecting to it from your desktop at home.

Lab: Preparing Data for TF-IDF with Spark and EMR, Part 2

In part 3, we'll use Zeppelin and Spark to explore our Wikipedia data subset, organize it into the format we need, tokenize and transform it, clean it, and finally use it to actually perform a search on our data.

Lab: Preparing Data for TF-IDF with Spark and EMR, Part 3

Let's do a quick knowledge check of what you've learned in the exploratory data analysis domain. These aren't the sorts of questions you'll get on the exam; it's just for review purposes.

Quiz: Exploratory Data Analysis
5 questions
+ Modeling
48 lectures 04:10:00

We'll cover the biological inspiration of deep learning, and how this translates to artificial neural networks.

Introduction to Deep Learning

We'll dive deep into activation functions, including linear, step, logistic / sigmoid, hyperbolic tangent, ReLU, Leaky ReLu, PReLu, Swish, and more - and how to choose between them.

Activation Functions

Convolutional Neural Networks, or CNN's, are inspired by the human visual cortex and are useful for object recognition and other tasks. We'll cover how they work, some popular CNN architectures such as ResNet, and how CNN's are built in Keras and Tensorflow.

Convolutional Neural Networks

Recurrent Neural Networks, or RNN's, are well suited for problems involving sequences of data, such as predicting markets or machine translation. We'll cover how RNN's work, some popular variants included LSTM and GRU, and different applications of them.

Recurrent Neural Networks

A brief discussion of EMR's built-in support for deep learning with Apache MXNet, deep learning AMI's for EC2, and EC2 instance types appropriate for deep learning.

Deep Learning on EC2 and EMR

Hyperparameter tuning of deep neural networks is a complex subject. We'll talk about how deep neural nets are trained with gradient descent, and how your choice of learning rate and batch size affects your training. Sometimes it's counter-intuitive!

Preview 04:48

Deep neural networks are prone to overfitting. We'll cover some simple regularization techniques to combat this, including dropout, early stopping, and simply using a smaller network.

Regularization Techniques for Neural Networks (Dropout, Early Stopping)

What is the vanishing gradient problem, and what can be done to combat it? Also, what's gradient checking?

Grief with Gradients: The Vanishing Gradient problem

What is L1 and L2 regularization, how do they differ, and how do you choose between them?

L1 and L2 Regularization

How to read and interpret various kinds of confusion matrices, allowing you to distinguish true and false positives and negatives from an overall accuracy metric.

The Confusion Matrix

We'll cover various ways to measure classifiers, including precision, recall, ROC curves, F1, RMSE, and AUC. We'll discuss how to interpret these metrics, and how to decide which one is relevant to the problem you're trying to solve.

Precision, Recall, F1, AUC, and more

Two ensemble methods are bagging and boosting, and they solve very different problems.

Ensemble Methods: Bagging and Boosting

The heart of AWS's machine learning offering is SageMaker. We'll cover what it does and its architecture at a high level, and how it's used together with ECR and S3.

Preview 08:06

The Linear Learner algorithm in SageMaker is a robust means of regression or classification in systems that can be described in a linear manner.

Linear Learner in SageMaker

The XGBoost algorithm is winning a lot of Kaggle competitions lately; if you care about accuracy, it's a great choice. SageMaker includes the open source XGBoost algorithm; we'll cover what it does, how to use it, and how to tune it.

XGBoost in SageMaker

The Seq2Seq algorithm is commonly used for machine translation tasks. It is implemented as an RNN or CNN with attention under the hood.

Seq2Seq in SageMaker

DeepAR is a powerful RNN-based model for extrapolating time series, and sets of related time series, into the future.

DeepAR in SageMaker

BlazingText can operate in supervised mode to assign labels to sentences, or in Word2Vec mode to build an embedding layer of related words.

BlazingText in SageMaker

Object2Vec is a general mechanism for building embeddings of objects based on arbitrary pairs of data.

Object2Vec in SageMaker

The Object Detection algorithm identifies objects in images, together with their bounding boxes.

Object Detection in SageMaker

Image Classification is used to identify what objects are in an image, but without data on where those objects are within the image.

Image Classification in SageMaker

Semantic Segmentation identifies objects within images at a per-pixel level, using segmentation masks.

Semantic Segmentation in SageMaker

Random Cut Forest is Amazon's algorithm for anomaly detection in a series of data.

Random Cut Forest in SageMaker

Neural Topic Modeling is a neural network-based technique for clustering documents into a specific number of topics, in an unsupervised manner.

Neural Topic Model in SageMaker

LDA is another topic modeling technique in SageMaker that does not rely on neural networks, but just looks at commonalities in the terms contained by documents.

Latent Dirichlet Allocation (LDA) in SageMaker

KNN is a simple method for classification or regression by just analyzing the K observations most similar to a new observation.

K-Nearest-Neighbors (KNN) in SageMaker
K-Means Clustering in SageMaker
Principal Component Analysis (PCA) in SageMaker

Factorization Machines are generally used for classification or regression of sparse data, for example in recommender systems.

Factorization Machines in SageMaker

IP Insights uses deep learning to identify anomalous behavior from IP addresses in your web log data.

IP Insights in SageMaker

We'll review how reinforcement learning (specifically Q-Learning and Markov Decision Processes) works with an example of an AI-driven video game, and cover how reinforcement learning works within SageMaker.

Reinforcement Learning in SageMaker

SageMaker has the ability to spin up multiple training jobs to automatically explore different hyperparameter settings, and settle on the best values to use for your deployed model. There are some important best practices to follow that we'll cover.

Automatic Model Tuning

SageMaker integrates with Apache Spark, so you can use Spark to pre-process massive data sets, and hand off your data to SageMaker for training and deployment.

Apache Spark with SageMaker

We'll cover what's new in SageMaker for 2020 - mainly SageMaker Studio, a new ML IDE on top of SageMaker Notebooks, Experiments, Debugger, Autopilot, and Model Monitor.

SageMaker Studio, and new SageMaker features for 2020

Amazon Comprehend is a high-level NLP (natural language processing) service, capable of identifying objects, key phrases, languages, sentiments, and syntax in arbitrary text.

Amazon Comprehend

Translate is AWS's high-level service for machine translation.

Amazon Translate

Transcribe is AWS's high-level service for speech-to-text.

Amazon Transcribe

Polly is the AWS service for text-to-speech. There are many ways to control it that we'll talk about.

Preview 05:38

Rekognition is the AWS service for computer vision. It's capable of object detection, facial recognition and analysis, celebrity detection, text extraction, and more.

Amazon Rekognition

Forecast is an AWS service for time-series analysis. It can select from multiple time series prediction models to find the best one for your particular data sets.

Amazon Forecast

Lex is billed as the heart of Alexa; it's really a chatbot-building service.

Amazon Lex

We'll briefly mention Amazon Personalize, Amazon Textract, DeepRacer, and DeepLens.

The Best of the Rest: Other High-Level AWS Machine Learning Services

We'll cover the newest high-level ML services for 2020: AWS DeepComposer, Amazon Fraud Detector, Amazon CodeGuru, Contact Lens for Amazon Connect, Amazon Kendra, and Amazon Augmented AI (A2I)

New ML Services for 2020

Some examples of assembling AWS's high-level machine learning services into complete applications.

Putting them All Together

We'll set up a deep learning AMI on EC2, and connect to Jupyter Notebook from our desktop, and import our deep learning CNN model to experiment with.

Lab: Tuning a Convolutional Neural Network on EC2, Part 1

We'll walk through preparing the input data for our CNN and building our initial model for it in Tensorflow and Keras.

Lab: Tuning a Convolutional Neural Network on EC2, Part 2

Next, we'll improve our model by applying dropout layers to avoid overfitting, and we'll explore the effect changing the batch size and learning rate has on our results, and why.

Lab: Tuning a Convolutional Neural Network on EC2, Part 3

Let's do a quick knowledge check of what you've learned in the modeling domain. These aren't the sorts of questions you'll get on the exam; it's just for review purposes.

Quiz: Modeling
5 questions
+ ML Implementation and Operations
11 lectures 01:02:57
Section Intro: Machine Learning Implementation and Operations

We'll go in depth on how SageMaker containers work and their expected format, and how production variants can be used to divide traffic between different versions of a model.

Preview 10:55

SageMaker Neo can compile some SageMaker inference images into code that may be run on embedded devices, when latency matters a lot. IoT Greengrass is what gets the code where it needs to be.

SageMaker On the Edge: SageMaker Neo and IoT Greengrass

We'll review some general AWS security best practices, and the specifics of how SageMaker encrypts your data at rest and in transit using KMS.

SageMaker Security: Encryption at Rest and In Transit

There are some special cases when using VPC's to keep your SageMaker environment secure. We'll also cover the IAM policies relevant to SageMaker, and how to log and monitor SageMaker with CloudTrail and CloudWatch.

SageMaker Security: VPC's, IAM, Logging, and Monitoring

Some general guidelines on choosing an instance type for SageMaker training and inference, and how to use Spot instances to reduce your training costs.

SageMaker Resource Management: Instance Types and Spot Training

Elastic Inference can accelerate deep learning inference deployments at a lower cost than deploying dedicated GPU instances. Automatic scaling can automatically add and remove inference nodes in response to load, as measured by CloudWatch. We'll also talk about ensuring your SageMaker resources are spread across multiple availability zones.

SageMaker Resource Management: Elastic Inference, Automatic Scaling, AZ's

Inference Pipelines allow you to chain together multiple containers for inference.

SageMaker Inference Pipelines

In part one, we'll spin up a SageMaker notebook and import our CNN model developed with Keras and Tensorflow.

Lab: Tuning, Deploying, and Predicting with Tensorflow on SageMaker - Part 1

In part 2, we'll test our model locally on the notebook instance, and kick off a training job using SageMaker on a P3 instance.

Lab: Tuning, Deploying, and Predicting with Tensorflow on SageMaker - Part 2

Finally, we'll deploy our model and use it to make inferences. And, we'll use SageMaker's automatic model tuning to explore the space of hyperparameters to find the best values for our model, and deploy a new, tuned model.

Lab: Tuning, Deploying, and Predicting with Tensorflow on SageMaker - Part 3

Let's do a quick knowledge check of what you've learned in the ML implementation and operations domain. These aren't the sorts of questions you'll get on the exam; it's just for review purposes.

Quiz: ML Implementation and Operations
5 questions
+ Wrapping Up
6 lectures 19:56
Section Intro: Wrapping Up

There are additional practice exams from third parties and from Amazon that are worth taking. Amazon also offers an exam guide, free sample questions, and free online courses of their own. The SageMaker developer guide is also an important resource.

More Preparation Resources

What to expect on your test day, and how to make sure you're in top form for it. We'll also cover some strategies on how to manage your time during the exam, and find the best answers.

Preview 10:04
You Made It!
Save 50% on your AWS Exam Cost!
Get an Extra 30 Minutes on your AWS Exam - Non Native English Speakers only
+ Practice Exams
2 lectures 02:30

This 10-question warmup test should give you a good idea of how prepared you really are for the full practice exam, and for the real one - without investing 3 hours in the process. We chose these questions to be representative of the domains covered by the real exam, and some of the more difficult topics you'll be expected to know on it. If you're surprised by the topics and level of detail you encounter, you know you have more preparation and studying to do.

The AWS Certified Machine Learning Specialty exam goes beyond AWS topics, and tests your knowledge in feature engineering, model tuning, and modeling as well as how deep neural networks work. You need to both have expert-level knowledge of AWS's machine learning services (especially SageMaker), and expert-level knowledge in machine learning and AI in general. Many of the questions seem specifically designed to confound people who have only learned the theory of AI but have not applied it in practice.

Warmup Test: Quick Assessment
10 questions
Bonus Lecture: Get the Full 3-Hour Practice Exam