
Get the most from this course - learn how to adjust the video playback speed, enable closed captions, and ensure good video streaming.
Download the notebooks you'll need throughout the hands-on labs in this course.
High-level overview of how Jupyter notebooks, Pandas, Numpy, Matplotlib, Seaborn, and scikit-learn play a role in exploratory data analysis and preparing your training data for machine learning.
We'll walk through a Jupyter notebook that explores, cleans, and normalizes training data to build a real machine learning model to predict if mammogram results are benign or malignant.
We'll cover the differences between numerical, categorical, and ordinal data.
Topics covered include normal distributions, Poisson distributions, binomial distributions, Bernoulli distributions, and the difference between probability density functions and probability mass functions.
We'll talk about how time series data consists of separate signals from trends, seasonality, and noise.
A quick overview of Amazon Athena, and how it can be used to query your unstructured, structured, or semi-structured data in S3 in a serverless setting.
High-level features of QuickSight, Amazon's data visualization product, including its new machine learning capabilities.
There are lots of visualization choices; bar and line graphs, heat maps, tree maps, pivot tables, and much more - all of which are offered by QuickSight. Let's talk about how to decide which kind of graph is most appropriate for illustrating different aspects of your data.
How Amazon EMR works, including how a Hadoop cluster's architecture works. What is HDFS and EMRFS? What are different usage modes for EMR? How does it scale? What can it do?
How Apache Spark has supplanted MapReduce; the architecture of Spark, and its capabilities, including Spark Streaming, MLLib, GraphX, and Spark SQL. How Spark integrates with AWS and Kinesis.
Zeppelin notebooks run on your EMR cluster to control Spark, but EMR notebooks can run outside of your cluster and control the provisioning of the cluster itself, too. We'll also discuss the security features available with EMR, and how to choose an instance type for the master, core, and task nodes of your cluster.
We'll introduce what the world of feature engineering is all about, and why it is so important to getting good results from your machine learning models. And, we'll dive into the "curse of dimensionality," and why more features usually isn't better.
A big part of feature engineering is dealing with missing data. We'll discuss various approaches, including mean imputation, dropping, and using machine learning for imputation including KNN, deep learning, and regression methods such as MICE.
Training models with highly unbalanced data sets - such as in fraud detection, where very few observations are actual fraud, is a big problem. We'll talk about ways to address this from a feature engineering standpoint, including oversampling, undersampling, and SMOTE.
We'll introduce how to compute variance and standard deviation, and how to identify outliers as a function of standard deviation and in box-and-whisker plots. We'll also give a shout-out to Amazon's Random Cut Forest algorithm.
We'll round out our tour of feature engineering with a discussion of binning numerical data, transforming data to create new features to discover sub-linear and super-linear patterns, one-hot encoding, scaling and normalization, and the importance of shuffling your training data.
Humans can be the most important tool for creating missing data, especially labels. We'll talk about how Amazon SageMaker Ground Truth manages human labeling tasks and optimizes them, as well as using unsupervised techniques such as Rekognition and Comprehend to fabricate features and labels from existing data.
As TF-IDF (Term Frequency - Inverse Document Frequency) may be new to you, we'll start by reviewing how TF-IDF works and how it fits into a search engine solution.
We'll walk through the process of creating an JupyterLab PySpark Notebook within an EMR Workspace, backed by an EMR EC2 cluster, within a EMR Studio environment. We'll use this notebook to pre-process real Wikipedia data, build a TF/IDF model around it, and use it for actual search.
We'll cover the biological inspiration of deep learning, and how this translates to artificial neural networks.
We'll dive deep into activation functions, including linear, step, logistic / sigmoid, hyperbolic tangent, ReLU, Leaky ReLu, PReLu, Swish, and more - and how to choose between them.
Convolutional Neural Networks, or CNN's, are inspired by the human visual cortex and are useful for object recognition and other tasks. We'll cover how they work, some popular CNN architectures such as ResNet, and how CNN's are built in Keras and Tensorflow.
Recurrent Neural Networks, or RNN's, are well suited for problems involving sequences of data, such as predicting markets or machine translation. We'll cover how RNN's work, some popular variants included LSTM and GRU, and different applications of them.
A brief discussion of EMR's built-in support for deep learning with Apache MXNet, deep learning AMI's for EC2, and EC2 instance types appropriate for deep learning.
Hyperparameter tuning of deep neural networks is a complex subject. We'll talk about how deep neural nets are trained with gradient descent, and how your choice of learning rate and batch size affects your training. Sometimes it's counter-intuitive!
Deep neural networks are prone to overfitting. We'll cover some simple regularization techniques to combat this, including dropout, early stopping, and simply using a smaller network.
What is L1 and L2 regularization, how do they differ, and how do you choose between them?
What is the vanishing gradient problem, and what can be done to combat it? Also, what's gradient checking?
How to read and interpret various kinds of confusion matrices, allowing you to distinguish true and false positives and negatives from an overall accuracy metric.
We'll cover various ways to measure classifiers, including precision, recall, ROC curves, F1, RMSE, and AUC. We'll discuss how to interpret these metrics, and how to decide which one is relevant to the problem you're trying to solve.
Two ensemble methods are bagging and boosting, and they solve very different problems.
The heart of AWS's machine learning offering is SageMaker. We'll cover what it does and its architecture at a high level, and how it's used together with ECR and S3.
The Linear Learner algorithm in SageMaker is a robust means of regression or classification in systems that can be described in a linear manner.
The XGBoost algorithm is winning a lot of Kaggle competitions lately; if you care about accuracy, it's a great choice. SageMaker includes the open source XGBoost algorithm; we'll cover what it does, how to use it, and how to tune it.
The Seq2Seq algorithm is commonly used for machine translation tasks. It is implemented as an RNN or CNN with attention under the hood.
DeepAR is a powerful RNN-based model for extrapolating time series, and sets of related time series, into the future.
BlazingText can operate in supervised mode to assign labels to sentences, or in Word2Vec mode to build an embedding layer of related words.
Object2Vec is a general mechanism for building embeddings of objects based on arbitrary pairs of data.
The Object Detection algorithm identifies objects in images, together with their bounding boxes.
Image Classification is used to identify what objects are in an image, but without data on where those objects are within the image.
Semantic Segmentation identifies objects within images at a per-pixel level, using segmentation masks.
Random Cut Forest is Amazon's algorithm for anomaly detection in a series of data.
Neural Topic Modeling is a neural network-based technique for clustering documents into a specific number of topics, in an unsupervised manner.
LDA is another topic modeling technique in SageMaker that does not rely on neural networks, but just looks at commonalities in the terms contained by documents.
KNN is a simple method for classification or regression by just analyzing the K observations most similar to a new observation.
Factorization Machines are generally used for classification or regression of sparse data, for example in recommender systems.
IP Insights uses deep learning to identify anomalous behavior from IP addresses in your web log data.
We'll review how reinforcement learning (specifically Q-Learning and Markov Decision Processes) works with an example of an AI-driven video game, and cover how reinforcement learning works within SageMaker.
SageMaker has the ability to spin up multiple training jobs to automatically explore different hyperparameter settings, and settle on the best values to use for your deployed model. There are some important best practices to follow that we'll cover.
SageMaker integrates with Apache Spark, so you can use Spark to pre-process massive data sets, and hand off your data to SageMaker for training and deployment.
We'll cover what's new in SageMaker for 2020 - mainly SageMaker Studio, a new ML IDE on top of SageMaker Notebooks, Experiments, Debugger, Autopilot, and Model Monitor.
Amazon Comprehend is a high-level NLP (natural language processing) service, capable of identifying objects, key phrases, languages, sentiments, and syntax in arbitrary text.
Translate is AWS's high-level service for machine translation.
Transcribe is AWS's high-level service for speech-to-text.
Polly is the AWS service for text-to-speech. There are many ways to control it that we'll talk about.
Rekognition is the AWS service for computer vision. It's capable of object detection, facial recognition and analysis, celebrity detection, text extraction, and more.
Forecast is an AWS service for time-series analysis. It can select from multiple time series prediction models to find the best one for your particular data sets.
Lex is billed as the heart of Alexa; it's really a chatbot-building service.
We'll briefly mention Amazon Personalize, Amazon Textract, DeepRacer, and DeepLens.
We'll cover the newest high-level ML services for 2020: AWS DeepComposer, Amazon Fraud Detector, Amazon CodeGuru, Contact Lens for Amazon Connect, Amazon Kendra, and Amazon Augmented AI (A2I)
Some examples of assembling AWS's high-level machine learning services into complete applications.
We'll set up a deep learning AMI on EC2, and connect to Jupyter Notebook from our desktop, and import our deep learning CNN model to experiment with.
We'll walk through preparing the input data for our CNN and building our initial model for it in Tensorflow and Keras.
Next, we'll improve our model by applying dropout layers to avoid overfitting, and we'll explore the effect changing the batch size and learning rate has on our results, and why.
We'll go in depth on how SageMaker containers work and their expected format, and how production variants can be used to divide traffic between different versions of a model.
SageMaker Neo can compile some SageMaker inference images into code that may be run on embedded devices, when latency matters a lot. IoT Greengrass is what gets the code where it needs to be.
We'll review some general AWS security best practices, and the specifics of how SageMaker encrypts your data at rest and in transit using KMS.
There are some special cases when using VPC's to keep your SageMaker environment secure. We'll also cover the IAM policies relevant to SageMaker, and how to log and monitor SageMaker with CloudTrail and CloudWatch.
Some general guidelines on choosing an instance type for SageMaker training and inference, and how to use Spot instances to reduce your training costs.
Elastic Inference can accelerate deep learning inference deployments at a lower cost than deploying dedicated GPU instances. Automatic scaling can automatically add and remove inference nodes in response to load, as measured by CloudWatch. We'll also talk about ensuring your SageMaker resources are spread across multiple availability zones.
Inference Pipelines allow you to chain together multiple containers for inference.
In part one, we'll spin up a SageMaker notebook and import our CNN model developed with Keras and Tensorflow.
In part 2, we'll test our model locally on the notebook instance, and kick off a training job using SageMaker on a P3 instance.
Finally, we'll deploy our model and use it to make inferences. And, we'll use SageMaker's automatic model tuning to explore the space of hyperparameters to find the best values for our model, and deploy a new, tuned model.
Understand how the Transformer architecture evolved from earlier work with Recurrent Neural Networks, and how self-attention and attention-based neural networks allowed massive parallel training of large language models (LLMs)
Understand exactly how self-attention, masked self-attention, and multi-headed self-attention neural networks are trained and used to establish the meaning of words within their context. Also, how self-attention is a part of Transformers (such as GPT) and large language models.
Understand how the Transformer architecture for deep learning can be used for chat, Q&A, classification, named entity recognition, summarization, translation, code generation, and text generation.
Learn how GPT (GPT-2, GPT-3.5, GPT-4) works under the hood, including tokenization, token embedding, positional encoding, and stacks of Decoders consisting of masked self-attention and feed-forward neural networks (FFNN's).
In the conclusion of how GPT works, we'll focus on the output processing that happens after the final Decoder block. How this produces a new token embedding and logits that can be used to predict the next token (word) in a sequence.
Learn how fine-tuning can be applied to transformers (such as GPT) to adapt them to specific tasks, through transfer learning.
Launch a SageMaker Notebook integrated with Huggingface to explore tokenization and positional encoding components of the Transformer architecture.
See and visualize multi-headed self-attention using a SageMaker Notebook and Huggingface.
We'll illustrate importing the GPT-2 transfomer from HuggingFace into a SageMaker Notebook, and using it to generate text.
Learn how AWS is starting to incorporate generative AI with AWS Foundation Models (Jurassic-2, Claude, Stable Diffusion, Amazon Titan), and how to quickly deploy and use them with SageMaker JumpStart.
We'll load a GPT-2 Foundation Model using SageMaker JumpStart, use it, and discuss how to fine-tune it within a SageMaker Notebook.
Understand what the upcoming Amazon Bedrock service promises to be: a serverless interface to AWS foundation models for generative AI within SageMaker.
Learn the features and pricing of Amazon CodeWhisper, a coding assistant powered by generative AI and AWS.
There are additional practice exams from third parties and from Amazon that are worth taking. Amazon also offers an exam guide, free sample questions, and free online courses of their own. The SageMaker developer guide is also an important resource.
What to expect on your test day, and how to make sure you're in top form for it. We'll also cover some strategies on how to manage your time during the exam, and find the best answers.
[꼭 읽어주세요] 한글 AI 자막 강의란?
유데미의 한국어 [자동] AI 자막 서비스로 제공되는 강의입니다.
강의에 대한 질문사항은 Frank & Stephane 강사님이 확인하실 수 있도록 Q&A 게시판에 영어로 남겨주시기 바랍니다.
강의 내용 업데이트: 최신 SageMaker 기능, 생성적 AI (GPT) 및 새로운 AWS ML 서비스 내용이 추가되었습니다. 즐거운 학습 되세요!
AWS Certified Machine Learning - Specialty (MLS-C01) 자격증 시험에 합격할 수 있을지 걱정되신다구요? 그럴 만도 하죠! 가장 어렵고 가치 있는 AWS 자격증 중 하나라는 것은 의심할 여지가 없습니다. 이 시험에 합격하려면 AWS와 SageMaker에 대한 깊은 지식만으로는 부족하며, 책이나 강의실에서 가르치지 않는 머신러닝에 대한 깊은 지식과 피처 엔지니어링 및 모델 튜닝의 뉘앙스에 대한 이해도 필요합니다. 아무리 많이 공부해도 이 시험에 대해 충분히 준비하고 간다는 것은 쉽지 않습니다.
이 자격증 준비 강의는 Amazon에서 9년 동안 머신러닝 분야에서 근무한 Frank Kane 강사님이 강의합니다. Frank 강사님은 이 시험을 첫 번째 응시만에 바로 합격했으며, 시험을 통과하기 위해 무엇이 필요한지 정확히 알고 있습니다. Frank 강사님과 함께 이 강의를 진행하는 공동강사님은 AWS 전문가이자 Udemy에서 인기 있는 AWS 자격증 강사인 Stephane Maarek 강사님입니다.
14시간의 강의 영상과 더불어 실제 시험과 동일한 주제와 스타일로 구성된 30분의 빠른 실전 테스트도 포함되어 있습니다. 또한 학습한 내용을 실습하고 모델 튜닝, 피처 엔지니어링, 데이터 엔지니어링에 대한 귀한 경험을 쌓을 수 있는 4개의 실습 랩이 제공됩니다.
이 강의는 데이터 엔지니어링, 탐색적 데이터 분석, 모델링 , 머신러닝 구현 및 오퍼레이션 등 이 시험에서 테스트하는 네 가지 영역으로 구성되어 있습니다. 시험에서 다루는 주제는 다음과 같습니다:
트랜스포머 아키텍처 (GPT)와 attention 기반의 신경망 (마스킹된 셀프 어텐션)을 포함한 생성형 AI 및 대규모 언어 모델(LLM) 의 작동 방식
Amazon의 최신 생성형 AI 서비스: Bedrock, 생성형 AI를 위한 SageMaker JumpStart, CodeWhisperer, SageMaker 파운데이션 모델
S3 데이터 레이크
AWS Glue 및 Glue ETL
Kinesis 데이터 스트림, firehose 및 비디오 스트림
DynamoDB
데이터 파이프라인, AWS Batch 및 계단 함수
scikit_learn 사용
데이터 과학 기초
Athena 및 Quicksight
Elastic MapReduce (EMR)
Apache Spark 및 MLLib
피처 엔지니어링 (전치, 이상값, 비닝, 변환, 인코딩 및 정규화)
실측 데이터 (Ground Truth)
딥 러닝 기초
신경망 튜닝 및 과적합 방지
SageMaker Studio, SageMaker 모델 모니터, SageMaker 오토파일럿 및 SageMaker 디버거를 포함한 Amazon SageMaker.
정규화 기술
머신 러닝 모델 평가 (정밀도, 리콜, F1, 혼동 행렬 등)
하이 레벨 ML 서비스: Comprehend, Translate, Polly, Transcribe, Lex, Rekognition 등
Amazon Personalize 로 추천 시스템 구축
Lookout 및 Monitron 으로 산업 장비 모니터링하기
AWS에서 머신 러닝을 통한 보안 모범 사례
머신러닝은 고급 자격증으로, 이미 AWS에서 associate 수준의 자격증을 취득하고 실제 업계 경험이 있는 분이 응시하는 것이 가장 좋습니다. 이 시험은 AWS 초보자를 위한 시험이 아닙니다.
AWS Certified Machine Learning - Specialty 시험 준비를 위해 이 강의보다 더 포괄적인 강의는 아직 보지 못했습니다. 지금 수강신청하시고 자신 있게 시험장에 들어가세요!
강사 소개 - Stéphane Maarek
저는 클라우드 컴퓨팅에 대한 열정을 가지고 있으며, 이 과정의 강사로 활동 중인 Stéphane Maarek입니다. 저는 AWS 자격증에 대해 가르치고 있으며, 학생들이 AWS에서 전문적인 역량을 향상시킬 수 있도록 돕는 데 중점을 두고 있습니다.
저는 이미 1,500,000명 이상의 학생들을 가르쳤고, 이러한 자격증과 과정을 설계하고 제공하는 과정에서 50만 개 이상의 리뷰를 받았습니다!
AWS가 오늘날 현대 IT 아키텍처의 중심이 되면서, 저는 이제 학생들이 AWS 머신러닝 전문가가 되는 방법을 배워야 할 때라고 생각했습니다. 그럼 이제 강의를 시작하겠습니다! 저의 강의에 잘 찾아오셨습니다!
강사 소개 - Frank Kane
안녕하세요, 저는 이 강의의 강사 Frank Kane입니다. 저는 아마존에서 9년 동안 선임 엔지니어와 수석 관리자로 일하면서 추천 시스템과 머신 러닝을 전문으로 일했습니다. 강사로서 저는 "빅데이터" , 데이터 분석, 머신 러닝, Apache Spark, 시스템 설계, 기술 관리 및 경력 성장, Elasticsearch 분야의 베스트셀러 강좌로 가장 잘 알려져 있습니다.
저는 2015년부터 Udemy에서 강의하고 있으며, 전 세계 700,00명 이상의 수강생을 배출했습니다!
저는 이 강의에 AWS 머신 러닝의 최신 개발 사항을 반영하고 최신 버전의 시험에 대비할 수 있도록 열심히 노력했습니다. 지금 바로 수강신청하셔서 시험에 준비하세요!
이 강의에는 다음과 같은 혜택도 제공됩니다:
향후 모든 업데이트에 대한 평생 액세스
Q&A 섹션에서 응답하는 강사
다운로드 가능한 Udemy 수료증
30일 "묻지도 따지지도 않는" 환불 보장!
AWS 머신러닝 자격증을 준비하고 AWS 플랫폼을 마스터하고 싶다면 바로 이 강의에 수강신청해보세요!