
After a brief introduction to the course, we'll dive right in and install what you need: Anaconda (your Python development environment,) the course materials, and the MovieLens data set of 100,00 real movie ratings from real people. We'll then run a quick example to generate movie recommendations using the SVD algorithm, to make sure it all works!
We'll just lay out the structure of the course so you know what to expect later on (and when you'll start writing some code of your own!) Also, we'll provide advice on how to navigate this course depending on your prior experience.
The phrase "recommender system" is a more general-sounding term than it really is. Let's briefly clarify what a recommender system is - and more importantly, what it is not.
There are many different flavors of recommender systems, and you encounter them every day. Let's review some of the applications of recommender systems in the real world.
How do recommender systems learn about your individual tastes and preferences? We'll explain how both explicit ratings and implicit ratings work, and the strengths and weaknesses of both.
Most real-world recommender systems are "Top-N" systems, that produce a list of top results to individuals. There are a couple of main architectural approaches to building them, which we'll review here.
We'll review what we've covered in this section with a quick 4-question quiz, and discuss the answers.
After installing Jupyter Notebook, we'll cover the basics of what's different about Python, including its use of white-space. We'll dissect a simple function to get a feel of what Python code looks like.
We'll look at using lists, tuples, and dictionaries in Python.
We'll see how to define a function in Python, and how Python lets you pass functions to other functions. We'll also look at a simple example of a Lambda function.
We'll look at how Boolean expressions work in Python as well as loops. Then, we'll give you a challenge to write a simple Python function on your own!
Learn about different testing methodologies for evaluating recommender systems offline, including train/test, K-Fold Cross Validation, and Leave-One-Out cross-validation.
Learn about Root Mean Squared Error, Mean Absolute Error, and why we use these measures of recommendation prediction accuracy.
Learn about several ways to measure the accuracy of top-N recommenders, including hit rate, cumulative hit rate, average reciprocal hit rank, rating hit rate, and more.
Learn how to measure the coverage of your recommender system, how diverse its results are, and how novel its results are.
Measure how often your recommendations change (churn,) how quickly they respond to new data (responsiveness,) and why no metric matters more than the results of real, online A/B tests. We'll also talk about perceived quality, where you explicitly ask your users to rate your recommendations.
In this short quiz, we'll review what we've learned about different ways to measure the qualities and accuracy of your recommender system.
Let's walk through this course's Python module for implementing the metrics we've discussed in this section on real recommender systems.
We'll walk through our sample code to apply our RecommenderMetrics module to a real SVD recommender using real MovieLens rating data, and measure its performance in many different ways.
After running TestMetrics.py, we'll look at the results for our SVD recommender, and discuss how to interpret them.
Let's review the architecture of our recommender engine framework, which will let us easy implement, test, and compare different algorithms throughout the rest of this course.
In part one of the code walkthrough of our recommender engine, we'll see how it's used, and dive into the Evaluator class.
In part two of the walkthrough, we'll dive into the EvaluationData class, and kick off a test with the SVD recommender.
Wrapping up our review of our recommender system architecture, we'll look at the results of using our framework to evaluate the SVD algorithm, and interpret them.
We'll talk about how content-based recommendations work, and introduce the cosine similarity metric. Cosine scores will be used throughout the course, and understanding their mathematical basis is important.
We'll cover how to factor time into our content-based recs, and how the concept of KNN will allow us to make rating predictions just based on similarity scores based on genres and release dates.
We'll look at some code for producing movie recommendations based on their genres and years, and evaluate the results using the MovieLens data set.
A common point of confusion is how to use implicit ratings, such as purchase or click data, with the algorithms we're talking about. It's pretty simple, but let's cover it here.
In our first "bleeding edge alert," we'll examine the use of Mise en Scene data for providing additional content-based information to our recommendations. And, we'll turn the idea into code, and evaluate the results.
In two different hands-on exercises, dive into which content attributes provide the best recommendations - and try augmenting our content-based recommendations using popularity data.
Similarity between users or items is at the heart of all neighborhood-based approaches; we'll discuss how similarity measures fit into our architecture, and the effect data sparsity has on it.
We'll cover different ways of measuring similarity, including cosine, adjusted cosine, Pearson, Spearman, Jaccard, and more - and how to know when to use each one.
We'll illustrate how user-based collaborative filtering works, where we recommend stuff that people similar to you liked.
Let's write some code to apply user-based collaborative filtering to the MovieLens data set, run it, and evaluate the results.
We'll talk about the advantages of flipping user-based collaborative filtering on its head, to give us item-based collaborative filtering - and how it works.
Let's write, run, and evaluate some code to apply item-based collaborative filtering to generate recommendations from the MovieLens data set, and compare it to user-based CF.
In this exercise, you're challenged to improve upon the user-based and item-based collaborative filtering algorithms we presented, by tweaking the way candidate generation works.
Since collaborative filtering does not make rating predictions, evaluating it offline is challenging - but we can test it with hit rate metrics, and leave-one-out cross validation. Which we'll do, in this activity.
In the previous activity, we measured the hit rate of a user-based collaborative filtering system. Your challenge is to do the same for an item-based system.
Learn how the ideas of neighborhood-based collaborative filtering can be applied into frameworks based on rating predictions, with K-Nearest-Neighbor recommenders.
Let's use SurpriseLib to quickly run user-based and item-based KNN on our MovieLens data, and evaluate the results.
Try different similarity measures to see if you can improve on the results of KNN - and we'll talk about why this is so challenging.
In our next "bleeding edge alert," we'll discuss Translation-Based Recommendations - an idea unveiled in the 2017 RecSys conference for recommending sequences of events, based on vectors in item similarity space.
Let's learn how PCA allows us to reduce higher-dimensional data into lower dimensions, which is the first step toward understanding SVD.
We'll extend PCA to the problem of making movie recommendations, and learn how SVD is just a specific implementation of PCA.
Let's run SVD and SVD++ on our MovieLens movie ratings data set, and evaluate the results. They're really good!
We'll talk about some variants and extensions to SVD that have emerged, and the importance of hyperparameter tuning on SVD, as well as how to tune parameters in SurpriseLib using the GridSearchCV class.
Have a go at modifying our SVD bake-off code to find the optimal values of the various hyperparameters for SVD, and see if it makes a difference in the results.
We'll cover some exciting research from the University of Minnesota based on matrix factorization.
A quick introduction on what to expect from this section, and who can skip it.
We'll cover the concepts of Gradient Descent, Reverse Mode AutoDiff, and Softmax, which you'll need to build deep neural networks.
We'll cover the evolution of neural networks from their origin in the 1940's, all the way up to the architecture of modern deep neural networks.
We'll use the Tensorflow Playground to get a hands-on feel of how deep neural networks operate, and the effects of different topologies.
We'll cover the mechanics of different activation functions and optimization functions for neural networks, including ReLU, Adam, RMSProp, and Gradient Descent.
We'll talk about how to prevent overfitting using techniques such as dropout layers, and how to tune your topology for the best results.
We'll walk through an example of using Tensorflow's low-level API to distribute the processing of neural networks using Python.
In this hands-on activity, we'll implement handwriting recognition on real data using Tensorflow's low-level API. Part 1 of 3.
In this hands-on activity, we'll implement handwriting recognition on real data using Tensorflow's low-level API. Part 2 of 3.
Keras is a higher-level API that makes developing deep neural networks with Tensorflow a lot easier. We'll explain how it works and how to use it.
We'll tackle the same handwriting recognition problem as before, but this time using Keras with much simpler code, and better results.
There are different patterns to use in Keras for multi-class or binary classification problems; we'll talk about how to tackle each.
As an exercise challenge, develop your own neural network using Keras to predict the political parties of politicians, based just on their votes on 16 different issues.
We'll talk about how your brain's visual cortex recognizes images seen by your eyes, and how the same approach inspires artificial convolutional neural networks.
The topology of CNN's can get complicated, and there are several variations of them you can choose from for certain problems, including LeNet, GoogLeNet, and ResNet.
We'll tackle handwriting recognition again, this time using Keras and CNN's for our best results yet. Can you improve upon them?
Recurrent Neural Networks are appropriate for sequences of information, such as time series data, natural language, or music. We'll dive into how they work and some variations of them.
Training RNN's involve back-propagating through time, which makes them extra-challenging to work with.
We'll wrap up our intro to deep learning by applying RNN's to the problem of sentiment analysis, which can be modeled as a sequence-to-vector learning problem.
We'll introduce the idea of using neural networks to produce recommendations, and explore whether this concept is overkill or not.
We'll cover a very simple neural network called the Restricted Boltzmann Machine, and show how it can be used to produce recommendations given sparse rating data.
We'll walk through our implementation of Restricted Boltzmann Machines integrated into our recommender framework. Part 1 of 2.
We'll walk through our implementation of Restricted Boltzmann Machines integrated into our recommender framework. Part 2 of 2.
We'll run our RBM recommender, and study its results.
You're challenged to tune the RBM using GridSearchCV to see if you can improve its results.
We'll review my results from the previous exercise, so you can compare them against your own.
We'll learn how to apply modern deep neural networks to recommender systems, and the challenges sparse data creates.
We'll walk through our code for producing recommendations with deep learning, and evaluate the results.
We'll introduce "GRU4Rec," a technique that applies recurrent neural networks to the problem of clickstream recommendations.
As a more challenging exercise that mimics what you might do in the real world, try and port some older research code into a modern Python and Tensorflow environment, and get it running.
We'll review my results from the previous exercise.
We'll explore DeepFM, which combines the strengths of Factorization Machines and of Deep Neural Networks to produce a hybrid solution that out-performs either technique.
We'll cover a few more "bleeding edge" topics, including Word2Vec, 3D CNN's for session-based recommendations, and feature extraction with CNN's.
We'll introduce Apache Spark as our first means of "scaling it up," and get it installed on your system if you want to experiment with it.
We'll explain just enough about how Spark works to let you understand how it distributes its work across a cluster, and the main objects our sample code will use: RDD's and DataFrames.
We'll start by using Spark's MLLib to generate recommendations with ALS for our ml-100k data set.
We'll scale things up, and use all of the cores on our local PC to process 20 million ratings and produce top-N recommendations with Apache Spark.
Amazon open-sourced its recommender engine called DSSTNE, which makes it easy to apply deep neural networks to massive, sparse data sets and produce great recommendations at large scale.
Watch as we use Amazon DSSTNE on an EC2 Ubuntu instance to produce movie recommendations using a deep neural network.
Let's explore how Amazon scaled DSSTNE up, paired with Apache Spark, to process their massive data and produce recommendations for millions of customers.
Amazon's SageMaker service offers some machine learning algorithms that can be used for recommendations, including factorization machines.
Watch as I use SageMaker from a cloud-hosted Notebook to pre-process the MovieLens 1-million-rating data set, train and save a Factorization Machine model, and deploy the model for making real-time predictions for movie recommendations.
A huge number of commercial SAAS offerings have emerged to offer easy-to-use recommender systems out of the box, and there are many open-source offerings that allow you to develop recommender systems at scale at as low a level as you want. We'll cover some of the more popular ones, and enumerate the rest.
The specifics of how you deploy a recommender system into production will depend on the environment you're working within, but we'll cover some high-level architectures to consider and some of the technologies you might employ.
Amazon 엔지니어에게 배우는 추천 시스템 구축!
실습 위주로 진행되는 실무와 가까운 강의!
자신만의 프레임워크를 개발하고 추천 알고리즘을 결합해보세요!
머신 러닝 & AI 로 추천 시스템 구축하기를 선택해야 하는 이유
이 강의에서 neighborhood-based 협업 필터링을 기반으로 이미 시도된 진정한 추천 알고리즘을 다룰 것이고, 행렬 분해와 인공 신경망을 사용한 딥 러닝을 포함한, 보다 현대적인 기술까지 연구할 것입니다. 그 과정에서 이러한 알고리즘을 실제 데이터와 함께 대규모로 적용할 때 직면하게 될 실제 문제들을 Frank의 광범위한 업계 경험을 통해 보다 재미있고 쉽게 이해하게 될 것입니다.
하지만 추천 시스템은 굉장히 복잡합니다. 만약 코드만 배우는 수준으로 생각 하신다면 다시 한번 생각 하시길 바랍니다. 추천 시스템을 구축하는 방법에 대한 정확한 레시피 대신 여러분은 언제, 어떤 알고리즘을 적용 해야 하는지 습득해야 합니다. 물론, 이미 코딩을 하는 법은 알고 계시리라 믿습니다. :)
이 강의는 굉장한 실습 위주로 되어 있으며, 강의를 통해 자신 만의 프레임워크를 개발하여 많은 추천 알고리즘들을 결합하고 평가하게 될 것이며, TensorFlow을 사용 하여 자신 만의 신경을 구축하게 될 것입니다. 또한, 실제 사용자들이 사용하는 실제 영화 등급 추천 시스템을 구축 하게 될 것입니다.
이 포괄적인 강의는 협업 필터링의 초기부터 심층 신경망의 최첨단 응용 프로그램 및 모든 개별 사용자에게 최고의 항목을 추천하기 위한 최신 머신 러닝 기술에 이르기까지 모든 과정을 안내합니다.
이 강의의 코딩 실습은 파이썬 프로그래밍 언어를 사용합니다. Python을 처음 사용하는 분들을 위해 Python에 대한 소개 강의가 포함되어 있지만, 전체적인 강의 내용을 성공적으로 다루려면 사전 프로그래밍 경험이 필요합니다. 인공 지능 분야가 처음인 분들을 위해 딥 러닝에 대한 간략한 소개 강의도 포함되고 있지만 새로운 컴퓨터 알고리즘을 이해할 수 있어야 합니다.
머신 러닝 & AI 로 추천 시스템 구축하기는 이렇게 진행 됩니다
추천 엔진 제작
평가 추천 시스템
항목 속성을 사용한 콘텐츠 기반 필터링
사용자 기반, 아이템 기반 및 KNN CF를 사용한 Neighborhood-based 협업 필터링
행렬분해 그리고 SVD를 포함한 모델 기반 매소드
딥 러닝, AI 및 인공 신경망을 추천 시스템에 적용
TensorFlow (TFRS) 와 Amazon Personalize의 최신 프레임워크 사용
재귀 신경망을 이용한 세션 기반 추천
Apache Spark 머신 러닝, Amazon DSSTNE 딥러닝, 그리고 Factorization Machines을 사용한 AWS SageMaker을 사용 하여 대규모 데이터 세트로 확장
추천 시스템의 현실 문제와 솔루션
유튜브와 넷플릭스의 사례 연구
하이브리드 구축, 앙상블 추천 시스템
추천 시스템 분야의 최신 연구를 다루는 "Bleeding edge alerts"
전세계 81만 수강생 보유 TOP강사! Frank Kane의 한마디
한국 수강생 여러분들을 위한 새로운 소식을 전해드립니다!
이 강의는 TensorFlow Recommenders(TFRS) 및 권장 사항을 위한 Generative Adversarial Networks(GAN)로 업데이트 되었습니다!
이 분야의 개척자이자 Amazon의 엔지니어에게로부터 머신 러닝 추천 시스템 구축하는 방법을 배워보세요. Frank Kane은 Amazon에서 9년 넘게 근무하면서 Amazon의 개인화 제품 추천 기술을 관리하고 또한 개발했습니다.
이러한 머신 러닝 알고리즘이 사용자의 고유한 관심사를 학습한 뒤, 각 개인에게 가장 적합한 제품이나 콘텐츠를 보여주기 때문에 Netflix 홈페이지, YouTube 및 Amazon 등 어디에서나 자동화된 추천을 볼 수 있습니다. 이러한 기술은 현재 가장 크고 권위 있는 Tech 회사의 고용주들이 중점적으로 고려 하는 부분이며 이 기술을 이해하는 여러분은 그들에게 굉장히 값진 사람이 될 것입니다.
여러분이 더 잘 따라오실 수 있도록 정교하게 번역된 한글 자막이 포함 되어 있습니다.
강의를 들으시고 강의와 관련하여 궁금하신 점은 무엇이든 Q&A에 남기실 수 있지만, 꼭 영어로 남겨주세요. 그래야 답변을 드릴 수 있습니다. :)
곧 강의에서 뵙기를 바랍니다!
-Frank