Artificial Intelligence: Reinforcement Learning in Python
4.5 (3,991 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
23,563 students enrolled

Artificial Intelligence: Reinforcement Learning in Python

Complete guide to artificial intelligence and machine learning, prep for deep reinforcement learning
4.5 (3,991 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
23,563 students enrolled
Last updated 1/2019
English
English [Auto-generated], Portuguese [Auto-generated], 1 more
  • Spanish [Auto-generated]
Current price: $11.99 Original price: $179.99 Discount: 93% off
2 days left at this price!
30-Day Money-Back Guarantee
This course includes
  • 8.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to Udemy's top 3,000+ courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Apply gradient-based supervised machine learning methods to reinforcement learning
  • Understand reinforcement learning on a technical level

  • Understand the relationship between reinforcement learning and psychology

  • Implement 17 different reinforcement learning algorithms
Requirements
  • Calculus
  • Probability
  • Markov Models
  • The Numpy Stack
  • Have experience with at least a few supervised machine learning methods
  • Gradient descent
  • Good object-oriented programming skills
Description

When people talk about artificial intelligence, they usually don’t mean supervised and unsupervised machine learning.

These tasks are pretty trivial compared to what we think of AIs doing - playing chess and Go, driving cars, and beating video games at a superhuman level.

Reinforcement learning has recently become popular for doing all of that and more.

Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn’t been until recently that we’ve been able to observe first hand the amazing results that are possible.

In 2016 we saw Google’s AlphaGo beat the world Champion in Go.

We saw AIs playing video games like Doom and Super Mario.

Self-driving cars have started driving on real roads with other drivers and even carrying passengers (Uber), all without human assistance.

If that sounds amazing, brace yourself for the future because the law of accelerating returns dictates that this progress is only going to continue to increase exponentially.

Learning about supervised and unsupervised machine learning is no small feat. To date I have over SIXTEEN (16!) courses just on those topics alone.

And yet reinforcement learning opens up a whole new world. As you’ll learn in this course, the reinforcement learning paradigm is more different from supervised and unsupervised learning than they are from each other.

It’s led to new and amazing insights both in behavioral psychology and neuroscience. As you’ll learn in this course, there are many analogous processes when it comes to teaching an agent and teaching an animal or even a human. It’s the closest thing we have so far to a true general artificial intelligence. What’s covered in this course?

  • The multi-armed bandit problem and the explore-exploit dilemma

  • Ways to calculate means and moving averages and their relationship to stochastic gradient descent

  • Markov Decision Processes (MDPs)

  • Dynamic Programming

  • Monte Carlo

  • Temporal Difference (TD) Learning

  • Approximation Methods (i.e. how to plug in a deep neural network or other differentiable model into your RL algorithm)

If you’re ready to take on a brand new challenge, and learn about AI techniques that you’ve never seen before in traditional supervised machine learning, unsupervised machine learning, or even deep learning, then this course is for you.

See you in class!


HARD PREREQUISITES / KNOWLEDGE YOU ARE ASSUMED TO HAVE:

  • Calculus

  • Probability

  • Object-oriented programming

  • Python coding: if/else, loops, lists, dicts, sets

  • Numpy coding: matrix and vector operations

  • Linear regression

  • Gradient descent


TIPS (for getting through the course):

  • Watch it at 2x.

  • Take handwritten notes. This will drastically increase your ability to retain the information.

  • Write down the equations. If you don't, I guarantee it will just look like gibberish.

  • Ask lots of questions on the discussion board. The more the better!

  • Realize that most exercises will take you days or weeks to complete.

  • Write code yourself, don't just sit there and look at my code.


WHAT ORDER SHOULD I TAKE YOUR COURSES IN?:

  • Check out the lecture "What order should I take your courses in?" (available in the Appendix of any of my courses, including the free Numpy course)



Who this course is for:
  • Anyone who wants to learn about artificial intelligence, data science, machine learning, and deep learning
  • Both students and professionals
Course content
Expand all 87 lectures 08:28:33
+ Welcome
3 lectures 11:51
Where to get the Code
02:41
Strategy for Passing the Course
05:56
+ High Level Overview of Reinforcement Learning and Course Outline
4 lectures 26:01
On Unusual or Unexpected Strategies of RL
06:10
Course Outline
04:42
Defining Some Terms
07:01
+ Return of the Multi-Armed Bandit
11 lectures 51:06
Problem Setup and The Explore-Exploit Dilemma
03:55
Epsilon-Greedy
01:48
Updating a Sample Mean
01:22
Designing Your Bandit Program
04:09
Comparing Different Epsilons
04:06
Optimistic Initial Values
02:56
UCB1
04:56
Bayesian / Thompson Sampling
09:52
Thompson Sampling vs. Epsilon-Greedy vs. Optimistic Initial Values vs. UCB1
05:11
Nonstationary Bandits
04:51
+ Build an Intelligent Tic-Tac-Toe Agent
11 lectures 01:07:21
Naive Solution to Tic-Tac-Toe
03:50
Components of a Reinforcement Learning System
08:00
Notes on Assigning Rewards
02:41
The Value Function and Your First Reinforcement Learning Algorithm
16:33
Tic Tac Toe Code: Outline
03:16
Tic Tac Toe Code: Representing States
02:56
Tic Tac Toe Code: Enumerating States Recursively
06:14
Tic Tac Toe Code: The Environment
06:36
Tic Tac Toe Code: The Agent
05:48
Tic Tac Toe Code: Main Loop and Demo
06:02
Tic Tac Toe Summary
05:25
+ Markov Decision Proccesses
9 lectures 01:03:41
Gridworld
02:13
The Markov Property
04:36
Defining and Formalizing the MDP
04:10
Future Rewards
03:16
Value Function Introduction
12:03
Value Functions
09:15
Bellman Examples
22:24
Optimal Policy and Optimal Value Function
04:09
MDP Summary
01:35
+ Dynamic Programming
11 lectures 45:17
Intro to Dynamic Programming and Iterative Policy Evaluation
03:06
Gridworld in Code
05:47
Designing Your RL Program
05:00
Iterative Policy Evaluation in Code
06:24
Policy Improvement
02:51
Policy Iteration
02:00
Policy Iteration in Code
03:46
Policy Iteration in Windy Gridworld
04:57
Value Iteration
03:58
Value Iteration in Code
02:14
Dynamic Programming Summary
05:14
+ Monte Carlo
9 lectures 35:42
Monte Carlo Intro
03:10
Monte Carlo Policy Evaluation
05:45
Monte Carlo Policy Evaluation in Code
03:35
Policy Evaluation in Windy Gridworld
03:38
Monte Carlo Control
05:59
Monte Carlo Control in Code
04:04
Monte Carlo Control without Exploring Starts
02:58
Monte Carlo Control without Exploring Starts in Code
02:51
Monte Carlo Summary
03:42
+ Temporal Difference Learning
8 lectures 24:40
Temporal Difference Intro
01:42
TD(0) Prediction
03:46
TD(0) Prediction in Code
02:27
SARSA
05:15
SARSA in Code
03:38
Q Learning
03:05
Q Learning in Code
02:13
TD Summary
02:34
+ Approximation Methods
9 lectures 37:37
Approximation Intro
04:11
Linear Models for Reinforcement Learning
04:16
Features
04:02
Monte Carlo Prediction with Approximation
01:54
Monte Carlo Prediction with Approximation in Code
02:58
TD(0) Semi-Gradient Prediction
04:22
Semi-Gradient SARSA
03:08
Semi-Gradient SARSA in Code
04:08
Course Summary and Next Steps
08:38
+ Appendix
12 lectures 02:25:17
What is the Appendix?
02:48
Windows-Focused Environment Setup 2018
20:20
How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow
17:32
How to Code by Yourself (part 1)
15:54
How to Code by Yourself (part 2)
09:23
How to Succeed in this Course (Long Version)
10:24
Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?
22:04
Proof that using Jupyter Notebook is the same as not using it
12:29
Python 2 vs Python 3
04:38
What order should I take your courses in? (part 1)
11:18
What order should I take your courses in? (part 2)
16:07
Where to get discount coupons and FREE deep learning material
02:20