Contextual Multi-Armed Bandit Problems in Python

Name: Contextual Multi-Armed Bandit Problems in Python
Rating: 3.6 (12 reviews)

All you need to master and apply multi-armed bandit problems into real-world problems

Created byHadi Aghazadeh

Last updated 2/2024

English

What you'll learn

Master all essential Bandit Algorithms
Learn How to Apply Bandit Problems into Real-world Applications with Focus on Product Recommendation
Learn How to Implement All Essential Aspects of Bandit Algorithms in Python
Build Different Deterministic and Stochastic Environments for Bandit Problems to Simulate Different Scenarios
Learn and Apply Bayesian Inference for Bandit Problems and Beyond as a Byproduct of This Course
Understand Essential Concepts in Contextual Bandit Problems
Apply Contextual Bandit Problems in a Real-World Product Recommendation Dataset and Scenario

Course content

5 sections • 70 lectures • 9h 0m total length

Course Overview11:30
Overview of the course to see the big picture of what will happen.
Casino and Statistics5:45
An almost fun approach on how Casinos shaped the curse of Statistics!
Story: A Gambler in Casino2:24
A story as a introduction to the Multi-armed Bandit Problems!
Multi-armed Bandit Problems and Their Applications7:55
Applications of Multi-armed Bandit Problems!
Multi-armed Bandit Problems for Startup Founders3:13
MAB has many applications in online digital section. This video shows how startups take the advantage of MAB for building customized products for their customers!
Similarities and Differences between Bandit Problems and Reinforcement Learning6:21
An important video on the similarities and differences of RL and MAB.
Slides0:03
Slides for the introduction section!
Resources0:07
Resources that this course is based on them.
The most important difference between RL and MAB

Environment Design Logic10:04
How the environment will work for us!
Deterministic Environment20:11
As a first scenario, we will implement some agents in a deterministic environment!
Proof for Incremental Averaging10:02
A simple mathematical proof on how incremental average makes the life easier for us!
Random Agent Class Implementation9:50
Let's define our first agent: Random Agent.
Incremental Average Implementation10:20
Now is time to implement the incremental average logic!
Results for Random Agent10:41
Let's see how random agent earns money for us!
Plotting Function Part112:01
Let's build a function for plotting the results for the agents!
Plotting Function Part212:02
We almost done with plotting function, I promise :)
Plot Results for Random Agent6:15
Ok, Let's visually see how the random agent works!
Greedy Agent9:03
Let's be greedy on having more and more!
Epsilon Greedy Agent12:02
Be greedy but not all the time!
Epsilon Greedy Parameter Tuning Part110:50
But how many times we should be greedy and how many times we shouldn't?
Epsilon Greedy Parameter Tuning Part29:06
Just a little bit more on parameter tuning for e-greedy agent!
Difference Between Stochasticity, Uncertainty, and Non-Stationary5:02
This video is essential for all who want to work with simulation models!
Create a Stochastic Environment12:57
Let's make the environment a little bit practical!
Create an Instance of Stochastic Environment4:35
And let's create an offspring out of stochastic environment!
Agents Performance with Stochastic Environment4:01
How already implemented agents performs in stochastic environment!
Softmax Agent Implementation7:32
SoftMax here, SoftMax every where in machine learning!
Softmax Agent Results2:22
How the SoftMax function is functioning!
Upper Confidence Bound (UCB) Algorithm Theory5:41
Every body knows MAB with UCB algorithm. Let's see how it works!
UCB Algorithm Implementation6:18
And let's implement UCB!
UCB Algorithm Results4:20
And more, let's see how it performs!
Comparisons of All Agent Performance and a Life Lesson8:34
I love these kind of life lessons that we can take out of agent performances in AI.
Regret Concept and Implementation11:32
Reward is not the only signal that helps, let's minimize regret rather than maximizing reward!
Regret Function Visualization7:59
Let's build a function to visualize and understand the concept of regret better!
Epsilon Greedy with Regret Concept6:55
E-greedy will help us to understand regret concept better in deterministic environment!
Regret Curves Results for Deterministic Environment3:38
Let's see how regret looks like!
Regret Curves Results for Stochastic Environment3:26
And let's see how it looks like in stochastic environment!
Code for Basic Agents0:01
All the codes for basic agents is here!
Regret Concept

Why and How We can Use Thompson Sampling9:53
How Thompson Sampling change the way we select the best action!
Design of Thompson Sampling Class Part 18:59
Let's design the class to be cornerstone for the rest of the codes for Thompson Sampling!
Design of Thompson Sampling Class Part 214:57
Ok, Let's finish what we have started!
Results for Thompson Sampling with Binary Reward5:53
An animated results on how the agent learns the probability distribution for different arms based on the binary reward!
Thompson Sampling For Binary Reward with Stochastic Environment3:35
Let's repeat what we have done for stochastic environment!
Theory for Gaussian Thompson Sampling8:48
What if the reward wouldn't be a binary but a real value? No worries, Gaussian Thompson Sampling can help!
Environment for Gaussian Thompson Sampling3:46
I think we need a wholly new environment for Gaussian Thompson Sampling!
Select Arm Module for Gaussian Thompson Sampling Class5:14
Let's build the action selection mechanism first!
Parameter Update Module for Gaussian Thompson Sampling Agent5:37
And continue with the heart of the algorithms: Update the parameters!
Visualization Function for Gaussian Thompson Sampling7:36
We need to see to believe!
Results for Gaussian Thompson Sampling6:54
Results of Gaussian Thompson Sampling!
Code for Thompson Sampling0:03
And the whole code for Thompson Sampling section!
Questions

Contextual Bandit Problems vs Supervised Learning11:56
What are the similarities and differences between Contextual Bandit Problems and Supervised Learning in machine learning!
LinUCB Math Notations12:09
Knowing the notation is the first step on understanding any algorithm!
LinUCB Algorithm Theory14:27
Let's see how LinUCB works!
LinUCB Implementation Part 124:20
Start of a long series of videos on implementation of LinUCB!
LinUCB Implementation Part 28:06
LinUCB is fairly hard to implement, let's be patient!
LinUCB Implementation Part 37:55
We almost done with LinUCB!
Test LinUCB Algorithm11:44
Let's make sure that LinUCB works!
Epsilon Greedy Algorithm Implementation6:27
There should be a baseline to compare the results and what is better than Epsilon Greedy?
Simulation Functions10:31
Let's build some functions to facilitate the process of simulation and comparison!
Comparison of Epsilon Greedy and LinUCB with Toy Data6:15
Firstly, let's see the performance with Random data? the results should be similar!
Real-world Case Dataset Explanation5:45
Exciting news: We have a real-world problem to solve!
Split Data into Train and Test5:04
An old but still effective way of evaluation of the performance!
Test Agents with Accuracy Metric9:54
Let's compare first with accuracy although it is not a right metric for Multi-armed bandit problems!
Evaluate Agent Performances based on Accumulated Rewards11:47
Accumulated reward is a better way of comparison!
Datasets and Data Preparation Code0:06
Ok, Let's make sure you can reproduce the results. This is the data and the data preparation code!
Code for Contextual Bandit Problems0:02
And of course, the code for Contextual Bandit Problems!
Concept of LinUCB algorithm

Requirements

No obligational pre-requisites

Description

Welcome to our course where we'll guide you through Multi-armed Bandit Problems and Contextual Bandit Problems, step by step. No prior experience needed - we'll start from scratch and build up your skills so you can use these algorithms for your own projects.

We'll cover the basics like random, greedy, e-greedy, softmax, and more advanced methods like Upper Confidence Bound (UCB). Along the way, we'll explain concepts like Regret concept instead of just focusing on rewards value in Reinforcement Learning and Multi-armed Bandit Problems. Through practical examples in different types of environments, like deterministic, stochastic and non-stationary environment, you'll see how these algorithms perform in action.

Ever wondered how Multi-armed Bandit problems relate to Reinforcement Learning? We'll break it down for you, highlighting what's similar and what's different.

We'll also dive into Bayesian inference, introducing you to Thompson sampling, both for binary reward and real value reward in simple terms, and use Beta and Gaussian distributions to estimate the probability distributions with clear examples to help you understand the theory and how to put it into practice.

Then, we'll explore Contextual Bandit problems, using the LinUCB algorithm as our guide. From basic toy examples to real-world data, you'll see how it works and compare it to simpler methods like e-greedy.

Don't worry if you're new to Python - we've got you covered with a section to help you get started. And to make sure you're really getting it, we'll throw in some quizzes to test your understanding along the way.

Our explanations are clear, our code is clean, and we've added fun visualizations to help everything make sense. So join us on this journey and become a master of Multi-armed and Contextual Bandit Problems!

Who this course is for:

Web Application Developers
Researchers working on Action optimization
Machine Learning Developers and Data Scientists
Startup Enthusiasts Driven to Develop Customized Recommendation Apps.

Contextual Multi-Armed Bandit Problems in Python

What you'll learn

Explore related topics

Course content

Introduction8 lectures • 37min

Introduction to Python5 lectures • 38min

Fundamental Algorithms in Multi-Armed Bandits Problems29 lectures • 3hr 57min

Thompson Sampling for Multi-Armed Bandits12 lectures • 1hr 21min

Contextual Bandit Problems16 lectures • 2hr 26min

Requirements

Description

Who this course is for: