
Overview of the course to see the big picture of what will happen.
An almost fun approach on how Casinos shaped the curse of Statistics!
A story as a introduction to the Multi-armed Bandit Problems!
Applications of Multi-armed Bandit Problems!
MAB has many applications in online digital section. This video shows how startups take the advantage of MAB for building customized products for their customers!
An important video on the similarities and differences of RL and MAB.
Slides for the introduction section!
Resources that this course is based on them.
Let's get to know where we will code!
Basic operation in Python!
For loops, ifs and many more operations!
A little bit practical and maybe advance level of python that is useful for this course!
Codes for introduction to Python!
How the environment will work for us!
As a first scenario, we will implement some agents in a deterministic environment!
A simple mathematical proof on how incremental average makes the life easier for us!
Let's define our first agent: Random Agent.
Now is time to implement the incremental average logic!
Let's see how random agent earns money for us!
Let's build a function for plotting the results for the agents!
We almost done with plotting function, I promise :)
Ok, Let's visually see how the random agent works!
Let's be greedy on having more and more!
Be greedy but not all the time!
But how many times we should be greedy and how many times we shouldn't?
Just a little bit more on parameter tuning for e-greedy agent!
This video is essential for all who want to work with simulation models!
Let's make the environment a little bit practical!
And let's create an offspring out of stochastic environment!
How already implemented agents performs in stochastic environment!
SoftMax here, SoftMax every where in machine learning!
How the SoftMax function is functioning!
Every body knows MAB with UCB algorithm. Let's see how it works!
And let's implement UCB!
And more, let's see how it performs!
I love these kind of life lessons that we can take out of agent performances in AI.
Reward is not the only signal that helps, let's minimize regret rather than maximizing reward!
Let's build a function to visualize and understand the concept of regret better!
E-greedy will help us to understand regret concept better in deterministic environment!
Let's see how regret looks like!
And let's see how it looks like in stochastic environment!
All the codes for basic agents is here!
How Thompson Sampling change the way we select the best action!
Let's design the class to be cornerstone for the rest of the codes for Thompson Sampling!
Ok, Let's finish what we have started!
An animated results on how the agent learns the probability distribution for different arms based on the binary reward!
Let's repeat what we have done for stochastic environment!
What if the reward wouldn't be a binary but a real value? No worries, Gaussian Thompson Sampling can help!
I think we need a wholly new environment for Gaussian Thompson Sampling!
Let's build the action selection mechanism first!
And continue with the heart of the algorithms: Update the parameters!
We need to see to believe!
Results of Gaussian Thompson Sampling!
And the whole code for Thompson Sampling section!
What are the similarities and differences between Contextual Bandit Problems and Supervised Learning in machine learning!
Knowing the notation is the first step on understanding any algorithm!
Let's see how LinUCB works!
Start of a long series of videos on implementation of LinUCB!
LinUCB is fairly hard to implement, let's be patient!
We almost done with LinUCB!
Let's make sure that LinUCB works!
There should be a baseline to compare the results and what is better than Epsilon Greedy?
Let's build some functions to facilitate the process of simulation and comparison!
Firstly, let's see the performance with Random data? the results should be similar!
Exciting news: We have a real-world problem to solve!
An old but still effective way of evaluation of the performance!
Let's compare first with accuracy although it is not a right metric for Multi-armed bandit problems!
Accumulated reward is a better way of comparison!
Ok, Let's make sure you can reproduce the results. This is the data and the data preparation code!
And of course, the code for Contextual Bandit Problems!
Welcome to our course where we'll guide you through Multi-armed Bandit Problems and Contextual Bandit Problems, step by step. No prior experience needed - we'll start from scratch and build up your skills so you can use these algorithms for your own projects.
We'll cover the basics like random, greedy, e-greedy, softmax, and more advanced methods like Upper Confidence Bound (UCB). Along the way, we'll explain concepts like Regret concept instead of just focusing on rewards value in Reinforcement Learning and Multi-armed Bandit Problems. Through practical examples in different types of environments, like deterministic, stochastic and non-stationary environment, you'll see how these algorithms perform in action.
Ever wondered how Multi-armed Bandit problems relate to Reinforcement Learning? We'll break it down for you, highlighting what's similar and what's different.
We'll also dive into Bayesian inference, introducing you to Thompson sampling, both for binary reward and real value reward in simple terms, and use Beta and Gaussian distributions to estimate the probability distributions with clear examples to help you understand the theory and how to put it into practice.
Then, we'll explore Contextual Bandit problems, using the LinUCB algorithm as our guide. From basic toy examples to real-world data, you'll see how it works and compare it to simpler methods like e-greedy.
Don't worry if you're new to Python - we've got you covered with a section to help you get started. And to make sure you're really getting it, we'll throw in some quizzes to test your understanding along the way.
Our explanations are clear, our code is clean, and we've added fun visualizations to help everything make sense. So join us on this journey and become a master of Multi-armed and Contextual Bandit Problems!