Artificial Intelligence 2.0: AI, Python, Deep RL + LLM Prize

Name: Artificial Intelligence 2.0: AI, Python, Deep RL + LLM Prize
Rating: 4.8 (1370 reviews)

Build an AI powerful enough to train an ant / spider and half-humanoid to walk and run across a field with the TD3 model

Created byHadelin de Ponteves, SuperDataScience Team, Ligency

Last updated 5/2026

English

What you'll learn

Q-Learning
Deep Q-Learning
Policy Gradient
Actor Critic
Deep Deterministic Policy Gradient (DDPG)
Twin-Delayed DDPG (TD3)
The Foundation Techniques of Deep Reinforcement Learning
How to implement a state of the art AI model that is over performing the most challenging virtual applications

Course content

8 sections • 61 lectures • 9h 37m total length

Welcome15:12
Get the materials0:05
Prizes for Learning0:07
Some resources before we start0:41
Q-Learning10:25
Explore reinforcement learning fundamentals where an agent uses states, actions, rewards, and return to learn. Discover Q-learning and Td3, including Q-values, Bellman updates, and the move toward policy gradient.
Deep Q-Learning6:54
Policy Gradient6:35
Actor-Critic4:05
Taxonomy of AI models7:48

The whole code folder of the course with all the implementations0:18
Beginning5:36
Implementation - Step 115:46
Implementation - Step 215:12
Implementation - Step 313:55
Implementation - Step 414:09
Implementation - Step 511:03
Implementation - Step 69:43
Step four of the td3 training tutorial samples transitions from the replay buffer to create four batches—states, next states, actions, rewards and dones—and converts them to torch tensors.
Implementation - Step 74:26
Implementation - Step 87:44
Implementation - Step 93:55
Implementation - Step 104:08
Implementation - Step 117:33
Implementation - Step 124:06
Implement step ten by feeding the current state and action into two critic models to yield Q1 and Q2 and prepare their mean squared error loss against the target.
Implementation - Step 135:31
Implementation - Step 146:54
Implementation - Step 1514:20
Implementation - Step 168:54
Implementation - Step 176:11
Implementation - Step 1813:30
Implementation - Step 1911:46
Train and evaluate a TD3 agent over 500,000 time steps using replay memory and off-policy learning to improve average rewards across episodes. Start with 10,000 random actions before policy-driven exploration.
Implementation - Step 205:11

Plan of Attack4:04
What is Reinforcement Learning?11:26
The Bellman Equation18:25
The Plan2:12
Markov Decision Process16:27
Policy vs Plan12:55
Explore policy versus plan in a stochastic Markov decision process, showing how randomness and the Bellman equation reshape state values and drive learned policies over preplanned paths.
Living Penalty9:47
Q-Learning Intuition14:45
Temporal Difference19:27
Q-Learning Visualization13:31
Explore q-learning in a gridworld maze in the artificial intelligence 2.0 course, visualize q-values and learned policy, and see how exploration, randomness, and discounting shape reinforcement learning outcomes.

Requirements

Some maths basics like knowing what is a differentiation or a gradient
A bit of programming knowledge (classes and objects)

Description

Welcome to Artificial Intelligence 2.0!

In this course, we will learn and implement a new incredibly smart AI model, called the Twin-Delayed DDPG or TD3, which combines state of the art techniques in Artificial Intelligence including continuous Double Deep Q-Learning, Policy Gradient, and Actor Critic. The model is so strong that for the first time in our courses, we are able to solve the most challenging virtual AI applications (training an ant/spider and a half humanoid to walk and run across a field).

To approach this model the right way, we structured the course in three parts:

Part 1: Fundamentals
In this part we will study all the fundamentals of Artificial Intelligence which will allow you to understand and master the AI of this course. These include Q-Learning, Deep Q-Learning, Policy Gradient, Actor-Critic and more.
Part 2: The Twin-Delayed DDPG Theory
We will study in depth the whole theory behind the model. You will clearly see the whole construction and training process of the AI through a series of clear visualization slides. Not only will you learn the theory in details, but also you will shape up a strong intuition of how the AI learns and works. The fundamentals in Part 1, combined to the very detailed theory of Part 2, will make this highly advanced model accessible to you, and you will eventually be one of the very few people who can master this model.
Part 3: The Twin-Delayed DDPG Implementation
We will implement the model from scratch, step by step, and through interactive sessions, a new feature of this course which will have you practice on many coding exercises while we implement the model. By doing them you will not follow passively the course but very actively, therefore allowing you to effectively improve your skills. And last but not least, we will do the whole implementation on Colaboratory, or Google Colab, which is a totally free and open source AI platform allowing you to code and train some AIs without having any packages to install on your machine. In other words, you can be 100% confident that you press the execute button, the AI will start to train and you will get the videos of the spider and humanoid running in the end.

So are you ready to embrace AI at full power?

Come join us, never stop learning, and enjoy AI!

Who this course is for:

Data Scientists who want to take their AI Skills to the next level
AI experts who want to expand on the field of applications
Engineers who work in technology and automation
Businessmen and companies who want to get ahead of the game
Students in tech-related programs who want to pursue a career in Data Science, Machine Learning, or Artificial Intelligence
Anyone passionate about Artificial Intelligence

Artificial Intelligence 2.0: AI, Python, Deep RL + LLM Prize

What you'll learn

Explore related topics

Course content

Part 1 - Fundamentals9 lectures • 52min

Part 2 - Twin Delayed DDPG Theory4 lectures • 51min

Part 3 - Twin Delayed DDPG Implementation22 lectures • 3hr 10min

The Final Demo!2 lectures • 28min

Annex 1 - Artificial Neural Networks8 lectures • 1hr 18min

Annex 2 - Q-Learning10 lectures • 2hr 3min

Annex 3 - Deep Q-Learning5 lectures • 56min

Congratulations!! Don't forget your Prize :)1 lecture • 1min

Requirements

Description

Who this course is for: