Name: Evolutionary AI: Deep Reinforcement Learning in Python (v2)
Rating: 4.9 (148 reviews)

Udemy Business

Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Created byLazy Programmer Inc., Lazy Programmer Team

Last updated 2/2026

English

What you'll learn

Understand and implement Evolution Strategies (ES) from scratch
Understand and implement Augmented Random Search (ARS) from scratch
Apply evolutionary methods to MuJoCo (physics simulation environment)
Apply evolutionary methods to classic control reinforcement learning environments
Apply evolutionary methods to stock trading and portfolio optimization

Course content

12 sections • 65 lectures • 12h 8m total length

Introduction3:23
Explore deep reinforcement learning through evolutionary strategies and augmented random search, applying to Mujoco, Cartpole, Mountain Car, and finance, while building from scratch and boosting your resume.
Outline2:30
Outline introduces two evolutionary algorithms, evolution strategies and augmented random search, within deep reinforcement learning, covers online standardization, and explores finance applications including downside risk portfolio optimization over multiple periods.
Where to get the code2:18
Open the resources tab and click code link to access the Python notebooks. Use GitHub link for extra resources and extra reading dot txt, noting notebooks are not on GitHub.
How to succeed in this course3:04
Discover three guidelines to succeed: ask questions via the Q&A, meet the prerequisites, and get hands-on with handwritten notes or coding.

Reinforcement Learning Terminology24:48
Reinforcement Learning Methods and Objectives13:08
Explore the bare-minimum reinforcement learning framework: agent and environment, the policy mapping states to actions via a neural network, and optimizing the expected return across multiple episodes using evolution strategies.
Random Search in Python13:06
Explore a simple random search reinforcement learning method in python, using a binary policy and cartpole in gymnasium, evaluating by averaging rewards over multiple episodes.
Suggestion Box3:10
Share feedback via the suggestion box on Lazy Programmer, detailing your background, course, and perceived difficulty, and request missing explanations or topics like CNNs or transformers.

Online Standardization Section Introduction8:03
Introduces online standardization in reinforcement learning, explains why it helps neural networks, and presents online mean and variance updates and data whitening for evolution strategies and augmented random search.
Online Mean Update17:25
Learn how to perform online mean updates for standardizing inputs in reinforcement learning, deriving a constant-time update formula and preparing for online variance calculations.
Online Variance Update (Welford's Algorithm)19:27
Explore online variance updates with Welford's algorithm, deriving s_n^2 from s_{n-1}^2, the new sample x_n, and the current and previous means x-bar_n and x-bar_{n-1}.
Full Covariance Update (Data Whitening)13:03

ES Section Introduction1:50
Learn how evolution strategies train agents to solve MDPs and compare hill climbing with covariance matrix adaptation, via approximate gradient ascent, the Adam optimizer, and Mujoco.
ES Algorithm17:16
Explore evolution strategies in reinforcement learning. Generate multiple random inputs, evaluate fitness, and update the parameter vector with a gradient-like rule, without relying on explicit gradients.
Visualization of Hill Climbing and ES5:57
Explore visualizations of evolution strategies and hill climbing, showing how offspring samples converge toward the optimal point and why hill climbing can be slower on simple quadratic functions.
ES Gradient Approximation16:53
Learn how evolution strategies approximate gradients to update parameters using a finite population and noise, bridging gradient ascent with ES gradient approximation and enabling optimization with Adam.
Adam Optimizer7:43
ES for MuJoCo in Python (pt 1)24:33
Implement evolution strategies for Mujoco in Python, parallelizing environment evaluations with multiprocessing and Adam optimization, building a two-layer network and parameter handling for future episodes.
ES for MuJoCo in Python (pt 2)27:07
Continue implementing evolution strategies for MuJoCo in Python using Adam, with live coding insights, online standard scalar, and practical debugging of params, rewards, and parallel evaluation.
ES for MuJoCo in Python (pt 3)6:18
Explore evolution strategies on a MuJoCo environment in Python, with Adam as an optimizer, by defining stub functions, validating shapes, and observing rewards and non-smooth movements.
ES for MuJoCo in Python (pt 4)5:48
ES for MuJoCo in Python (pt 5)8:06
The lecture finalizes Mujoco evolution strategies with the Adam optimizer, introducing moving averages (m, v), bias correction, online standardisation, and play mode testing to boost rewards and performance.
CMA-ES Theory8:33
Cma-es extends evolution strategies by learning the covariance matrix and step size to adapt sampling, updating the mean and covariance from top offspring with weighted ranks.
CMA-ES Code Preparation2:31
Install the CMA library, initialize a CMA evolution strategy with sigma and adaptation options, and run the loop to evaluate offspring with a fitness function and obtain the best parameters.
CMA-ES Code21:28
Implement CMA-ES with the CMA library to optimize neural network parameters. Compare full covariance versus diagonal, enable mirroring, and experiment with population size and sigma to study stability and performance.

ARS Section Introduction1:11
Explore augmented random search (ARS) as a short section on evolution strategies, online standardization, gradient estimation via small deltas, and symmetric improvements, with code and performance insights.
ARS Algorithm25:24
Explore augmented random search and its ties to evolution strategies, detailing basic random search updates with plus/minus noise, horizon, and gradient-like estimation, plus top-k selection and online standardization.
ARS Gradient Approximation14:36
Analyze the ARS gradient approximation and how its update closely estimates gradient descent or ascent, using plus and minus evaluations and a second-order Taylor expansion to outperform evolution strategies.
ARS for MuJoCo in Python11:47
Exercise Prompt4:36
Tackle cartpole v1 with four-state dynamics and two discrete actions to keep the pole upright for a plus-one per step reward, and explore mountain car and its continuous version.
ARS for CartPole17:51
ARS for MountainCar9:30
Learn to implement ARS for mountain car by adapting a cart pole script, tuning exploration via sigma and learning rate, and observing faster convergence toward solving the environment.
ARS for MountainCarContinuous5:03
Apply an evolution strategy based ARS to the mountain car continuous task in Python, implementing a neural network with continuous action outputs and proper scaling to optimize rewards.

Motivation and Outline7:31
Motivate multi-period portfolio optimization with reinforcement learning to make dynamic trading decisions, not predict prices, and outline static and dynamic portfolio projects using an MDP framework, states, and rewards.
Portfolio Math17:25
Compute portfolio returns from closed prices in discrete time, using asset weights that sum to one to obtain the portfolio return and understand cumulative growth via one plus returns.
Sharpe Ratio and Sortino Ratio20:05
How Do Actions Work?1:24
Translate model actions into real-world trades by adjusting your portfolio to match the model's weights, selling assets to reach the target allocations.
Static Portfolio Optimization: Concepts11:26
Explore static portfolio concepts by applying an evolutionary method to function optimization, explaining mean returns, covariance, and portfolio weights, and comparing single-period and multi-period setups with Sharpe or Sortino variants.
Static Portfolio Optimization: Code31:40
Explore how evolution strategies optimize static portfolio weights with monthly rebalancing, using a non-differentiable objective like the Sortino ratio and softmax to derive weights.
How to get the VIP Content0:55
Discover how to access the full VIP content by upgrading through Deeplearning courses and request access via email with your course title, Udemy name, and sign up date.

Background Review Section Introduction6:34
Compare reinforcement learning to supervised learning as time-aware, goal-directed loop versus static function, emphasizing planning for the future. Use self-driving car and maze examples to illustrate data labeling and goals.
Elements of a Reinforcement Learning Problem20:18
Define agent and environment with practical examples, then introduce episodes, states, actions, rewards, and state and action spaces, illustrated by tic tac toe, breakout, and grid world.
States, Actions, Rewards, Policies9:24
Explore how to encode states and actions in code, incorporate rewards, define stochastic policies, and use epsilon-greedy and softmax to balance exploration and learning in reinforcement learning.
Markov Decision Processes (MDPs)10:07
Apply the Markov assumption to define Markov decision processes with states, actions, and rewards. Use state transition probability p(s'|s,a) and environment dynamics to model the agent-environment interaction and enable Q-learning.
The Return4:56
Value Functions and the Bellman Equation9:53
What does it mean to “learn”?7:18
Solving the Bellman Equation with Reinforcement Learning (pt 1)9:49
Solving the Bellman Equation with Reinforcement Learning (pt 2)12:04
Explore how policy evaluation and improvement drive generalized policy iteration, using Monte Carlo to update q values and derive the optimal policy through arg max over actions.
Epsilon-Greedy6:09
Learn how epsilon-greedy balances exploration and exploitation to improve Q-value estimates in reinforcement learning, using random action selection with probability epsilon and greedy actions otherwise.
Q-Learning14:15
Explore q-learning and temporal difference methods to update Q-values with bootstrapped returns, using epsilon greedy action selection and off policy updates for optimal policy.
How to Learn Reinforcement Learning5:56

Pre-Installation Check4:12
Review the pre-installation guidelines: installation lectures are generic, principle-driven, and scalable; learn pip usage, and when to install Cntk, Theano, and OpenAI gym for reinforcement learning.
Anaconda Environment Setup20:20
Learn to set up a Windows data science environment with Anaconda, isolating Python versions and installing essential libraries like TensorFlow, Keras, PyTorch, open gym, plus CUDA-ready tools.
How to install Numpy, Scipy, Matplotlib, Pandas, PyTorch, and TensorFlow17:22
Learn to set up a cross-platform data science environment by installing numpy, scipy, matplotlib, pandas, PyTorch, and TensorFlow, using virtual machines or direct installs on Windows, Mac, or Linux.

How to use Github & Extra Coding Tips (Optional)11:12
How to Code Yourself (part 1)15:54
Code by yourself to implement algorithms and build muscle memory through practice, using x and y data. In supervised learning, use the fit and predict to train across models.
How to Code Yourself (part 2)9:23
Practice test driven development by writing tests first to shape API design and guide implementation. Alternate theory and code, implement yourself, and build intuition through hands-on coding and testing.
Proof that using Jupyter Notebook is the same as not using it12:29
Learn why Jupyter notebook offers no real advantage; Python code runs identically in notebook, console, or IPython, and you should rely on print statements to verify behavior.

Requirements

Python programming with numerical computing libraries (e.g. Numpy)
Building neural networks (backpropgation not required)
Calculus, linear algebra, probability are useful

Description

Discover the cutting edge of reinforcement learning with a fresh, evolutionary approach. In this course, you’ll master Evolution Strategies (ES) and Augmented Random Search (ARS) - two powerful algorithms that bypass many of the challenges of traditional deep RL, while still achieving state-of-the-art results.

Unlike gradient-heavy methods, these algorithms are simple, scalable, and surprisingly effective. You’ll implement them from scratch in Python and apply them to exciting real-world problems:

MuJoCo Environments: Train agents to walk, run, and jump in a physics-based simulation that’s widely used in robotics research. Watching your neural network–powered agent learn to control a simulated robot is one of the most rewarding experiences in reinforcement learning.
Algorithmic Trading: Apply evolutionary RL to trading strategies, where direct gradients are difficult to define. You’ll see how these algorithms adapt naturally to noisy, complex environments like financial markets.

By the end of this course, you’ll have:

A deep understanding of ES and ARS, and how they compare to policy gradients and Q-learning.
Working Python implementations you can extend to your own projects.
The skills to leverage evolutionary AI in domains ranging from robotics to finance.

If you’re ready to move beyond the usual deep RL algorithms and explore approaches that are elegant, efficient, and highly practical, this course is for you.

Tools and Libraries

Python (with full code walkthroughs)
Gymnasium (formerly OpenAI Gym)
NumPy, Matplotlib

Why This Course?

Version 2 updates: Streamlined content, clearer explanations, and updated libraries.
Real implementations: Go beyond theory by building working agents — no black boxes.
For all levels: Includes a dedicated review section for beginners and deep dives for advanced learners.
Proven structure: Designed by an experienced instructor who has taught thousands of students to success in AI and machine learning.

Who Should Take This Course?

Data Scientists and ML Engineers who want to break into Reinforcement Learning
Students and Researchers looking to apply RL in academic or practical projects
Developers who want to build intelligent agents or AI-powered games
Anyone fascinated by how machines can learn through interaction

Join thousands of learners and start mastering Reinforcement Learning today — from theory to full implementations of agents that think, learn, and play.

Enroll now and take your AI skills to the next level!

Who this course is for:

Machine Learning & AI enthusiasts who want to explore one of the most exciting fields in AI: reinforcement learning
Software developers and engineers looking to build intelligent agents that learn from experience
Quantitative finance professionals interested in applying RL to portfolio optimization and algorithmic trading
Students and researchers studying AI, computer science, or data science who want hands-on experience with real RL implementations
Game developers curious about using RL to train AI for complex behaviors and adaptive gameplay
Robotics practitioners who want to learn how agents can make sequential decisions in physical environments
Data scientists aiming to expand their toolkit beyond supervised learning / unsupervised learning
Traders and investors looking to apply cutting-edge AI methods to automated trading strategies
Entrepreneurs and hobbyists eager to experiment with advanced AI models and build projects that learn and adapt over time
Professionals switching careers into AI/ML and looking for portfolio-ready, real-world projects

What you'll learn

Explore related topics

Course content

Welcome4 lectures • 11min

Reinforcement Learning Basics4 lectures • 54min

Online Standardization4 lectures • 58min

Evolution Strategies (ES)13 lectures • 2hr 34min

Augmented Random Search (ARS)8 lectures • 1hr 30min

Evolutionary Portfolio Optimization (VIP Preview)7 lectures • 1hr 30min

Background Review12 lectures • 1hr 57min

Appendix / FAQ Intro1 lecture • 4min

Setting Up Your Environment (FAQ)3 lectures • 42min

Extra Help With Python Coding for Beginners (FAQ)4 lectures • 49min

Requirements

Description

Who this course is for: