
Explore deep reinforcement learning through evolutionary strategies and augmented random search, applying to Mujoco, Cartpole, Mountain Car, and finance, while building from scratch and boosting your resume.
Outline introduces two evolutionary algorithms, evolution strategies and augmented random search, within deep reinforcement learning, covers online standardization, and explores finance applications including downside risk portfolio optimization over multiple periods.
Open the resources tab and click code link to access the Python notebooks. Use GitHub link for extra resources and extra reading dot txt, noting notebooks are not on GitHub.
Discover three guidelines to succeed: ask questions via the Q&A, meet the prerequisites, and get hands-on with handwritten notes or coding.
Explore the bare-minimum reinforcement learning framework: agent and environment, the policy mapping states to actions via a neural network, and optimizing the expected return across multiple episodes using evolution strategies.
Explore a simple random search reinforcement learning method in python, using a binary policy and cartpole in gymnasium, evaluating by averaging rewards over multiple episodes.
Share feedback via the suggestion box on Lazy Programmer, detailing your background, course, and perceived difficulty, and request missing explanations or topics like CNNs or transformers.
Introduces online standardization in reinforcement learning, explains why it helps neural networks, and presents online mean and variance updates and data whitening for evolution strategies and augmented random search.
Learn how to perform online mean updates for standardizing inputs in reinforcement learning, deriving a constant-time update formula and preparing for online variance calculations.
Explore online variance updates with Welford's algorithm, deriving s_n^2 from s_{n-1}^2, the new sample x_n, and the current and previous means x-bar_n and x-bar_{n-1}.
Learn how evolution strategies train agents to solve MDPs and compare hill climbing with covariance matrix adaptation, via approximate gradient ascent, the Adam optimizer, and Mujoco.
Explore evolution strategies in reinforcement learning. Generate multiple random inputs, evaluate fitness, and update the parameter vector with a gradient-like rule, without relying on explicit gradients.
Explore visualizations of evolution strategies and hill climbing, showing how offspring samples converge toward the optimal point and why hill climbing can be slower on simple quadratic functions.
Learn how evolution strategies approximate gradients to update parameters using a finite population and noise, bridging gradient ascent with ES gradient approximation and enabling optimization with Adam.
Implement evolution strategies for Mujoco in Python, parallelizing environment evaluations with multiprocessing and Adam optimization, building a two-layer network and parameter handling for future episodes.
Continue implementing evolution strategies for MuJoCo in Python using Adam, with live coding insights, online standard scalar, and practical debugging of params, rewards, and parallel evaluation.
Explore evolution strategies on a MuJoCo environment in Python, with Adam as an optimizer, by defining stub functions, validating shapes, and observing rewards and non-smooth movements.
The lecture finalizes Mujoco evolution strategies with the Adam optimizer, introducing moving averages (m, v), bias correction, online standardisation, and play mode testing to boost rewards and performance.
Cma-es extends evolution strategies by learning the covariance matrix and step size to adapt sampling, updating the mean and covariance from top offspring with weighted ranks.
Install the CMA library, initialize a CMA evolution strategy with sigma and adaptation options, and run the loop to evaluate offspring with a fitness function and obtain the best parameters.
Implement CMA-ES with the CMA library to optimize neural network parameters. Compare full covariance versus diagonal, enable mirroring, and experiment with population size and sigma to study stability and performance.
Explore augmented random search (ARS) as a short section on evolution strategies, online standardization, gradient estimation via small deltas, and symmetric improvements, with code and performance insights.
Explore augmented random search and its ties to evolution strategies, detailing basic random search updates with plus/minus noise, horizon, and gradient-like estimation, plus top-k selection and online standardization.
Analyze the ARS gradient approximation and how its update closely estimates gradient descent or ascent, using plus and minus evaluations and a second-order Taylor expansion to outperform evolution strategies.
Tackle cartpole v1 with four-state dynamics and two discrete actions to keep the pole upright for a plus-one per step reward, and explore mountain car and its continuous version.
Learn to implement ARS for mountain car by adapting a cart pole script, tuning exploration via sigma and learning rate, and observing faster convergence toward solving the environment.
Apply an evolution strategy based ARS to the mountain car continuous task in Python, implementing a neural network with continuous action outputs and proper scaling to optimize rewards.
Motivate multi-period portfolio optimization with reinforcement learning to make dynamic trading decisions, not predict prices, and outline static and dynamic portfolio projects using an MDP framework, states, and rewards.
Compute portfolio returns from closed prices in discrete time, using asset weights that sum to one to obtain the portfolio return and understand cumulative growth via one plus returns.
Translate model actions into real-world trades by adjusting your portfolio to match the model's weights, selling assets to reach the target allocations.
Explore static portfolio concepts by applying an evolutionary method to function optimization, explaining mean returns, covariance, and portfolio weights, and comparing single-period and multi-period setups with Sharpe or Sortino variants.
Explore how evolution strategies optimize static portfolio weights with monthly rebalancing, using a non-differentiable objective like the Sortino ratio and softmax to derive weights.
Discover how to access the full VIP content by upgrading through Deeplearning courses and request access via email with your course title, Udemy name, and sign up date.
Compare reinforcement learning to supervised learning as time-aware, goal-directed loop versus static function, emphasizing planning for the future. Use self-driving car and maze examples to illustrate data labeling and goals.
Define agent and environment with practical examples, then introduce episodes, states, actions, rewards, and state and action spaces, illustrated by tic tac toe, breakout, and grid world.
Explore how to encode states and actions in code, incorporate rewards, define stochastic policies, and use epsilon-greedy and softmax to balance exploration and learning in reinforcement learning.
Apply the Markov assumption to define Markov decision processes with states, actions, and rewards. Use state transition probability p(s'|s,a) and environment dynamics to model the agent-environment interaction and enable Q-learning.
Explore how policy evaluation and improvement drive generalized policy iteration, using Monte Carlo to update q values and derive the optimal policy through arg max over actions.
Learn how epsilon-greedy balances exploration and exploitation to improve Q-value estimates in reinforcement learning, using random action selection with probability epsilon and greedy actions otherwise.
Explore q-learning and temporal difference methods to update Q-values with bootstrapped returns, using epsilon greedy action selection and off policy updates for optimal policy.
Learn the purpose of the appendix and FAQ, why this section is optional, and how to use the Q&A to get answers, ensuring you have zero questions by course end.
Review the pre-installation guidelines: installation lectures are generic, principle-driven, and scalable; learn pip usage, and when to install Cntk, Theano, and OpenAI gym for reinforcement learning.
Learn to set up a Windows data science environment with Anaconda, isolating Python versions and installing essential libraries like TensorFlow, Keras, PyTorch, open gym, plus CUDA-ready tools.
Learn to set up a cross-platform data science environment by installing numpy, scipy, matplotlib, pandas, PyTorch, and TensorFlow, using virtual machines or direct installs on Windows, Mac, or Linux.
Code by yourself to implement algorithms and build muscle memory through practice, using x and y data. In supervised learning, use the fit and predict to train across models.
Practice test driven development by writing tests first to shape API design and guide implementation. Alternate theory and code, implement yourself, and build intuition through hands-on coding and testing.
Learn why Jupyter notebook offers no real advantage; Python code runs identically in notebook, console, or IPython, and you should rely on print statements to verify behavior.
Discover the cutting edge of reinforcement learning with a fresh, evolutionary approach. In this course, you’ll master Evolution Strategies (ES) and Augmented Random Search (ARS) - two powerful algorithms that bypass many of the challenges of traditional deep RL, while still achieving state-of-the-art results.
Unlike gradient-heavy methods, these algorithms are simple, scalable, and surprisingly effective. You’ll implement them from scratch in Python and apply them to exciting real-world problems:
MuJoCo Environments: Train agents to walk, run, and jump in a physics-based simulation that’s widely used in robotics research. Watching your neural network–powered agent learn to control a simulated robot is one of the most rewarding experiences in reinforcement learning.
Algorithmic Trading: Apply evolutionary RL to trading strategies, where direct gradients are difficult to define. You’ll see how these algorithms adapt naturally to noisy, complex environments like financial markets.
By the end of this course, you’ll have:
A deep understanding of ES and ARS, and how they compare to policy gradients and Q-learning.
Working Python implementations you can extend to your own projects.
The skills to leverage evolutionary AI in domains ranging from robotics to finance.
If you’re ready to move beyond the usual deep RL algorithms and explore approaches that are elegant, efficient, and highly practical, this course is for you.
Tools and Libraries
Python (with full code walkthroughs)
Gymnasium (formerly OpenAI Gym)
NumPy, Matplotlib
Why This Course?
Version 2 updates: Streamlined content, clearer explanations, and updated libraries.
Real implementations: Go beyond theory by building working agents — no black boxes.
For all levels: Includes a dedicated review section for beginners and deep dives for advanced learners.
Proven structure: Designed by an experienced instructor who has taught thousands of students to success in AI and machine learning.
Who Should Take This Course?
Data Scientists and ML Engineers who want to break into Reinforcement Learning
Students and Researchers looking to apply RL in academic or practical projects
Developers who want to build intelligent agents or AI-powered games
Anyone fascinated by how machines can learn through interaction
Join thousands of learners and start mastering Reinforcement Learning today — from theory to full implementations of agents that think, learn, and play.
Enroll now and take your AI skills to the next level!