Mathematics Behind Backpropagation | Theory and Python Code

Name: Mathematics Behind Backpropagation | Theory and Python Code
Rating: 4.7 (85 reviews)

Implement Backpropagation & Gradient Descent from scratch in your own neural network, then code it Without any Libraries

Created byPatrik Szepesi

Last updated 1/2025

English

What you'll learn

Understand and Implement Backpropagation by Hand and Code
Understand the Mathematical Foundations of Neural Networks
Build and Train Your Own Feedforward Neural Network in Python without any Libraries
Explore Common Pitfalls in Backpropagation
Numerically Calculate Derivatives, Partial Derivatives and Gradients through Examples
Find the Derivatives of Loss Functions and Activation Functions
Undestand What Derivatives are
Visualize Gradient Descent in Action
Implement Gradient Descent by Hand
Use Python to code Multiple Neural Networks
Undertand how Partial Derivatives Work in Backpropagation
Understand Gradients and How they guide Machines to Learn
Learn Why we Use Activation Functions
Understand the Role of Learning Rates in Gradient Descent

Course content

4 sections • 40 lectures • 4h 37m total length

What is this Course2:58
Master backpropagation by building a simple neural network, implementing gradient descent in Python, and deriving learning rate, partial derivatives, gradients, chain rule, and mean squared error.

Introduction to Our Simple Neural Network6:52
Explore a simple neural network with one input, two weights, and a true label. Trace backpropagation and gradient descent toward the target output, using a hidden layer and computation graph.
Why We Use Computational Graphs6:18
Conducting the Forward Pass6:54
Perform the forward pass of a neural network before updating weights, calculating y_hat from x, w1, and w2, and use mean squared error to guide updates.
Roadmap to Understanding Backpropagation2:46
Explore the roadmap to understanding backpropagation by linking derivatives, partial derivatives, gradients, and gradient descent to reduce neural network loss and update weights.
Derivatives Theory4:27
Explore derivatives as measures of a function’s rate of change, illustrated by slopes of tangent lines and numerical examples, laying foundations before partial derivatives and gradients in backpropagation.
Numerical Example of Derivatives13:43
Explore a numerical derivative example using f(x)=x^3-3x^2+2x+1 to illustrate the power rule, tangent slope, rate of change, and backpropagation intuition with small delta x.
Understanding Partial Derivatives8:02
Learn how partial derivatives isolate each input in multivariable functions, revealing how x, y, or z changes the output, and relate these rates to weight updates in backpropagation.
Understanding Gradients3:52
Learn that a gradient collects all partial derivatives and forms the gradient vector. It points to the steepest increase, guiding gradient descent toward the minimum.
Understanding What Partial Derivatives Do (Example)10:18
Learn how partial derivatives reveal the rate of change for each input on f(x,y)=x^2+y^2, with gradients six and eight at x=3, y=4.
Introduction to Backpropagation5:00
Explore back propagation by deriving gradients of the loss with respect to w1 and w2 via the chain rule after the forward pass with mean squared error, guiding backpropagation.
Understanding the Chain Rule (Optional)7:35
Explore the chain rule for composite functions and how multiplying partial derivatives reveals how weights influence the loss in neural networks.
Gradient Derivation of the Mean Squared Error Loss Function7:35
Compute the gradient of the mean squared error with respect to y_hat using the chain rule, yielding dL/dy_hat = y_hat - y to guide backpropagation and adjust W1 and W2.
Visualizing the Loss Function + Gradients11:42
Visualize the loss function with respect to y_hat, compute its gradient and partial derivative, then use backpropagation and chain rule to adjust weights via gradient descent.
Using the Chain rule to Calculate the Gradient of w219:02
Using the Chain Rule to Calculate the Gradient of w14:38
Use the chain rule to compute the gradient of w1 from the partial derivative of y hat, link it to the loss, and prepare backpropagation and gradient descent.
Visualizing Gradient Descent10:14
Visualize gradient descent on a 3d loss surface for w1 and w2, tracking how initial weights update iteratively via back propagation to minimize loss and approach the ideal solution.
Introduction to Gradient Descent6:14
Learn gradient descent: minimize the loss function like mean squared error by updating weights w1 and w2 opposite the gradient, scaled by the learning rate alpha.
Understanding the Learning Rate (Alpha)8:19
Understand how the learning rate alpha sets the step size in gradient descent, preventing overshoot and slow progress, and how adaptive optimizers like Adagrad, RMSprop, and Adam adjust alpha.
Moving in the Opposite Direction of the Gradient5:35
Explore how moving in the opposite direction of the gradient updates weights and predictions via gradient descent, using positive and negative gradients to reduce loss.
Calculating Gradient Descent by Hand8:47
Compute the gradients of the loss with respect to w1 and w2 via the chain rule, then update with gradient descent (alpha 0.01) to reduce the mean squared error.
Coding our Simple Neural Network Part 14:34
Implement gradient descent and backpropagation from scratch in Google Colab using numpy, performing forward pass and mean squared error for a net with x=2, y=20, w1=2, w2=0.5, learning rate of 0.01.
Coding our Simple Neural Network Part 27:20
Implement backpropagation by computing the gradients (partial derivatives) of the loss with respect to w1 and w2 using the chain rule, based on the forward pass and y hat.
Coding our Simple Neural Network Part 36:49
Implement a training loop for a simple neural network, initializing weights, performing forward passes, calculating mean squared error, and computing gradients for backpropagation before updating weights in gradient descent.
Coding our Simple Neural Network Part 45:09
Implement gradient descent to update neural network weights w1 and w2 using gradients dL/dW1 and dL/dW2, perform forward passes to observe loss reduction across ten training epochs.
Coding our Simple Neural Network Part 55:33
Demonstrate back propagation by running a simple neural network, update weights W1 and W2 through gradient descent, and confirm loss decreases toward zero as y hat approaches 20.

Introduction to Our Advanced Neural Network5:35
Demonstrate backpropagation and gradient descent on a network with two inputs and sigmoid activation. Trace the forward pass, composite functions, and mean squared error to show activation and non-linearity.
Conducting the Forward Pass4:31
Conduct the forward pass to compute h1 and y hat using sigmoid activations with weights w1, w2, and w3. Then assess the mean squared error loss before moving to backpropagation.
Getting Started with Backpropagation4:54
Learn to compute backpropagation gradients for weights W1, W2, W3, propagate the error through the sigmoid to the MSE loss, and update with gradient descent at learning rate 0.1.
Getting the Derivative of the Sigmoid Activation Function (Optional)7:42
Derive the sigmoid derivative using the chain rule, showing dyhat/dz2 equals yhat times (1 minus yhat) and explaining how small changes in z2 affect yhat.
Implementing Backpropagation with the Chain Rule4:55
Apply the chain rule to relate z2 to the loss by combining the gradient of y hat with respect to z2 and the gradient of the loss with respect to y hat.
Understanding How w3 Affects the Final Loss6:10
Apply the chain rule: dL/dW3 = dL/dZ2 × dZ2/dW3 = -0.11222 × 0.3775 = -0.04239, with h1 = 0.3775, so a negative gradient suggests increasing W3 to minimize the loss.
Calculating Gradients For Z17:42
The lecture demonstrates how to compute the gradient dL/dz1 by chaining partial derivatives: dz2/dh1 = w3 and dh1/dz1 = h1(1−h1). Using dL/dz2 = -0.11222, w3 = 0.5, and h1 = 0.3775, it yields dL/dz1 ≈ -0.01318 and shows how backpropagation proceeds toward W1 and W2.
Understanding How w1 & w2 Affect the Loss5:06
Implementing Gradient Descent By Hand8:34
Coding our Advanced Neural Network Part (Implementing Forward Pass + Loss)7:05
Demonstrate a forward pass and loss calculation for an advanced neural network in a Colab notebook, implementing sigmoid, its derivative, and gradient descent.
Coding our Advanced Neural Network Part 2 (Implement Backpropagation)10:41
Coding our Advanced Neural Network Part 3 (Implement Gradient Descent)5:51
Update w1, w2, and w3 via gradient descent using old values minus learning rate times their loss derivatives. Print epoch, y hat, error, and updated weights.
Coding our Advanced Neural Network Part 4 (Training our Neural Network)8:21
Run the neural network through ten epochs, compare forward pass predictions y hat with targets, and observe decreasing loss as weights w1, w2, w3 update via backprop and gradient descent.

Requirements

basic python knowledge
high school mathematics

Description

Unlock the secrets behind the algorithm that powers modern AI: backpropagation. This essential concept drives the learning process in neural networks, powering technologies like self-driving cars, large language models (LLMs), medical imaging breakthroughs, and much more.

In Mathematics Behind Backpropagation | Theory and Code, we take you on a journey from zero to mastery, exploring backpropagation through both theory and hands-on implementation. Starting with the fundamentals, you'll learn the mathematics behind backpropagation, including derivatives, partial derivatives, and gradients. We’ll demystify gradient descent, showing you how machines optimize themselves to improve performance efficiently.

But this isn’t just about theory—you’ll roll up your sleeves and implement backpropagation from scratch, first calculating everything by hand to ensure you understand every step. Then, you’ll move to Python coding, building your own neural network without relying on any libraries or pre-built tools. By the end, you’ll know exactly how backpropagation works, from the math to the code and beyond.

Whether you're an aspiring machine learning engineer, a developer transitioning into AI, or a data scientist seeking deeper understanding, this course equips you with rare skills most professionals don’t have. Master backpropagation, stand out in AI, and gain the confidence to build neural networks with foundational knowledge that sets you apart in this competitive field.

Who this course is for:

Data Scientists who want to deepen their understanding of the mathematical underpinnings of neural networks.
Aspiring Machine Learning Engineers who want to build a strong foundation in the algorithms that power AI.
Software Developers looking to transition into the exciting world of machine learning and AI.
Students and Enthusiasts eager to learn how machine learning really works under the hood.
Professionals aiming to stay competitive in the era of LLMs and advanced AI by mastering skills beyond basic frameworks.

Mathematics Behind Backpropagation | Theory and Python Code

What you'll learn

Explore related topics

Course content

What We're Going to Learn1 lecture • 3min

Course Resources1 lecture • 1min

Neural Networks, Derivatives, Gradients, Chain Rule, Gradient Descent and more25 lectures • 3hr 7min

Implementing Our Advanced Neural Network By Hand + Python13 lectures • 1hr 27min

Requirements

Description

Who this course is for: