Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Deep Learning: Advanced Computer Vision (GANs, SSD, +More!)

Name: Deep Learning: Advanced Computer Vision (GANs, SSD, +More!)
Rating: 4.8 (7116 reviews)

VGG, ResNet, Inception, SSD, RetinaNet, Neural Style Transfer, GANs +More in Tensorflow, Keras, and Python

Created byLazy Programmer Inc., Lazy Programmer Team

Last updated 3/2026

English

English [Auto],Italian [Auto],

What you'll learn

Understand and apply transfer learning
Understand and use state-of-the-art convolutional neural nets such as VGG, ResNet and Inception
Understand and use object detection algorithms like SSD
Understand and apply neural style transfer
Understand state-of-the-art computer vision topics
Class Activation Maps
GANs (Generative Adversarial Networks)
Object Localization Implementation Project
Understand important foundations for OpenAI ChatGPT, GPT-4, DALL-E, Midjourney, and Stable Diffusion

Course content

19 sections • 115 lectures • 17h 10m total length

Introduction2:35
Bridge basic cnn architectures to modern models such as VGG, ResNet, and Inception for blood cell image analysis, enabling medical expert system with SSD object detection and neural style transfer.
Outline and Perspective6:49
Explore advanced convolutional neural networks, including VGG, ResNet, and Inception, and master transfer learning for faster training on tasks like SSD object detection and style transfer.
How to Succeed in this Course3:04
Learn how to succeed in this course by asking questions in the Q&A, meeting prerequisites, taking handwritten notes for conceptual lectures, and coding what you see in coding lectures.

Where to get the code, notebooks, and data4:29
Discover where to access the course code and data: Colab notebooks are the main format, while plain text Python files live on GitHub, and notebooks are not on GitHub.
Intro to Google Colab, how to use a GPU or TPU for free12:32
Explore Google Colab, a cloud-based notebook platform that runs Python with GPU or TPU support, sharing via Google Drive, and preinstalled libraries for deep learning.
Uploading your own data to Google Colab11:41
Learn how to upload your own data to Colab using wget, tf get_file, or direct uploads; load datasets with pandas, adjust headers and delimiters, and access files via Google Drive.
Where can I learn about Numpy, Scipy, Matplotlib, Pandas, and Scikit-Learn?11:00
Explore the NumPy stack basics for deep learning, including NumPy arrays, tensors, Matplotlib charts, pandas for CSV files, SciPy tools, and scikit-learn basics for classification and regression.
Temporary 403 Errors2:57
Learn to handle 403 download errors from lazyprogrammer.me by downloading in a browser. Upload the file into Colab via the file explorer by drag-and-drop, and understand public IP blocks.

What is Machine Learning?14:26
Learn how machine learning reduces to geometry, with regression and classification as core supervised tasks; visualize data points, features, and decision boundaries to separate categories.
Code Preparation (Classification Theory)15:59
Explore binary classification theory and implement logistic regression in TensorFlow 2.0 using Keras, building a Dense layer with sigmoid activation and training via binary cross-entropy with validation.
Beginner's Code Preamble4:38
Beginner-friendly guidance on how to approach coding lectures, explaining why coding along hinders learning, emphasizing thinking over typing, and pointing to self-guided study resources and the course appendix.
Classification Notebook22:21
Perform linear classification on the breast cancer dataset to predict malignant versus benign tumors using TensorFlow 2 with a sigmoid output. Train and evaluate with a train-test split and accuracy.
Code Preparation (Regression Theory)7:18
Load and normalize data, build a no-activation linear regression model, train with SGD using mean squared error, and evaluate; apply log-transformed regression to Moore's Law.
Regression Notebook27:28
Learn regression in TensorFlow by building a two-layer model and visualizing loss with Matplotlib. Apply a log transform to analyze exponential growth and estimate doubling time.
The Neuron9:58
Explore how linear and logistic regression form the foundations of neural computation, showing how weighted inputs, bias, and the sigmoid produce neuron-like action potentials.
How does a model "learn"?10:53
Explore how a model learns from linear regression to gradient descent, using mean squared error and learning rate eta to update weights w and b with automatic differentiation in TensorFlow.
Making Predictions6:45
Explore making predictions with TensorFlow 2 using the Keras API, converting predicted probabilities to class labels, flattening outputs, and validating accuracy via manual checks and model.evaluate for classification and regression.
Saving and Loading a Model4:27
Save and load a TensorFlow model with Keras, load via tf.keras.models.load_model, verify accuracy, and avoid a bug by not using explicit input layer, specifying input shape in dense layer.
Suggestion Box3:10
Share your background, course, and difficulty through the suggestion box at lazy programmer.me/suggestions to help tailor advanced computer vision content, address missing explanations, and request future topics.

Artificial Neural Networks Section Introduction6:00
Introduce artificial neural networks, focusing on feedforward networks, their architecture, activation functions, and multi-class classification, with applications to image data.
Forward Propagation9:40
Explain forward propagation through a deep neural network, showing how inputs pass through layers and neurons, compute Z with weight matrices and biases, then apply a sigmoid for binary classification.
The Geometrical Picture9:43
Neural networks learn non-linear boundaries with multiple neurons and hidden layers, removing the need for manual feature engineering. Gradient descent optimizes weights to fit complex data and learn non-linear boundaries.
Activation Functions17:18
Explore activation functions from sigmoid and tanh to ReLU, Leaky ReLU, ELU, and Softplus, and learn how they address vanishing gradients and dead neurons in deep networks.
Multiclass Classification8:41
Learn how multi-class classification maps final-layer activations to a probability distribution over K outcomes using the softmax function, with examples like ImageNet's 1000 categories.
How to Represent Images12:36
Explore how images are represented in computers, with height, width, and RGB channels, covering quantization, grayscale, 3d tensors, and how flattening yields N x D inputs for neural networks.
Color Mixing Clarification0:54
Explore why mixing primary colors in paint doesn't yield white, and how RGB color representation in computers differs, illustrating basic color mixing with practical examples.
Code Preparation (ANN)12:42
Load the MNIST handwritten digits data from tf.keras.datasets, build a feedforward Keras model with flatten, dense, and dropout layers, train with sparse_categorical_crossentropy, and evaluate its predictions.
ANN for Image Classification8:36
Demonstrates a feedforward neural network for MNIST image classification using TensorFlow 2.0 in Colab, covering data normalization, model layers, training, evaluation, and confusion matrix analysis.
ANN for Regression11:05
Demonstrate a Colab notebook that trains a two-layer neural network on synthetic two-dimensional data to fit a non-linear cosine function, visualizing the 3D surface and loss convergence.

What is Convolution? (part 1)16:38
Explore convolution as an image modifier: slide a filter over an input image to produce a transformed output via addition and multiplication; deep learning treats this as cross-correlation.
What is Convolution? (part 2)5:56
See convolution as a sliding pattern finder that uses dot products and cosine similarity to detect patterns, linking cross-correlation and pearson correlation in feature extraction.
What is Convolution? (part 3)6:41
Explore the equivalence between convolution and matrix multiplication, illustrate with a one-dimensional example, and show how weight sharing enables efficient, translationally invariant pattern finding across image locations.
Convolution on Color Images15:58
Extend convolution to color images by using a 3D input and 3D filter, producing multiple feature maps via stacking outputs and applying bias and activation in a neural network layer.
CNN Architecture20:58
Learn the standard CNN architecture: a convolutional stage with pooling followed by dense layers, with guidance on pool sizes, strides, feature maps, and techniques like global max pooling and flattening.
CNN Code Preparation15:13
Explore CNN code preparation for image classification using TensorFlow's Keras functional API, loading Fashion MNIST and CIFAR-10 data, building Conv2D networks, training, evaluating, and predictions.
CNN for Fashion MNIST6:46
Explore constructing a convolutional neural network with TensorFlow 2.0 on the Fashion MNIST dataset in Colab, including data shaping for 3D input, the Keras functional API, dropout, and evaluation insights.
CNN for CIFAR-104:28
Explore image classification on CIFAR-10, a 10-class dataset, using a TensorFlow 2.0 CNN in Colab, detailing data loading, architecture, GPU training, and overfitting via plots.
Data Augmentation8:51
Explore data augmentation for image models, using on-the-fly generation with Keras image data generators to expand training data through rotations, flips, brightness, and other transforms, boosting generalization.
Batch Normalization5:14
Apply batch normalization in convolutional networks by normalizing each batch’s mean and standard deviation, then re-scale and re-shift with gamma and beta.
Improving CIFAR-10 Results19:11
Build a CNN classifier for CIFAR-10 with data augmentation using the TensorFlow Keras functional API, featuring conv layers, batch norm, max pooling, dropout, and a softmax output.

VGG Section Intro3:04
Introduce the VGG network, its history and architecture, including VGG16’s layered convolution, pooling, and fully connected blocks, and its role in image classification and localization.
What's so special about VGG?7:00
Explore how VGG differs from earlier networks, why pre-trained weights accelerate development, and how Keras and open-source models like ResNet and Inception enable fast prediction without retraining.
Transfer Learning8:22
Use transfer learning with VGG as a feature transformer, remove the last layer, and train a new classifier.
Relationship to Greedy Layer-Wise Pretraining2:19
Explore greedy layer-wise pre-training for transfer learning by stacking autoencoders to build a deep network, then add a logistic aggression and compare training options.
2 Approaches to Transfer Learning5:01
Compare two transfer learning approaches for large CNNs: pre-compute feature vectors z and train a logistic regression, or train with data augmentation inside the loop.
Transfer Learning Code (pt 1)23:42
Explore transfer learning with data augmentation in TensorFlow using a VGG16 pretrained model, building a small classifier with augmented images and train/validation pipelines.
Transfer Learning Code (pt 2)24:48
Apply transfer learning without data augmentation by extracting feature vectors from a pre-trained model and training a top classifier; explore vector representations for vision, NLP, and cross-modal matching.
VGG Section Summary1:47
summarize the vgg section by highlighting transfer learning, reusing the network body and attaching a new head to train faster for SSD and style transfer.

ResNet Section Intro2:49
Explore the resnet architecture, its origins, and how it overcomes LeNet-like limitations, with Keras integration and an optional TensorFlow implementation, plus application to fruits and blood cells.
ResNet Architecture12:45
Explore how residual blocks with shortcut paths enable deep networks, detailing identity and conv blocks, and show a 50-layer ResNet with no extra parameters.
Transfer Learning with ResNet in Code22:07
Apply transfer learning with ResNet in code using TensorFlow and Keras, fine-tuning a pre-trained model on a 60-class fruits dataset. Assess performance with confusion matrices, accuracy, and comparison with VGG16.
Blood Cell Images Dataset3:02
Explore the blood cell images dataset from Kaggle and learn how to train a neural network to classify blood cells, augmenting human expertise with automated vision.
How to Build ResNet in Code11:16
Build and train a ResNet from scratch in Keras on blood cell images, without transfer learning. Implement identity and conv blocks, and evaluate with confusion matrices.
1x1 Convolutions4:03
1x1 convolutions act as a shared dense layer per pixel, performing a dot product between input feature maps to produce a C2 output, enabling per-image processing in parallel.
Optional: Inception6:47
Explore the inception module, a multi-branch cnn with 1x1, 3x3, 5x5 convolutions and pooling, side branches for multi-scale training, including GoogLeNet, Inception V3/V4, and Inception-ResNet.
Different sized images using the same network4:12
Use global pooling, including global max or average pooling, to convert variable-sized feature maps into a fixed 1 by 1 by 1024 representation, enabling the same network to process inputs.
ResNet Section Summary2:27
introduce residual blocks with shortcut connections to ease identity learning in deep networks, enabling ResNet to achieve deeper architectures, while also covering inception concepts like one-by-one convolutions and side branches.

SSD Section Intro5:04
Explore how CNNs move from classification to object localization and detection with SSD, a real-time single-shot detector, and contrast it with YOLO and RCNN, including sliding windows.
Object Localization6:36
Learn how to extend convolutional neural networks from classification to object localization by predicting bounding boxes and presence, using normalized coordinates and a combined loss of cross-entropy and regression.
What is Object Detection?2:53
Explore the fundamentals of object detection, from locating multiple objects and their bounding boxes to real-time accuracy in self-driving systems, and examine how ssd networks handle variable object counts.
How would you find an object in an image?8:40
Transform a sliding window search into a convolutional network to locate objects in an image, turning dense layers into convolution and enabling single shot detection (SSD) without region proposals.
The Problem of Scale3:47
Address the problem of scale in object detection by leveraging SSD's multi-scale feature maps and attached mini networks, using transfer learning with backbones like VGG, ResNet, and Inception.
The Problem of Shape3:52
Addressing shape and aspect ratio in object detection, SSD applies default boxes at every scale (small, big, tall, wide) to decide if objects fit, combining boxes with regression.
SSD Tensorflow Object Detection API (pt 1)12:04
Explore how to set up TensorFlow's separate object detection API, install protobuf, clone the models repo, and select a pre-trained SSD model from the zoo with MobileNet as the backbone.
SSD Tensorflow Object Detection API (pt 2)12:15
Learn to use the SSD TensorFlow object detection API to load COCO labels, load a saved model, run detections, and visualize bounding boxes and class names on images.
SSD for Video Object Detection11:59
Learn to run object detection on video by looping through each frame, using a pre-trained TensorFlow object detector, and compare performance when using the GPU versus not using the GPU.
Optional: Intersection over Union & Non-max Suppression5:06
Apply the Jaccard index, or intersection over union, to measure overlap between ground-truth and predicted bounding boxes. Use non-max suppression to keep a single, high-scoring box per object.
SSD Section Summary2:52
Learn how SSD enables fast, accurate object detection by unifying localization and classification, addressing scale, shape, and sliding window challenges, with IOU and non-max suppression, for applications like self-driving cars.

Style Transfer Section Intro2:52
Explore neural style transfer by applying an artwork's style to a photo with two inputs, learn theory and coding steps to optimize and balance content and style with a CNN.
Style Transfer Theory11:23
Explore the theory of neuro style transfer, balancing content and style by optimizing the input image via content loss and style loss using a pre-trained VGG CNN and gram matrices.
Optimizing the Loss8:02
Optimize the input image by minimizing the combined content and style loss with L-BFGS in Scipy, using gradients from the Keras/Theano backend and wrapping the interface for vector inputs.
Code pt 17:46
Recreate content in code as the first half of style transfer by building a VGG16 average-pool network, defining a content loss, and optimizing an input image.
Code pt 27:13
Learn to recreate style in code using gram matrices and multi-output VGG features, and optimize style loss to transfer painting style, such as Starry Night.
Code pt 33:50
Combine prior style transfer code to generate a final output by loading content and style images, building vgg-based models, and optimizing losses with gradients.
Style Transfer Section Summary2:21
Learn neural style transfer in advanced convolutional networks by balancing content loss and style loss to produce images with content structure and style from a reference image, using gram matrices.

Class Activation Maps (Theory)7:09
Learn how class activation maps turn image classification into object localization by overlaying a heat map generated from class-specific weights in a pre-trained ResNet.
Class Activation Maps (Code)9:54
Demonstrate drawing a class activation map with ResNet using class_activation_maps.py. Load images, extract 7x7x2048 features, weight them, upsample, decode predictions, and overlay CAM on the image.

Requirements

Know how to build, train, and use a CNN using some library (preferably in Python)
Understand basic theoretical concepts behind convolution and neural networks
Decent Python coding skills, preferably in data science and the Numpy Stack

Description

Ever wondered how AI technologies like OpenAI ChatGPT, GPT-4, DALL-E, Midjourney, and Stable Diffusion really work? In this course, you will learn the foundations of these groundbreaking applications.

This is one of the most exciting courses I’ve done and it really shows how fast and how far deep learning has come over the years.

When I first started my deep learning series, I didn’t ever consider that I’d make two courses on convolutional neural networks.

I think what you’ll find is that, this course is so entirely different from the previous one, you will be impressed at just how much material we have to cover.

Let me give you a quick rundown of what this course is all about:

We’re going to bridge the gap between the basic CNN architecture you already know and love, to modern, novel architectures such as VGG, ResNet, and Inception (named after the movie which by the way, is also great!)

We’re going to apply these to images of blood cells, and create a system that is a better medical expert than either you or I. This brings up a fascinating idea: that the doctors of the future are not humans, but robots.

In this course, you’ll see how we can turn a CNN into an object detection system, that not only classifies images but can locate each object in an image and predict its label.

You can imagine that such a task is a basic prerequisite for self-driving vehicles. (It must be able to detect cars, pedestrians, bicycles, traffic lights, etc. in real-time)

We’ll be looking at a state-of-the-art algorithm called SSD which is both faster and more accurate than its predecessors.

Another very popular computer vision task that makes use of CNNs is called neural style transfer.

This is where you take one image called the content image, and another image called the style image, and you combine these to make an entirely new image, that is as if you hired a painter to paint the content of the first image with the style of the other. Unlike a human painter, this can be done in a matter of seconds.

I will also introduce you to the now-famous GAN architecture (Generative Adversarial Networks), where you will learn some of the technology behind how neural networks are used to generate state-of-the-art, photo-realistic images.

Currently, we also implement object localization, which is an essential first step toward implementing a full object detection system.

I hope you’re excited to learn about these advanced applications of CNNs, I’ll see you in class!

AWESOME FACTS:

One of the major themes of this course is that we’re moving away from the CNN itself, to systems involving CNNs.
Instead of focusing on the detailed inner workings of CNNs (which we've already done), we'll focus on high-level building blocks. The result? Almost zero math.
Another result? No complicated low-level code such as that written in Tensorflow, Theano, or PyTorch (although some optional exercises may contain them for the very advanced students). Most of the course will be in Keras which means a lot of the tedious, repetitive stuff is written for you.

"If you can't implement it, you don't understand it"

Or as the great physicist Richard Feynman said: "What I cannot create, I do not understand".
My courses are the ONLY courses where you will learn how to implement machine learning algorithms from scratch
Other courses will teach you how to plug in your data into a library, but do you really need help with 3 lines of code?
After doing the same thing with 10 datasets, you realize you didn't learn 10 things. You learned 1 thing, and just repeated the same 3 lines of code 10 times...

Suggested Prerequisites:

Know how to build, train, and use a CNN using some library (preferably in Python)
Understand basic theoretical concepts behind convolution and neural networks
Decent Python coding skills, preferably in data science and the Numpy Stack

WHAT ORDER SHOULD I TAKE YOUR COURSES IN?:

Check out the lecture "Machine Learning and AI Prerequisite Roadmap" (available in the FAQ of any of my courses, including the free Numpy course)

UNIQUE FEATURES

Every line of code explained in detail - email me any time if you disagree
No wasted time "typing" on the keyboard like other courses - let's be honest, nobody can really write code worth learning about in just 20 minutes from scratch
Not afraid of university-level math - get important details about algorithms that other courses leave out

Who this course is for:

Students and professionals who want to take their knowledge of computer vision and deep learning to the next level
Anyone who wants to learn about object detection algorithms like SSD and YOLO
Anyone who wants to learn how to write code for neural style transfer
Anyone who wants to use transfer learning
Anyone who wants to shorten training time and build state-of-the-art computer vision nets fast

Deep Learning: Advanced Computer Vision (GANs, SSD, +More!)

What you'll learn

Explore related topics

Course content

Welcome3 lectures • 12min

Google Colab & Getting Setup5 lectures • 43min

Machine Learning Basics Review11 lectures • 2hr 7min

Artificial Neural Networks (ANN) Review10 lectures • 1hr 37min

Convolutional Neural Networks (CNN) Review11 lectures • 2hr 6min

VGG and Transfer Learning8 lectures • 1hr 16min

ResNet (and Inception)9 lectures • 1hr 9min

Object Detection (SSD / RetinaNet)11 lectures • 1hr 15min

Neural Style Transfer7 lectures • 43min

Class Activation Maps2 lectures • 17min

Requirements

Description

Who this course is for: