Generative Adversarial Networks (GANs): Complete Guide

Name: Generative Adversarial Networks (GANs): Complete Guide
Rating: 4.7 (342 reviews)

Deep Learning and Computer Vision to implement projects using one of the most revolutionary technologies in the world!

Highest Rated

Created byJones Granatyr, Gabriel Alves, AI Expert Academy

Last updated 11/2023

English

What you'll learn

Understand the basic intuition about GANs
Generate images of digits (0 - 9) using DCGAN and WGAN
Transform satellite images into maps using Pix2Pix architecture
Transform zebras into horses using CycleGAN architecture
Transfer styles between images
Apply super resolution to improve image quality using ESRGAN architecture
Create new faces of people with high quality and definition using StyleGAN
Generate images through textual descriptions
Restore old photos using GFP-GAN
Complete missing parts of images using Boundless architecture
Generate deepfakes to swap faces with SimSwap

Course content

10 sections • 112 lectures • 16h 48m total length

Course content15:09
Explore practical GAN architectures, from DCGAN and WGAN to Pix2Pix and Cycle-GAN, applying style transfer, super-resolution, and face generation with Google Colab GPUs.
Introduction to GANs18:21
Explore generative adversarial networks, a 2014 concept from Ian Goodfellow, where a generator and a discriminator create new content across images and media.
How GANs work13:37
Explore how generative adversarial networks pair a generator and a discriminator that compete to produce realistic images from random noise, with the discriminator evaluating real versus fake images during training.
Course materials0:07

DCGAN - intuition8:13
Explore the intuition behind dcgan, the deep convolutional generative adversarial network, using a two-network architecture with generator and discriminator to produce realistic images from random noise.
MNIST dataset17:05
Learn to implement MNIST handwritten digit generation by loading and preprocessing the MNIST dataset in TensorFlow, normalizing images, and applying mini-batch gradient descent in Colab.
Building the generator19:53
Build the generator that converts 100-dim random noise into a 28x28 grayscale image using dense, reshape, batch normalization, leaky relu, and conv-transpose layers.
Building the discriminator12:03
Build a convolutional discriminator that accepts 28 by 28 images and outputs real or fake, using 14 by 14 and 7 by 7 feature maps, leaky relu and dropout, dense output.
Loss (error) calculation10:27
We define discriminator and generator losses using binary cross-entropy with from_logits=True, labeling real images as ones and fake as zeros, guiding training with Adam optimizers.
A quick note about the code0:26
Training12:32
Learn to train a generative adversarial network with TensorFlow, using a 256-image batch, 100 epochs, and a 100-d noise vector, updating generator and discriminator with gradient tape.
Visualizing the results11:07
Visualize GAN results by training a generator with random noise and examining 16 grayscale digits across 100 epochs and 234 batches, with visuals showing progressive improvement.
HOMEWORK and solution0:18
WGAN - intuition 116:53
Explore Wasserstein GANs and the Wasserstein loss to improve training stability, reduce mode collapse, and alleviate vanishing gradients through the critic and earthmover distance.
WGAN - intuition 212:29
Explore the intuition of the WGAN, comparing real and fake image distributions using the Wasserstein distance, and address gradient issues, weight clipping, and the gradient penalty for better quality.
WGAN-GP - intuition6:16
Explore how WGAN-GP replaces weight clipping with a gradient penalty to enforce Lipschitz constraints, stabilizing training by penalizing the gradient norm on real, generated, and interpolated samples.
Preparing the environment5:52
Prepare the environment to implement WGAN GP in Colab, load MNIST data with TensorFlow 2.12, normalize images to -1 to 1, and initialize generator and discriminator.
Wassertein loss9:15
Define loss generator and loss discriminator for Wasserstein loss, comparing discriminator outputs on real and fake images, and apply gradient penalty with lambda per WGAN GP.
Gradient penalty16:01
Implement the gradient penalty for a GAN by interpolating real and fake images with epsilon, computing the gradient norm minus one squared via tf gradient tape.
Training 112:34
Train a GAN with 30 epochs and a 100 dimensional noise vector, generating 16 images per seed, with the discriminator updated three times per generator update under a WGAN-GP penalty.
Training 2 and visualization13:05
Finish the wgan-gp implementation and compare the results with dcgans, training for 30 epochs and saving and visualizing generated images.
HOMEWORK and solution0:19

cGAN - intuition13:32
Explore conditional generative adversarial networks (cGANs) that pair latent vectors with class information to guide the generator and discriminator, enabling targeted images and concepts like pix2pix and cyclegan.
Pix2Pix - intuition13:54
Explore the Pixtopix architecture for image-to-image translation with a conditional gan, using paired datasets, a generator and discriminator, and training with sigmoid cross-entropy and l1 loss.
Map dataset9:13
Explore pix2pix style image-to-image translation by converting real street images to map-like images using the maps dataset in Google Colab, and train the generator to produce maps.
Preprocessing the images 18:59
Implement preprocessing to split each image into left and right halves, decode and resize to 256x512, convert to float32, normalize by 255, and test visualization for the pixtopix architecture.
Preprocessing the images 216:56
Continue building the gan data pipeline by counting training images, filtering jpgs, resizing to 256 by 256, normalizing to -1 to 1, and applying random crop, jitter, and optional flips.
Loading the data9:03
Load and preprocess training and testing image datasets in TensorFlow, applying resizing, normalization, and random jitter, then shuffle and batch them for GAN training.
Building the generator 119:36
Learn to build the generator for GANs with a modified u-net design, featuring encoder and decoder blocks, skip connections, convolutional layers, batch normalization, leaky relu, and upsampling for Pixtopix.
Building the generator 221:39
Build a generator by stacking eight encoder blocks and eight decoder blocks into a u-net like structure, using 256×256×3 inputs and skip connections.
Building the generator 38:37
Continue building the generator, refining the encoder–decoder and Unet, and implement the generator loss combining GAN loss with an L1 loss and fixed learning rate and lambda.
Building the discriminator 116:00
Learn how to build a patchgan discriminator that classifies each image patch as real or fake, using two inputs (original and transformed), concatenation, and a 30 by 30 output.
Building the discriminator 26:15
Finish implementing the discriminator to process original and transformed images as patch outputs, compute real and fake losses via sigmoid cross-entropy, and configure Adam optimizers and checkpoints for GAN training.
Generating the images6:35
Implement a function to generate images using the generator in a generative adversarial network, feeding test inputs and the real image as ground truth, and visualize results during training.
Training 113:49
Train a generative adversarial network end-to-end by generating fake images with the generator, evaluating them with the discriminator against real images, and applying GAN and L1 losses.
Training 2 and results22:23
Finish the pixtopix training pipeline by implementing the training loop, visualization with tensorboard, and checkpointing, then evaluate generator and discriminator losses across steps.
Pretrained Pix2Pix with PyTorch11:58
Load and use a pre-trained pix2pix model with PyTorch in Google Colab, avoiding training time while exploring cycle-gan and pix2pix architectures for image translation.
Facades dataset4:54
Download and extract the facades dataset from the provided url, then explore train, test, and validation folders and pair edges with real images to train a generator.
Visualizing the results8:39
Test a pretrained pix2pix GAN on facades dataset in the B to A direction, generating real A, real B, and fake B, and visualize results with the latest net G.
Drawing to photo 15:14
Learn how to translate drawings to photos using the edges to shows model in a PyTorch workflow, including setting up directories, downloading the pre-trained model, and loading test images.
Drawing to photo 211:37
Visualize GAN results by running python test.py with data route, model name pre trained, test mode, single images of edges, a u-net 256 architecture and norm equals batch.
Night to day3:23
Learn to convert day images to night using the Pixtopix architecture with a pretrained day-to-night model and test it in a Google Colab workflow for GAN-generated night images.
HOMEWORK and solution0:19
CycleGAN - intuition14:53
Learn how CycleGAN uses unpaired data and two generators with two discriminators to translate between domains, guided by cycle-consistency loss to produce realistic, consistent results.
Change in the dataset URL0:35
Apples and orange dataset7:13
Explore implementing the cycle-gan architecture to convert apples to oranges using a simple dataset, first from scratch and then with PyTorch, on Google Colab with GPU acceleration.
Preprocessing3:35
Learn to implement image preprocessing functions for gan training, including random crop, jitter, resize, flip, and normalization from 0 to 255 to -1 to 1 for training and testing images.
Loading the images7:53
Load the dataset and convert to TensorFlow format, recreate training variables, apply random jitter and normalization, shuffle, and visualize apples and oranges to verify preprocessing.
Generator and discriminator15:47
Learn to build generator and discriminator networks for a cycle-gan using pix2pix with instance normalization and a u-net style generator, including apples-to-oranges experiments and training basics.
Loss function11:20
Implement discriminator and generator losses using binary cross entropy, targeting real and fake images. Apply cycle consistency loss and identity loss with mean absolute error to enforce correct mappings.
Optimizers and checkpoint4:16
Set Adam optimizers with learning rate 0.0002, beta1 0.5, beta2 0.999. Create optimizers for two generators and two discriminators, then set up a checkpoint manager to restore the latest checkpoint.
Training 120:54
implement the cyclegan training step by constructing generators g and f, generating fake images, enforcing cycle consistency and identity loss, and updating discriminators through gradient-based optimization.
Training 2 and results10:51
Train a cycle-gan to translate apples to oranges by iterating ten epochs, calling the training step, generating images with generator G, and saving checkpoints and model weights.
Pretrained CycleGAN with PyTorch4:48
Demonstrate using a pretrained CycleGAN in PyTorch on Google Colab, downloading the horse to zebra model, setting up folders and test images to explore CycleGAN translations.
Horse to zebra4:22
Test the model by running the test script to transform a horse into a zebra and view results in the pre-trained horse to zebra folder; outcomes vary with image complexity.
Style transfer5:38
Apply style transfer with generative adversarial networks to convert images into Monet-like styles using a pretrained model and Colab workflow. Load the model, organize images, run tests, and review results.
Van Gogh, Cezanne and Ukiyo-e styles3:09
Apply style transfer with GANs to transform photos into Van Gogh style artworks. Load the model, process all images, and visualize the results to compare against other artists' styles.
HOMEWORK and solution0:23

SRGAN - intuition11:33
Explore how srgan uses adversarial and content (perceptual) losses to convert low-resolution images into high-resolution outputs, leveraging generator and discriminator dynamics and residual skip connections.
ESRGAN - intuition10:29
Explore ESRGAN, an enhanced SRGAN that boosts image resolution with a residual-in-residual dense block, relativistic average GAN, and perceptual loss before activation for clearer, non-blurred results.
Pretrained model13:57
Explore how to implement esrgan for image super resolution in google colab, loading the pretrained rdb esr gen x4 model with PyTorch and GPU support.
Testing images2:51
Load and test low-resolution images in a Google Colab session, move them to the lr folder, and prepare for super-resolution in the next lecture.
Super resolution11:47
Explore implementing a super resolution function for a GANs workflow, including image loading, preprocessing, model inference on GPU, and post-processing to visualize and save high-resolution results.
Evaluating the results - PSNR10:10
Compare super resolution methods using PSNR to quantify artifact reduction, and choose between ESRGAN-based approaches and Bicubic interpolation based on image quality.
Improving the results8:03
Test and compare real ESRGAN architectures for super-resolution GANs, install dependencies, run inference on images and videos, and observe artifact reduction and state of the art results.
HOMEWORK and solution0:05

ProGAN - intuition9:35
Discover how ProGAN progressively grows generator and discriminator from 4x4 to 1024x1024 to produce higher quality face images. Explore how latent space maps random numbers to controllable face generation.
StyleGAN - intuition8:34
ExploreStyleGAN, a style-based generator for GANs that blends style transfer with adversarial networks, mapping latent z to w and using adaptive instance normalization to control image features.
Pretrained model6:20
This lecture guides setting up Stylegan3 pretrained model in Google Colab, cloning the GitHub repository, configuring GPU accelerator, installing libraries, and loading a ffhq trained model for face generation.
Generating images 17:57
Generate images with a GAN by setting seeds and truncation to vary outputs, using a neural network architecture, and saving results to a folder for visualization in Google Colab.
Generating images 211:45
Load and run a GAN neural network on a CUDA device, generate images from a latent Z vector with truncation, and convert PyTorch outputs to numpy and PIL for viewing.
Generating images 36:48
Generate and visualize images from seeds by creating latent vectors, feeding them to the generator, and displaying results; explore how truncation affects image diversity and similarity.
Interpolation11:15
Learn to interpolate between latent vectors in GANs to morph one image into another. Generate intermediate images, visualize them in a grid, and create a gif of the transitions.
Other pretrained models2:45
Explore other pretrained models, including StyleGAN3, on Met Faces and animal faces to generate portrait-like images, visualize interpolations with GIFs, and compare results.
HOMEWORK and solution0:34

VQGAN + CLIP - intuition13:16
Explore how VQGAN and CLIP fuse convolutional networks with transformers to convert text prompts into images, guided by vector quantization and how CLIP evaluates text-to-image matches.
Warning after lib update0:33
Pretrained model6:29
Begin implementing VQGAN and CLIP to generate images from text prompts, set up Google Colab, install torch and CLIP, download pretrained models, and load VQ helper functions.
GAN settings9:58
configures neural network settings for gan image generation, detailing text prompts, 300 by 300 image size, vq gan image net f16 16384, and seed-based reproducibility.
Visualizing the results8:15
Explore how a gan-based Vqgan and clip integration generates evolving images from prompts, showing training progress from initial outputs to refined visuals after 700–1000 iterations.
Results in videos3:28
Generate a video from a sequence of frames by loading each image and saving with im.save, then visualize it in Google Colab; the lecture emphasizes prompts and testing different models.
HOMEWORK and solution0:12

BigGAN - intuition2:54
Explore BigGAN, a class-conditioned image generator capable of producing images from over 1000 classes, with hinge loss and key architecture like z noise, upsampling, batch normalization, and residual blocks.
Pretrained model6:59
Learn to use a pretrained BigGAN-deep-256 model in Google Colab, clone the repository, install dependencies, load the pretrained ImageNet model, and generate images across 1000 ImageNet classes.
GAN settings13:20
Explore how to use a pre-trained big gan model to generate images from 1000 object categories, fetch category lists from JSON, and apply truncation and one-hot encoding.
Generating new images 16:35
Generate new images with a GAN using a noise vector, a class vector, and truncation, visualize 256 by 256 rgb outputs, and save the results as images.
Generating new images 216:25
Learn to generate new images with a gan by providing a category, seed, and truncation, and study how truncation controls variety across castles, burgers, and flowers, on gpu or cpu.
GFP-GAN to restore old photos2:58
GFP-GAN restores old photos by focusing on face restoration, increasing face resolution without altering facial features. It compares favorably with other methods, and the next lecture starts the implementation.
Pretrained model9:43
Learn to restore old photos with gfpgan in google colab, clone the GitHub repo, install dependencies, and use a pretrained gfpgan v1.3 model for face restoration.
Photo restoration16:24
Explore photo restoration with gfpgan and real ESR gan by running a Python script, restoring faces and backgrounds, and visualizing results in a dedicated results folder.
Boundless for image extension3:25
Explore the boundless gan for image extension, a method that reconstructs missing image regions by adding new pixels and extending cropped areas, with tests at 25/75, 50/50, and 75/25.
Processing the image7:27
Load and preprocess an image for the boundless algorithm in Google Colab, resizing to 257 by 257, converting to numpy, expanding the batch dimension, and normalizing to 0–1 in RGB.
Visualizing the results10:26
Learn to run a GAN-based image extension by loading models and masking pixels. Visualize the original, masked, and generated results to understand how the model guesses missing pixels.
SimSwap for deepfake1:32
Explore the sim swap architecture in GANs to exchange faces between images, maintaining attributes like hair color while transferring a source face to a target image.
Pretrained model11:06
Develop a hands-on guide to performing face swapping with a pretrained model in a deep fakes context, using Google Colab to load checkpoints, install dependencies, and test with two images.
Face swap9:42
Perform a face swap between two images using a prebuilt model, save results to a dedicated directory, adjust crop size and mask options, and compare original and swapped faces.
Additional: GANs in videos0:05

Biological fundamentals5:42
Explore the biological fundamentals of human neural networks, including more than 100 billion interconnected neurons, synapses and electrical signals, and how learning creates new connections that enable behavior and cognition.
Single layer perceptron19:23
Learn how a perceptron uses inputs, weights, a sum function, and a step activation to produce binary outputs, and why it struggles with xor, prompting multi-layer networks.
Multilayer perceptron – sum and activation functions14:20
Explore how a multilayer perceptron uses a hidden layer, weights, and sigmoid activation to perform feed forward computations, solving non linearly separable problems like the xor operator.
Multilayer perceptron – error calculation5:19
Calculate the error, also called the loss function, by subtracting predictions from the expected outputs in a multilayer perceptron, then learn to reduce this error by adjusting weights.
Gradient descent9:49
Learn how gradient descent adjusts neural network weights to minimize error through partial derivatives. Explore local and global minimum, epochs, and using the sigmoid function to compute weight updates.
Delta parameter8:09
Explore delta parameter calculation in gradient-based weight updates, applying sigmoid activation and partial derivatives to update weights from the output layer to the hidden layer, guiding toward the global minimum.
Updating weights with backpropagation14:03
Master backpropagation to update weights in multilayer neural networks, using gradient descent, momentum, and learning rate across hidden and output layers to minimize error.
Bias, error, stochastic gradient descent, and more parameters17:56
Explore bias units, error calculations, and gradient descent variants, then compare activation functions such as sigmoid, tanh, relu, softmax, and linear for classification and regression tasks.

Introduction to convolutional neural networks7:18
Explore how convolutional neural networks drive computer vision tasks such as object detection, object tracking, and facial recognition, enabling autonomous cars to detect pedestrians and road signals by learning features.
Convolutional operator10:04
Explore the convolution operation in a neural network, using a 3x3 kernel to extract features from a 7x7 image, building feature maps via a sliding window and applying ReLU.
Pooling5:28
Pooling in a convolutional neural network highlights main features by applying max pooling over 2x2 windows on feature maps created by feature detectors, reducing to a smaller matrix.
Flattening6:31
Flattening converts the max-pooled feature map into a vector to feed the dense neural network, enabling pre-processing via convolutional layers, feature detectors, kernels, relu, and a sigmoid output.
Dense neural network5:10
Explore dense neural network, step in convolutional networks, which flattens and transforms a matrix into a vector for input layer to classify digits one, three, and nine with learned weights.

Final remarks1:43
Recap the gans complete guide by revisiting basic architectures and applications, from handwritten digits and wgan to pix2pix, cycle-gan, super resolution, stylegan, text-to-image, image restoration, inpainting, and face swapping.
BONUS1:32
Join AI Expert Academy to access all courses with monthly membership, earning certificates while exploring topics like machine learning, deep learning, computer vision, natural language processing, artificial intelligence, and algorithms.

Requirements

Programming logic
Basic Python programming
Knowledge about neural networks is desirable, but not mandatory

Description

GANs (Generative Adversarial Networks) are considered one of the most modern and fascinating technologies within the field of Deep Learning and Computer Vision. They have gained a lot of attention because they can create fake content. One of the most classic examples is the creation of people who do not exist in the real world to be used to broadcast television programs. This technology is considered a revolution in the field of Artificial Intelligence for producing high quality results, remaining one of the most popular and relevant topics.

In this course you will learn the basic intuition and mainly the practical implementation of the most modern architectures of Generative Adversarial Networks! This course is considered a complete guide because it presents everything from the most basic concepts to the most modern and advanced techniques, so that in the end you will have all the necessary tools to build your own projects! See below some of the projects that you are going to implement step by step:

Creating of digits from 0 to 9
Transforming satellite images into map images, like Google Maps style
Convert drawings into high-quality photos
Create zebras using horse images
Transfer styles between images using paintings by famous artists such as Van Gogh, Cezanne and Ukiyo-e
Increase the resolution of low quality images (super resolution)
Generate deepfakes (fake faces) with high quality
Create images through textual descriptions
Restore old photos
Complete missing parts of images
Swap the faces of people who are in different environments

To implement the projects, you will learn several different architectures of GANs, such as: DCGAN (Deep Convolutional Generative Adversarial Network), WGAN (Wassertein GAN), WGAN-GP (Wassertein GAN-Gradient Penalty), cGAN (conditional GAN), Pix2Pix (Image-to-Image), CycleGAN (Cycle-Consistent Adversarial Network), SRGAN (Super Resolution GAN), ESRGAN (Enhanced Super Resolution GAN), StyleGAN (Style-Based Generator Architecture for GANs), VQ-GAN (Vector Quantized Generative Adversarial Network), CLIP (Contrastive Language–Image Pre-training), BigGAN, GFP-GAN (Generative Facial Prior GAN), Unlimited GAN (Boundless) and SimSwap (Simple Swap).

During the course, we will use the Python programming language and Google Colab online, so you do not have to worry about installing and configuring libraries on your own machine! More than 100 lectures and 16 hours of videos!

Who this course is for:

People interested in creating complex applications using GANs
Undergraduate and graduate students who are taking courses on Computer Vision, Artificial Intelligence, Digital Image Processing or Computer Vision
People who want to implement their own projects using Computer Vision techniques
Data Scientists who want to increase their project portfolio

Generative Adversarial Networks (GANs): Complete Guide

What you'll learn

Explore related topics

Course content

Introduction4 lectures • 47min

DCGAN and WGAN18 lectures • 3hr 5min

cGAN - Pix2Pix and CycleGAN36 lectures • 5hr 48min

SRGAN and ESRGAN8 lectures • 1hr 9min

StyleGAN9 lectures • 1hr 6min

VQGAN + CLIP - text to image7 lectures • 42min

Other types of GANs15 lectures • 1hr 59min

Additional content 1: Artificial neural networks8 lectures • 1hr 35min

Additional content 2: Convolution neural networks5 lectures • 35min

Final remarks2 lectures • 3min

Requirements

Description

Who this course is for: