
Explore practical GAN architectures, from DCGAN and WGAN to Pix2Pix and Cycle-GAN, applying style transfer, super-resolution, and face generation with Google Colab GPUs.
Explore generative adversarial networks, a 2014 concept from Ian Goodfellow, where a generator and a discriminator create new content across images and media.
Explore how generative adversarial networks pair a generator and a discriminator that compete to produce realistic images from random noise, with the discriminator evaluating real versus fake images during training.
Explore the intuition behind dcgan, the deep convolutional generative adversarial network, using a two-network architecture with generator and discriminator to produce realistic images from random noise.
Learn to implement MNIST handwritten digit generation by loading and preprocessing the MNIST dataset in TensorFlow, normalizing images, and applying mini-batch gradient descent in Colab.
Build the generator that converts 100-dim random noise into a 28x28 grayscale image using dense, reshape, batch normalization, leaky relu, and conv-transpose layers.
Build a convolutional discriminator that accepts 28 by 28 images and outputs real or fake, using 14 by 14 and 7 by 7 feature maps, leaky relu and dropout, dense output.
We define discriminator and generator losses using binary cross-entropy with from_logits=True, labeling real images as ones and fake as zeros, guiding training with Adam optimizers.
Learn to train a generative adversarial network with TensorFlow, using a 256-image batch, 100 epochs, and a 100-d noise vector, updating generator and discriminator with gradient tape.
Visualize GAN results by training a generator with random noise and examining 16 grayscale digits across 100 epochs and 234 batches, with visuals showing progressive improvement.
Explore Wasserstein GANs and the Wasserstein loss to improve training stability, reduce mode collapse, and alleviate vanishing gradients through the critic and earthmover distance.
Explore the intuition of the WGAN, comparing real and fake image distributions using the Wasserstein distance, and address gradient issues, weight clipping, and the gradient penalty for better quality.
Explore how WGAN-GP replaces weight clipping with a gradient penalty to enforce Lipschitz constraints, stabilizing training by penalizing the gradient norm on real, generated, and interpolated samples.
Prepare the environment to implement WGAN GP in Colab, load MNIST data with TensorFlow 2.12, normalize images to -1 to 1, and initialize generator and discriminator.
Define loss generator and loss discriminator for Wasserstein loss, comparing discriminator outputs on real and fake images, and apply gradient penalty with lambda per WGAN GP.
Implement the gradient penalty for a GAN by interpolating real and fake images with epsilon, computing the gradient norm minus one squared via tf gradient tape.
Train a GAN with 30 epochs and a 100 dimensional noise vector, generating 16 images per seed, with the discriminator updated three times per generator update under a WGAN-GP penalty.
Finish the wgan-gp implementation and compare the results with dcgans, training for 30 epochs and saving and visualizing generated images.
Explore conditional generative adversarial networks (cGANs) that pair latent vectors with class information to guide the generator and discriminator, enabling targeted images and concepts like pix2pix and cyclegan.
Explore the Pixtopix architecture for image-to-image translation with a conditional gan, using paired datasets, a generator and discriminator, and training with sigmoid cross-entropy and l1 loss.
Explore pix2pix style image-to-image translation by converting real street images to map-like images using the maps dataset in Google Colab, and train the generator to produce maps.
Implement preprocessing to split each image into left and right halves, decode and resize to 256x512, convert to float32, normalize by 255, and test visualization for the pixtopix architecture.
Continue building the gan data pipeline by counting training images, filtering jpgs, resizing to 256 by 256, normalizing to -1 to 1, and applying random crop, jitter, and optional flips.
Load and preprocess training and testing image datasets in TensorFlow, applying resizing, normalization, and random jitter, then shuffle and batch them for GAN training.
Learn to build the generator for GANs with a modified u-net design, featuring encoder and decoder blocks, skip connections, convolutional layers, batch normalization, leaky relu, and upsampling for Pixtopix.
Build a generator by stacking eight encoder blocks and eight decoder blocks into a u-net like structure, using 256×256×3 inputs and skip connections.
Continue building the generator, refining the encoder–decoder and Unet, and implement the generator loss combining GAN loss with an L1 loss and fixed learning rate and lambda.
Learn how to build a patchgan discriminator that classifies each image patch as real or fake, using two inputs (original and transformed), concatenation, and a 30 by 30 output.
Finish implementing the discriminator to process original and transformed images as patch outputs, compute real and fake losses via sigmoid cross-entropy, and configure Adam optimizers and checkpoints for GAN training.
Implement a function to generate images using the generator in a generative adversarial network, feeding test inputs and the real image as ground truth, and visualize results during training.
Train a generative adversarial network end-to-end by generating fake images with the generator, evaluating them with the discriminator against real images, and applying GAN and L1 losses.
Finish the pixtopix training pipeline by implementing the training loop, visualization with tensorboard, and checkpointing, then evaluate generator and discriminator losses across steps.
Load and use a pre-trained pix2pix model with PyTorch in Google Colab, avoiding training time while exploring cycle-gan and pix2pix architectures for image translation.
Download and extract the facades dataset from the provided url, then explore train, test, and validation folders and pair edges with real images to train a generator.
Test a pretrained pix2pix GAN on facades dataset in the B to A direction, generating real A, real B, and fake B, and visualize results with the latest net G.
Learn how to translate drawings to photos using the edges to shows model in a PyTorch workflow, including setting up directories, downloading the pre-trained model, and loading test images.
Visualize GAN results by running python test.py with data route, model name pre trained, test mode, single images of edges, a u-net 256 architecture and norm equals batch.
Learn to convert day images to night using the Pixtopix architecture with a pretrained day-to-night model and test it in a Google Colab workflow for GAN-generated night images.
Learn how CycleGAN uses unpaired data and two generators with two discriminators to translate between domains, guided by cycle-consistency loss to produce realistic, consistent results.
Explore implementing the cycle-gan architecture to convert apples to oranges using a simple dataset, first from scratch and then with PyTorch, on Google Colab with GPU acceleration.
Learn to implement image preprocessing functions for gan training, including random crop, jitter, resize, flip, and normalization from 0 to 255 to -1 to 1 for training and testing images.
Load the dataset and convert to TensorFlow format, recreate training variables, apply random jitter and normalization, shuffle, and visualize apples and oranges to verify preprocessing.
Learn to build generator and discriminator networks for a cycle-gan using pix2pix with instance normalization and a u-net style generator, including apples-to-oranges experiments and training basics.
Implement discriminator and generator losses using binary cross entropy, targeting real and fake images. Apply cycle consistency loss and identity loss with mean absolute error to enforce correct mappings.
Set Adam optimizers with learning rate 0.0002, beta1 0.5, beta2 0.999. Create optimizers for two generators and two discriminators, then set up a checkpoint manager to restore the latest checkpoint.
implement the cyclegan training step by constructing generators g and f, generating fake images, enforcing cycle consistency and identity loss, and updating discriminators through gradient-based optimization.
Train a cycle-gan to translate apples to oranges by iterating ten epochs, calling the training step, generating images with generator G, and saving checkpoints and model weights.
Demonstrate using a pretrained CycleGAN in PyTorch on Google Colab, downloading the horse to zebra model, setting up folders and test images to explore CycleGAN translations.
Test the model by running the test script to transform a horse into a zebra and view results in the pre-trained horse to zebra folder; outcomes vary with image complexity.
Apply style transfer with generative adversarial networks to convert images into Monet-like styles using a pretrained model and Colab workflow. Load the model, organize images, run tests, and review results.
Apply style transfer with GANs to transform photos into Van Gogh style artworks. Load the model, process all images, and visualize the results to compare against other artists' styles.
Explore how srgan uses adversarial and content (perceptual) losses to convert low-resolution images into high-resolution outputs, leveraging generator and discriminator dynamics and residual skip connections.
Explore ESRGAN, an enhanced SRGAN that boosts image resolution with a residual-in-residual dense block, relativistic average GAN, and perceptual loss before activation for clearer, non-blurred results.
Explore how to implement esrgan for image super resolution in google colab, loading the pretrained rdb esr gen x4 model with PyTorch and GPU support.
Load and test low-resolution images in a Google Colab session, move them to the lr folder, and prepare for super-resolution in the next lecture.
Explore implementing a super resolution function for a GANs workflow, including image loading, preprocessing, model inference on GPU, and post-processing to visualize and save high-resolution results.
Compare super resolution methods using PSNR to quantify artifact reduction, and choose between ESRGAN-based approaches and Bicubic interpolation based on image quality.
Test and compare real ESRGAN architectures for super-resolution GANs, install dependencies, run inference on images and videos, and observe artifact reduction and state of the art results.
Discover how ProGAN progressively grows generator and discriminator from 4x4 to 1024x1024 to produce higher quality face images. Explore how latent space maps random numbers to controllable face generation.
ExploreStyleGAN, a style-based generator for GANs that blends style transfer with adversarial networks, mapping latent z to w and using adaptive instance normalization to control image features.
This lecture guides setting up Stylegan3 pretrained model in Google Colab, cloning the GitHub repository, configuring GPU accelerator, installing libraries, and loading a ffhq trained model for face generation.
Generate images with a GAN by setting seeds and truncation to vary outputs, using a neural network architecture, and saving results to a folder for visualization in Google Colab.
Load and run a GAN neural network on a CUDA device, generate images from a latent Z vector with truncation, and convert PyTorch outputs to numpy and PIL for viewing.
Generate and visualize images from seeds by creating latent vectors, feeding them to the generator, and displaying results; explore how truncation affects image diversity and similarity.
Learn to interpolate between latent vectors in GANs to morph one image into another. Generate intermediate images, visualize them in a grid, and create a gif of the transitions.
Explore other pretrained models, including StyleGAN3, on Met Faces and animal faces to generate portrait-like images, visualize interpolations with GIFs, and compare results.
Explore how VQGAN and CLIP fuse convolutional networks with transformers to convert text prompts into images, guided by vector quantization and how CLIP evaluates text-to-image matches.
Begin implementing VQGAN and CLIP to generate images from text prompts, set up Google Colab, install torch and CLIP, download pretrained models, and load VQ helper functions.
configures neural network settings for gan image generation, detailing text prompts, 300 by 300 image size, vq gan image net f16 16384, and seed-based reproducibility.
Explore how a gan-based Vqgan and clip integration generates evolving images from prompts, showing training progress from initial outputs to refined visuals after 700–1000 iterations.
Generate a video from a sequence of frames by loading each image and saving with im.save, then visualize it in Google Colab; the lecture emphasizes prompts and testing different models.
Explore BigGAN, a class-conditioned image generator capable of producing images from over 1000 classes, with hinge loss and key architecture like z noise, upsampling, batch normalization, and residual blocks.
Learn to use a pretrained BigGAN-deep-256 model in Google Colab, clone the repository, install dependencies, load the pretrained ImageNet model, and generate images across 1000 ImageNet classes.
Explore how to use a pre-trained big gan model to generate images from 1000 object categories, fetch category lists from JSON, and apply truncation and one-hot encoding.
Generate new images with a GAN using a noise vector, a class vector, and truncation, visualize 256 by 256 rgb outputs, and save the results as images.
Learn to generate new images with a gan by providing a category, seed, and truncation, and study how truncation controls variety across castles, burgers, and flowers, on gpu or cpu.
GFP-GAN restores old photos by focusing on face restoration, increasing face resolution without altering facial features. It compares favorably with other methods, and the next lecture starts the implementation.
Learn to restore old photos with gfpgan in google colab, clone the GitHub repo, install dependencies, and use a pretrained gfpgan v1.3 model for face restoration.
Explore photo restoration with gfpgan and real ESR gan by running a Python script, restoring faces and backgrounds, and visualizing results in a dedicated results folder.
Explore the boundless gan for image extension, a method that reconstructs missing image regions by adding new pixels and extending cropped areas, with tests at 25/75, 50/50, and 75/25.
Load and preprocess an image for the boundless algorithm in Google Colab, resizing to 257 by 257, converting to numpy, expanding the batch dimension, and normalizing to 0–1 in RGB.
Learn to run a GAN-based image extension by loading models and masking pixels. Visualize the original, masked, and generated results to understand how the model guesses missing pixels.
Explore the sim swap architecture in GANs to exchange faces between images, maintaining attributes like hair color while transferring a source face to a target image.
Develop a hands-on guide to performing face swapping with a pretrained model in a deep fakes context, using Google Colab to load checkpoints, install dependencies, and test with two images.
Perform a face swap between two images using a prebuilt model, save results to a dedicated directory, adjust crop size and mask options, and compare original and swapped faces.
Explore the biological fundamentals of human neural networks, including more than 100 billion interconnected neurons, synapses and electrical signals, and how learning creates new connections that enable behavior and cognition.
Learn how a perceptron uses inputs, weights, a sum function, and a step activation to produce binary outputs, and why it struggles with xor, prompting multi-layer networks.
Explore how a multilayer perceptron uses a hidden layer, weights, and sigmoid activation to perform feed forward computations, solving non linearly separable problems like the xor operator.
Calculate the error, also called the loss function, by subtracting predictions from the expected outputs in a multilayer perceptron, then learn to reduce this error by adjusting weights.
Learn how gradient descent adjusts neural network weights to minimize error through partial derivatives. Explore local and global minimum, epochs, and using the sigmoid function to compute weight updates.
Explore delta parameter calculation in gradient-based weight updates, applying sigmoid activation and partial derivatives to update weights from the output layer to the hidden layer, guiding toward the global minimum.
Master backpropagation to update weights in multilayer neural networks, using gradient descent, momentum, and learning rate across hidden and output layers to minimize error.
Explore bias units, error calculations, and gradient descent variants, then compare activation functions such as sigmoid, tanh, relu, softmax, and linear for classification and regression tasks.
Explore how convolutional neural networks drive computer vision tasks such as object detection, object tracking, and facial recognition, enabling autonomous cars to detect pedestrians and road signals by learning features.
Explore the convolution operation in a neural network, using a 3x3 kernel to extract features from a 7x7 image, building feature maps via a sliding window and applying ReLU.
Pooling in a convolutional neural network highlights main features by applying max pooling over 2x2 windows on feature maps created by feature detectors, reducing to a smaller matrix.
Flattening converts the max-pooled feature map into a vector to feed the dense neural network, enabling pre-processing via convolutional layers, feature detectors, kernels, relu, and a sigmoid output.
Explore dense neural network, step in convolutional networks, which flattens and transforms a matrix into a vector for input layer to classify digits one, three, and nine with learned weights.
Recap the gans complete guide by revisiting basic architectures and applications, from handwritten digits and wgan to pix2pix, cycle-gan, super resolution, stylegan, text-to-image, image restoration, inpainting, and face swapping.
Join AI Expert Academy to access all courses with monthly membership, earning certificates while exploring topics like machine learning, deep learning, computer vision, natural language processing, artificial intelligence, and algorithms.
GANs (Generative Adversarial Networks) are considered one of the most modern and fascinating technologies within the field of Deep Learning and Computer Vision. They have gained a lot of attention because they can create fake content. One of the most classic examples is the creation of people who do not exist in the real world to be used to broadcast television programs. This technology is considered a revolution in the field of Artificial Intelligence for producing high quality results, remaining one of the most popular and relevant topics.
In this course you will learn the basic intuition and mainly the practical implementation of the most modern architectures of Generative Adversarial Networks! This course is considered a complete guide because it presents everything from the most basic concepts to the most modern and advanced techniques, so that in the end you will have all the necessary tools to build your own projects! See below some of the projects that you are going to implement step by step:
Creating of digits from 0 to 9
Transforming satellite images into map images, like Google Maps style
Convert drawings into high-quality photos
Create zebras using horse images
Transfer styles between images using paintings by famous artists such as Van Gogh, Cezanne and Ukiyo-e
Increase the resolution of low quality images (super resolution)
Generate deepfakes (fake faces) with high quality
Create images through textual descriptions
Restore old photos
Complete missing parts of images
Swap the faces of people who are in different environments
To implement the projects, you will learn several different architectures of GANs, such as: DCGAN (Deep Convolutional Generative Adversarial Network), WGAN (Wassertein GAN), WGAN-GP (Wassertein GAN-Gradient Penalty), cGAN (conditional GAN), Pix2Pix (Image-to-Image), CycleGAN (Cycle-Consistent Adversarial Network), SRGAN (Super Resolution GAN), ESRGAN (Enhanced Super Resolution GAN), StyleGAN (Style-Based Generator Architecture for GANs), VQ-GAN (Vector Quantized Generative Adversarial Network), CLIP (Contrastive Language–Image Pre-training), BigGAN, GFP-GAN (Generative Facial Prior GAN), Unlimited GAN (Boundless) and SimSwap (Simple Swap).
During the course, we will use the Python programming language and Google Colab online, so you do not have to worry about installing and configuring libraries on your own machine! More than 100 lectures and 16 hours of videos!