
In the video, we introduce ComfyUI as a powerful open-source, node-based application for generative AI, capable of creating images, videos, 3D models and music. I strongly recommend installing the portable version from GitHub because generative AI is very resource-intensive. While a Windows installer is easier to set up, the portable version makes reconfigurations and understanding how it works in the file system much simpler in the long run. These are skills that you will need.
Essential System Requirements for ComfyUI
Generating AI assets on your computer is very resource-intensive. So, you should have a good quality minimum hardware setup:
Graphics Card (GPU): A minimum of an Nvidia 30 series is recommended, with a 40 series (like the Nvidia GeForce 4060Ti used in the tutorial) or preferably a 50 series for a more enjoyable experience. Do not use a 10 or 20 series as your experience will not be enjoyable at all. High-quality graphics cards with a lot of VRAM are crucial.
Storage (Disk Space): You will likely need hundreds of gigabytes of storage, potentially using up to 500 GB very quickly for AI models and generated assets. A dedicated drive, like a 2 TB disk, is highly recommended to avoid issues with your C drive filling up and affecting the performance of your operating system. The pre-built ComfyUI zip file alone is nearly 2 GB, and extracted it's 6.14 GB, with subsequent models also being multiple gigabytes.
RAM (Memory): A minimum of 32 GB of RAM is suggested, with 64 GB being ideal. All downloaded files and models will be loaded into memory, making sufficient RAM important. If you lack significant RAM, disk space, and a good graphics card, using AI on your own computer might not be an enjoyable experience.
Video Timings
00:00 Install Comfy UI from GitHub, avoiding installers.
00:30 Check hardware specs: 8GB VRAM minimum, 32GB RAM recommended.
01:15 Install on dedicated disk due to large storage requirements.
02:45 Download portable version matching your specific graphics card.
03:30 Extract the 2 GB download onto the dedicated installation drive.
04:15 Start Comfy UI using the Run Nvidia GPU batch file.
05:00 Default workflow displayed upon successful start in the browser.
07:00 Verify VRAM and CUDA version in the server console window.
We will setup a very basic image generation workflow to familiarize ourselves with the UI and the process of installing a model checkpoint.
Load Checkpoint : Loads a checkpoint model (e.g., SD 1.5).
KSampler : The denoising engine. Uses the prompt, noise, and model to iteratively generate an image in latent space.
VAE Decode : Variational Autoencoder. Converts the latent image into a visible RGB image.
Save Image : Saves the final generated image to disk.
CLIP Text Encode (Positive Prompt) : Encodes your main text prompt into a format the model can use.
CLIP Text Encode (Negative Prompt) : Encodes undesired elements (e.g., "blurry, distorted") to help the model avoid them.
Empty Latent Image : Creates an initial noise image (latent space) of the desired resolution.
Video Timings
00:00 Begin basic Stable Diffusion 1.5 workflow by clearing the workspace.
01:30 Download, install, and load the SD 1.5 Checkpoint model.
03:00 Review checkpoint components and low VRAM data types.
05:00 Connect model to K Sampler, VAE Decode, and Preview Image.
06:30 Add CLIP Text Encode nodes for positive and negative prompts.
08:00 Resolve latent image input error using Empty Latent Image.
10:30 Use Save Image node for file output and sharing workflows.
13:00 Implement LCM LoRa to speed up generation for low VRAM.
We will experiment with a text to image (T2I) workflow and learn about the KSampler.
We will use Stable Diffusion 1.5 since it is fast and will work on most GPUs.
It works best with short, clear prompts and simple concepts, and it has a natural, realistic visual style.
We will use an image to image technique that is easy to setup and good for reducing randomness when creating new images based on existing images.
Inpainting in AI image generation refers to the process of filling in or modifying specific parts of an image using a generative model.
You provide a base image with a mask that indicates the area to change, and a prompt describing the desired content to replace it with.
Inpainting is useful for more precision when removing objects, changing details, or even restoring damaged images.
Outpainting in AI image generation refers to the process of extending an image beyond its original borders using a generative model.
Given an existing image and a prompt, the model predicts and fills in new visual content that seamlessly matches the original style, lighting, and context.
In this lesson, we will experiment with the ImageCompositeMasked node.
This will allow us to overlay an image over another.
We will neaten the appearance of a workflow using subgraphs, nested subgraphs and customise slots.
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI.
The ComfyUI-Manager is optional. It comes pre installed with ComfyUI when using the main installer package from the ComfyUI website, however if you've installed the portable version, as recommended in this course, then this will require some more manual configuration.
We will use the ComfyUI-Manager to install, update, delete third party custom nodes and models more easily.
It also has a search option which will help in discovering new nodes that we might want to try out.
ESRGAN is an abbreviation for Enhanced Super-Resolution Generative Adversarial Network.
Enhanced – It's an improvement over earlier SRGAN (Super-Resolution GAN) models.
Super-Resolution – Refers to the task of increasing image resolution (e.g., 256×256 → 1024×1024) while preserving or enhancing detail.
Generative Adversarial Network (GAN) – A type of machine learning model where two networks (a generator and a discriminator) compete to produce high-quality, realistic outputs.
ESRGAN uses deep learning to sharpen, upscale, and restore images, often adding plausible detail during the process.
In this video, we will do a quick overview of popular image generation models such as SD 1.5, SD 2.1, SDXL, SD 3.5, Flux Dev, Flux Schnell, DreamShaper 8, Dreamshaper XL and AbsoluteReality.
There is no one model that is best at everything, they are all trained on different data sets, and there are many models available. But you can experiment to find a compromise between acceptable speed and quality.
We can get much better results by upscaling a KSampler output latent and running it through another KSampler again.
This technique is an alternative to using ESRGAN, but where it stands out the most is bringing out the best in SD1.5 based models.
Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it.
The base `svd.safetensors` model was trained to generate 14 frames at 1024x576.
`svd_xt.safetensors` was trained to generate 25 frames at 1024x576.
`svd_xt_1_1.safetensors` is a more finely tuned version of img2vid-xt.
We will smooth the framerate appearance of these animated images.
We will use the "Canny" node to create an outline of an image and then generate a new image based on that outline.
Then Canny Edge Detection algorithm, was developed by John F. Canny in 1986. It's a widely-used technique in computer vision to detect edges (outlines) in images.
The Canny edge detector is best suited for images with clear structure, strong contrasts, and well-defined shapes.
00:00 Canny ControlNet creates outlines for new images.
00:30 Canny edge algorithm detects outlines, best for clear structures.
01:00 In Comfy UI, connect Canny output to Apply ControlNet.
01:30 Use SD1.5 scribble ControlNet for initial image generation.
02:00 Generate reliable 1024x1024 images using ControlNet.
02:30 Experiment with various checkpoints for different output styles.
03:00 Canny on house image; adjust thresholds for outline detail.
03:30 SDXL checkpoints require specific Lora Canny ControlNet.
04:00 Flux needs Flux Canny ControlNet and strength adjustment.
04:30 Pre-generated Canny images can be reused for new creations.
The Depth ControlNet will prioritise 3D depth information in the image more than shapes and outlines detected when using Canny.
It can produce much more creative styles since it is only interested in the 3D shape, and not any patterns or lines detected on the surfaces.
Since it is 3D aware, it can also produce better lighting, shadows and perspective.
The Depth ControlNet is most often used in architectural and interior design image generation.
We will extract the poses from some existing images and use them to guide the generation of new images.
The name "IP-Adapter" stands for Image Prompt Adapter.
IP-Adapters enable the use of images to influence the style, composition and other specific details of the generated output.
We will use the IP-Adapter to keep the styling of the original image when applying a pose.
Using the OpenPose ControlNet with Flux Dev & Schnell.
We need to use a Flux compatible ControlNet model.
00:00 Learn to use OpenPose ControlNet with Flux for image generation.
00:30 Download the large Flux Union ControlNet model.
00:55 Rename the model and copy it into the ControlNet folder.
01:20 Drag the prepared workflow into Comfy UI.
01:45 Crucially set end percent very low to avoid OpenPose remnants.
02:30 Adjust strength to balance prompt versus image influence.
03:10 Use Flux Schnell or Flux Dev (with 20+ steps) for generation.
03:40 Demonstrate the workflow using multiple poses simultaneously.
04:15 Create multi-pose images from photos or existing images.
04:45 Successfully generate consistent models from complex multi-pose inputs.
Using the IP-Adapter with Flux Dev & Schnell.
Video Outline
Introduces IP Adapter with Flux for image-guided generation.
Explains installing the ComfyUI IP adapter Flux custom node.
Guides downloading and placing the `ip_adapter.bin` model file.
Shows improving text prompts for better initial results.
Demonstrates integrating the IP adapter into the workflow.
Mentions first-time download of additional 3.5GB SIGLIP files.
Notes GPU/CPU options for Flux model loading.
Shows how IP adapter improves image styling significantly.
Advises using Flux-generated images as input for best results.
Explores generating images with or without an open pose.
We can create ControlNets from Videos.
In this lesson we will create a few ControlNets from videos so that we can use them in later lessons to help guide the video generation process.
AnimateDiff will allow us to turn our static images and text prompts into animated videos by generating a sequence of images that transition smoothly.
We will use the ComfyUI-AnimateDiff-Evolved custom node. This is an improved version from the original ComfyUI-AnimateDiff
AnimateDiff-Evolved will allow us more control over the motion in our videos, and we can create longer videos than what we saw with just Stable Video Diffusion alone.
We will start learning about AnimateDiff by using SD1.5 models.
In this lesson we will improve our AnimateDiff workflow with the ComfyUI-VideoHelperSuite, and then merge three of our videos into one.
While there are nodes in ComfyUI to load videos, extract images, audio, FPS, do further operations on the videos such as ESRGAN and frame interpolation, save them in multiple formats, the VideoHelperSuite is a toolset that can do many of these things plus much more.
We can,
convert between the MP4, WEBM, MKV, GIF and MOV formats,
force frame rates, skip frames, cap frames, select every nth frame, and resize maintaining aspect ratios,
load audio and seek,
combine videos and image sequences, reverse, loop, ping-pong, amend metadata, change pixel formats,
convert frames to latents, apply prompt and decode back into another video,
preview videos without saving them each time,
and more
Enhancing AnimateDiff to work with Stable Video Diffusion.
LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 30 FPS videos at a 1216×704 resolution faster than they can be watched.
We will create a LTXV image to video workflow and discuss some of its options.
We will use the LTXV In Context LoRA Pose Control model to add pose context while generating a video.
We will experiment with Flux Kontext.
FLUX.1 Kontext is an advanced image-editing model which allows you to edit or generate images using both text and image inputs, while preserving character/object consistency.
We will experiment with using multiple guiding images with Flux Kontext.
There are a few tricks to make it work better and we will learn them.
We will use the GGUF quantised Wan2.2 models.
We will create a short 20s looping video using I2V and FLF2V techniques.
Qwen Edit is another great imaging editing model.
We will experiment with changing camera positions and angles.
A ComfyUI custom node used to get one line of string from a multiline string.
Useful for writing lots of different prompts and have ComfyUI run all of them.
For S2V workflow use the WanSoundImageToVideo and WanSoundImageToVideoExtend nodes.
We will experiment with ref_image, control_video and ref_motion.
VACE : All-in-One Video Creation and Editing
Think of it like inpainting, but for video.
We can also use Wan Animate as an alternative to VACE for character replacement.
There are different considerations, and behaviours, but character consistency can often produce better results.
This lesson carries on from the VACE lesson, and will focus where there are differences in setup and usage.
FLUX.2 Klein is a great model for both T2I and I2I editing.
It is very fast with speeds comparable to FLUX.1 Schnell.
Use invitation code "SBCODE" instead of "qwenday".
We will create our own character LoRA using Qwen models.
There are LoRAs for many purposes. We will experiment with a LoRA that can copy the lighting style from one image and apply it to another.
A good model to use for a virtual try-on workflow is Qwen Edit 2511.
We will discuss LTXV T2VA, I2VA, FL2VA & I2VCA workflows.
Running some workflows on your personal computer can be very resource intensive.
If you computer is not able to run some of the workflows in this course, then one option you have is to use a cloud hosted ComfyUI.
I show how to setup a ComfyUI instance with persistent storage and an RTX5090 GPU using Runpod.
If using a 10 Series NVidia GPU, modern generative AI will not be an enjoyable experience.
Many of the earlier lessons in this course use the Stable Diffusion 1.5 Pruned EMAOnly FP16 model.
If you have a 10 series Nvidia card, then you will be very limited in choices, since many AI models are released in FP16 format.
Learn to generate high-quality images and videos using ComfyUI, a powerful visual interface built around Stable Diffusion and many other popular AI models. Whether you're a digital artist, content creator, creative developer, or AI enthusiast, this course will show you how to turn your ideas into stunning visuals - with no coding required.
This hands-on course walks you through the essentials of ComfyUI, a node-based system that gives you full control over the generative process.
You'll start with the basics of,
text-to-image generation,
then move into more advanced workflows like image-to-image,
Inpainting,
Outpainting,
compositing image layers together,
installing the ComfyUI manager,
improving resolution and quality using ESRGAN,
comparing various model checkpoints,
Flux Schnell, Dev & Kontext,
Stable Video Diffusion,
frame interpolation for improving videos and motion sequences,
Canny, Depth and OpenPose ControlNets
IP-Adapters for SD1.5, SDXL and Flux
ControlNets from Videos
AnimateDiff
Video Helper Suite
ControlNeXt SVD
LTXV Text to Video, Image 2 Video & IC Pose
Wan 2.2 T2V, I2V, FLF2V, VACE Video Editing, Animate Character Replacement and S2V Lip Syncing
Multiple Camera Angles with Qwen Edit
Character LoRA Creation
Editing Lighting
LTXV2.3 T2VA, I2VA, FL2VA, I2VCA
You'll gain a solid understanding of how different nodes interact - including samplers, models, prompts, and schedulers - and how to combine them for powerful creative outputs. Along the way, we'll cover best practices for exporting assets for use in creative or commercial projects.
By the end of the course, you'll be able to confidently design and execute complete image and video workflows in ComfyUI.
This course is perfect for learners who want creative control without writing code, and who are ready to move beyond "prompt-only" AI tools into building custom visual workflows that are fast, flexible, and future-ready.