Practical Computer Vision: From Zero to Hero over 10 Topics

Name: Practical Computer Vision: From Zero to Hero over 10 Topics
Rating: 5.0 (7 reviews)

Build AI Systems: Sports Analytics with YOLOv8, AI Interior Design with Stable Diffusion & Invoice Parsing with Llama 3

Created byNeuralearn Dot AI

Last updated 1/2026

English

What you'll learn

Build an image classifier with Hugging Face Transformers and deploy it as a FastAPI.
Train a YOLOv8 model for real-time ball detection in images and videos.
Use Grounding DINO for zero-shot player detection without specific training.
Implement DeepSORT to track multiple players consistently throughout a video.
Project player and ball positions onto a 2D map using homography with OpenCV.
Train a YOLOv8-Pose model to detect custom key points on a tennis court.
Generate new interior designs from empty rooms using Stable Diffusion img2img.
Create realistic depth maps from single images with the Depth Anything model.
Produce precise object masks using Grounding DINO and the Segment Anything Model.
Control AI image generation with ControlNet using depth and segmentation maps.
Generate photorealistic influencer images with the advanced FLUX model.
Insert products seamlessly into AI-generated images using the InsertAnything framework.
Extract text from invoices and receipts automatically using PaddleOCR.
Train a Llama 3 model with Unsloth to parse and structure extracted OCR data.
Optimize model performance by converting models to ONNX for faster inference.
Fine-tune large language and vision models efficiently using LoRA adapters.
Deeploying Onnx models with fastapi

Course content

14 sections • 50 lectures • 7h 36m total length

Welcome1:01
Explore practical computer vision across industries like agriculture, automotive, healthcare, manufacturing, real estate, advertising, and retail. Master image classification, segmentation, detection, tracking, ocr with llms, depth estimation.
What you'll learn7:53
Learn image classification and model deployment, build a sports analytics pipeline with Yolov8 and Deepsort, explore advanced generative AI projects, and implement OCR-driven intelligent document processing with LLMs.

Code0:01
Introduction to Image Classification and Data Preparation17:17
Master image classification with hugging face transformers on a plant dataset labeled healthy, powdery, and rusty. Load, split, and transform data from Kaggle, then preprocess with image processors for training.
Modeling and Training10:40
Model and train an image classification model using a pre-trained auto model, define checkpoint and id2label mappings, and tune training arguments for accuracy.
Evaluation and Testing with Gradio10:28
Push your trained plant-disease model to Hugging Face hub, evaluate with a confusion matrix, and test a Gradio-based interface that classifies uploaded leaf images as healthy, powdery, or rusty.

Code0:01
Tennis Video0:09
Explore practical computer vision concepts through a tennis video, applying foundational ideas from the course to analyze motion, scenes, and player actions.
Understanding YOLOv8 Format6:12
Train a YOLOv8 model to detect a tennis ball using a Kaggle dataset, and learn the YOLO label format (class, x center, y center, width, height) with the Ultralytics library.
Loading Dataset and Training YOLOv8 Model6:20
Install kaggle and ultra linux, download and unzip dataset, and configure yaml. Train the yolo v8 x model with 100 epochs, 800 image size, albumentations, and automatic mixed precision.
Running Inference on a single Image9:58
Run inference on a single image by loading a test frame, defining a YOLO model with the best weight, and thresholding detections by confidence to locate the ball.
Running Inference on a full Video12:14
Apply YOLO-based inference to a full video by processing each frame, detecting the ball with bounding boxes, and writing an annotated output video using OpenCV.

Code0:01
Understanding Grounding DINO Model and Zero-Shot Object Detection9:02
Explore zero-shot object detection with grounding dino, enabling open-set recognition via prompt-driven localization, multi-modal reasoning, and contrastive and localization losses to detect novel categories.
Prompting the DINO Model to Get Bounding Boxes11:29
Learn how to prompt the DINO model for zero-shot object detection to obtain bounding boxes, using transformers auto processor and auto model, process an image, post-process outputs, and visualize results.
Carrying Out Inference on a Full Video9:39
Define tennis player detections on a full video, process each frame, generate predictions with bounding boxes and scores, and prepare for tracking.

Code0:01
Drawing the plane20:27
Project a tennis match onto a reference plane using homography, build the projection matrix from four points with OpenCV, and overlay players, ball, and court lines.
Understanding Homography and using it to Project the Pitch on a Plane7:00
Learn how to compute a homography to project a pitch onto a plane and project the player and ball positions using OpenCV's perspective transform to annotate video frames.

Requirements

Basic Python Programming Skills
Fundamentals of Machine Learning
Familiarity with Jupyter Notebooks or Google Colab:
Introduction to Deep Learning & Neural Networks:
Basic knowledge of computer vision concepts

Description

Are you ready to go beyond basic tutorials and build sophisticated, real-world AI systems? While many courses teach you how to classify an image, the real power of AI lies in creating complex, multi-model pipelines that solve challenging problems. This is where the industry is heading, and where the top-tier AI engineers operate.

In fields like sports analytics, AI is revolutionizing how we understand the game by tracking players and projecting plays. In creative industries, generative AI is transforming empty rooms into stunning interior designs and creating photorealistic virtual influencers. In business, AI is automating the tedious task of parsing data from invoices and receipts.

The demand for engineers who can build these advanced, integrated AI solutions is higher than ever. However, learning how to combine state-of-the-art models like YOLOv8, Grounding DINO, Stable Diffusion, and Llama 3 into a single, cohesive application is a skill that few courses teach.

This is that course.

In this comprehensive, project-based journey, we will take you from individual AI concepts to building complete, portfolio-worthy systems. You will learn not just the "how" but the "why," using cutting-edge libraries like Hugging Face, Ultralytics, and PyTorch. We won't just train a model; we will optimize it with ONNX, deploy it with FastAPI. We shall also learn how to fine-tune LLMs efficiently with LoRA and Unsloth.

By the end of this course, you won't just be an AI practitioner; you will be an AI architect, capable of designing and implementing complex solutions that are at the forefront of technology.

You will build incredible, real-world projects, including:

An Automated Sports Analytics System that detects and tracks tennis players and the ball, projecting their movements onto a 2D court map.
An AI Interior Designer that uses Stable Diffusion, ControlNet, segmentation and depth estimation to realistically furnish images of empty rooms.
A Virtual Influencer Generator using the powerful FLUX model to create lifelike people and seamlessly place products in their hands with InsertAnything.
An Intelligent Document Processor that uses OCR to extract text from invoices and a custom-trained Llama 3 model to parse it into structured data.

If you are ready to elevate your career and build the kind of AI applications that define the future, this course is your definitive guide. We are incredibly excited to help you achieve your goals!

This course is offered to you by Neuralearn. We are committed to your success. Your feedback and questions in the forum are vital, and we will be there to support you every step of the way.

Let's start building

Who this course is for:

Software Developers & Engineers
Aspiring AI & Machine Learning Engineers:
Tech Enthusiasts & Hobbyists
Entrepreneurs & Product Managers
Graduate Students & Researchers

Practical Computer Vision: From Zero to Hero over 10 Topics

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 9min

Image Classification with Hugging Face Transformers4 lectures • 38min

Model Deployment3 lectures • 32min

Ball Object Detection with Ultralytics YOLOv86 lectures • 35min

Player Detection with Grounding DINO on Hugging Face4 lectures • 30min

Player Tracking Throughout the Video with DeepSORT3 lectures • 38min

Field Projection with Homography3 lectures • 27min

Key Point Detection with Ultralytics YOLO Pose Estimation Model3 lectures • 22min

Conclusion1 lecture • 8min

Interior Designer (Empty House Filling) with Stable Diffusion (Img2Img)3 lectures • 25min

Requirements

Description

Who this course is for: