Udemy Business

Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Computer Vision : OCR using Python - GenAI with LLM & RAG

Name: Computer Vision : OCR using Python - GenAI with LLM & RAG
Rating: 4.2 (293 reviews)

Become a Computer Vision Expert & Learn OCR with Tesseract, OpenCV, Deep Learning, GenAI, LLMs, & RAG

Created byVineeta Vashistha

Last updated 3/2025

English

What you'll learn

A quick starter on OCR Architecture, Commercial Solutions and Use Cases in Industry
Learn to implement OCR - Text Detection with OpenCV and Deep Learning Models
Use Tesseract and EasyOCR to implement OCR - Text Recognition
Work with OCR - Text Labelling using Spacy and Regular Expression
Discover the concepts of RAG, its architecture and extract deeper insights from text
Integrating OCR outputs into RAG pipelines for advanced document understanding and information extraction
Build OCR Solutions for Invoice Processing with Text Labelling and XML output & Vehicle Nameplate Recognition
Executable Code of CTPN and EAST Model implementation for Text Detection and Text Recognition
Learn to train Deep Learning Models of CTPN and EAST on ICDAR dataset
Understand the Image Basics and apply it for Image Processing
Use OpenCV and Tesseract to apply Noise Removal Techniques including Thresholding, Rescaling, Dilation, Erosion and Deskewing
Learn to develop web-based applications - Business Card Recognition and KYC Digitization for OCR using Flask

Course content

14 sections • 121 lectures • 8h 39m total length

Learning Path to Become Computer Vision Expert2:35
Course Starter - How to approach the course6:18
Udemy Review1:51

Objectives1:13
Discover the objectives of optical character recognition (ocr), including its architecture, industry solutions, accuracy and pricing, benefits, and momentum across finance, legal, healthcare, and general business.
OCR Overview2:41
OCR Architecture5:03
OCR Solutions8:58
Compare industry OCR solutions, including Tesseract, Abbyy, Google Cloud Vision, and Microsoft Computer Vision, examining accuracy on ordinary invoices versus identity documents, key challenges, and pricing.
OCR Benefits4:52
Discover how OCR reduces costs and boosts productivity by digitizing data, improving accuracy, and speeding document processing while enhancing data security, accessibility, and compliance.
OCR Use Case Across Industry6:43
OCR Starter Quiz

Objectives0:58
Tool Setup - Ubuntu0:28
Tool Setup - Windows2:15
Setup Issues Resolution1:17
Using Google Colab11:17
Create, upload, and run notebooks in Google Colab. Mount Google Drive by GUI or code in Colab and select CPU, GPU, or TPU runtimes with 12-hour limits.
Using Pycharm for Coding6:26
Learn how to set up and use PyCharm for Python coding, including creating projects, configuring Python interpreters and virtual environments, installing packages, running and debugging code with breakpoints.
Using Jupyter Notebook and Shortcuts1:26
Setting up Environment Quiz

Objectives1:12
Pixels and Images3:24
Understand how a digital image is formed from pixels, the smallest picture elements in a 2d grid, with black-and-white, grayscale, and color images using 0–255 rgb channel values.
Image Properties using OpenCV and PIL12:19
Learn image basics by reading images with PIL and OpenCV, convert to arrays, and inspect shape, height, width, and color channels, including grayscale conversion and RGB, HSV, and LAB spaces.
Feature Mapping using Kernel2:27
Feature Map2:21
Image Basics Quiz

Objectives2:20
Text Detection Workflow2:15
Identify and localize text in images through a text detection workflow that pre-processes images to remove noise, then segments content by characters, words, or lines for reliable detection.
Preprocessing for Accuracy Improvement5:52
Noise Removal Techniques (Morphology, Image Blurring, Dilation, Erosion, Deskew)5:50
Master noise removal for OCR preprocessing, covering morphology with kernels, small contour noise removal, image blurring, dilation, erosion, deskew, and border handling to boost accuracy.
Implement Preprocessing Techniques (Adaptive, Otsu Binarisation, Gaussian Blur)24:36
Explore image preprocessing techniques for OCR, including binarisation, adaptive and Otsu thresholding, gaussian blur, rescaling, noise removal, morphology, deskewing, border removal, and padding using OpenCV.
Segmentation of Image Text1:12
Implement Segmentation (Line, Word and Character Level Segmentation)3:33

Objectives2:10
The Need for OCR3:28
Identify why OCR matters in a data-rich world by bridging paper documents and digital systems. Extract text from images to enable document scanning, data extraction, indexing, search, and accessibility.
Benefits of Free and Open Source OCR3:12
Tesseract - The Robust Open-Source OCR Engine6:31
Calamari - A Deep Learning Based OCR Tool3:40
OCRopus - A Deep Dive into Open Source OCR5:46
Comparison of Open-Source OCR Tools2:23
Resources0:05

Objectives1:55
Explore cloud based computer vision APIs and OCR capabilities to extract insights from images and videos, and compare cloud vision services to select the right tool for your project.
The Rise of Cloud-Based Computer Vision4:27
Discover how cloud-based computer vision removes upfront hardware costs, scales with demand, and delivers pre-trained models for detection and classification. See impacts in healthcare, retail, manufacturing, and security.
Introducing Abbyy Cloud2:01
Key Features of Abbyy Cloud3:07
Unveiling Google Cloud Vision6:56
Explore Google Cloud Vision's text detection and document text detection OCR, landmark recognition, and image analysis to extract text, identify landmarks, and derive insights within the GCP ecosystem.
Exploring Azure Computer Vision4:20
The Power of Azure Computer Vision5:20
Choosing the Right Cloud Vision Tool5:44
Cloud Vision Use Cases and Applications5:17
Explore cloud vision use cases across healthcare, retail, manufacturing, and security, highlighting automation, insights, and real-world applications like medical image analysis, product recognition, and facial recognition.
The Future of Cloud Vision3:54
Case Studies of Cloud Vision1:24

Objectives0:31
What is a Neuron?2:02
Neuron Architecture1:25
Explore how a neuron, modeled on the human brain, uses weighted inputs, a bias term, and an activation function to classify iris flowers within an artificial neural network.
Artificial Neural Network3:04
Convolutional Neural Network6:12
Activation Function2:44
Activation functions drive deep learning outputs and training efficiency, acting as gates for each neuron and covering binary step, linear, and non-linear types.
Deep Learning - CTPN Model5:54
Deep Learning - EAST Model5:57
Annotation for OCR0:34
Further Reading - Open Source Text Detection Tools0:18

Requirements

Basic Programming skills in Python

Description

Master OCR with Python and OpenCV: Become a Computer Vision Expert

Unlock the Power of Text Extraction with AI & Generative AI

This comprehensive course will equip you with the skills to:

Build Cutting-Edge OCR Systems: Go beyond traditional OCR with Python and OpenCV. Learn to leverage the power of Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) to create intelligent and accurate text extraction systems.
Master Deep Learning Techniques: Dive into advanced deep learning models like CTPN and EAST for text detection and recognition.
Integrate GenAI for Enhanced OCR: Discover how to integrate Generative AI with LLMs and RAG to improve OCR accuracy, extract insights from unstructured text, and automate complex document processing tasks.
Apply OCR to Real-World Scenarios: Implement OCR solutions for a variety of applications, including document digitization, invoice processing, and more.
Stay Ahead of the Curve: Keep up with the latest advancements in OCR, Computer Vision, LLMs, RAG, and Generative AI.

Key Features:

Hands-On Projects: Gain practical experience with real-world projects, such as invoice processing, KYC digitization, and business card recognition.
Expert Guidance: Learn from experienced instructors who will guide you through every step of the process.
In-Depth Coverage: In-Depth Coverage: Explore a wide range of topics, from fundamental image processing and deep learning to advanced LLM and RAG techniques.
Dedicated Support: Get 24/7 support from our team of experts.
Flexible Learning: Learn at your own pace with self-paced video lessons and downloadable resources.

What You'll Learn:

Fundamental Image Processing: Understand the basics of image processing, including image formats, color spaces, and image manipulation techniques.
Text Detection and Recognition: Master techniques for detecting and recognizing text in images and PDFs.
Deep Learning for OCR: Explore advanced deep learning models like CTPN and EAST for accurate text detection and recognition.
Revolutionize OCR with the power of LLMs and RAG. Learn to build intelligent text extraction systems by mastering LLM fine-tuning, exploring RAG architectures, and seamlessly integrating OCR outputs into advanced AI pipelines.
Data Preprocessing and Augmentation: Prepare your data for training deep learning models.
Model Training and Evaluation: Train and evaluate your models using appropriate metrics.
Deployment Strategies: Deploy your OCR models to production environments.

Why Choose This Course?

Industry-Relevant Skills: Develop highly sought-after skills in OCR, Computer Vision, LLMs, RAG, and Generative AI to advance your career in AI and machine learning
Real-World Applications: Learn how to apply OCR to solve real-world problems.
Flexible Learning: Learn at your own pace with self-paced video lessons and downloadable resources.
Expert Guidance: Benefit from expert instruction and personalized support.
Career Advancement: Gain a competitive edge in the job market with advanced OCR skills.

Enroll Now and Unlock the Power of OCR with GenAI, LLMs, and RAG!

Who this course is for:

Beginners to Computer Vision
OCR Engineer
OCR Specialist
Machine Learning Professionals
Anyone looking to become more employable as a Computer Vision Expert

Computer Vision : OCR using Python - GenAI with LLM & RAG

What you'll learn

Explore related topics

Course content

Course Starter3 lectures • 11min

OCR Starter - OCR Architecture6 lectures • 30min

Setting up Environment - Ubuntu, Windows7 lectures • 24min

Image Basics - Pixels, Kernel, Image Properties5 lectures • 22min

Text Detection - Machine Learning Techniques (Noise Removal, Thresholding)7 lectures • 46min

Exploring Open-Source OCR Tools - Tesseract, Calamari and OCRopus8 lectures • 27min

Cloud Vision Tools - Abbyy Cloud, Google Cloud and Azure Computer Vision11 lectures • 44min

Using OCR for RAG - LLM Pipeline12 lectures • 52min

Introduction to Neural Networks and Text Detection Models10 lectures • 29min

Text Detection & Recognition - EasyOCR, Tesseract, PyTesseract8 lectures • 23min

Requirements

Description

Who this course is for: