Optical Character Recognition (OCR) in Python

Name: Optical Character Recognition (OCR) in Python
Rating: 4.7 (442 reviews)

OpenCV, Tesseract, EasyOCR and EAST applied to images and videos! Create your own OCR from scratch using Deep Learning!

Highest Rated

Created byJones Granatyr, Gabriel Alves, AI Expert Academy

Last updated 4/2023

English

What you'll learn

Use Tesseract, EAST and EasyOCR tools for text recognition in images and videos
Understand the differences between OCR in controlled and natural environments
Apply image pre-processing techniques to improve image quality, such as: thresholding, inversion, resizing, morphological operations and noise reduction
Use EAST architecture and EasyOCR library for better performance in natural scenes
Train an OCR from scratch using Deep Learning and Convolutional Neural Networks
Application of natural language processing techniques in the texts extracted by OCR (word cloud and named entity recognition)
License plate reading

Course content

13 sections • 95 lectures • 12h 58m total length

Course content12:57
Learn optical character recognition in Python using Google Colab. Build end-to-end OCR pipelines from image preprocessing to text detection and recognition, including thresholding, noise removal, and neural networks.
Introduction to OCR6:37
Explore how OCR differs between controlled scenarios and natural scenes, and learn how preprocessing, image quality, and layout affect text detection and extraction.
Course materials0:11

Introduction to Tesseract12:09
Learn about Tesseract, a widely used OCR engine, and how it converts images to text using neural networks and preprocessing, with support for many languages.
Preparing the environment7:56
Learn optical character recognition in Python by using Google Colab, installing the OCR library, and preparing images for text extraction with OpenCV. Next lesson covers sending the results to Tesseract.
First text recognition2:28
Perform the first ocr test in Python by loading an image, calling the ocr function from a library, and capturing and printing the recognized text as a string.
Support for other languages10:46
Learn how to enable multi-language OCR in Python by installing language packages (Portuguese), configuring paths, and validating results in Colab.
Page segmentation mode (PSM)11:02
Learn to configure page segmentation modes (psm) for OCR in Python, mapping modes to single blocks, lines, or words, and test results with language settings.
Page orientation detection4:35
Detect page orientation for optical character recognition in Python by loading a book page image with OpenCV, and determining orientation in degrees and the page language (Portuguese).
Selection of texts 19:06
select texts from an image using optical character recognition, draw bounding boxes around detected words, and review confidences and the block structure (page, block, paragraph, line, word) to extract results.
Selection of texts 214:28
Draw bounding boxes around detected texts by filtering high-confidence results and display each recognized text above its box, using the detection results dictionary and OpenCV in Python.
Selection of texts 310:21
Apply advanced OCR in Python to improve text extraction from larger images, handle Portuguese text, and display results by drawing bounding boxes and labeled text using a custom font.
Search using regular expressions11:14
Use optical character recognition (OCR) to extract text from images and identify dates with regular expressions in Python, demonstrated on a Brazilian Portuguese bank statement.
Detections in natural scenarios6:28
Learn how OCR detects text in natural scenes using neural networks, addressing false positives, and applying filters by confidence and text length to draw accurate bounding boxes.

Grayscale7:22
Apply grayscale preprocessing to convert color images from bgr to a single channel, improving image quality for ocr while reducing data size and computation.
Thresholding - intuition12:22
Explore thresholding techniques for binarization and image segmentation to extract text, including simple global thresholds, histogram-based methods, and adaptive thresholding with mean and gaussian calculations.
Simple thresholding6:36
Apply simple thresholding as a pre-processing step for OCR by converting to grayscale and applying OpenCV binary thresholding with a chosen value (e.g., 127) to produce a clear text image.
Thresholding with Otsu method6:36
Learn to implement the orzo methods for thresholding in python OCR, use histograms to find thresholds on grayscale images, and compare simple thresholds with orzo thresholds while visualizing results.
Adaptive thresholding6:27
Explore adaptive thresholding in Python for OCR, comparing mean and Gaussian methods with a global threshold, using grayscale conversion, block size 11, and the C parameter to handle uneven illumination.
Gaussian adaptive thresholding4:47
We implement gaussian adaptive thresholding to address uneven lighting in images, preprocess to grayscale, apply adaptive thresholds, and compare results with adaptive averaging in Python.
Color inversion4:37
Learn color inversion for OCR in Python by converting images to grayscale, applying 255 minus gray and thresholded inversion, to produce black text on white and improve recognition.
Resizing - intuition5:31
Explore the intuition of image resizing for OCR, and how scale factors affect x and y dimensions while comparing interpolation methods like nearest, linear, area, and lanczos.
Resizing - implementation5:37
Resize images with OpenCV by adjusting fx and fy and using cubic interpolation to enlarge or reduce images, showing 1.5 and 0.5 scales for OCR.
Morphological operations - intuition3:48
Explore morphological operations that remove noise, detect edges, and improve image quality in binary images. Understand erosion, dilation, and the opening and closing sequences that reshape white regions.
Morphological operations - implementation10:56
Implement erosion, deletion, opening, and closing to remove noise from grayscale images with OpenCV in Python. Learn kernel design (3x3 and 5x5) and how these operations affect image borders.
Noise removal - intuition15:58
Apply blur and kernel-based filters to remove noise and preserve text for OCR. Explore box and Gaussian blur, median, and bilateral filters and the role of convolution.
Noise removal - implementation8:16
Explore noise removal techniques for optical character recognition in Python, including average blur, Gaussian blur, median blur, and bilateral filter, assessing which yields the clearest text.
Text recognition with OCR4:07
Perform text detection with OCR, emphasizing preprocessing to handle image quality, install Tesseract with language data, test English and Portuguese, compare preprocessing techniques, and preview upcoming homework.
HOMEWORK0:08
Homework solution4:21
Load the image and implement an OCR workflow in python by converting to grayscale, applying thresholding and inversion, and employing preprocessing techniques to extract text.

EAST - introduction9:19
Explore the EAST text detector for locating text in images, returning bounding box coordinates (x, y, width, height) and confidence, and combining with OCR to convert detected regions into strings.
Pre-processing the image12:25
Implement text detection with the East architecture in Python by loading a pre-trained model in Google Colab and resizing images to 320 by 320.
Loading the neural network10:29
Load and prepare the East architecture neural network in OpenCV, convert the image to a blob, and obtain scores and geometry for text bounding boxes.
Decoding the image 17:59
Decode the geometry and scores from OCR detections by building a geometric data function, extracting positions and angles, and computing offsets and bounding box coordinates with sine and cosine.
Decoding the image 214:38
Finish implementing the OCR decoding phase by filtering detections by confidence, applying no max suppression to choose the best bounding box, and visualizing the region of interest.
Text recognition5:54
Perform text recognition on a region of interest in Python after installing and importing the OCR libraries and the English language package, adjusting margins to improve recognition and visualize results.

Importing the libraries4:40
Train a custom OCR with TensorFlow in Google Colab to recognize digits 0–9 and letters A–Z, using datasets and steps to connect to Google Drive and run the notebooks.
MNIST 0-9 dataset11:11
Load the MNIST 0-9 dataset from TensorFlow, combine train and test sets into a 70,000-image grayscale 28x28 data set with 0-255 pixel values, and explore digits zero to nine.
Kaggle A-Z dataset11:12
Load the kaggle a-z dataset for optical character recognition in python, unzip and reshape 28×28 images, convert pixels to floats, and analyze class distribution from a to z.
Joining the datasets5:33
Combine digit and alphabet datasets by offsetting alphabet labels to avoid collisions, then stack data and labels into a unified 28x28 grayscale dataset with a channel dimension for OCR models.
Pre-processing the data17:52
Preprocess OCR data by normalizing images to 0–1, one-hot encoding 36 classes, and balancing train-test splits; augment with rotation, zoom, and shift for neural network training using softmax.
Building the neural network14:03
Build a TensorFlow sequential neural network with convolutional layers, 32, 64, and 128 filters, max pooling, and dense layers for 36-class OCR using padding same and softmax output.
Training the neural network7:58
Train a custom OCR neural network with batch size 128 over 20 books and 20 epochs, validating on a data split to save the best model by loss.
Evaluating the neural network12:04
Evaluate the neural network on the 88,000-image test set, showing 93% accuracy across 36 classes, with precision and recall insights guiding further training and saving to Google Drive.
Saving the neural network3:17
save the trained neural network to Google Drive to preserve weights after training. connect Google Colab to Google Drive, copy the saved weights file, and later load it for predictions.
Testing with images10:39
Test a saved neural network for ocr in Python by loading the model with TensorFlow and preprocessing an image (grayscale, thresholding, 28 by 28 resize, normalize) to predict the label.
Preparing the environment5:45
Prepare the environment for OCR in Python by loading a neural network in Google Colab, connecting to Google Drive, and applying grayscale preprocessing and contour detection.
Pre-processing the image7:40
Learn image pre-processing techniques for OCR in Python, including Gaussian blur, grayscale conversion, adaptive thresholding, color inversion, dilation, and edge detection to enhance text extraction.
Contour detection14:16
Detect contours with OpenCV to locate characters using external contours and bounding boxes, order them left to right, and apply preprocessing—blur, adaptive thresholding, inversion, and dilation—to separate letters.
Processing the detections 112:05
Develop a processing workflow for OCR by creating utility functions for right extraction, thresholding, and resizing detections to 28 by 28, centering digits with padding.
Processing the detections 27:37
Finish processing characters by applying a normalization function that converts the image to float32 and adds a channels axis. Threshold, resize, and normalize to prepare bounding boxes for recognition.
Character recognition12:37
Finish character recognition in Python by building digit and letter sets. Predict 36 classes from 28x28 images using a neural network, visualize results with bounding boxes, and convert to text.
Problems with 0 and O, 1 and l, 5 and S6:31
Learn how ocr models confuse 0 with o, 1 with l, and 5 with s, and apply simple pre-processing and neighborhood checks to reduce these errors in Python.
Problems with undetected texts5:51
Learn how to handle undetected texts in Python OCR by reprocessing images, filtering backgrounds, selecting bounding boxes, and focusing on the largest text regions to improve OCR accuracy.

Preparing the environment5:37
Learn to use easy OCR in a Google Colab workflow. Install dependencies like OpenCV, set language options, load an image, and perform the first text recognition with easy OCR.
Text recognition2:14
Implement text recognition with EasyOCR by creating a reader object, sending the image to read_text, and returning detected text with bounding boxes and confidence, without preprocessing.
Writing the results on the image13:42
Learn how to write detected text on an image using easy OCR in python, by computing text positions, drawing bounding boxes, loading fonts, and rendering results for multilingual text.
Other languages - French and Chinese5:58
Learn multilingual optical character recognition in Python with easyocr to detect English, French, and Chinese in the same image, visualize bounding boxes and confidence scores, and test in Google Colab.
Text recognition (background)8:22
Apply enhanced text recognition in Python by implementing background drawing and bounding boxes to visualize OCR results, using multilingual testing (English and Portuguese) and adjustable fonts for clearer detections.

Preparing the environment7:51
Connect to Google Drive, install libraries, and prepare the environment to run OCR in Python using Colab, then configure preprocessing steps like grayscale, 100% resize, and thresholding for Portuguese.
Video settings11:39
Load and resize a video for OCR in Python using OpenCV, read frames, and save frame-by-frame OCR results to a new video.
Processing the video4:42
Implement video processing by reading frames in blocks of 20, resizing with a visual resize function, and displaying results in Google Colab while managing memory.
OCR with EAST and Tesseract13:04
Finish implementing text extraction from video frames using east for detection and tesseract for recognition, including preprocessing, roi extraction, and non-max suppression to produce readable text overlays.
OCR with EasyOCR5:44
Learn to perform video OCR with EasyOCR in Python, from installation to frame-by-frame processing, bounding boxes, and rendering text on frames, plus a quick quality comparison with earlier methods.

Preparing the environment6:16
Prepare the environment for a Python OCR project by connecting Google Colab to Google Drive, loading the image dataset, and configuring Tesseract for Portuguese text recognition.
Text recognition6:30
Implement text recognition on a batch of images in a folder using optical character recognition, loading each image, extracting text, concatenating results, and saving to an output file.
Searching for texts5:58
Search for the term computador in a text file and in images using OCR, leveraging regular expressions to count occurrences in Portuguese texts.
Word cloud12:53
Generate a word cloud from Portuguese text in Python, using spaCy to remove stop words, and visualize the most frequent terms.
Named entity recognition3:37
Learn to implement named entity recognition in natural language processing to extract locations, people, and organizations, visualize results in Google Colab, and filter by person entities.
Search for texts in images10:27
Visualize OCR results by drawing detected text on images with configured fonts in Google Colab, and build a pipeline to count search terms with confidence filtering.
Saving the results4:54
save the ocr results to a processed images folder by iterating over test images, detecting terms, and copying matched images with new names.

Preparing the environment5:40
Prepare a Google Colab OCR environment by connecting to Google Drive and loading images, then apply grayscale, Gaussian blur, and edge detection followed by a perspective transformation upright for OCR.
Contour detection8:37
Detect edges with the canny edge detector, extract all contours with OpenCV, select the largest contour, then compute its perimeter with arc length and approximate with approxPolyDP to visualize.
Perspective transformation11:37
Finish implementing the points ordering and the perspective transformation to warp the image into a straight, OCR-ready view using the transformation matrix from cv2 getPerspectiveTransform.
OCR with Tesseract3:41
Enable ocr with Tesseract in Python by configuring Portuguese language data, applying preprocessing and image resizing to improve text extraction, and comparing results to guide future techniques.
Improving image quality8:31
Improve image quality for OCR in Python by applying brightness and contrast adjustments, adaptive thresholding, grayscale conversion, and border removal.
Putting all together2:41
Combine image processing steps to prepare pictures for OCR by applying grayscale, blurring, edge detection, finding the largest contour, applying perspective transform, and adaptive thresholding for clean OCR input.

Pre-processing the image9:02
Apply preprocessing to isolate license plate regions from car images, convert to grayscale, apply bilateral filter and canny edge detection, extract text with OCR, and verify against a database.
Text recognition5:22
Use Python OCR to perform text recognition, preprocess images with gray, blur, and edge detection, detect license plates, filter alphanumeric characters, and annotate results on the image.
Improving image quality2:17
Increase the image size by 20% and apply thresholding to improve image quality for OCR in Python. Rerun the OCR to achieve clearer and correct recognition and demonstrate preprocessing benefits.

Requirements

Programming logic
Python programming basic

Description

Within the area of Computer Vision is the sub-area of Optical Character Recognition (OCR), which aims to transform images into texts. OCR can be described as converting images containing typed, handwritten or printed text into characters that a machine can understand. It is possible to convert scanned or photographed documents into texts that can be edited in any tool, such as the Microsoft Word. A common application is automatic form reading, in which you can send a photo of your credit card or your driver's license, and the system can read all your data without the need to type them manually. A self-driving car can use OCR to read traffic signs and a parking lot can guarantee access by reading the license plate of the cars!

To take you to this area, in this course you will learn in practice how to use OCR libraries to recognize text in images and videos, all the code implemented step by step using the Python programming language! We are going to use Google Colab, so you do not have to worry about installing libraries on your machine, as everything will be developed online using Google's GPUs! You will also learn how to build your own OCR from scratch using Deep Learning and Convolutional Neural Networks! Below you can check the main topics of the course:

Recognition of texts in images and videos using Tesseract, EasyOCR and EAST
Search for specific terms in images using regular expressions
Techniques for improving image quality, such as: thresholding, color inversion, grayscale, resizing, noise removal, morphological operations and perspective transformation
EAST architecture and EasyOCR library for better performance in natural scenes
Training an OCR from scratch using TensorFlow and modern Deep Learning techniques, such as Convolutional Neural Networks
Application of natural language processing techniques in the texts extracted by OCR (word cloud and named entity recognition)
License plate reading

These are just some of the main topics! By the end of the course, you will know everything you need to create your own text recognition projects using OCR!

Who this course is for:

Anyone interested in OCR (Optical Character Recognition)
Undergraduate students who are studying subjects related to Artificial Intelligence, Digital Image Processing or Computer Vision
Data Scientists who want to increase their knowledge in Computer Vision
Professionals interested in developing professional optical character recognition solutions
People interested in creating their own custom OCR

Optical Character Recognition (OCR) in Python

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 20min

OCR with Tesseract11 lectures • 1hr 41min

Techniques for image pre-processing16 lectures • 1hr 47min

OCR with EAST for natural scenes6 lectures • 1hr 1min

Training a custom OCR18 lectures • 2hr 51min

Natural scenarios with EasyOCR5 lectures • 36min

OCR in videos5 lectures • 43min

Project 1: Searching for specific terms7 lectures • 51min

Project 2: Scanner + OCR6 lectures • 41min

Project 3: License plate reading3 lectures • 17min

Requirements

Description

Who this course is for: