
Learn optical character recognition in Python using Google Colab. Build end-to-end OCR pipelines from image preprocessing to text detection and recognition, including thresholding, noise removal, and neural networks.
Explore how OCR differs between controlled scenarios and natural scenes, and learn how preprocessing, image quality, and layout affect text detection and extraction.
Learn about Tesseract, a widely used OCR engine, and how it converts images to text using neural networks and preprocessing, with support for many languages.
Learn optical character recognition in Python by using Google Colab, installing the OCR library, and preparing images for text extraction with OpenCV. Next lesson covers sending the results to Tesseract.
Perform the first ocr test in Python by loading an image, calling the ocr function from a library, and capturing and printing the recognized text as a string.
Learn how to enable multi-language OCR in Python by installing language packages (Portuguese), configuring paths, and validating results in Colab.
Learn to configure page segmentation modes (psm) for OCR in Python, mapping modes to single blocks, lines, or words, and test results with language settings.
Detect page orientation for optical character recognition in Python by loading a book page image with OpenCV, and determining orientation in degrees and the page language (Portuguese).
select texts from an image using optical character recognition, draw bounding boxes around detected words, and review confidences and the block structure (page, block, paragraph, line, word) to extract results.
Draw bounding boxes around detected texts by filtering high-confidence results and display each recognized text above its box, using the detection results dictionary and OpenCV in Python.
Apply advanced OCR in Python to improve text extraction from larger images, handle Portuguese text, and display results by drawing bounding boxes and labeled text using a custom font.
Use optical character recognition (OCR) to extract text from images and identify dates with regular expressions in Python, demonstrated on a Brazilian Portuguese bank statement.
Learn how OCR detects text in natural scenes using neural networks, addressing false positives, and applying filters by confidence and text length to draw accurate bounding boxes.
Apply grayscale preprocessing to convert color images from bgr to a single channel, improving image quality for ocr while reducing data size and computation.
Explore thresholding techniques for binarization and image segmentation to extract text, including simple global thresholds, histogram-based methods, and adaptive thresholding with mean and gaussian calculations.
Apply simple thresholding as a pre-processing step for OCR by converting to grayscale and applying OpenCV binary thresholding with a chosen value (e.g., 127) to produce a clear text image.
Learn to implement the orzo methods for thresholding in python OCR, use histograms to find thresholds on grayscale images, and compare simple thresholds with orzo thresholds while visualizing results.
Explore adaptive thresholding in Python for OCR, comparing mean and Gaussian methods with a global threshold, using grayscale conversion, block size 11, and the C parameter to handle uneven illumination.
We implement gaussian adaptive thresholding to address uneven lighting in images, preprocess to grayscale, apply adaptive thresholds, and compare results with adaptive averaging in Python.
Learn color inversion for OCR in Python by converting images to grayscale, applying 255 minus gray and thresholded inversion, to produce black text on white and improve recognition.
Explore the intuition of image resizing for OCR, and how scale factors affect x and y dimensions while comparing interpolation methods like nearest, linear, area, and lanczos.
Resize images with OpenCV by adjusting fx and fy and using cubic interpolation to enlarge or reduce images, showing 1.5 and 0.5 scales for OCR.
Explore morphological operations that remove noise, detect edges, and improve image quality in binary images. Understand erosion, dilation, and the opening and closing sequences that reshape white regions.
Implement erosion, deletion, opening, and closing to remove noise from grayscale images with OpenCV in Python. Learn kernel design (3x3 and 5x5) and how these operations affect image borders.
Apply blur and kernel-based filters to remove noise and preserve text for OCR. Explore box and Gaussian blur, median, and bilateral filters and the role of convolution.
Explore noise removal techniques for optical character recognition in Python, including average blur, Gaussian blur, median blur, and bilateral filter, assessing which yields the clearest text.
Perform text detection with OCR, emphasizing preprocessing to handle image quality, install Tesseract with language data, test English and Portuguese, compare preprocessing techniques, and preview upcoming homework.
Load the image and implement an OCR workflow in python by converting to grayscale, applying thresholding and inversion, and employing preprocessing techniques to extract text.
Explore the EAST text detector for locating text in images, returning bounding box coordinates (x, y, width, height) and confidence, and combining with OCR to convert detected regions into strings.
Implement text detection with the East architecture in Python by loading a pre-trained model in Google Colab and resizing images to 320 by 320.
Load and prepare the East architecture neural network in OpenCV, convert the image to a blob, and obtain scores and geometry for text bounding boxes.
Decode the geometry and scores from OCR detections by building a geometric data function, extracting positions and angles, and computing offsets and bounding box coordinates with sine and cosine.
Finish implementing the OCR decoding phase by filtering detections by confidence, applying no max suppression to choose the best bounding box, and visualizing the region of interest.
Perform text recognition on a region of interest in Python after installing and importing the OCR libraries and the English language package, adjusting margins to improve recognition and visualize results.
Train a custom OCR with TensorFlow in Google Colab to recognize digits 0–9 and letters A–Z, using datasets and steps to connect to Google Drive and run the notebooks.
Load the MNIST 0-9 dataset from TensorFlow, combine train and test sets into a 70,000-image grayscale 28x28 data set with 0-255 pixel values, and explore digits zero to nine.
Load the kaggle a-z dataset for optical character recognition in python, unzip and reshape 28×28 images, convert pixels to floats, and analyze class distribution from a to z.
Combine digit and alphabet datasets by offsetting alphabet labels to avoid collisions, then stack data and labels into a unified 28x28 grayscale dataset with a channel dimension for OCR models.
Preprocess OCR data by normalizing images to 0–1, one-hot encoding 36 classes, and balancing train-test splits; augment with rotation, zoom, and shift for neural network training using softmax.
Build a TensorFlow sequential neural network with convolutional layers, 32, 64, and 128 filters, max pooling, and dense layers for 36-class OCR using padding same and softmax output.
Train a custom OCR neural network with batch size 128 over 20 books and 20 epochs, validating on a data split to save the best model by loss.
Evaluate the neural network on the 88,000-image test set, showing 93% accuracy across 36 classes, with precision and recall insights guiding further training and saving to Google Drive.
save the trained neural network to Google Drive to preserve weights after training. connect Google Colab to Google Drive, copy the saved weights file, and later load it for predictions.
Test a saved neural network for ocr in Python by loading the model with TensorFlow and preprocessing an image (grayscale, thresholding, 28 by 28 resize, normalize) to predict the label.
Prepare the environment for OCR in Python by loading a neural network in Google Colab, connecting to Google Drive, and applying grayscale preprocessing and contour detection.
Learn image pre-processing techniques for OCR in Python, including Gaussian blur, grayscale conversion, adaptive thresholding, color inversion, dilation, and edge detection to enhance text extraction.
Detect contours with OpenCV to locate characters using external contours and bounding boxes, order them left to right, and apply preprocessing—blur, adaptive thresholding, inversion, and dilation—to separate letters.
Develop a processing workflow for OCR by creating utility functions for right extraction, thresholding, and resizing detections to 28 by 28, centering digits with padding.
Finish processing characters by applying a normalization function that converts the image to float32 and adds a channels axis. Threshold, resize, and normalize to prepare bounding boxes for recognition.
Finish character recognition in Python by building digit and letter sets. Predict 36 classes from 28x28 images using a neural network, visualize results with bounding boxes, and convert to text.
Learn how ocr models confuse 0 with o, 1 with l, and 5 with s, and apply simple pre-processing and neighborhood checks to reduce these errors in Python.
Learn how to handle undetected texts in Python OCR by reprocessing images, filtering backgrounds, selecting bounding boxes, and focusing on the largest text regions to improve OCR accuracy.
Learn to use easy OCR in a Google Colab workflow. Install dependencies like OpenCV, set language options, load an image, and perform the first text recognition with easy OCR.
Implement text recognition with EasyOCR by creating a reader object, sending the image to read_text, and returning detected text with bounding boxes and confidence, without preprocessing.
Learn how to write detected text on an image using easy OCR in python, by computing text positions, drawing bounding boxes, loading fonts, and rendering results for multilingual text.
Learn multilingual optical character recognition in Python with easyocr to detect English, French, and Chinese in the same image, visualize bounding boxes and confidence scores, and test in Google Colab.
Apply enhanced text recognition in Python by implementing background drawing and bounding boxes to visualize OCR results, using multilingual testing (English and Portuguese) and adjustable fonts for clearer detections.
Connect to Google Drive, install libraries, and prepare the environment to run OCR in Python using Colab, then configure preprocessing steps like grayscale, 100% resize, and thresholding for Portuguese.
Load and resize a video for OCR in Python using OpenCV, read frames, and save frame-by-frame OCR results to a new video.
Implement video processing by reading frames in blocks of 20, resizing with a visual resize function, and displaying results in Google Colab while managing memory.
Finish implementing text extraction from video frames using east for detection and tesseract for recognition, including preprocessing, roi extraction, and non-max suppression to produce readable text overlays.
Learn to perform video OCR with EasyOCR in Python, from installation to frame-by-frame processing, bounding boxes, and rendering text on frames, plus a quick quality comparison with earlier methods.
Prepare the environment for a Python OCR project by connecting Google Colab to Google Drive, loading the image dataset, and configuring Tesseract for Portuguese text recognition.
Implement text recognition on a batch of images in a folder using optical character recognition, loading each image, extracting text, concatenating results, and saving to an output file.
Search for the term computador in a text file and in images using OCR, leveraging regular expressions to count occurrences in Portuguese texts.
Generate a word cloud from Portuguese text in Python, using spaCy to remove stop words, and visualize the most frequent terms.
Learn to implement named entity recognition in natural language processing to extract locations, people, and organizations, visualize results in Google Colab, and filter by person entities.
Visualize OCR results by drawing detected text on images with configured fonts in Google Colab, and build a pipeline to count search terms with confidence filtering.
save the ocr results to a processed images folder by iterating over test images, detecting terms, and copying matched images with new names.
Prepare a Google Colab OCR environment by connecting to Google Drive and loading images, then apply grayscale, Gaussian blur, and edge detection followed by a perspective transformation upright for OCR.
Detect edges with the canny edge detector, extract all contours with OpenCV, select the largest contour, then compute its perimeter with arc length and approximate with approxPolyDP to visualize.
Finish implementing the points ordering and the perspective transformation to warp the image into a straight, OCR-ready view using the transformation matrix from cv2 getPerspectiveTransform.
Enable ocr with Tesseract in Python by configuring Portuguese language data, applying preprocessing and image resizing to improve text extraction, and comparing results to guide future techniques.
Improve image quality for OCR in Python by applying brightness and contrast adjustments, adaptive thresholding, grayscale conversion, and border removal.
Combine image processing steps to prepare pictures for OCR by applying grayscale, blurring, edge detection, finding the largest contour, applying perspective transform, and adaptive thresholding for clean OCR input.
Apply preprocessing to isolate license plate regions from car images, convert to grayscale, apply bilateral filter and canny edge detection, extract text with OCR, and verify against a database.
Use Python OCR to perform text recognition, preprocess images with gray, blur, and edge detection, detect license plates, filter alphanumeric characters, and annotate results on the image.
Increase the image size by 20% and apply thresholding to improve image quality for OCR in Python. Rerun the OCR to achieve clearer and correct recognition and demonstrate preprocessing benefits.
Within the area of Computer Vision is the sub-area of Optical Character Recognition (OCR), which aims to transform images into texts. OCR can be described as converting images containing typed, handwritten or printed text into characters that a machine can understand. It is possible to convert scanned or photographed documents into texts that can be edited in any tool, such as the Microsoft Word. A common application is automatic form reading, in which you can send a photo of your credit card or your driver's license, and the system can read all your data without the need to type them manually. A self-driving car can use OCR to read traffic signs and a parking lot can guarantee access by reading the license plate of the cars!
To take you to this area, in this course you will learn in practice how to use OCR libraries to recognize text in images and videos, all the code implemented step by step using the Python programming language! We are going to use Google Colab, so you do not have to worry about installing libraries on your machine, as everything will be developed online using Google's GPUs! You will also learn how to build your own OCR from scratch using Deep Learning and Convolutional Neural Networks! Below you can check the main topics of the course:
Recognition of texts in images and videos using Tesseract, EasyOCR and EAST
Search for specific terms in images using regular expressions
Techniques for improving image quality, such as: thresholding, color inversion, grayscale, resizing, noise removal, morphological operations and perspective transformation
EAST architecture and EasyOCR library for better performance in natural scenes
Training an OCR from scratch using TensorFlow and modern Deep Learning techniques, such as Convolutional Neural Networks
Application of natural language processing techniques in the texts extracted by OCR (word cloud and named entity recognition)
License plate reading
These are just some of the main topics! By the end of the course, you will know everything you need to create your own text recognition projects using OCR!