Intelligently Extract Text & Data from Document with OCR NER
What you'll learn
- Develop and Train Named Entity Recognition Model
- Not only Extract text from the Image but also Extract Entities from Business Card
- Develop Business Card Scanner like ABBY from Scratch
- High Level Data Preprocess Techniques for Natural Language Problem
- Real Time NER apps
- Should be at least beginner in Python
- Understand aggregation techniques with Pandas DataFrames
- Read, Write Images with OpenCV and Drawing Rectangles on Image
- Understand HTML, Boostrap
Welcome to Course "Intelligently Extract Text & Data from Document with OCR NER" !!!
In this course you will learn how to develop customized Named Entity Recognizer. The main idea of this course is to extract entities from the scanned documents like invoice, Business Card, Shipping Bill, Bill of Lading documents etc. However, for the sake of data privacy we restricted our views to Business Card. But you can use the framework explained to all kinds of financial documents. Below given is the curriculum we are following to develop the project.
To develop this project we will use two main technologies in data science are,
Natural Language Processing
In Computer Vision module, we will scan the document, identify the location of text and finally extract text from the image. Then in Natural language processing, we will extract the entitles from the text and do necessary text cleaning and parse the entities form the text.
Python Libraries used in Computer Vision Module.
Python Libraries used in Natural Language Processing
As are combining two major technologies to develop the project, for the sake of easy to understand we divide the course into several stage of development.
Stage -1: We will setup the project by doing the necessary installations and requirements.
Stage -2: We will do data preparation. That is we will extract text from images using Pytesseract and also do necessary cleaning.
Overview on Pytesseract
Extract Text from all Image
Clean and Prepare text
Stage -3: We will see how to label NER data using BIO tagging.
Manually Labeling with BIO technique
B - Beginning
I - Inside
O - Outside
Stage -4: We will further clean the text and preprocess the data for to train machine learning.
Prepare Training Data for Spacy
Convert data into spacy format
Stage -5: With the preprocess data we will train the Named Entity model.
Configuring NER Model
Train the model
Stage -6: We will predict the entitles using NER and model and create data pipeline for parsing text.
Render and Serve with Displacy
Draw Bounding Box on Image
Parse Entitles from Text
Finally, we will put all together and create document scanner app.
Are you ready !!!
Let start developing the Artificial Intelligence project.
Who this course is for:
- Anyone who wants to Develop Business Card Reader App
- Data Scientist, Analyst, Python Develop who want to enhance skills in NLP
I am Gusksra working in Data Science with a demonstrated history of working in the information technology and services industry. Skilled in Machine Learning, Deep Learning, Statistical algorithms. We mostly worked on Image Processing and Natural Language processing application. I also successfully deployed many data science-related projects in cloud platforms as a service in AWS, Google Cloud, etc.
We're team of Machine Learning experts, AI developers working together to advance the state of the art in artificial intelligence. You will be hearing from us when new courses are released, answering Q&A and many more.
We are here to help you stay on the cutting edge of Data Science and Technology.
Data Science Anywhere Team