Natural Language Processing for Text Summarization

Name: Natural Language Processing for Text Summarization
Rating: 4.6 (409 reviews)

Understand the basic theory and implement three algorithms step by step in Python! Implementations from scratch!

Created byJones Granatyr, AI Expert Academy

Last updated 4/2023

English

What you'll learn

Understand the theory and mathematical calculations of text summarization algorithms
Implement the following summarization algorithms step by step in Python: frequency-based, distance-based and the classic Luhn algorithm
Use the following libraries for text summarization: sumy, pysummarization and BERT summarizer
Summarize articles extracted from web pages and feeds
Use the NLTK and spaCy libraries and Google Colab for your natural language processing implementations
Create HTML visualizations for the presentation of the summaries

Course content

6 sections • 44 lectures • 4h 56m total length

Course content7:04
Introduction to natural language processing4:37
Explore the fundamentals of natural language processing and its applications, including text summarization, speech transcription, neural machine translation, chatbots, Q&A, and captioning for images and videos.
Source code and slides0:07

Plan of attack1:42
Algorithm - intuition11:58
Preprocessing the texts 14:56
Preprocessing the texts 211:04
Word frequency3:58
Implement word frequency by tokenizing preprocessed text, computing a frequency distribution, and building a dictionary of unique words and their counts.
Weighted word frequency3:08
Compute and normalize word frequencies to weight words in a text summarization algorithm, using the highest frequency and a for loop to derive weighted frequencies, preparing for sentence tokenization.
Sentence tokenization5:00
Generating the summary10:22
Visualizing the summary in HTML5:41
Extracting texts from the Internet5:31
Function to summarize the texts8:36
This lecture builds a summarize function that pre-processes text, weighs words by frequency, scores sentences, and returns the best sentences as a concise article summary.
Function to visualize the results3:41
Summarizing multiple texts7:50

Plan of attack3:54
Outline the plan of attack for the lune algorithm in text summarization with Python, a frequency-based method with complex calculations, including intuition, word clouds, named-entity extraction, and the scoring equation.
Preparing the environment5:10
Implementation 111:00
Implementation 212:30
Implementation 315:24
Implementation 412:30
Implementation 56:11
Extracting texts from the Internet6:59
Reading articles from RSS feeds12:27
Word cloud7:49
Extracting named entities5:14
Summarizing articles from feed6:42
Summary in HTML files7:06

Requirements

Programming logic
Basic Python programming

Description

The area of Natural Language Processing (NLP) is a subarea of Artificial Intelligence that aims to make computers capable of understanding human language, both written and spoken. Some examples of practical applications are: translators between languages, translation from text to speech or speech to text, chatbots, automatic question and answer systems (Q&A), automatic generation of descriptions for images, generation of subtitles in videos, classification of sentiments in sentences, among many others! Another important application is the automatic document summarization, which consists of generating text summaries. Suppose you need to read an article with 50 pages, however, you do not have enough time to read the full text. In that case, you can use a summary algorithm to generate a summary of this article. The size of this summary can be adjusted: you can transform 50 pages into only 20 pages that contain only the most important parts of the text!

Based on this, this course presents the theory and mainly the practical implementation of three text summarization algorithms: (i) frequency-based, (ii) distance-based (cosine similarity with Pagerank) and (iii) the famous and classic Luhn algorithm, which was one of the first efforts in this area. During the lectures, we will implement each of these algorithms step by step using modern technologies, such as the Python programming language, the NLTK (Natural Language Toolkit) and spaCy libraries and Google Colab, which will ensure that you will have no problems with installations or configurations of software on your local machine.

In addition to implementing the algorithms, you will also learn how to extract news from blogs and the feeds, as well as generate interesting views of the summaries using HTML! After implementing the algorithms from scratch, you have an additional module in which you can use specific libraries to summarize documents, such as: sumy, pysummarization and BERT summarizer. At the end of the course, you will know everything you need to create your own summary algorithms! If you have never heard about text summarization, this course is for you! On the other hand, if you are already experienced, you can use this course to review the concepts.

Who this course is for:

People interested in natural language processing and text summarization
People interested in the spaCy and NLTK libraries
Students who are studying subjects related to Artificial Intelligence
Data Scientists who want to increase their knowledge in natural language processing
Professionals interested in developing text summarization solutions
Beginners who are starting to learn natural language processing

Natural Language Processing for Text Summarization

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 12min

Frequency-based algorithm13 lectures • 1hr 23min

Luhn algorithm13 lectures • 1hr 53min

Cosine similarity7 lectures • 57min

Libraries for text summarization6 lectures • 28min

Final remarks2 lectures • 3min

Requirements

Description

Who this course is for: