A Comprehensive Guide to NLTK in Python: Volume 1
3.6 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
28 students enrolled
Wishlisted Wishlist

Please confirm that you want to add A Comprehensive Guide to NLTK in Python: Volume 1 to your Wishlist.

Add to Wishlist

A Comprehensive Guide to NLTK in Python: Volume 1

Tokenizing Text in Python for Natural Language Processing
3.6 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
28 students enrolled
Created by Mike West
Last updated 5/2017
Current price: $10 Original price: $20 Discount: 50% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 1 hour on-demand video
  • 7 Articles
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
What Will I Learn?
  • You'll understand tokenization in NLTK at a very deep level.
  • You'll understand and more importantly be able to tokenzie any part of a corpus.
  • You'll learn how to graphically represent arcane concepts like lemmas and synsets in NTLK.
  • You'll receive a completed Jupyter Notebook with the complete code and annotations for the course.
View Curriculum
  • Familiarity with Python will help you in this course.
  • An basic understanding of the terminology of machine learning would also be beneficial.

Recent Course Review:

"Great course! The things that Mike taught are practical and can be applied in the real world immediately."  -- Ricky Valencia

Welcome to A Comprehensive Guide to NLTK in Python: Volume 1

This is the very FIRST course in a series of courses that will focus on NLTK

Natural Language ToolKit (NLTK) is a comprehensive Python library for natural language processing and text analytics.

Note: This isn't a modeling building course. This course is laser focused on a very specific part of natural language processing called tokenization. 

This is the first part in a series of courses crafted to help you master NLP. This course  will cover the basics of tokenizing text and using WordNet

Tokenization is a method of breaking up a piece of text into many pieces, such as sentences and words, and  is an essential first step for recipes in the later courses. WordNet is a dictionary designed  for programmatic access by natural language processing systems. 

NLTK was originally created in 2001 as part of a computational linguistics course in the Department of Computer and Information Science at the University of Pennsylvania

We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles.

At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.

Technologies based on NLP are becoming increasingly widespread. For example, phones and handheld computers support predictive text and handwriting recognition; web search engines give access to information locked up in unstructured text; machine translation allows us to retrieve texts written in Chinese and read them in Spanish; text analysis enables us to detect sentiment in tweets and blogs.

Jupyter notebook is a web app that allows you to write and annotate Python code interactively. It's a great way to experiment, do research, and share what you are working on.

In this course all of the tutorials will be created using jupyter notebooks. In the preview lessons we install Python. Check them out. They are completely free.

By providing more natural human-machine interfaces, and more sophisticated access to stored information, language processing has come to play a central role in the multilingual information society.

Thanks for your interest in A Comprehensive Guide to NLTK in Python: Volume 1

Who is the target audience?
  • If you're interested in Natural Language Processing then this course is for you.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
Course Introduction
9 Lectures 21:54

What's the course about? 

What is NLTK? 

Let's learn in this introductory lesson. 

Preview 01:43

Let's find out what the course is about. 

What are you going to learn. 

Preview 01:22

The entire course is centered around the concept of tokeization. 

Let's define what that is in this lecture. 

Preview 02:44

Let's look at predictive modeling from a every day life experience point of view. 

Preview 01:21

This Q&A with me will help you decide if this course is for you. 

I want you to take my course but only if it's right for you. 

Preview 02:47

This is where you'll download the Jupyter Notebook for the course. 


In this lesson let's install Python. 

Installing Python 3.X

Our IDE of choice for learning Python and machine learning is a Jupyter Notebook.

Let's walk through the basics in this lesson.

Anatomy of a Jupyter Notebook


5 questions
12 Lectures 34:50

This diagram will help you visualize how NLTK uses tokenizers. 

Tokenization Hierarchy in NLTK

Let's demo how to tokenize our sentences. 

Sentence Tokenization

Regular expressions are a lot more complicated but will give you granular control over your text. 

Tokenization Using Regular Expressions

Let's define a few stop words and learn how to filter them out in our corpus. 

Stop Words

Let's define what Synsets are and how to use them. 

Synsets in WordNet

Let's define what a lemma is and how to use them. 

Pronounced like "emma." 

Lemmas in WordNet

Let's define some antonymns. 

Lemmas and Antonyms

Let's calculate similarity in Wordnets. 

Calculating WordNet Synset Similarity

Let's define what bigrams are and learn about word collocations. 

Word Collocations


10 questions

Congratulations and Thank You.

Bonus Lecture "Deep Learning"
About the Instructor
Mike West
4.1 Average rating
2,601 Reviews
42,924 Students
40 Courses
SQL Server and Machine Learning Evangelist

I've been a production SQL Server DBA most of my career.

I've worked with databases for over two decades. I've worked for or consulted with over 50 different companies as a full time employee or consultant. Fortune 500 as well as several small to mid-size companies. Some include: Georgia Pacific, SunTrust, Reed Construction Data, Building Systems Design, NetCertainty, The Home Shopping Network, SwingVote, Atlanta Gas and Light and Northrup Grumman.

Experience, education and passion

I learn something almost every day. I work with insanely smart people. I'm a voracious learner of all things SQL Server and I'm passionate about sharing what I've learned. My area of concentration is performance tuning. SQL Server is like an exotic sports car, it will run just fine in anyone's hands but put it in the hands of skilled tuner and it will perform like a race car.


Certifications are like college degrees, they are a great starting points to begin learning. I'm a Microsoft Certified Database Administrator (MCDBA), Microsoft Certified System Engineer (MCSE) and Microsoft Certified Trainer (MCT).


Born in Ohio, raised and educated in Pennsylvania, I currently reside in Atlanta with my wife and two children.