Recent Course Review:
"Great course! The things that Mike taught are practical and can be applied in the real world immediately." -- Ricky Valencia
Welcome to A Comprehensive Guide to NLTK in Python: Volume 1
This is the very FIRST course in a series of courses that will focus on NLTK.
Natural Language ToolKit (NLTK) is a comprehensive Python library for natural language processing and text analytics.
Note: This isn't a modeling building course. This course is laser focused on a very specific part of natural language processing called tokenization.
This is the first part in a series of courses crafted to help you master NLP. This course will cover the basics of tokenizing text and using WordNet
Tokenization is a method of breaking up a piece of text into many pieces, such as sentences and words, and is an essential first step for recipes in the later courses. WordNet is a dictionary designed for programmatic access by natural language processing systems.
NLTK was originally created in 2001 as part of a computational linguistics course in the Department of Computer and Information Science at the University of Pennsylvania
We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles.
At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.
Technologies based on NLP are becoming increasingly widespread. For example, phones and handheld computers support predictive text and handwriting recognition; web search engines give access to information locked up in unstructured text; machine translation allows us to retrieve texts written in Chinese and read them in Spanish; text analysis enables us to detect sentiment in tweets and blogs.
A Jupyter notebook is a web app that allows you to write and annotate Python code interactively. It's a great way to experiment, do research, and share what you are working on.
In this course all of the tutorials will be created using jupyter notebooks. In the preview lessons we install Python. Check them out. They are completely free.
By providing more natural human-machine interfaces, and more sophisticated access to stored information, language processing has come to play a central role in the multilingual information society.
Thanks for your interest in A Comprehensive Guide to NLTK in Python: Volume 1
What's the course about?
What is NLTK?
Let's learn in this introductory lesson.
Let's find out what the course is about.
What are you going to learn.
The entire course is centered around the concept of tokeization.
Let's define what that is in this lecture.
Let's look at predictive modeling from a every day life experience point of view.
This Q&A with me will help you decide if this course is for you.
I want you to take my course but only if it's right for you.
This is where you'll download the Jupyter Notebook for the course.
In this lesson let's install Python.
Our IDE of choice for learning Python and machine learning is a Jupyter Notebook.
Let's walk through the basics in this lesson.
This diagram will help you visualize how NLTK uses tokenizers.
Let's demo how to tokenize our sentences.
Regular expressions are a lot more complicated but will give you granular control over your text.
Let's define a few stop words and learn how to filter them out in our corpus.
Let's define what Synsets are and how to use them.
Let's define what a lemma is and how to use them.
Pronounced like "emma."
Let's define some antonymns.
Let's calculate similarity in Wordnets.
Let's define what bigrams are and learn about word collocations.
I've been a production SQL Server DBA most of my career.
I've worked with databases for over two decades. I've worked for or consulted with over 50 different companies as a full time employee or consultant. Fortune 500 as well as several small to mid-size companies. Some include: Georgia Pacific, SunTrust, Reed Construction Data, Building Systems Design, NetCertainty, The Home Shopping Network, SwingVote, Atlanta Gas and Light and Northrup Grumman.
Experience, education and passion
I learn something almost every day. I work with insanely smart people. I'm a voracious learner of all things SQL Server and I'm passionate about sharing what I've learned. My area of concentration is performance tuning. SQL Server is like an exotic sports car, it will run just fine in anyone's hands but put it in the hands of skilled tuner and it will perform like a race car.
Certifications are like college degrees, they are a great starting points to begin learning. I'm a Microsoft Certified Database Administrator (MCDBA), Microsoft Certified System Engineer (MCSE) and Microsoft Certified Trainer (MCT).
Born in Ohio, raised and educated in Pennsylvania, I currently reside in Atlanta with my wife and two children.