Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Hands-on NLP with NLTK and Scikit-learn
Rating: 3.9 out of 5(18 ratings)
125 students

Hands-on NLP with NLTK and Scikit-learn

A complete Python guide to Natural Language Processing to build spam filters, topic classifiers, and sentiment analyzers
Last updated 10/2019
English

What you'll learn

  • Build end-to-end Natural Language Processing solutions, ranging from getting data for your model to presenting its results.
  • Core NLP concepts such as tokenization, stemming, and stop word removal.
  • Use open source libraries such as NLTK, scikit-learn, and spaCy to perform routine NLP tasks.
  • Classify emails as spam or not-spam using basic NLP techniques and simple machine learning models.
  • Put documents in their relevant topics using techniques such as TF-IDF, SVMs, and LDAs.
  • Common text data processing steps to increase the performance of your machine learning models.

Course content

6 sections30 lectures2h 46m total length
  • The Course Overview2:10

    This video gives an overview of the entire course.

  • Use Python, NLTK, spaCy, and Scikit-learn to Build Your NLP Toolset6:40

    In this video, we will learn how we setup a stack of libraries for natural language processing.

    • Use Python for machine learning

    • Learn how does NLTK and spaCy fit into natural language processing

    • Learn how does scikit-learn fit into natural language processing

  • Reading a Simple Natural Language File into Memory5:57

    In this video, we will learn we will be putting textual data into Python to perform NLP.

    • Use iterators to read large text files

    • Speed up text file input-output with multiprocessing

  • Split the Text into Individual Words with Regular Expression6:25

    In this video, we will be capturing each word in a corpus as a feature.

    • Split lines of text into word tokens with the split function

    • Explore a better tokenizer with regular expressions

  • Converting Words into Lists of Lower Case Tokens4:00

    In this video, we will learn to remove effects of capitalization in our analysis.

    • Combine what we have learned to read a text file and process it

    • Split the corpus into case-insensitive tokens

  • Removing Uncommon Words and Stop Words6:35

    In this video, we will learn to remove noise caused by stop words and uncommon words.

    • Remove uncommon words

    • Learn about stop words

    • Remove uncommon words using the collections module

Requirements

  • Prior programming experience with Python is assumed along with being comfortable dealing with machine learning terms such as supervised learning, regression, and classification. No prior Natural Language Processing or text mining experience is needed.

Description

There is an overflow of text data online nowadays. As a Python developer, you need to create a new solution using Natural Language Processing for your next project. Your colleagues depend on you to monetize gigabytes of unstructured text data. What do you do?

Hands-on NLP with NLTK and scikit-learn is the answer. This course puts you right on the spot, starting off with building a spam classifier in our first video. At the end of the course, you are going to walk away with three NLP applications: a spam filter, a topic classifier, and a sentiment analyzer. There is no need for fancy mathematical theory, just plain English explanations of core NLP concepts and how to apply those using Python libraries.

Taking this course will help you to precisely create new applications with Python and NLP. You will be able to build actual solutions backed by machine learning and NLP processing models with ease.

This course uses Python 3.6, TensorFlow 1.4, NLTK 2, and scikit-learn 0.19, while not the latest version available, it provides relevant and informative content for legacy users of NLP with NLTK and Scikit-learn.

About the Author

Colibri Ltd is a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas such as big data, data science, machine learning, and cloud computing. Over the past few years, they have worked with some of the world's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the World's most popular soft drinks companies, helping each of them to make better sense of its data, and process it in more intelligent ways. The company lives by its motto: Data -> Intelligence -> Action.

Rudy Lai is the founder of QuantCopy, a sales acceleration startup using AI to write sales emails to prospects. By taking in leads from your pipelines, QuantCopy researches them online and generates sales emails from that data. It also has a suite of email automation tools to schedule, send, and track email performance—key analytics that all feedback into how our AI generates content.

Prior to founding QuantCopy, Rudy ran HighDimension.IO, a machine learning consultancy, where he experienced first-hand the frustrations of outbound sales and prospecting. As a founding partner, he helped startups and enterprises with High Dimension. IO's Machine-Learning-as-a-Service, allowing them to scale up data expertise in the blink of an eye.

In the first part of his career, Rudy spent 5+ years in quantitative trading at leading investment banks such as Morgan Stanley. This valuable experience allowed him to witness the power of data, but also the pitfalls of automation using data science and machine learning. Quantitative trading was also a great platform from which to learn deeply about reinforcement learning and supervised learning topics in a commercial setting.

Rudy holds a Computer Science degree from Imperial College London, where he was part of the Dean's List, and received awards such as the Deutsche Bank Artificial Intelligence prize.

Who this course is for:

  • This course is for developers, data scientists, and programmers who want to learn about practical Natural Language Processing with Python in a hands-on way. Developers who have an upcoming project that needs NLP, or a pile of unstructured text data on their hands, and don't know what to do with it, will find this course useful.