Text Mining with Machine Learning and Python

Name: Text Mining with Machine Learning and Python
Rating: 4.0 (88 reviews)

Get high-quality information from your text using Machine Learning with Tensorflow, NLTK, Scikit-Learn, and Python

Created byPackt Publishing

Last updated 5/2018

English

What you'll learn

Refine and clean your text
Extract important data from text
Classify text into types
Apply modern ML and DL techniques on the text
Work on pre-trained models
Important text mining processes
Analyze text in the best and most effective way

Course content

6 sections • 31 lectures • 2h 26m total length

The Course Overview4:45
This video gives an overview of the entire course.
Understanding Modern-Day Text Mining4:56
There is often misconception about experimental data science and data science for the real world, for companies building an actual product. This video aims to clarify some things, in order to set a baseline for future videos.
        Establish that there is a difference
        Clarify which fronts these differences exist
        Explain the path we are going to take
Exploring Your Text Mining Toolbox3:26
We do not yet have an overview of the tools to be used during this course, by laying them out we hopefully make this clear.
        Explore general tools, such as programming language and IDE
        Discover data science tools
        Find out the NLP and text tools
Setting Up Your Working Environment3:35
Some of the tools we have used require extra downloads or installs. We will walk you through the various steps, so they are fully equipped to tackle the coding examples.
        Set up a virtual environment
        Install and downloading extra packages
        Test if everything works fine
A Short Rundown of the Topics We Will Cover1:56
Now that we have our system up and running, we don’t know yet what we intend to do with it.
        Find out what to do with data science tools
        Find out what to do with machine learning tools
        Find out what to do with NLP tools

Understanding Text Data Sources3:39
This video highlights the types of data the text mining data scientists might come into contact with, and where he can get his hands on some text data to get working on.
        Explain how data sources differ
        Explain the uses for each type
        Point to some sources for each type
Cleaning Messy Text4:52
Under the motto garbage in = garbage out, we look at some often-used pre-processing steps.
        Explain why we need to clean text
        Explain what aspect can be cleaned
        Explain the importance of order to follow
Tokenization, POS Tagging, and Lemmatization6:18
A key aspect of transforming text to numeric is correctly building your corpus of words and word features. The appropriate techniques are explained in this video.
        Find out what tokenization is
        Learn what POS tagging is
        Find out what lemmatization is
Dealing with N-Grams8:38
To further correctly refine our corpus, n-grams often play an important part. This video explains how to approach this.
        Tackle the problem in a non-intelligent way
        Use chunking to be more precise
        Use statistical approaches to handle the problem

Word Search Versus Entity Extraction3:21
This video gives a quick intro to the information extraction problem. First up is a very important difference between an ML approach and a text search approach.
        Find out what information extraction is
        Find out what information extraction is
        Differentiate between search-based and machine learning based
Named Entity Recognition (NER)4:22
A deeper dive into the world of named entity recognition, the machine learning approach to information extraction. It is important to know how this approach works.
        Explain what Named Entity Recognition is
        Explain the types of approaches and models
        Explain how to choose the correct approach
Using Pre-Trained Models6:19
For a lot of so-called entities, pre-trained models exist which one can use of the shelf. But when, which entities and which models often remains unclear. This is explained in this video.
        Lay out the various popular entities
        Lay out the popular models to do use
        Demonstrate how to use them and when they fall short
Training Your Own NER13:58
Often, pre-trained entity types are just not enough. This video will demonstrate what to do in such a scenario.
        Discuss why you sometimes need to train your own model
        Train using spaCy
        Train using PyCRF
Deep Learning Approach to NER5:18
State of the art results can be obtained using more advanced deep learning models. This video gently explores that option, along with the opportunities and pitfalls.
        Explore the next step in NER – deep learning
        Find out when does it (not) make sense
        Demonstrate and urging the viewer to further dive into this

Feature Representation6:58
This video gives an introduction to the problem of text classification, along with the first step in the process – representing your text in a mathematical vector for use in a learning algorithm.
        Find out what text classification is
        Explain why mathematical feature representation is needed
        Differentiate between various representation techniques
Machine Learning Algorithms for Text Classification2:36
There are many algorithms and techniques out there to tackle the problem. This video aims to guide the student to selecting the right one for his or her problem.
        Explain how to divide your problem
        Lay out which algorithms work well and which don’t
        Explain model parameters you need to take into account
Setting Up a Basic Text Classifier8:06
A dive into a coded example, starting from a popular dataset, to show what we have learned so far in each step, and have a full working example.
        Demonstrate feature representation and data preparation
        Demonstrate model setup
        Demonstrate evaluation
Pitfalls and Rules of Thumb3:51
In our previous example, there are a lot of choices to be made and hyperparameters to be tuned, this video gives an overview of this.
        Explain why and how to do hyperparameter tuning
        Explain other choices
        Provide neat tips and tricks
Putting Classifiers into Production3:54
This step is often omitted in other courses but since this course aims to provide a real world view on text machine learning, some attention is spent on putting classification into production.
        Explain why it matters to think about this
        Explain what you need to think about
        Provide other tips and tricks
Deep Learning Approach to Text Classification3:33
State of the art results can be obtained using more advanced deep learning models. This video gently explores that option, along with the opportunities and pitfalls.
        Lay out the general thinking approach
        Explore CNNs
        Explore other options

What Are Word Embeddings?4:49
This video gives an introduction to the concept of word embeddings – the history and the use cases.
        Introduce word embeddings
        Explain how they became important
        Explain the use cases
Main Techniques3:35
Moving deeper into Word2Vec (with skip-grams and CBOW) and Glove to better clarify and structure the main techniques used.
        Explain the difference between glove and Word2Vec and FastText
        Explain the difference between skip-grams and CBOW
        Point to pre-trained word embeddings
Training a Word2Vec Model5:58
We now know what it is, but not yet how to use it. The first step is training a word embedding model.
        Introduce the gensim package to help us
        Demonstrate how to train a Word2Vec model with it
        Demonstrate how to train a FastText model with it
Visualizing a Trained Word Embedding Model4:35
To demonstrate some of the powerful aspects of word embeddings, we will try to visualize one.
        Explain the problem, dimensionality reduction
        Introduce and use T-sne
        Demonstrate some key word embedding strengths
X2Vec3:43
The power of word embeddings is being applied more and more often in other domains as well, as well on other things than just words. This video tries to tease some of these possibilities.
        Explain the transferability of word vectors
        Explain expansion one – from word to other aspects
        Explain expansion two – embeddings in other domains

Stitching It All Together2:51
We saw a lot of concepts introduced in the previous sections. To consolidate, we will try to stitch everything together in one overview.
        Introduce an example to map everything to
        Map it in terms of the various sections
        Place introduced tools on overview
Topic Modelling2:47
A method we haven’t seen yet is topic modelling. Though it can be a useful one for some topics. Based on what we already know (tools and methods), the introduction is easily made.
        Illustrate what topic modelling is
        Illustrate the use cases for topic modelling
        Illustrate the tools you can use
Text Generation4:58
From here on, the more exotic topics will be addressed, starting with text generation. Using neural network architectures, models can be trained to predict sequences in the style of the training corpora.
        Introduce the use case of text generation
        Quickly browse the neural network types used for this
        Illustrate with a fun example
Machine Translation4:26
One of the large areas where neural networks have revolutionized NLP is machine translation. This video quickly touches upon this topic.
        Introduce SMT and NMT
        Touch upon encoder-decoder networks
        Point to further reading and examples
Further Reading1:29
From here on forth, there are a number of areas one can use to get further acquainted. It can be difficult to find these right paths on your own, so this video will give some nice pointers to get started.
        Point to recommended books
        Point to recommended blog repos
        Point to Kaggle
Closing2:35
Again, we saw a large number of different topics, both in this section and the previous ones. In this final video, we try to structure everything in one large overview.
        Place the basic NLP topics on the overview
        Place the advanced NLP/ML topics on the overview
        Place the exotic NLP/ML topics on the overview

Requirements

Basic knowledge of Python, Machine Learning, and Data Science are required.

Description

Text is one of the most actively researched and widely spread types of data in the Data Science field today. New advances in machine learning and deep learning techniques now make it possible to build fantastic data products on text sources. New exciting text data sources pop up all the time. You'll build your own toolbox of know-how, packages, and working code snippets so you can perform your own text mining analyses.

You'll start by understanding the fundamentals of modern text mining and move on to some exciting processes involved in it. You'll learn how machine learning is used to extract meaningful information from text and the different processes involved in it. You will learn to read and process text features. Then you'll learn how to extract information from text and work on pre-trained models, while also delving into text classification, and entity extraction and classification. You will explore the process of word embedding by working on Skip-grams, CBOW, and X2Vec with some additional and important text mining processes. By the end of the course, you will have learned and understood the various aspects of text mining with ML and the important processes involved in it, and will have begun your journey as an effective text miner.

About the Author

Thomas Dehaene is a Data Scientist at FoodPairing, a Belgium-based Food Technology scale-up that uses advanced concepts in Machine Learning, Natural Language Processing, and AI in general to capture meaning and trends from food-related media. He obtained his Master of Science degree in Industrial Engineering and Operations Research at Ghent University, before moving his career into Data Analytics and Data Science, in which he has been active for the past 5 years. In addition to his day job, Thomas is also active in numerous Data Science-related activities such as Hackathons, Kaggle competitions, Meetups, and citizen Data Science projects.

Who this course is for:

This course targets Data Scientists who need to obtain a basic set of skills in the field of text analysis, or a Citizen Data Scientist who wants to get up and running with text mining.

Text Mining with Machine Learning and Python

What you'll learn

Explore related topics

Course content

Getting Started with Text Mining5 lectures • 19min

Reading and Processing Text Features4 lectures • 23min

Extracting from Text5 lectures • 33min

Classification of Text6 lectures • 29min

Word Embeddings5 lectures • 23min

Other ML Topics with Text6 lectures • 19min

Requirements

Description

Who this course is for: