Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Natural Language Processing:Concept along with Case Study
Rating: 4.6 out of 5(213 ratings)
8,195 students

Natural Language Processing:Concept along with Case Study

Free Course: Natural Language Processing (NLP), Text Processing, Machine Learning, Spam Filter [Python]
Created byRishi Bansal
Last updated 6/2020
English

What you'll learn

  • What are various text processing techniques and their implementation in python.
  • Case Study: Role of Hashing in Spam Filter compared to Countvectorizer.

Course content

3 sections19 lectures1h 31m total length
  • What is Natural Language Processing (NLP)4:49

    •NLP: Natural Language Processing

    •is a subfield of linguistics, computer science, information engineering, and AI

    •deals with the interactions between computers and human languages

    •how to program computers to process and analyze large amounts of natural language data

    •computers can read text, hear speech, interpret it, measure sentiment and determine which parts are important

    •App: Optical Character Recognition (OCR), Speech Recognition, Machine Translation, and Chatbots

    •ML Algorithm study millions of text examples written by humans

    •Algorithms gain understanding of the context

    •This helps in differentiating between meaning of various texts

  • Tokenization2:15

    •Task of breaking a text into pieces called as token

    Types:

    •Word Tokenization

    •Sentence Tokenization

  • Stop Words Removal3:28

    •Stopwords are the English words which does not add much meaning to a sentence.

    •They can safely be ignored without sacrificing the meaning of the sentence.

    •A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore.

  • N-Grams3:46

    •An n-gram is a contiguous sequence of n items from a given sample of text or speech.

    E.g: While typing we get suggestion

  • Stemming1:42

    •Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form

    E.g: Search Engine

  • Word Sense Disambiguation2:04

    •WSD is identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings.

  • Count Vectorizer5:34

    •Provides a simple way to both tokenize a collection of text documents and build a vocabulary of known words, but also to encode new documents using that vocabulary.

    •The same vectorizer can be used on documents that contain words not included in the vocabulary. These words are ignored and no count is given in the resulting vector.

    •Issue: Appearance of “the”

    •Each column represents one word, count refers to frequency of the word

    •Sequence of words are not maintained

  • TF-IDF Vectorizer7:30

    •TF-IDF are word frequency scores that try to highlight words that are more interesting, e.g. frequent in a document but not across documents.

    •The importance is in scale of 0 & 1

    Term Frequency: This summarizes how often a given word appears within a document.
    Inverse Document Frequency: This downscales words that appear a lot across documents.

    Adv:

    •Feature vector much more tractable in size

    •Frequency and relevance captured

    DisAdv:

    •Context still not captured

  • Hashing Vectorizer4:21

    •Issue with Counts and frequencies – vocabulary can become very large

    •Work around is to use a one way hash of words to convert them to integers

    •No vocabulary is required and you can choose an arbitrary-long fixed length vector

    •Downside - no way to convert the encoding back to a word

Requirements

  • Basic Understanding of Python
  • One Laptop with Python IDE installed
  • Understanding of Machine learning will be helpful in Case Study however not mandatory

Description

This course provides a basic understanding of NLP. Anyone can opt for this course. No prior understanding of NLP is required.  Text Processing like Tokenization, Stop Words Removal, Stemming, different types of Vectorizers, WSD, etc are explained in detail with python code. Also difference between CountVectorizer and Hashing in Spam Filter.

Who this course is for:

  • People willing to learn NLP and looking forward to build career in Machine Learning.