
Explore how natural language processing uses corpora to teach machines with readable, trustworthy text, and learn to access popular nltk corpora like Gutenberg, Brown, and writer's corpus.
Learn how to access NLTK corpora, including the Gutenburg and brown corpus, by retrieving raw text, word lists, and sentence lists, and explore categories and file IDs.
Learn to create and analyze conditional frequency distributions with NLTK by grouping words by category, accessing distributions like a dictionary, and tabulating results across genres.
Explore how lemmatization uses a corpus and WordNet to map words to root forms, revealing synonyms, senses, and meanings.
In this video series, we will start with in introduction to corpus we have at our disposal through NLTK. Once we download the corpus and learn different tricks to access it, we will move on to very useful feature in NLP called frequency distribution. In this section, we will see how calculate, tabulate and plot frequency distribution of words. In the next section, we will start learning NLP specific techniques that include:
1. Stemming
2. Lemmatization
3. Tokenization