NLTK Introduction

Abhishek Kumar
A free video tutorial from Abhishek Kumar
Computer Scientist at Adobe
4.1 instructor rating • 16 courses • 9,405 students

Learn more from the full course

Natural Language Processing (NLP) with Python and NLTK

Master Natural Language with Python and NLP using Spam Filter detection

03:32:54 of on-demand video • Updated January 2020

  • Natural Language Processing using Python
English Now that we have some basic understanding of NLP or natural language processing Let's see. What is any NLP toolkit or in short - NLTK So NLTK is a suite of open-source tools created to make NLP processes in python Easier to build. In the last video we had seen that NLP has revolutionised many areas, like it may be parts of speech tagging. It may be sentence translation. It may be even Text generation and many more applications. So there are many inbuilt of functions and libraries that are included inside this NLTK library So you don't need to implement everything from scratch. For example, we may have for example, we may have a stem function inside this so what it does it if many words like coder May be coding corders and many more then it all comes down to Its these all words are a stemed to their root word 'code' So you can write a complex library to do all of these but It's already present in NLTK and you don't need to do anything. Similarly for Word separation so given a sentence or a text you want to tokenize it into a list of words So you don't need to write a custom function of yourself. You can just use the tokenize function there and similarly, there are many more things like stop words and many more applications which are in built in NLTK and many more are being added continuously due to its open source nature So it's a very useful library and if you don't know how to use it then Anything you want to design or develop will be very slow because you will need to do everything yourself. Now let's start with the setup for the NLTK. So first you need to install the NLTK. so for more details, you can visit the official NLTK website and you can see all the installation instructions for Linux Mac or Windows as per your system. And once you have done that you can import this using import nltk if you don't get any error after installation and you do import nltk Then you are fine to go because you did not get any error Now once you have installed it and imported nltk in your Python code you can download it and explore the package so once you Write some UI will come downloader UI and it will give you some list of packages And if you are doing it in Google colab notebook, then it will again be some this kind of list like d for download l for list. So you can list the packages using l and Whatever package you want to download you can write d and then package Identifier and it will download that So if you are using Google collab, then you can write 'd all' and it will download all the packages But if you are using it from your native Command line or ipython notebook jupyter notebook Then it will Give a downloader UI and from there you can select the package and then click download. I will show that after Some time. So first, let's see And then once you have done downloading you want to see what are the available functions? and attributes present in this package, you can write dir(nltk) So you need to import that then download and then it will pop out a list of various function like 'stem', As I have explained this is called stemming because it stems these words to the root words And then tokenize and then pos_tag It stands for parts of speech tagging and there will be tons and tons of these Functions and I cannot go through all of these in this video series, but you can explore as much as you want. Many of them will be useful for the application you are trying to build then Let's see one custom example of this. 'tokenize' function. So this is just to give you a flavour of how To use the various components of NLTK. So first you will import it as we have done already And then from nltk.tokenize we import word_tokenize. So this will tokenize a sentence into a list of words So for example if you have input text "I am learning NLP and using NLTK". And then you use this word tokenizer on this input text Then it will return So this word_tokens will become a list which will contain all the words. So 'I' 'am' and so on till 'NLTK' so it has tokenized the words and return a list of that. So when you print Input text it will be this sentence when you print word_tokens, it will be a list. So let's see all of this in jupyter notebook So first, I will do import nltk. so I have already installed NLTK on my system. So if you haven't installed or Then you will get an error if you do import nltk So first you need to do pip install nltk, and then you can import this library So then I go ahead and run it. I don't get any error. That means it's installed now I'd like to download the package packages and explore what's inside that. So I will run this and this was the UI I was talking about so you will get all the packages so let's say I select all packages And then I can do download So it's downloading all the packages So you can cancel or so I will pause this video and come back when it's downloaded Now we see that all the packages have been downloaded So we can quit this window and We get it true here. Now. Let's Explore it and we will do dir(nltk) and this should list out all the various functions we have been talking about So you can find there Like 'pos_tag' for parts of speech tagging and similarly We had this 'stem' function for stemming and we will have 'tokenize' another ready-made functions available here Now, let's write our tokenize example So let's insert the heading here Tokenize example And now let's write our code So we will write from nltk.tokenize Import word_tokenize And then we will have Input.txt, and this is "I am learning NLP and using NLTK" So this is the input text So let's tokenize it and save it in word_tokens And then we will use the word_tokenize function on the input text So and let's run it So we get some error input.txt is not defined Sorry there is spelling mistake run again, and now let's print it print Input.txt and also print the word_tokens So we see that this is the input sentence and the world_tokens contains a list of words So this just two lines of code Using this NLTK's ready-made function word_tokenize you have tokenized this entire sentence So that's why you can see how much powerful this NLTK is and you can see Here the list of extensive list of all the functions available So that was just a brief introduction to NLTK. So in further videos, we will see more about NLTK in Python. Thank you