Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Natural Language Processing in R for Beginners

Name: Natural Language Processing in R for Beginners
Rating: 4.6 (22 reviews)

Learn NLP in R with our easy to understand videos and free textbook!

Created byMichael Bunting, Corydon Baylor, Caitlin Kennedy, Hassan Mahmood

Last updated 2/2021

English

English [Auto],

What you'll learn

Access Text Data from APIs with jsonlite
Scrape the Web Using rvest
Import Data from Twitter and Wikipedia
Find Patterns using Regex
Manipulate and Clean Data Using tidytext and tm
Measure Emotion with Sentiment Analysis
Surface Meaning with Topic Modeling
Provide Context with Parts of Speech Tagging and Named Entity Recognition
Quantify Relationships with Word Embeddings

Course content

14 sections • 70 lectures • 3h 21m total length

What To Expect1:20
Learn natural language processing in R for beginners by mining web data, preparing with tidytext, and analyzing sentiment, topic modeling, parts of speech tagging, named entity recognition, and word embeddings.
How To Use Course Textbook0:42
Download the lp site zip from course resources, unzip it, open the folder, and click index.html to study the HTML textbook as a standalone alternative to video lessons.

What is an API?1:50
Learn how application programming interfaces enable computers to talk and retrieve data for natural language processing, with terms like client, resource, endpoint, and get or post requests.
GET Requests1:09
Make your first get requests with the pooky api dutko api to fetch Pokemon data from the web, and learn how json data structures appear complex yet become clear.
API Call in R Studio with jsonlite2:48
Load the jsonlite package, transform JSON into R lists, and call an API in R Studio to retrieve Pokemon data, then loop through results and tidy into a data frame.
API Keys1:26
Discover how API keys and tokens control access to data, acting like passwords you must keep private, and learn to obtain, activate, and paste your key into your request URL.
Hands-On Practice with Pokemon API2:42
Practice using the pokemon api to fetch name, height, weight, base experience, and primary type for several pokemon, then combine the results into a data frame.
Hands-On Practice with OMDb API1:00
Create film vectors, populate info inside a loop, and combine into a data frame; manage a private API key variable for the OMDb API from imdb.com.

Applying for Twitter Developer Account3:35
Learn how to apply for a Twitter developer account and access the API v2 free sandbox, including selecting account type, answering policy questions, submitting the application, and email verification.
Getting Twitter API Keys1:42
Learn how to obtain and store the five Twitter API credentials—Twitter app name, API key, API secret key, access token, and access token secret—through the developer portal.
Searching for Tweets and Getting Timelines3:53
Learn to use our tweet package to search tweets and retrieve timelines via Twitter's API, including token creation, API keys, and basic filtering for verified or non-verified users.
Hands-On Practice with rtweet2:42
practice pulling tweets with rtweet, plot retweet-to-favorite ratios for dwayne the rock johnson via a scatterplot, and practice topic search with basic data cleaning on twitter data.

Introduction to Web Scraping and rvest1:47
Learn to pull data from websites using rvest in R, inspect HTML structure, and transform scraped text—such as transcripts or news articles—into a tidy data frame for analysis.
Scraping a Table from Webpage2:29
Learn to fetch HTML from a web page, use XPath to target the Wikipedia table of the 2016 Summer Olympics, and convert it into a clean data frame in R.
Scraping Data from Unstructured Sources3:29
Learn to scrape movie titles from unstructured html using read_html and css selectors, extracting text from lister-item-header elements and saving titles as a vector or data frame.
Hands-On Practice with rvest2:48
Scrape a twenty seventeen movie list by reading the HTML with rvest, extracting rank, title, runtime, genre, and primary genre, and assemble them into a data frame for analysis.

Introduction to getwiki1:41
Discover how the getwiki package simplifies obtaining Wikipedia text in R with tidy output, and learn how to install it via the web tools package from GitHub.
Getting Text from Wikipedia Article3:08
Learn to fetch Wikipedia article text with get_wiki, save it as strings or a data frame, and prepare it for tidytext in this beginner natural language processing in R course.
Searching for Articles2:41
Learn how to use a search function to return multiple articles based on a term, retrieve 20 results with titles and content, and fetch full articles with get wiki.
Hands-On Practice with getwiki1:47
Practice using the getWiki workflow to search for the Soviet Union, retrieve 20 articles, and feed the results into getWiki to explore data size and insights.

Introduction to Regex1:13
Learn to use regex (regular expressions) in R for text data cleaning and pattern matching, leveraging tidytext workflows; understand patterns and shorthand for multiple words.
Introduction to stringr4:17
Explore stringr basics for beginners, including detect, extract, match, and replace patterns. Learn when to use replace all and how extract and match return lists versus vectors.
Basic Pattern Matching in Regex2:13
Explore how regex uses period, backslash, and character classes like \w, \d, and \s to match words such as walk and talk in R, and how escaping a period works.
Anchors and Optional Characters3:22
Explore anchors in regex with start and end patterns in R, and use optional characters and grouping to flexibly match text, including mr x variants.
Ranges and Repeating Patterns4:04
Explore using ranges and repeating patterns with brackets and braces to match specific character sets and extract data such as phone numbers.
Hands-On Practice with stringr4:57
Practice stringr techniques to detect phone numbers in text messages, filter fraud data, and handle optional formats like parentheses and dashes in R.

Introduction to Tokenization1:01
Learn how to prepare text data for computing by performing tokenization, removing stop words, and text normalization such as stemming or lemmatization, using unigrams and bigrams.
Tokenizing n-grams3:05
Download the Metamorphosis text from Gutenberg, import it in R, build a text data frame, and tokenize into unigrams and bigrams to analyze word patterns.
Removing Stop Words2:03
Remove stop words to reveal meaningful context in text data; the lecture demonstrates filtering stop words from a data frame and shows the before-and-after impact.
Stemming2:39
Apply stemming with the snowball stem package to reduce words to root forms, as shown with love. Note limitations such as overstepping and imperfect grouping that can misrepresent words.
Lemmatization3:37
Lemmatization applies language-aware text normalization to derive root words that belong to the language, using the limites words function from the text standard library.
Cleaning n-grams2:24
Learn to clean text in R by tokenizing, filtering stop words, and limiting to single words in a data frame, then reconstruct and tokenize into n-grams.
Hands-On Practice with Preparing Text Data3:13
Download heart of darkness from Gutenberg via doodlebug function, tokenize the text into trigrams, remove stop words, and convert to root forms; group by Gutenberg idea and reconstruct for visualization.

Creating Bar Charts and Word Clouds5:25
Create a bar chart and a word cloud to explore the most frequent words in a text dataset, using tokenization and stop-word removal to reveal key terms.
Hands-On Practice with Exploring Text Data4:28
Load Hamlet text from Gutenberg, remove stop words, and build a bar chart of the ten most frequent words. Then create a word cloud from the tokens to visualize terms.

Creating a Corpus4:37
Create and inspect corpora in R using the tm package, importing Europe Wikipedia articles with getwiki, and convert data frames or vectors into a corpus with VCorpus.
Inspecting a Corpus2:37
Inspect or newsprint a corpus in R with the TM package to view metadata, access documents, and extract content with double brackets and substring.
Data Cleaning4:26
Master data cleaning in R with tm_map, applying a function to a corpus with arguments like stop words removal and stemming, and using content transformer for lowercasing.
Using a Document Term Matrix4:01
Explore the document term matrix, where rows are documents and columns are words with counts indicating term frequency, and examine sparsity, frequent terms, and word associations.

Introduction to TF-IDF2:26
Learn to compute tf-idf by combining term frequency and inverse document frequency to compare word usage across chapters of metamorphosis, tokenize the text, and count word occurrences in R.
Term Frequency3:08
Explore term frequency in r by building a data frame of words per chapter, calculating chapter and book counts, and comparing across the book to uncover meaningful terms.
Inverse Document Frequency1:40
Explore idf, the inverse document frequency, and how it discounts very common words using the natural log of total documents over those containing the word, boosting rare terms.
Applying TF-IDF2:51
Explore tf-idf in r by combining term frequency with inverse document frequency to identify chapter-specific, high-idf terms. Visualize top words by chapter to reveal important content beyond mere word frequency.
Hands-On Practice with TF-IDF3:01
Pull Wikipedia articles for France, England, Russia, and Germany with the get wiki package, compute tf-idf top words per article, and visualize the results.

Requirements

Basic Understanding of R
Desire to Learn Natural Language Processing
Bonus: Knowledge of the tidyverse

Description

Working with text data does not need to be difficult!

Follow along as we explain complex topics for a beginner audience. By the end of this course, you will be able to read in data from websites like twitter and wikipedia, clean it, and perform analysis.

We keep it easy.

This course is designed for a data analyst who is familiar with the R language but has absolutely no background in natural language processing or even statistics in general.

We break our course into three main sections: text mining, preparing and exploring text data, and analyzing text data.

Text Mining

Like with every other form of analytics, before any real work can be done, the data must exist (obviously) and be in a working format.

What’s Covered: APIs, Twitter Data, Webscraping, Wikipedia Data

Preparing and Exploring Text Data

Once the data has been properly gathered and mined, it needs to be put into a usable format. The following tutorials cover how to clean and explore text data.

What’s Covered: Regex, stringr package, tidytext package, tm package

Analyzing Text Data

After exploratory data analysis has been performed, we can do further analysis of the relationships and meaning in text.

What’s Covered: TF-IDF, Sentiment Analysis, Topic Modeling, Parts of Speech Tagging, Name Entity Recognition, Word Embeddings

So dive in and see what insights are hiding in your text data!

Who this course is for:

Data scientists looking to branch out to NLP
Business analysts who need to get insight from text data
Hobbyists who want to explore the interesting world of text analysis

Natural Language Processing in R for Beginners

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 2min

APIs with jsonlite6 lectures • 11min

Twitter Data with rtweet4 lectures • 12min

Web Scraping with rvest4 lectures • 11min

Getting Wikipedia Data with getwiki4 lectures • 9min

Regex and Stringr6 lectures • 20min

Preparing Text Data with Tidytext7 lectures • 18min

Visualize Text Data2 lectures • 10min

Working in tm4 lectures • 16min

Term Frequency - Inverse Document Frequency (TF-IDF)5 lectures • 13min

Requirements

Description

Who this course is for: