
Explore hands-on text mining and natural language processing in R for data science, using real-world social media data to perform sentiment analysis, machine learning, and unstructured text insights.
Conclude section one by reiterating text mining goals and prerequisites, then outline reading data from diverse sources and installing R and RStudio with attached code.
Learn to read online csv data into R, handle metadata, skip top rows, set headers, and access data frames before indexing and subsetting.
Read json data in r, parse world bank json files, and extract country ids and iso codes using lapply and custom functions to access regional attributes.
Scrape IMDB pages with rvest and the Selecter gadget to extract rankings, titles, runtimes, and genres, build a data frame, and gain insights from 2017 film data.
Discover another way to access and inspect elements on a dynamic New Zealand tourism page, inspecting popular cities via linked HTML pages.
Use the Guardian API in R to extract headlines and text on a topic, register for a developer key, and clean the data by removing encodings, links, and HTML tags.
Learn to extract data from Twitter in R by configuring credentials, connecting to the Twitter API, and retrieving tweets with date ranges and geolocation such as the Barcelona area.
Learn to extract tweets with the rtweet package in R by authenticating with app keys and querying English tweets on a topic for text mining.
Learn how to extract geolocated tweets with rtweet, stream London tweets for 60 seconds, convert to a data frame, and analyze text, coordinates, and trends.
Authenticate with the tweet package, search 500 users tweeting the hashtag, and plot the top locations from the location column, notably Washington, D.C.
Register your app on Foursquare developers, obtain a client id and client secret, install the orkun package from GitHub, and authenticate to access venue tips via the API.
Learn to extract venue reviews and check-ins from the Foursquare API using R, focusing on Indian restaurants in Copenhagen, and analyze user tips and comments.
Explore tweet data from Hillary Clinton and Donald Trump using a preprocessed dataset to reveal original versus retweeted content and reply activity.
Explore tidytext basics by converting Jane Austen texts to tidy data, tokenizing into words or sentences with unnest_tokens, detecting chapters via regex, and examining Pride and Prejudice as an example.
Explore and visualize text from Pride and Prejudice using tidytext in R, including tokenization, stop-word removal, word clouds, and sentiment analysis with the bing lexicon.
Explore how word clouds visualize tweet sentiments about India's 2016 demonetization, using pre-processing, corpus construction, and frequency-based sentiment analysis in R.
Create word clouds with quanteda by building a corpus and document frequency matrix, then clean text with stemming and stop-word removal for immigration manifestos and tweets.
Learn to compute word frequency from text data in R, cleaning and converting to a corpus, then analyze Twitter data with API keys and plot the most frequent terms.
Explore text polarity by calculating negative, neutral, and positive content with the cued up library on Mugabe tweets, revealing overall negativity and extracting positive and negative keywords.
Download four public-domain novels, tokenize chapters, and build a four-topic lda model in R to distinguish the books by themes.
Cluster tweets with quanteda by building a document-feature matrix, selecting the top 50 words via idf, and performing hierarchical clustering to reveal word groupings like fake news and media.
Apply supervised classification in R to distinguish ham from spam emails, building a document-term matrix from cleaned text and achieving about 94 percent accuracy.
Use Artex tools to create a document matrix from email text and classify ham versus spam with a linear support vector machine. Split data 75/25 and predict unseen emails.
Explore multiclass classification on text data using R, including encoding removal, tokenization, TF-IDF vectorization, and training a support vector machine with tenfold cross-validation and evaluation via a confusion matrix.
Extract 250 tweets for a hashtag using Social Media Lab and Margaretha, build an actor network and semantic term network, identify communities and clusters, and visualize user-hashtag associations.
Posit lets you deploy and share data science projects from your browser, with no install, using RStudio or Jupyter, and supports R or Python apps like Shiny, Streamlit, and Dash.
Do You Want to Gain an Edge by Gleaning Novel Insights from Social Media?
Do You Want to Harness the Power of Unstructured Text and Social Media to Predict Trends?
Over the past decade there has been an explosion in social media sites and now sites like Facebook and Twitter are used for everything from sharing information to distributing news. Social media both captures and sets trends. Mining unstructured text data and social media is the latest frontier of machine learning and data science.
LEARN FROM AN EXPERT DATA SCIENTIST WITH +5 YEARS OF EXPERIENCE:
My name is Minerva Singh and I am an Oxford University MPhil (Geography and Environment) graduate. I recently finished a PhD at Cambridge University (Tropical Ecology and Conservation). I have several years of experience in analyzing real-life data from different sources using data science-related techniques and producing publications for international peer-reviewed journals. Unlike other courses out there, which focus on theory and outdated methods, this course will teach you practical techniques to harness the power of both text data and social media to build powerful predictive models. We will cover web-scraping, text mining and natural language processing along with mining social media sites like Twitter and Facebook for text data. Additionally, you will learn to apply both exploratory data analysis and machine learning techniques to gain actionable insights from text and social media data.
TAKE YOUR DATA SCIENCE CAREER TO THE NEXT LEVEL
BECOME AN EXPERT IN TEXT MINING & NATURAL LANGUAGE PROCESSING :
My course will help you implement the methods using real data obtained from different sources. Many courses use made-up data that does not empower students to implement R based data science in real life. After taking this course, you’ll easily use packages like the caret, dplyr to work with real data in R. You will also learn to use the common social media mining and natural language processing packages to extract insights from text data. I will even introduce you to some very important practical case studies - such as identifying important words in a text and predicting movie sentiments based on textual reviews. You will also extract tweets pertaining to trending topics analyze their underlying sentiments and identify topics with Latent Dirichlet allocation. With this Powerful course, you’ll know it all: extracting text data from websites, extracting data from social media sites and carrying out analysis of these using visualization, stats, machine learning, and deep learning!
Start analyzing data for your own projects, whatever your skill level and Impress your potential employers with actual examples of your data science projects.
HERE IS WHAT YOU WILL GET:
Data Structures and Reading in R, including CSV, Excel, JSON, HTML data.
Web-Scraping using R
Extracting text data from Twitter and Facebook using APIs
Extract and clean data from the FourSquare app
Exploratory data analysis of textual data
Common Natural Language Processing techniques such as sentiment analysis and topic modelling
Implement machine learning techniques such as clustering, regression and classification on textual data
Network analysis
Plus you will apply your newly gained skills and complete a practical text analysis assignment
We will spend some time dealing with some of the theoretical concepts. However, the majority of the course will focus on implementing different techniques on real data and interpreting the results.
After each video, you will learn a new concept or technique which you may apply to your own projects.
All the data and code used in the course has been made available free of charge and you can use it as you like. You will also have access to additional lectures that are added in the future for FREE.
JOIN THE COURSE NOW!