Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Data Science:Data Mining & Natural Language Processing in R

Harness the Power of Machine Learning in R for Data/Text Mining, & Natural Language Processing with Practical Examples

Created byMinerva Singh

Last updated 11/2025

English

What you'll learn

Perform the most important pre-processing tasks needed prior to machine learning in R
Carry out data visualization in R
Use machine learning for unsupervised classification in R
Carry out supervised learning by building classification and regression models in R
Evaluate the accuracy of supervised machine learning algorithms and compare their performance in R
Carry out sentiment analysis using text data in R

Course content

15 sections • 113 lectures • 13h 18m total length

Introduction4:58
Data and Scripts For the Course0:04
Introduction to R and RStudio6:36
Install R and RStudio on Windows, Mac, or Linux using versions 3.3 or 3.4. Learn to create HTML reports with R Markdown for reproducible analyses and manage packages with library.
Start with Rattle6:30
Explore data science tasks in R using the Rattle GUI, reading data from multiple sources, summarizing variables, visualizing with ggplot, and performing k-means clustering, modeling, and evaluation.
Troubleshooting For Rattle0:10
Conclusion to Section 11:34

Read in Data from CSV and Excel Files9:56
Read csv and txt data into R and RStudio using read.csv and read.table, set the working directory, and import excel files with read.excel after installing the readxl package.
Read Data from a Database8:23
Read Data from JSON5:28
Read in Data from Online CSVs4:04
Read in Data from Online HTML Tables-Part 14:13
Read in Data from Online HTML Tables-Part 26:24
Read Data from Other Sources2:13
Conclusions to Section 22:20

Remove NAs17:12
More Data Cleaning8:05
Exploratory Data Analysis(EDA): Basic Visualizations with R18:53
Explore exploratory data analysis in R by visualizing distributions with histograms and box plots, and relationships with scatter plots, using iris and MP Cars data; learn ggplot2 basics.
More Exploratory Data Analysis with xda4:16
Introduction to dplyr for Data Summarizing-Part 16:11
Introduction to dplyr for Data Summarizing-Part 24:44
Data Exploration & Visualization With dplyr & ggplot26:07
Learn to use dplyr and ggplot to explore the corruption perception index with real data, creating a 2016 bar plot that shows top and bottom countries in blue and red.
Pre-Processing Dates-Part 17:33
Pre-Processing Dates-Part 28:28
Plotting Temporal Data in R12:35
Twist in the (Temporal) Data8:56
Associations Between Quantitative Variables- Theory3:43
Testing for Correlation19:50
Evaluate the Relation Between Nominal Variables6:14
Explore chi-square tests for independence on nominal data, build contingency tables from survey and student datasets, interpret p-values, and measure association with Phi and Cramer's V.
Cramer's V for Examining the Strength of Association Between Nominal Variable3:35
Section 3 Quiz

Dimensionality Reduction-theory3:17
PCA13:10
Removing Highly Correlated Predictor Variables16:42
Identify multicollinearity and remove highly correlated predictors with a 0.7 cutoff using carrot package, then validate via variance inflation factor in regression on Boston housing data.
Variable Selection Using LASSO Regression3:42
Select influential predictors for the Boston housing data by applying lasso regression with 10-fold cross-validation, scaling and centering, and discarding zero-coefficient variables like indus and age.
Variable Selection With FSelector13:35
Boruta Analysis for Feature Selection4:51
Explore Boruta feature selection to identify predictors for malignant versus benign tumors using Boruta package and Random Forest, running 101 iterations to select 28 of 32 predictors in cancer_tumor dataset.
Conclusions to Section 71:39
Section 7 Quiz

Binary Classification0:09
What are GLMs?5:25
Logistic Regression Models as Binary Classifiers9:10
Introduces logistic regression for binary response variables using a real-life voice dataset and the caret package, covering 75/25 train-test split, 10-fold cross-validation, odds ratios, and 97% accuracy.
Linear Discriminant Analysis (LDA)12:55
Binary Classifier with PCA15:44
Obtain Binary Classification Accuracy Metrics8:18
Explore binary classification accuracy beyond overall accuracy by using confusion matrices, sensitivity, specificity, and ROC curves with AUC calculations to evaluate model performance.
Multi-class Classification Models0:08
Our Multi-class Classification Problem6:13
Classification Trees11:55
More on Classification Tree Visualization9:20
Decision Trees8:39
Random Forest (RF) classification8:15
Examine Individual Variable Importance for Random Forests3:53
Explore how random forest models reveal individual variable influence, using a partial plot to show how past due days affects loan status outcomes such as paid off or collection.
GBM Classification7:50
Demonstrates building a gbm classifier for loan status with tenfold cross-validation and caret tuning, using 50 trees, depth 2, shrinkage 0.1, achieving 97% unseen accuracy; past due days is dominant.
Support Vector Machines (SVM) for Classification3:55
Explore support vector machines for classification using the diamonds dataset to predict cut, compare linear, polynomial, and radial kernels, and assess performance with a 75/25 split and tenfold cross-validation.
More SVM for Classification3:42
Explore support vector machine classification with a different R package beyond caret, using ksvm with linear and RBF kernels on the diamonds data, and compare accuracy with a confusion matrix.
Conclusions to Section 91:59
Section 9 Quiz

Requirements

Keen interest in learning about data science and data mining
Keen interest in mining and deriving insights from text data
Should have prior experience of using R and RStudio
Should be able to install and read in packages in R
Prior exposure to the principles of statistical data analysis , data visualization and summarizing in R will be beneficial but not necessary

Description

MASTER DATA SCIENCE, TEXT MINING AND NATURAL LANGUAGE PROCESSING IN R:

Learn to carry out pre-processing, visualization and machine learning tasks such as: clustering, classification and regression in R. You will be able to mine insights from text data and Twitter to give yourself & your company a competitive edge.

LEARN FROM AN EXPERT DATA SCIENTIST WITH +5 YEARS OF EXPERIENCE:

My name is Minerva Singh and I am an Oxford University MPhil (Geography and Environment) graduate. I recently finished a PhD at Cambridge University (Tropical Ecology and Conservation).
I have several years of experience in analyzing real life data from different sources using data science related techniques and producing publications for international peer reviewed journals. Over the course of my research I realized almost all the R data science courses and books out there do not account for the multidimensional nature of the topic and use data science interchangeably with machine learning.
This gives students an incomplete knowledge of the subject. Unlike other courses out there, we are not going to stop at machine learning. We will also cover data mining, web-scraping, text mining and natural language processing along with mining social media sites like Twitter and Facebook for text data.

NO PRIOR R OR STATISTICS/MACHINE LEARNING KNOWLEDGE IS REQUIRED:

You’ll start by absorbing the most valuable R Data Science basics and techniques. I use easy-to-understand, hands-on methods to simplify and address even the most difficult concepts in R.
My course will help you implement the methods using real data obtained from different sources. Many courses use made-up data that does not empower students to implement R based data science in real life. After taking this course, you’ll easily use packages like caret, dplyr to work with real data in R. You will also learn to use the common NLP packages to extract insights from text data.
I will even introduce you to some very important practical case studies - such as detecting loan repayment and tumor detection using machine learning. You will also extract tweets pertaining to trending topics and analyze their underlying sentiments and identify topics with Latent Dirichlet allocation. With this Powerful All-In-One R Data Science course, you’ll know it all: visualization, stats, machine learning, data mining, and neural networks!

The underlying motivation for the course is to ensure you can apply R based data science on real data into practice today. Start analyzing data for your own projects, whatever your skill level and Impress your potential employers with actual examples of your data science projects.

HERE IS WHAT YOU WILL GET:

(a) This course will take you from a basic level to performing some of the most common advanced data science techniques using the powerful R based tools.

(b) Equip you to use R to perform the different exploratory and visualization tasks for data modelling.

(c) Introduce you to some of the most important machine learning concepts in a practical manner such that you can apply these concepts for practical data analysis and interpretation. (d) You will get a strong understanding of some of the most important data mining, text mining and natural language processing techniques.

(e) & You will be able to decide which data science techniques are best suited to answer your research questions and applicable to your data and interpret the results.

More Specifically, here's what's covered in the course:

Getting started with R, R Studio and Rattle for implementing different data science techniques
Data Structures and Reading in Pandas, including CSV, Excel, JSON, HTML data.
How to Pre-Process and “Wrangle” your R data by removing NAs/No data, handling conditional data, grouping by attributes..etc
Creating data visualizations like histograms, boxplots, scatterplots, barplots, pie/line charts, and MORE
Statistical analysis, statistical inference, and the relationships between variables.
Machine Learning, Supervised Learning, & Unsupervised Learning in R
Neural Networks for Classification and Regression
Web-Scraping using R
Extracting text data from Twitter and Facebook using APIs
Text mining
Common Natural Language Processing techniques such as sentiment analysis and topic modelling

We will spend some time dealing with some of the theoretical concepts related to data science. However, majority of the course will focus on implementing different techniques on real data and interpret the results.

After each video you will learn a new concept or technique which you may apply to your own projects.

All the data and code used in the course has been made available free of charge and you can use it as you like. You will also have access to additional lectures that are added in the future for FREE.

JOIN THE COURSE NOW!

Who this course is for:

Students wishing to learn practical data science and machine learning in R
Students wishing to learn the underlying theory and application of data mining in R
Students interested in obtaining/mining data from sources such as Twiter
Students interested in pre-processing and visualizing real life data
Students wishing to analyze and derive insights from text data
Students interested in learning basic text mining and Natural Language Processing (NLP) in R

Data Science:Data Mining & Natural Language Processing in R

What you'll learn

Explore related topics

Course content

INTRODUCTION TO THE COURSE: The Key Concepts and Software Tools6 lectures • 20min

Reading in Data from Different Sources in R8 lectures • 43min

Exploratory Data Analysis and Data Visualization in R15 lectures • 2hr 16min

Data Mining for Patterns and Relationships6 lectures • 37min

Machine Learning for Data Science2 lectures • 11min

Unsupervised Classification- R7 lectures • 1hr 4min

Dimension Reduction7 lectures • 57min

Supervised Learning Theory2 lectures • 14min

Supervised Learning: Classification17 lectures • 1hr 58min

Supervised Learning: Regression10 lectures • 1hr 14min

Requirements

Description

Who this course is for: