
Install R and RStudio on Windows, Mac, or Linux using versions 3.3 or 3.4. Learn to create HTML reports with R Markdown for reproducible analyses and manage packages with library.
Explore data science tasks in R using the Rattle GUI, reading data from multiple sources, summarizing variables, visualizing with ggplot, and performing k-means clustering, modeling, and evaluation.
Read csv and txt data into R and RStudio using read.csv and read.table, set the working directory, and import excel files with read.excel after installing the readxl package.
Explore exploratory data analysis in R by visualizing distributions with histograms and box plots, and relationships with scatter plots, using iris and MP Cars data; learn ggplot2 basics.
Learn to use dplyr and ggplot to explore the corruption perception index with real data, creating a 2016 bar plot that shows top and bottom countries in blue and red.
Explore chi-square tests for independence on nominal data, build contingency tables from survey and student datasets, interpret p-values, and measure association with Phi and Cramer's V.
Identify multicollinearity and remove highly correlated predictors with a 0.7 cutoff using carrot package, then validate via variance inflation factor in regression on Boston housing data.
Select influential predictors for the Boston housing data by applying lasso regression with 10-fold cross-validation, scaling and centering, and discarding zero-coefficient variables like indus and age.
Explore Boruta feature selection to identify predictors for malignant versus benign tumors using Boruta package and Random Forest, running 101 iterations to select 28 of 32 predictors in cancer_tumor dataset.
Explore basic supervised learning concepts in R, including KNN, SVM, random forests, gradient boosting, and logistic regression, with emphasis on model evaluation using AUC and cross-validation.
Introduces logistic regression for binary response variables using a real-life voice dataset and the caret package, covering 75/25 train-test split, 10-fold cross-validation, odds ratios, and 97% accuracy.
Explore binary classification accuracy beyond overall accuracy by using confusion matrices, sensitivity, specificity, and ROC curves with AUC calculations to evaluate model performance.
Explore how random forest models reveal individual variable influence, using a partial plot to show how past due days affects loan status outcomes such as paid off or collection.
Demonstrates building a gbm classifier for loan status with tenfold cross-validation and caret tuning, using 50 trees, depth 2, shrinkage 0.1, achieving 97% unseen accuracy; past due days is dominant.
Explore support vector machines for classification using the diamonds dataset to predict cut, compare linear, polynomial, and radial kernels, and assess performance with a 75/25 split and tenfold cross-validation.
Explore support vector machine classification with a different R package beyond caret, using ksvm with linear and RBF kernels on the diamonds data, and compare accuracy with a confusion matrix.
MASTER DATA SCIENCE, TEXT MINING AND NATURAL LANGUAGE PROCESSING IN R:
Learn to carry out pre-processing, visualization and machine learning tasks such as: clustering, classification and regression in R. You will be able to mine insights from text data and Twitter to give yourself & your company a competitive edge.
LEARN FROM AN EXPERT DATA SCIENTIST WITH +5 YEARS OF EXPERIENCE:
My name is Minerva Singh and I am an Oxford University MPhil (Geography and Environment) graduate. I recently finished a PhD at Cambridge University (Tropical Ecology and Conservation).
I have several years of experience in analyzing real life data from different sources using data science related techniques and producing publications for international peer reviewed journals. Over the course of my research I realized almost all the R data science courses and books out there do not account for the multidimensional nature of the topic and use data science interchangeably with machine learning.
This gives students an incomplete knowledge of the subject. Unlike other courses out there, we are not going to stop at machine learning. We will also cover data mining, web-scraping, text mining and natural language processing along with mining social media sites like Twitter and Facebook for text data.
NO PRIOR R OR STATISTICS/MACHINE LEARNING KNOWLEDGE IS REQUIRED:
You’ll start by absorbing the most valuable R Data Science basics and techniques. I use easy-to-understand, hands-on methods to simplify and address even the most difficult concepts in R.
My course will help you implement the methods using real data obtained from different sources. Many courses use made-up data that does not empower students to implement R based data science in real life. After taking this course, you’ll easily use packages like caret, dplyr to work with real data in R. You will also learn to use the common NLP packages to extract insights from text data.
I will even introduce you to some very important practical case studies - such as detecting loan repayment and tumor detection using machine learning. You will also extract tweets pertaining to trending topics and analyze their underlying sentiments and identify topics with Latent Dirichlet allocation. With this Powerful All-In-One R Data Science course, you’ll know it all: visualization, stats, machine learning, data mining, and neural networks!
The underlying motivation for the course is to ensure you can apply R based data science on real data into practice today. Start analyzing data for your own projects, whatever your skill level and Impress your potential employers with actual examples of your data science projects.
HERE IS WHAT YOU WILL GET:
(a) This course will take you from a basic level to performing some of the most common advanced data science techniques using the powerful R based tools.
(b) Equip you to use R to perform the different exploratory and visualization tasks for data modelling.
(c) Introduce you to some of the most important machine learning concepts in a practical manner such that you can apply these concepts for practical data analysis and interpretation. (d) You will get a strong understanding of some of the most important data mining, text mining and natural language processing techniques.
(e) & You will be able to decide which data science techniques are best suited to answer your research questions and applicable to your data and interpret the results.
More Specifically, here's what's covered in the course:
Getting started with R, R Studio and Rattle for implementing different data science techniques
Data Structures and Reading in Pandas, including CSV, Excel, JSON, HTML data.
How to Pre-Process and “Wrangle” your R data by removing NAs/No data, handling conditional data, grouping by attributes..etc
Creating data visualizations like histograms, boxplots, scatterplots, barplots, pie/line charts, and MORE
Statistical analysis, statistical inference, and the relationships between variables.
Machine Learning, Supervised Learning, & Unsupervised Learning in R
Neural Networks for Classification and Regression
Web-Scraping using R
Extracting text data from Twitter and Facebook using APIs
Text mining
Common Natural Language Processing techniques such as sentiment analysis and topic modelling
We will spend some time dealing with some of the theoretical concepts related to data science. However, majority of the course will focus on implementing different techniques on real data and interpret the results.
After each video you will learn a new concept or technique which you may apply to your own projects.
All the data and code used in the course has been made available free of charge and you can use it as you like. You will also have access to additional lectures that are added in the future for FREE.
JOIN THE COURSE NOW!