Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

R: Complete Data Analysis Solutions

Name: R: Complete Data Analysis Solutions
Rating: 3.0 (14 reviews)

Learn by doing - solve real-world data analysis problems using the most popular R packages

Created byPackt Publishing

Last updated 7/2020

English

What you'll learn

Extract, transform, and load data from heterogeneous sources
Understand how easily R can confront probability and statistics problems
Get simple R instructions to quickly organize and manipulate large datasets
Predict user purchase behavior by adopting a classification approach
Implement data mining techniques to discover items that are frequently purchased together
Group similar text documents by using various clustering methods

Course content

13 sections • 109 lectures • 5h 21m total length

About the course4:44
Downloading open data1:50
Reading and writing CSV files1:10
Set the working directory, read a CSV into R with read.table or read.csv, and inspect the first six rows showing date, open, high, low, and close.
Scanning text files2:18
learn to read text data in r with the scan function, define field types, skip headers, and handle fixed-width formats through practical stock data examples and data structure exploration.
Working with Excel files1:52
Install and load the XLSX package, read the 2014 world economy Excel data, select key fields, inspect dimensions with dim, and write the filtered data to a new file.
Reading data from databases4:00
Learn to connect to a database using Arjay DBC or MySQL, load drivers, authorize with credentials, list tables, and run select queries to retrieve data for analysis.
Scraping web data4:59
Accessing Facebook data3:06
Working with Twitter1:48
Test Your Knowledge

Renaming the data variable2:01
Learn to rename data variables in R by using download.file to fetch datasets from GitHub, loading them with read.csv, and updating column names with names, call names, and dimnames.
Converting data types2:31
Working with the date format2:47
Learn to manipulate date data with a date package, convert data types, rename columns, and calculate employee age by computing intervals between birth date and hire date, using year-based results.
Adding new records2:06
Filtering data3:26
Learn to subset and filter data with square brackets, head and tail, and the subset function. Apply conditions to select rows and columns, including salary ranges and gender.
Dropping data1:39
Drop data by excluding rows or columns with negative indices and the within function to remove unwanted attributes, discarding bad data during pre-processing.
Merging and sorting data3:57
Merge datasets by a common key with the merge function and plyr join, enabling left or right joins, then sort by salary and from date using sort and order.
Reshaping data2:39
Detecting missing data2:21
Imputing missing data2:29
Test Your Knowledge

Enhancing a data.frame with a data.table4:18
Managing data with a data.table4:46
Performing fast aggregation with a data.table1:59
Merging large datasets with a data.table2:34
Subsetting and slicing data with dplyr2:04
Learn to subset and slice data with deployer, using filter, slice, and head on data frames and data tables, including multiple conditions with in and or operators.
Sampling data with dplyr1:21
Use sample_n to randomly slice rows and sample_frac to select a percentage from the original dataset. Learn to specify the weight parameter whose length matches the number of rows.
Selecting columns with dplyr2:22
Learn to select specific columns with the dplyr select function, including quantity and price, exclude with minus, and use patterns such as starting with P.
Chaining operations in dplyr2:04
Arranging rows with dplyr1:10
Eliminating duplicated rows with dplyr1:03
Adding new columns with dplyr1:02
Summarizing data with dplyr1:09
Merging data with dplyr1:37
Test Your Knowledge

Generating random samples2:25
Generate random samples from a population using R functions, control reproducibility with seed settings, and simulate experiments like lotteries, coin tosses, and dice to illustrate probability, distribution, and statistical inference.
Understanding uniform distributions1:35
Generating binomial random variates2:27
Generate binomial random variates by simulating independent trials, compute the probability of exactly one six in ten dice rolls, and less than three sixes, then visualize with bar plots.
Generating Poisson random variates2:03
Sampling from a normal distribution4:58
Sampling from a chi-squared distribution1:51
Understanding Student's t-distribution1:53
Sampling from a dataset1:48
Simulating the stochastic process2:04
Simulate the stochastic process by modeling stock trading with random samples, Brownian motion, and price changes; visualize returns and explore performance analytics.
Test Your Knowledge

Getting confidence intervals5:06
Performing Z-tests3:07
Performing student's T-tests2:11
Learn to perform one-sample and two-sample t tests, decide between known or unknown standard deviation (Student's t with n<30), and interpret means using box plots with weight data.
Conducting exact binomial tests2:10
Performing Kolmogorov-Smirnov tests2:05
Apply nonparametric Kolmogorov-Smirnov tests to compare a sample with a reference distribution or to compare two samples, using the empirical cumulative distribution function and KS test from the SAT package.
Working with the Pearson's chi-squared tests1:32
Understanding the Wilcoxon Rank Sum and Signed Rank tests1:42
Learn to perform Wilcoxon rank sum and signed rank tests using the step package, analyzing Facebook likes, testing a median of 30, and comparing two fan pages.
Conducting one-way ANOVA3:03
Performing two-way ANOVA2:20
Test Your Knowledge

Transforming data into transactions4:25
Displaying transactions and associations2:25
Mining associations with the Apriori rule3:36
Pruning redundant rules2:02
identify and prune redundant association rules by comparing support and confidence thresholds, using superset and subset checks to reveal meaningful rules and remove duplicates.
Visualizing association rules2:11
Mining frequent itemsets with Eclat2:31
Creating transactions with temporal information2:10
Mining frequent sequential patterns with cSPADE2:29
Test Your Knowledge

Creating time series data4:03
Plotting a time series object2:03
Plot time series data using the plot.ts function, visualize trends and seasonal composition clearly, and compare multiple time series either in separate subfigures or a single figure with colors.
Decomposing a time series2:16
Smoothing a time series3:53
Forecasting a time series3:15
Selecting an ARIMA model2:55
Creating an ARIMA model2:00
Select optimal p, d, q parameters and build an arima model from a time series using forecast and stats packages, then assess training accuracy and review the model summary.
Forecasting with an ARIMA model1:53
Forecast future values with the forecast package in R, generate predictions from Arema/rhema model, summarize results, plot a line chart, and diagnose residuals with acf, box test, and tsdiag.
Predicting stock prices with an ARIMA model3:31
Test Your Knowledge

Scraping web pages and processing texts8:19
Scrape text from web pages and process it for analysis by reading text files and html content, extracting paragraphs with css selectors or xpath, and cleaning citations and spaces.
Corpus, TDM, TF-IDF, and word cloud8:58
Cosine similarity and Latent Semantic Analysis7:03
Explore cosine similarity to group documents by projecting them into a vector space, and apply latent semantic analysis using singular value decomposition to cluster themes.
Extracting topics with Latent Dirichlet Allocation5:01
Extract topics from a document corpus with latent Dirichlet allocation, modeling each document as a mix of topics and each topic by keywords, using a document-term matrix and Gibbs sampling.
Sentiment scoring with tidytext and Syuzhet4:06
Classify sentiment and mood in color transcripts using tidytext and Syuzhet, performing token extraction, stop-word removal, and per-call sentiment averages with AFINN and NRC lexicons.
Classifying texts with RTextTools3:08
Test Your Knowledge

Requirements

You are expected to know basics of R programming. You should have R installed on your system and your system should be connected to the Internet. That’s all really!

Description

If you are looking for that one course that includes everything about data analysis with R, this is it. Let’s get on this data analysis journey together.

This course is a blend of text, videos, code examples, and assessments, which together makes your learning journey all the more exciting and truly rewarding. It includes sections that form a sequential flow of concepts covering a focused learning path presented in a modular manner. This helps you learn a range of topics at your own speed and also move towards your goal of solving data analysis problems with R.

The R language is a powerful open source functional programming language. R is becoming the go-to tool for data scientists and analysts. Its growing popularity is due to its open source nature and extensive development community. R is increasingly being used by experienced data science professionals instead of Python and it will remain the top choice for data scientists in 2017. Big companies continue to use R for their data science needs and this course will make you ready for when these opportunities come your way.

This course has been prepared using extensive research and curation skills. Each section adds to the skills learned and helps us to achieve mastery of data analysis. Every section is modular and can be used as a standalone resource.

This course has been designed to include topics on every possible requirement from a data scientist and it does so in a step-by-step and practical manner. This course covers step-by-step and practical solutions to data analysis using R. It covers every required topic and also adds an introduction to machine learning.

We will start off with learning how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation will be provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We will then understand how easily R can confront probability and statistics problems and look at R instructions to quickly organize and manipulate large datasets. We will then learn to predict user purchase behavior by adopting a classification approach and implement data mining techniques to discover items that are frequently purchased together. Finally, we will offer insight into time series analysis on financial data, after which there will be detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

This course has been authored by some of the best in their fields:

Yu-Wei, Chiu (David Chiu)

Yu-Wei, Chiu (David Chiu) is the founder of LargitData, a start-up company that mainly focuses on providing big data and machine learning products. He specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences.

Selva Prabhakaran

Selva Prabhakaran is a data scientist with a large E-commerce organization. In his 7 years of experience in data science, he has tackled complex real-world data science problems and delivered production-grade solutions for top multinational companies.

Tony Fischetti

Tony Fischetti is a data scientist at College Factual, where he gets to use R everyday to build personalized rankings and recommender systems.

Viswa Viswanathan

Viswa Viswanathan is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. In addition to teaching at the university, Viswa has conducted training programs for industry professionals. He has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education.

Shanthi Viswanathan

Shanthi Viswanathan is an experienced technologist who as a consultant, has helped several large organizations, such as Canon, Cisco, Celgene, Amway, Time Warner Cable, and GE among others, in areas such as data architecture and analytics, master data management, service-oriented architecture, business process management, and modeling.

Romeo Kienzler

Romeo Kienzler is the Chief Data Scientist of the IBM Watson IoT Division and working as an Advisory Architect helping client worldwide to solve their data analysis problems. His current research focus is on cloud-scale data mining using open source technologies including R, ApacheSpark, SystemML, ApacheFlink, and DeepLearning4J.

This course is a blend of text, videos, and assessments, all packaged together keeping your journey in mind. It combines some of the best that Packt has to offer in one complete package. It includes content from the following Packt products:

R for Data Science Cookbook by Yu-Wei, Chiu (David Chiu)
R for Data Science Solutions [video] by Yu-Wei, Chiu (David Chiu)
Mastering R Programming [video] by Selva Prabhakaran
Data Analysis with R by Tony Fischetti
R Data Analysis Cookbook by Viswa Viswanathan and Shanthi Viswanathan
Learning Data Mining with R [video] by Romeo Kienzler

Who this course is for:

This course is useful whether someone is a hobbyist, analyst, an aspiring or professional data scientist, or even learning data analysis for the first time. Those already familiar with the basics of R, but want to learn to efficiently analyze real-world data problems will also find this course a match for their needs.

R: Complete Data Analysis Solutions

What you'll learn

Explore related topics

Course content

Data Extracting, Transforming, and Loading9 lectures • 26min

Data Preprocessing and Preparation10 lectures • 26min

Data Manipulation13 lectures • 27min

Simulation from Probability Distributions9 lectures • 21min

Statistical Inference in R9 lectures • 23min

Rule and Pattern Mining with R8 lectures • 22min

Time Series Mining with R9 lectures • 26min

Text Analytics In-depth6 lectures • 37min

Sources of Data5 lectures • 18min

Let's Do A Project: Social Network Analysis4 lectures • 22min

Requirements

Description

Who this course is for: