
Set the working directory, read a CSV into R with read.table or read.csv, and inspect the first six rows showing date, open, high, low, and close.
learn to read text data in r with the scan function, define field types, skip headers, and handle fixed-width formats through practical stock data examples and data structure exploration.
Install and load the XLSX package, read the 2014 world economy Excel data, select key fields, inspect dimensions with dim, and write the filtered data to a new file.
Learn to connect to a database using Arjay DBC or MySQL, load drivers, authorize with credentials, list tables, and run select queries to retrieve data for analysis.
Learn to rename data variables in R by using download.file to fetch datasets from GitHub, loading them with read.csv, and updating column names with names, call names, and dimnames.
Learn to manipulate date data with a date package, convert data types, rename columns, and calculate employee age by computing intervals between birth date and hire date, using year-based results.
Learn to subset and filter data with square brackets, head and tail, and the subset function. Apply conditions to select rows and columns, including salary ranges and gender.
Drop data by excluding rows or columns with negative indices and the within function to remove unwanted attributes, discarding bad data during pre-processing.
Merge datasets by a common key with the merge function and plyr join, enabling left or right joins, then sort by salary and from date using sort and order.
Learn to subset and slice data with deployer, using filter, slice, and head on data frames and data tables, including multiple conditions with in and or operators.
Use sample_n to randomly slice rows and sample_frac to select a percentage from the original dataset. Learn to specify the weight parameter whose length matches the number of rows.
Learn to select specific columns with the dplyr select function, including quantity and price, exclude with minus, and use patterns such as starting with P.
Generate random samples from a population using R functions, control reproducibility with seed settings, and simulate experiments like lotteries, coin tosses, and dice to illustrate probability, distribution, and statistical inference.
Generate binomial random variates by simulating independent trials, compute the probability of exactly one six in ten dice rolls, and less than three sixes, then visualize with bar plots.
Simulate the stochastic process by modeling stock trading with random samples, Brownian motion, and price changes; visualize returns and explore performance analytics.
Learn to perform one-sample and two-sample t tests, decide between known or unknown standard deviation (Student's t with n<30), and interpret means using box plots with weight data.
Apply nonparametric Kolmogorov-Smirnov tests to compare a sample with a reference distribution or to compare two samples, using the empirical cumulative distribution function and KS test from the SAT package.
Learn to perform Wilcoxon rank sum and signed rank tests using the step package, analyzing Facebook likes, testing a median of 30, and comparing two fan pages.
identify and prune redundant association rules by comparing support and confidence thresholds, using superset and subset checks to reveal meaningful rules and remove duplicates.
Plot time series data using the plot.ts function, visualize trends and seasonal composition clearly, and compare multiple time series either in separate subfigures or a single figure with colors.
Select optimal p, d, q parameters and build an arima model from a time series using forecast and stats packages, then assess training accuracy and review the model summary.
Forecast future values with the forecast package in R, generate predictions from Arema/rhema model, summarize results, plot a line chart, and diagnose residuals with acf, box test, and tsdiag.
Scrape text from web pages and process it for analysis by reading text files and html content, extracting paragraphs with css selectors or xpath, and cleaning citations and spaces.
Explore cosine similarity to group documents by projecting them into a vector space, and apply latent semantic analysis using singular value decomposition to cluster themes.
Extract topics from a document corpus with latent Dirichlet allocation, modeling each document as a mix of topics and each topic by keywords, using a document-term matrix and Gibbs sampling.
Classify sentiment and mood in color transcripts using tidytext and Syuzhet, performing token extraction, stop-word removal, and per-call sentiment averages with AFINN and NRC lexicons.
If you are looking for that one course that includes everything about data analysis with R, this is it. Let’s get on this data analysis journey together.
This course is a blend of text, videos, code examples, and assessments, which together makes your learning journey all the more exciting and truly rewarding. It includes sections that form a sequential flow of concepts covering a focused learning path presented in a modular manner. This helps you learn a range of topics at your own speed and also move towards your goal of solving data analysis problems with R.
The R language is a powerful open source functional programming language. R is becoming the go-to tool for data scientists and analysts. Its growing popularity is due to its open source nature and extensive development community. R is increasingly being used by experienced data science professionals instead of Python and it will remain the top choice for data scientists in 2017. Big companies continue to use R for their data science needs and this course will make you ready for when these opportunities come your way.
This course has been prepared using extensive research and curation skills. Each section adds to the skills learned and helps us to achieve mastery of data analysis. Every section is modular and can be used as a standalone resource.
This course has been designed to include topics on every possible requirement from a data scientist and it does so in a step-by-step and practical manner. This course covers step-by-step and practical solutions to data analysis using R. It covers every required topic and also adds an introduction to machine learning.
We will start off with learning how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation will be provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We will then understand how easily R can confront probability and statistics problems and look at R instructions to quickly organize and manipulate large datasets. We will then learn to predict user purchase behavior by adopting a classification approach and implement data mining techniques to discover items that are frequently purchased together. Finally, we will offer insight into time series analysis on financial data, after which there will be detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.
This course has been authored by some of the best in their fields:
Yu-Wei, Chiu (David Chiu)
Yu-Wei, Chiu (David Chiu) is the founder of LargitData, a start-up company that mainly focuses on providing big data and machine learning products. He specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences.
Selva Prabhakaran
Selva Prabhakaran is a data scientist with a large E-commerce organization. In his 7 years of experience in data science, he has tackled complex real-world data science problems and delivered production-grade solutions for top multinational companies.
Tony Fischetti
Tony Fischetti is a data scientist at College Factual, where he gets to use R everyday to build personalized rankings and recommender systems.
Viswa Viswanathan
Viswa Viswanathan is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. In addition to teaching at the university, Viswa has conducted training programs for industry professionals. He has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education.
Shanthi Viswanathan
Shanthi Viswanathan is an experienced technologist who as a consultant, has helped several large organizations, such as Canon, Cisco, Celgene, Amway, Time Warner Cable, and GE among others, in areas such as data architecture and analytics, master data management, service-oriented architecture, business process management, and modeling.
Romeo Kienzler
Romeo Kienzler is the Chief Data Scientist of the IBM Watson IoT Division and working as an Advisory Architect helping client worldwide to solve their data analysis problems. His current research focus is on cloud-scale data mining using open source technologies including R, ApacheSpark, SystemML, ApacheFlink, and DeepLearning4J.
This course is a blend of text, videos, and assessments, all packaged together keeping your journey in mind. It combines some of the best that Packt has to offer in one complete package. It includes content from the following Packt products: