
Explore interactive data input with scan and the wrappers read.table and read.csv, including keyboard entry, console focus, and distinguishing headers and delimiters for file reading.
Master preprocessing and visualization in R by cleaning the workspace, managing memory with gc, and exploring large birth data to plot births by day of week and delivery type.
Learn to preprocess birth data and visualize it with lattice graphics in R, using bar charts, histograms, density plots, and conditioned panels by plurality and delivery type.
Preprocess and visualize birth data using box plots, violin plots, and level plots to reveal distributions, Apgar scores, and gestation relationships.
Explains how to build and prune a body fat decision tree with the rpart package, using minsplit and cp controls, and evaluating pruning via the CP table.
Apply regression and generalized linear modeling to the Hart data set through detailed script-based examples, with exercises in PTF form after reviewing lessons and videos.
Model heart data as a proportion using a binomial glm in R, with counts of heart attacks and non-attacks, and compare models via residual deviance, diagnostic plots, and chi-square tests.
Explore Poisson GLM modeling of AIDS case counts over time, diagnose fit with plots, and improve the model by adding a quadratic term, then compare via deviance and ANOVA.
Explore k-means clustering on exoplanet data, using mass, periodicity, and eccentricity, visualized in a three-dimensional scatterplot with range normalization to form and compare three candidate clusters.
Explore density-based and hierarchical agglomerative clustering materials for section 7, including Becher 2013, Kamins cluster analysis, Tamm residuals, MDA multivariate analysis, f pci-e, scripts, and slides.
Explore density-based clustering with dbscan in R using the fpc package, focusing on reachability distance and min points, identifying noise, and visualizing results on the iris dataset.
Explore section 8's final cluster analysis examples, tackle the residual analysis exercise, and compare standardization by standard deviation versus by range using crime rate data.
This is a "hands-on" business analytics, or data analytics course teaching how to use the popular, no-cost R software to perform dozens of data mining tasks using real data and data mining cases. It teaches critical data analysis, data mining, and predictive analytics skills, including data exploration, data visualization, and data mining skills using one of the most popular business analytics software suites used in industry and government today. The course is structured as a series of dozens of demonstrations of how to perform classification and predictive data mining tasks, including building classification trees, building and training decision trees, using random forests, linear modeling, regression, generalized linear modeling, logistic regression, and many different cluster analysis techniques. The course also trains and instructs on "best practices" for using R software, teaching and demonstrating how to install R software and RStudio, the characteristics of the basic data types and structures in R, as well as how to input data into an R session from the keyboard, from user prompts, or by importing files stored on a computer's hard drive. All software, slides, data, and R scripts that are performed in the dozens of case-based demonstration video lessons are included in the course materials so students can "take them home" and apply them to their own unique data analysis and mining cases. There are also "hands-on" exercises to perform in each course section to reinforce the learning process. The target audience for the course includes undergraduate and graduate students seeking to acquire employable data analytics skills, as well as practicing predictive analytics professionals seeking to expand their repertoire of data analysis and data mining knowledge and capabilities.