Learning Path: R: Real-World Data Mining With R
2.7 (3 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
48 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Learning Path: R: Real-World Data Mining With R to your Wishlist.

Add to Wishlist

Learning Path: R: Real-World Data Mining With R

Learn data mining with R using real-world dataset analysis techniques and discover the versatility of R
2.7 (3 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
48 students enrolled
Created by Packt Publishing
Last updated 4/2017
Curiosity Sale
Current price: $10 Original price: $200 Discount: 95% off
30-Day Money-Back Guarantee
  • 7 hours on-demand video
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Get to know the basic concepts of R: the data frame and data manipulation
  • Work with complex data sets and understand how to process data sets
  • Explore graphs and the statistical measure in graphs
  • Apply data management steps to handle large datasets
  • Implement various dimension reduction techniques to handle large datasets
  • Create predictive models in order to build a recommendation engine
  • Acquire knowledge about the neural network concept drawn from computer science and its applications in data mining
View Curriculum
  • Requires basic knowledge of R.

Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before 

Data mining is a growing demand on the market as the world is generating data at an increasing pace. R is a popular programming language for statistics. It is very useful for day-to-day data analysis tasks.

Data mining is a very broad topic and takes some time to learn. This Learning Path will help you to understand the mathematical basics quickly, and then you can directly apply what you’ve learned in R. This Learning Path explores data mining techniques, showing you how to apply different mining concepts to various statistical and data applications in a wide range of fields.

This Learning Path is the complete learning process for data-happy people. We begin with a thorough introduction to data mining and how R makes it easy with its many packages. We then move on to exploring data mining techniques, showing you how to apply different mining concepts to various statistical and data applications in a wide range of fields using R’s vast set of algorithms.

The goal of this Learning Path is to help you understand the basics of data mining with R and then get you working on real-world datasets and projects.

This Learning Path is authored by some of the best in their fields.

Romeo Kienzler

Romeo Kienzler is the Chief Data Scientist of the IBM Watson IoT Division and working as an Advisory Architect helping client worldwide to solve their data analysis problems.

He holds an M. Sc. of Information System, Bioinformatics and Applied Statistics from the Swiss Federal Institute of Technology. He works as an Associate Professor for data mining at a Swiss University and his current research focus is on cloud-scale data mining using open source technologies including R, ApacheSpark, SystemML, ApacheFlink, and DeepLearning4J. He also contributes to various open source projects. Additionally, he is currently writing a chapter on Hyperledger for a book on Blockchain technologies.

Pradeepta Mishra

Pradeepta Mishra is a data scientist, predictive modeling expert, deep learning and machine learning practitioner, and econometrician. He currently leads the data science and machine learning practice for Ma Foi Analytics, Bangalore, India. Ma Foi Analytics is an advanced analytics provider for Tomorrow's Cognitive Insights Ecology, using a combination of cutting-edge artificial intelligence, a proprietary big data platform, and data science expertise. He holds a patent for enhancing the planogram design for the retail industry. Pradeepta has published and presented research papers at IIM Ahmedabad, India. He is a visiting faculty member at various leading B-schools and regularly gives talks on data science and machine learning.

Pradeepta has spent more than 10 years solving various projects relating to classification, regression, pattern recognition, time series forecasting, and unstructured data analysis using text mining procedures, spanning across domains such as healthcare, insurance, retail and e-commerce, manufacturing, and so on.

Who is the target audience?
  • This course is ideal for data analysts from novice to intermediate level. You should have prior knowledge of basic statistics and some programming language experience in any tool or platform. Familiarity with R will be an added advantage.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
78 Lectures
Learning Data Mining with R
30 Lectures 02:16:54

This video will provide you an overview of entire course.

Preview 03:30

The aim of this video is to show how easy it is to use R for data mining. On the other hand, the expectations are set because R is sometimes a bit hard to learn—especially for programmers.

Getting Started with R

You have to accept that most of your work will involve data cleansing, which is one of the most important steps in data mining. Fortunately, R has all the tools in place to do this task as elegantly as possible.

Data Preparation and Data Cleansing

The aim of this video is to explain the basic concepts of R. This is exemplified by showing how easy it is to load data in R. Get an idea about how this is done in most of the cases as well as for some special cases such as databases and big data technologies.

The Basic Concepts of R

This video gives an overview of the data frame object, which is an essential part of R and part of every analysis. You will learn what a data frame is and how to use it for data manipulation.

Data Frames and Data Manipulation

We want to explain that data is nothing but points in a multidimensional vector space exemplified by an example.

Preview 03:59

Points in a multidimensional vector space can be drawn and analyzed by introducing k-means—the simplest of the clustering algorithms.

An Algorithmic Approach to Find Hidden Patterns in Data

Coming from a hard-to-understand dataset, process and visualize it to gain insights.

A Real-world Life Science Example

The aim of this video is to show how powerful R is as a data language. We will query an internal example dataset and show how it can be filtered and aggregated on.

Preview 04:00

The aim of this video is to show how powerful R is as a data language. Now we concentrate on data types.

R Data Types

Next, we concentrate on functions and indexing.

R Functions and Indexing

The aim of this video is to show how object-oriented programming is done in R since some of the algorithms covered rely on it.

S3 Versus S4 – Object-oriented Programming in R

The aim of this video is to show a little example to motivate the attendee based on the standard market basket analysis.

Preview 03:09

The aim of this video is to explain the mathematical structure "graph".

Introduction to Graphs

The aim of this video is to explain the different types of association rules.

Different Association Types

The aim of this video is to explain the Apriori Algorithm.

The Apriori Algorithm

The aim of this video is to explain the Eclat Algorithm.

The Eclat Algorithm

The aim of this video is to explain the FP-Growth Algorithm.

The FP-Growth Algorithm

This video introduces the discipline of classification, the mathematical foundation for understanding Bayes' theorem and the Naïve Bayes classifier.

Preview 06:00

Now since we've understood Bayes' theorem, we can derive the Bayes classifier and use naïve Bayes for spam classification in R.

The Naive Bayes Classifier

This is a practical example of using naïve Bayes for spam classification in R

Spam Classification with Naïve Bayes

Introduction to support vector machines, understanding how to use them to separate points in multidimensional vector spaces, and finally using kernels in non-linearly separable data

Support Vector Machines

Introduction to lazy learning using k-nearest neighbors. This video explains how KNNs work and how they are applied in R.

K-nearest Neighbors

This video introduces the discipline of hierarchical clustering.

Preview 05:44

This video introduces the discipline of distribution based clustering.

Distribution-based Clustering

This video introduces the discipline of density based clustering.

Density-based Clustering

A practical example of using DBSCAN in R.

Using DBSCAN to Cluster Flowers Based on Spatial Properties

This video introduces neural networks.

Preview 06:09

This video shows an example in R—how to use the H2D deep learning framework for handwritten digit recognition (classification).

Using the H2O Deep Learning Framework

This video shows an example in R—how to use the H2D deep learning framework for anomaly detection of real-time Iot sensor data.

Real-time Cloud Based IoT Sensor Data Analysis
R Data Mining Projects
31 Lectures 03:19:39

This video provides an overview of the entire course.

Preview 03:52

The process of deciphering meaningful insights from existing databases and analyzing results for consumption by business users.

What Is Data Mining?

We are going to start with basic programming using R for data management and data manipulation.

Introduction to the R Programming Language

Changing one data type to another if the formatting is not done properly is not difficult at all using R.

Data Type Conversion

While working on a client dataset with a large number of observations, it is required to subset the data based on some selection criteria and with or without replacement-based sampling.

Sorting, Merging, Indexing, and Subsetting Dataframes

The date functions return a Date class that represents the number of days since January 1, 1970.

Date and Time Formatting

There are two different types of functions in R, user-defined functions and built-in Functions.

Types of Functions

Using a loop, a similar task can be performed many times.

Loop Concepts

The apply function uses an array, a matrix, or a dataframe as an input and returns the result in an array format.

Applying Concepts

In typical data management, it is important to standardize the text columns or variables in a dataset because R is case sensitive and it reads any discrepancy as a new data point.

String Manipulation

The R programming language, missing values are represented as NA. NAs are not string or numeric values; they are considered as an indicator for missing values.

NA and Missing Value Management and Imputation Techniques

To generate univariate statistics about a dataset, we have to follow two approaches, one for continuous variables and the other for discrete or categorical variables.

Preview 09:18

The relationship or association between two variables is known as bivariate analysis. There are three possible ways of looking at the relationship.

Bivariate Analysis

The multivariate relationship is a statistical way of looking at multiple dependent and independent variables and their relationships.

Multivariate Analysis

Understanding probability distributions is important in order to have a clear idea about the assumptions of any statistical hypothesis test.

Understanding Distributions and Transformation

Interpretation of the calculated distribution helps in forming a hypothesis.

Interpreting Distributions and Variable Binning

Contingency tables are frequency tables represented by two or more categorical variables Frequency table is used to represent one categorical variable; however, contingency table is used to represent two categorical variables.

Contingency Tables, Bivariate Statistics, and Checking for Data Normality

The null hypothesis states that nothing has happened; the means are constant, and so on. However, the alternative hypothesis states that something different has happened and the means are different about a population.

Hypothesis Testing

When a training dataset does not conform to any specific probability distribution because of non-adherence to the assumptions of that specific probability distribution, the only option left to analyze the data is via non-parametric methods.

Non-Parametric Methods

This video will walk you through the basics of data visualization along with how to create advanced data visualization using existing libraries in R programming language.

Preview 16:06

This video will let you explore different kinds of charts and plots and their creation. You'll also be able to use geo mapping.

Visualizing Charts, and Geo Mapping

By the end of this video, you will be able to use some amazing data visualization techniques which are widely used for smart Data representation.

Visualizing Scatterplot, Word Cloud and More

This video will teach you how to take the plotting to a new level. Here, you will learn to use the plotly library, which is designed as an interactive browser-based charting library built on the JavaScript library.

Using plotly

This video will let you explore the Geo mapping which is a type of chart, used by data mining experts when the dataset contains location information.

Creating Geo Mapping

How could you predict the future outcomes of a target variable? Regression is the answer to this. Let's have a brief introduction and understand regression.

Preview 04:08

This video will let you explore about Linear regression model which can be used for explaining the relationship between a single dependent variable and independent variable.

Linear Regression

This video will let you understand the use of stepwise regression method to solve complex regression problems.

Stepwise Regression Method for Variable Selection

What could we do in those scenarios where the variable of interest is categorical in nature, such as buying a product or not, approving a credit card or not, tumor is cancerous or not, and so on? Logistic regression is the best solution to these.

Logistic Regression

Let's dive into another form of regression where the parameters in a linear regression model are increased up to one or two levels of polynomial calculation.

Cubic Regression

Market Basket Analysis is the study of relationships between various products and products that are purchased together or in a series of transactions.

Preview 12:29

Implementing market basket analysis.

Practical project
Advanced Data Mining projects with R
17 Lectures 01:24:44

This video provides an overview of the entire course.

Preview 03:53

It is important to classify objects according to their similarities or dissimilarities so that their study becomes easier. We use clustering techniques for that purpose.

Understanding Customer Segmentation

There are many clustering methods available. Out of them, we will learn about two methods, K-means and hierarchical, in this video.

Clustering Methods – K means and Hierarchical

In this video, we will go a step further and learn about model-based and other clustering algorithms. We will also compare the algorithms.

Clustering Methods – Model Based, Other and Comparison

Recommendation is a technique by which the algorithm detects what the user is buying. You would always like to be recommended things similar to your interest or things you have bought before. Recommendation engine helps in doing that.

Preview 07:29

There are different types of methods for building recommendation engine. You need to know which method to use depending on what type of product shopping you do. Also, there are certain limitations to these methods.

Application of Methods and Limitations of Collaborative Filtering

As we are armed with the theory of recommendation, we will now build a recommendation engine.

Practical Project

When there are a lot of variables, it becomes difficult to extract data. We need to devise something that will let us gather data in less number of variables. Dimensionality reduction provides you with that solution.

Preview 09:14

In order to understand dimensionality reduction, we need to work with it. Here, we will apply dimensionality reduction procedure, both the model-based and principal component-based approaches.

Practical Project around Dimensionality Reduction

We can also try some other approaches to perform dimensionality reduction according to the need of the dataset. Let's look at that in this video.

Parametric Approach to Dimension Reduction

Before working on neural networks, we need to understand the theory behind neural networks.

Preview 04:07

To understand and implement the neural networks, we need to understand the maths behind it. This video will do just that!

Understanding the Math Behind the Neural Network

After knowing about neural networks, we need to see how to implement neural networks in R.

Neural Network Implementation in R

Prediction is an important aspect of data mining. In this video, we will create a prediction model using neural network to predict the auction average price.

Neural Networks for Prediction

We need to form clusters or groups of data so that performing actions on them becomes easier. Here we are going to classify customers based on marketing.

Neural Networks for Classification

We will also perform forecasting using neural networks. In this video, we will forecast a time series.

Neural Networks for Forecasting

After working with neural networks, we should also know the merits and demerits of the famous technology.

Merits and Demerits of Neural Networks
About the Instructor
Packt Publishing
3.9 Average rating
7,297 Reviews
52,158 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.