Data mining with Rattle

Name: Data mining with Rattle
Rating: 4.9 (15 reviews)

A GUI based data mining tool

Highest Rated

Created byDr. Gomathi Srinivasan

Last updated 3/2021

English

English [Auto],

What you'll learn

Data mining
Rattle
Machine learning
Classification
Evaluating Data mining model
All about data
Data mining tool

Course content

12 sections • 22 lectures • 1h 59m total length

About the Instructor0:40
Know your instructor
Modules1:56
Modules in the course

Introduction to Data Mining14:28
Data mining is about building models from data. We build models to gain insights into the world and how the world works, so we can predict how things will behave in the future. A data miner, in building models, deploys many different data analysis and model building techniques. Our choices depend on the business problems to be solved. Although data mining is not the only approach it is becoming very widely used because it is well suited to the data environments we find in today's enterprises. This is characterised by the volume of data available, commonly in the gigabytes and fast approaching the terabytes, and the complexity of that data, both in terms of the relationships that are awaiting discovery in the data and the data types available today, including text, image, audio, and video. Also, the business environments are rapidly changing, and analyses need to be regularly performed and models regularly updated to keep up with today's dynamic world.
Introduction to Rattle and R1:52
Know the basics of R and Rattle
Data - Terminologies4:13
Instructions to install R and Rattle3:17
Download R and R studio1:54
Find the link in the resources
Install R and R Studio3:17
Install and launch Rattle3:26
Install and launch Rattle to begin data mining with Rattle, setting up the tool and starting your first analysis.

Explore - Summary6:09
A key task in any data mining project is exploratory data analysis (often abbreviated as EDA), which generally involves getting a basic understanding of a dataset. Statistics, the fundamental tool here, is essentially about uncertainty--to understand it and thereby to make allowance for it.
Explore - Distributions5:21
It is usually a good idea to review the distributions of the values of each of the variables in your dataset. The Distributions option allows you to visually explore the distributions for specific variables.
Using graphical tools to visually investigate the data's characteristics can help our understanding of the data, error correction, and variable selection and variable transformation.
Graphical presentations are more effective for most people, and Rattle provides a graphical summary of the distribution of the data with the Distribution option of the Explore tab.
Explore - Correlation3:01
A correlation plot will display correlations between the values of variables in the dataset. In addition to the usual correlation calculated between values of different variables, the correlation between missing values can be explored by checking the Explore Missing check box.
Explore - PCA & Interactive plot6:24
Learn how to: Principal Component Analysis and Interactive plot.

Test4:33
Statistical Tests: These tests apply to two samples. The paired two-sample tests assume that we have two samples or observations and that we are testing for a change, usually from one time period to another.

Distribution of the Data

* Kolmogorov-Smirnov Non-parametric Are the distributions different?
* Wilcoxon Signed Rank Non-parametric Do paired samples have different distribution?

Location of the Average

* T-test Parametric Are the means different?
* Wilcoxon Rank-Sum Non-parametric Are the medians different?

Variation in the Data

* F-test Parametric Are the variances different?

Correlation

* Correlation Pearsons Are the values from the paired samples correlated?

Transform12:40
The Transform tab provides numerous options for transforming our datasets. Cleaning our data and creating new features from the data occupies much of our time as data miners. There is a myriad of approaches, and a programming language like R supports them all. Through the Rattle user interface, we can perform some of the more common transformations. This includes normalising our data, filling in missing values, turning numeric variables into categorical variables, and vice versa, dealing with outliers, and removing variables or entities with missing values.

Model6:09
The task of classification is at the heart of data mining! Most of what we learn from a traditional data mining course focuses on the algorithms from machine learning and statistics that build classification models. These models can then be used to classify new entities. The actual structure of the model also gives us insight into the relationships between the variables that are important in differentiating the classes.
This chapter focuses on this common data mining task of classification and prediction. We consider binary (or two class) classification, but the concepts also apply to multi-class classification.
The two-class model builders provided by Rattle are Decision Trees, Boosted Decision Trees, Random Forests, Support Vector Machines, and Logistic Regression.

Evaluate5:31
To Evaluate the model, we have various features based on the model/clusters we use. Various evaluation criteria are available.
Error Matrix
An error matrix shows the true outcomes against the predicted outcomes. Two tables will be presented here. The first will be the
count of observations and the second will be the proportions. For a binary classification model, the cells of the error matrix are
referred to, from the top left going clockwise, as the True Negatives, False Positives, True Positives, and False Negatives. An error matrix is also known as a confusion matrix.

Requirements

Be able to work with computer
Should know the basics of Data mining

Description

In this course, you will learn about Rattle GUI which is an interactive tool for data mining.

Rattle GUI is a free and open-source software package providing a graphical user interface (GUI) for data mining using the R statistical programming language. Rattle is used in a variety of situations. Rattle provides considerable data mining functionality by exposing the power of the R Statistical Software through a graphical user interface.

Rattle is also used as a teaching facility to learn the R software Language. There is a Log Code tab, which replicates the R code for any activity undertaken in the GUI, which can be copied and pasted. Rattle can be used for statistical analysis, or model generation. Rattle allows for the dataset to be partitioned into training, validation and testing. The dataset can be viewed and edited. There is also an option for scoring an external data file.

Data mining is the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. The computer is responsible for finding the patterns by identifying the underlying rules and features in the data. The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records

Who this course is for:

Beginners who want to learn data mining tool
Anybody who wish to do data mining without code
GUI based Data mining tool

Data mining with Rattle

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 3min

Introduction7 lectures • 32min

Rattle1 lecture • 4min

Data1 lecture • 12min

Explore4 lectures • 21min

Test1 lecture • 5min

Transform1 lecture • 13min

Model1 lecture • 6min

Evaluate1 lecture • 6min

Associate1 lecture • 5min

Requirements

Description

Who this course is for: