What is Data Mining?

Minerva Singh
A free video tutorial from Minerva Singh
Bestselling Instructor & Data Scientist(Cambridge Uni)
4.2 instructor rating • 40 courses • 71,238 students

Learn more from the full course

Data Science:Data Mining & Natural Language Processing in R

Harness the Power of Machine Learning in R for Data/Text Mining, & Natural Language Processing with Practical Examples

13:05:56 of on-demand video • Updated November 2020

  • Perform the most important pre-processing tasks needed prior to machine learning in R
  • Carry out data visualization in R
  • Use machine learning for unsupervised classification in R
  • Carry out supervised learning by building classification and regression models in R
  • Evaluate the accuracy of supervised machine learning algorithms and compare their performance in R
  • Carry out sentiment analysis using text data in R
English [Auto] In this section I'm going to deal with data mining or at least some of the classical applications of data mining. And when we think about data mining a lot of words come to our mind things like well apart from data mining patterns learning customer set process and so one so they are mining is the art and science of discovering patterns in large datasets. And this is a field which lies at the intersection of machine learning and statistics and it covers topics like flustering data mining is the analysis step of knowledge discovery in database process or the. And what we are going to specificially focus on in this section Association mining is an integral component of the data mining family so Association mining is the process of discovering interesting relationships between variables in a database or data frame identifies strong rules which can act an item or different items with each other measure the strong measures of strong rules include quantitative metrics like support confidence and lift support indicates how often an item appears confidences how often the rule is found to be true and left indicates the probability of our occurrence and we are going to deal with the implementation of these in our in the lectures of this section. But just to briefly tell you that association mining comprises of two main algorithms a priory algorithm and. And these are the different steps in the a priori algorithm. And in case they don't make sense to you right now and it is quite complicated theoretically we are going to cover the applications in depth and including applications with real life data and the subsequent sections. But just very quickly the first step is going to scan the transaction database to get the support for each item set and items that is basically a set comprising of items and compare this with the minimum support and get support for these items sets and then generate a set of candidate items set. And we are going to use the primary property to prune the unfrequented key items sets from the set and then we are going to scan the database to get support for each item set and compare with the minimum support and we will identify the frequent gay items are the most frequently occurring group of items. And then we are going to generate non-empty subsets and try to generate rules for the same. And we're going to discuss the implementation further on. Then we have something known as éclat algorithm and this is also another method for frequent items a generation and it determines the support for any item set by intersecting lists of two game minus one subset. So eventually when you intersect them and the ones that are and the items that are occurring in both the subsets they will finally be selected. And while these are the classical applications of data mining. So all of the subsequent sections in this course are going to cover over things like text mining and extracting unstructured text data from the web and Twitter and Facebook and even making sense of unstructured data or text mining is also it belongs fairly and squarely to the crime within the family of data mining. But right now we are just going to look at the practical implementations of a priory and eclat algorithms.