How is Machine Learning Different from Statistical Data Analysis?

Minerva Singh
A free video tutorial from Minerva Singh
Bestselling Udemy Instructor & Data Scientist(Cambridge Uni)
4.3 instructor rating • 39 courses • 70,030 students

Learn more from the full course

Complete Data Science Training with Python for Data Analysis

Beginners python data analytics : Data science introduction : Learn data science : Python data analysis methods tutorial

12:49:50 of on-demand video • Updated July 2019

  • Python data analytics - Install Anaconda & Work Within The iPytjhon/Jupyter Environment, A Powerful Framework For Data Science Analysis
  • Python Data Science - Become Proficient In Using The Most Common Python Data Science Packages Including Numpy, Pandas, Scikit & Matplotlib
  • Data analysis techniques - Be Able To Read In Data From Different Sources (Including Webpage Data) & Clean The Data
  • Data analytics - Carry Out Data Exploratory & Pre-processing Tasks Such As Tabulation, Pivoting & Data Summarizing In Python
  • Become Proficient In Working With Real Life Data Collected From Different Sources
  • Carry Out Data Visualization & Understand Which Techniques To Apply When
  • Carry Out The Most Common Statistical Data Analysis Techniques In Python Including T-Tests & Linear Regression
  • Understand The Difference Between Machine Learning & Statistical Data Analysis
  • Implement Different Unsupervised Learning Techniques On Real Life Data
  • Implement Supervised Learning (Both In The Form Of Classification & Regression) Techniques On Real Data
  • Evaluate The Accuracy & Generality Of Machine Learning Models
  • Build Basic Neural Networks & Deep Learning Algorithms
  • Use The Powerful H2o Framework For Implementing Deep Neural Networks
English [Auto] So now after we have come to the end of our statistics section and before moving on to machine learning I just want you to I just want to illustrate a couple of differences between statistical modeling and machine learning. Frankly there are a lot of overlap between these fields but it is important to know some of the key differences between these so that when you work with your own data and that is the purpose of this course to equip equip you to work with your own data you can decide whether you should be focusing on statistical modeling on machine learning because frankly there is nothing nothing more in this world that screams amateurish except for situations when you end up using machine learning when you should be using statistical modeling. So this brief overview should let you make that decision. So there are some basic differences in tiered article underpinnings of statistics and machine learning and they should always be borne in mind when you work with your data. So statistics they focus on formalising the relationship between variables in the form of mathematical equations machine learning. On the other hand comprised of algorithms that can learn from data without really without realizing on relying on rules based programming or any formalization and formal in statistics formulation of relationships can include things like generating confidence in devils etc.. And obviously these are not the things that you'll get in machine learning and that's something you will see for yourself in the subsequent sections. Statistics is a subfield of mathematics which deals with finding relationship between variables to predict an outcome. On the other hand machine learning it comes from Computer Science and Artificial Intelligence. And this deals with building systems that can learn from data and instead of explicitly programmed instructions and not only do machine learning systems learn from data we use the same machine learning systems to actually predict unseen data and statistics. There's a lot of emphasis on quantifying uncertainty and in machine learning the emphasis is almost exclusively on prediction and classification and how good your prediction or classification model is. So from a practical perspective statistics emphasizes on formal statistical inference things like confidence in devils hypothesis tests in low dimensional problems and machine learning emphasizes on prediction and high dimension prediction problems. One of the key things that works for me when I have to decide between statistical models and machine learning is and that may well work for you as well irrespective of your domain that choice of statistical models is influenced by underlying data or distribution. And that is something that we saw a lot in the previous sections that we devoted a lot of time to checking whether our data were normally distributed or not. If the errors were normally distributed or not but in M-L models machine learning models that choice is based on the predictive ability of the model so it is decoupled from better distribution and this makes machine learning more amenable to nonlinear and other types of complex and skewed data. And for more for most machine learning systems you do not need normally distributed data in any form or shape and that is what happens in real life that we actually very rarely do we have normally distributed data. And since we do not have normally distributed data often we just make a lot of sense to work with machine learning data with machine learning systems and then working on identifying algorithms which gives me the best possible results. And in terms of predictive ability of model and in-case of statistics we get a formalized relationship between y and access in form of an equation. And so when we work with linear regression we got immigration at the end. But in machine learning usually no equations are produced. And what we have is that we have created a machine learning system that is going to predict an unseen data. And the upshot is how well the machine learning system predicts unseen data. But you're not going to get any equations at the end. So in case you have to report an equation or something then machine learning is not what you should go because machine learning the that emphasis is actually to build the best possible models on a set of data and then see how good they are on another set of unseen data. And in the next lecture I'm going to talk more about what machine learning is and different types of machine learning machine learning problems out there. And when the next section on what we are actually going to start with the practical examples of the different types of machine learning system that actually start working with machine learning algorithms all across the remaining sections.