What is Statistical Data Analysis?

Minerva Singh
A free video tutorial from Minerva Singh
Bestselling Udemy Instructor & Data Scientist(Cambridge Uni)
4.3 instructor rating • 39 courses • 70,030 students

Learn more from the full course

Complete Data Science Training with Python for Data Analysis

Beginners python data analytics : Data science introduction : Learn data science : Python data analysis methods tutorial

12:49:50 of on-demand video • Updated July 2019

  • Python data analytics - Install Anaconda & Work Within The iPytjhon/Jupyter Environment, A Powerful Framework For Data Science Analysis
  • Python Data Science - Become Proficient In Using The Most Common Python Data Science Packages Including Numpy, Pandas, Scikit & Matplotlib
  • Data analysis techniques - Be Able To Read In Data From Different Sources (Including Webpage Data) & Clean The Data
  • Data analytics - Carry Out Data Exploratory & Pre-processing Tasks Such As Tabulation, Pivoting & Data Summarizing In Python
  • Become Proficient In Working With Real Life Data Collected From Different Sources
  • Carry Out Data Visualization & Understand Which Techniques To Apply When
  • Carry Out The Most Common Statistical Data Analysis Techniques In Python Including T-Tests & Linear Regression
  • Understand The Difference Between Machine Learning & Statistical Data Analysis
  • Implement Different Unsupervised Learning Techniques On Real Life Data
  • Implement Supervised Learning (Both In The Form Of Classification & Regression) Techniques On Real Data
  • Evaluate The Accuracy & Generality Of Machine Learning Models
  • Build Basic Neural Networks & Deep Learning Algorithms
  • Use The Powerful H2o Framework For Implementing Deep Neural Networks
English [Auto] In this lecture I'm going to introduce you to what statistics are that common Uses and Misuses. So according to the Merriam Webster Dictionary statistics are a branch of mathematics dealing with the collection analyses interpretation and presentation of masses of numerical data. Well this is strictly a bit accurate because we can even work with categorical or qualitative data but that is something we really deal with later on in this course. So the numerical data or the data in question not a raw sample of samples rather taken from a population of interest with the view of driving inferences about the bespoke population usually collecting data about an entire population census is not possible. So representative samples are taken and inferences are drawn from them and they are regarded to be applicable to the entire population. So the population the statistical population in question might be population of a country and different samples of waters may be taken to estimate voting intentions. The polling companies that publish voting intentions and things like that are very common in most countries where we have democratically held elections. And essentially it is not possible to go to the entire population and ask them what their preferences are. So the next best thing they do is to either by telephone or by email they collect what they think are representative samples of the entire population. Ask them about their voting intentions carry out mathematical analyses on these and then 10 days before an election you might find that such and such candidate is poised to win. So here are some bits about using statistics. And the upshot is that while step districts are used in a variety of different field political forecasting medicine ecology samples used in all these cases are usually drawn as a representative of the entire population so we don't use the entire population and we use the sample of samples to draw inferences. So here we have a very recent newspaper article published on Saturday April 2017. And it says that conservative party in the United Kingdom is on course for landslide victory in elections. Poll suggests. So Observer and opinion they carried out a survey and in that survey they spoke to people of the United Kingdom or a sample that they believed represented the population of United Kingdom in terms of demography social economic and ethnic break ups. And based on that they are predicting that the Tory party or the conservatives will get a landslide victory in the forthcoming general elections. That is something we will know on the 9th of June 2017. Another survey was carried out and it shows that one in seven Labour's Labor voters have turned Tory. So again this particular newspaper that commissioned a poll. Samples were collected analyzed and on basis of that an inference has been drawn that out of the entire population of Labor voters in the United Kingdom one out of seven are going to vote for Tories or the Conservative Party. There are other areas that also uster to stake. So this is a medical article and it asks if food allergies on the rise and in order to answer this question they selected what they believed was a representative sample of the population of United States of America. I asked them about their food allergies etc. and came to the conclusion that food allergies are on the rise in this case. This is this pertains to ecology. And in a given study area in Asia people live in two different habitats. They went to an eco forest a degraded area secondary growth and they just observed the number of species and now they produced a chart telling us about the different about the number of species number of species avian species present in the different forests of that area. Obviously it is not possible to examine every forest in HCO or every forest in a given territory. So they believe that all of these areas forests they represent their landscape. And these are the number of species they believe they contain based on this particular sample. So this is how we use statistics and we use it for academic studies like studies in ecology or medicine or even when you open the newspaper. Are no getting past mistakes now. Well it's a valuable tool. This tool can get is also very easily and often misused accidentally or deliberately. So now I'm going to talk a bit about the misuse of statistics or statistical disasters. Fifty two percent. This particular figure apart from representing the biggest uphill post-war Europe faced is also a classical example of a statistical disaster. Indeed up to the United Kingdom's referendum to leave the European Union also known as breaks that many surveys were done about voting intentions and data. Statistically analyzed so different samples of voters were contacted. I was not and they were asked about their voting intentions and after the data analysis was done on June 22nd one day or well less than 12 hours before the referendum opened for polling this cumbrous polls showed a significant lead for three main 30 minutes before the polling closed on June 23 2016 you get what is expected to remain. When we woke up on June 24 leave people wanting to leave the European Union. They got 52 percent of the votes as compared to remains 48 percent votes. And this was the BBC headline Europe stand by you leave for and I'm not surprised because if two days before you read something of this sort. That U.K. was expected to remain. And first thing in the morning 24th of June you read that the UK has decided to leave. Yes. It was a stunning news to say the least. But just one more thing about this 52 percent. It does not represent 52 percent of its population. And this is a very common way in which we end up misinterpreting on this using statistics that you know so many people they say colloquially. Fifty two percent of ukase population has voted to leave. No it is 52 percent of the people who voted who actually voted either in person or by post. And there were eligibility criteria for who could vote. You had to be over the age of 18 you had to be either a citizen of the United Kingdom or a Commonwealth citizen EU residents and citizens citizens were not allowed to vote. So out of the eligible people Fifty two percent of the people they voted for breaks that. And that is not 52 percent of United Kingdom's 60 million plus population. Now the reasons for such statistical disasters are either poorly collected unbiased data or wrong statistical tests. Now the statistical tests in this case were accurate because these firms that multi million dollar or multimillion pound bowling and survey firms but in this case they ended up collecting biased data and which led to a major embarrassment for almost all the major polling firms in the United Kingdom. So of course it's not going to focus on political polling and things like that. And most of the cost will be devoted to learning about the different statistical tests that there are how to implement them in and which test to implement when that is also very important thing to know in the next lecture. I'm going to tell you more about designing studies and collecting appropriate data so that you don't end up with a biased sample and an egg on your face like all these polling companies ended up with.