Central Limit Theorem

Kirill Eremenko
A free video tutorial from Kirill Eremenko
Data Scientist
4.5 instructor rating • 44 courses • 1,743,284 students

Learn more from the full course

Statistics for Business Analytics and Data Science A-Z™

Learn The Core Stats For A Data Science Career. Master Statistical Significance, Confidence Intervals And Much More!

06:02:26 of on-demand video • Updated April 2021

  • Understand what a Normal Distribution is
  • Understand standard deviations
  • Explain the difference between continuous and discrete variables
  • Understand what a sampling distribution is
  • Understand the Central Limit Theorem
  • Apply the Central Limit Theorem in practice
  • Apply Hypothesis Testing for Means
  • Apply Hypothesis Testing for Proportions
  • Use the Z-Score and Z-Tables
  • Use the t-Score and t-Tables
  • Understand the difference between a normal distribution and a t-distribution
  • Understand and apply statistical significance
  • Create confidence intervals
  • Understand the potential pitfalls of overusing p-Values
English [Auto] Hello welcome back to the course on business statistics. And today we're talking about the Central Limit Theorem. It's a super exciting topic. I've got a very interesting troll prepared and more importantly the central limit theorem is said to be the most important theorem of statistics and probably even the whole of mathematics. And the reason for that is how useful it is to us when we are observing the world running experiments assessing populations and so on. And how powerful it is in itself. So being powerful doesn't mean has to be super complex will break it down into simple steps just now and it'll all make total sense. And then at the end of the section you'll see a very applied example of how it is used. But other than that we'll go through a couple. We'll mention a couple of examples where it is also used in the world and you'll see we will start to like the whole puzzle start to come together why it's so important. All right. So let's get started. We stopped over here where we have the population with the parameters the sample with statistics and then we introduce the sampling distribution where we've taken lots and lots of different samples from our population randomly and we've recorded the sample mean for every time. And then we are now going to look at what does that distribution look like what does the distribution or what is the sampling distribution of the sample mean look like. All right so let's get rid of this temporary Rathore we had and are you ready for this. The central limit states that the sampling distribution given that you've taken enough samples but the sampling distribution of the sample mean will be will look like that. And basically it will be a normal distribution. And that is regardless of what population you had what kind of distribution you had in the population. And that is the real power. So let's let's just reiterate what happened here as let's start from the population from the top so this moving things to the left. Let's say we have a population and it has its own distribution. It could be height height of people. And for some reason in this place in this location that you're analyzing there's there's a very two different distributions like cancer that is very tall people Giants and very very short people might be like on a different planet or something. It could be it could be a completely different distribution could be an exponential distribution could be any type of absolutely any type of distribution it doesn't matter what it looks like. But we're going to use this example because it's very different to a normal distribution. Just to make that point clear. So you have a population with a distribution like that. And then if you take one sample from that distribution. So the red part we here that we had if you take one sample from this distribution what it will look like. While it could look like something like this doesn't have to. Samples are random so you might get a lot from the left or from the right. But generally speaking this is a valid sample that you could get from this population. And as you can see it kind of resembles the population a bit and the more you take. So these are in every single box here is an observation that you took from a population the more you take the larger your sample the closer will generally be to your population distribution but it doesn't have to. Again you could have just gone all of the boxes from here just randomly or all the boxes from here. And those are really two things that are quite restrictive about the sample. So by taking a sample you might get something that resembles in a way your population distribution which you don't know anything about and therefore how can you model it. How can you come up with equations for it and so on. So even if you can resemble it is not really helpful. And the second thing is that there is not much there's a chance that you'll get something different you'll just be by chance because of your the way your samples have the sample was picked you might get something that's all over here or over there so it's completely unreliable in that sense given that you don't know your popular underlying population distribution that can be completely around. Doesn't have to be a normal distribution. But then when you take the sampling distribution what happens is magic basically it's always going to be a normal distribution. So that's what the central limit their home states and that's just scratching the surface of the central and that they're kind of like the overall concept. There's there's much more to it than We'll talk about that in the next Tauriel. But just that in itself is a super powerful concept. And the reason why that is so important is because we can apply it in so many different ways and in our lives in the world. Take for instance a book and look at the oh look at the length of words in your book. So they won't be distributed normally in no way you will have very short words with one letter. For instance I or a then you'll have the lens of words peaks around about four letters because you have a lot of words such as the. And and so on that are used very commonly. And then it drops off but you can have words which are up to 10 12 letters long and so on. So it kind of goes like goes up and then goes down like that. So it's not a normal distribution but at the same time if you take the average length of all of the words on every single page for a book. So basically your page is your sample and you take the average length of all of the words on your page and then you do that for every single page in the book. Then you look at the sampling distribution of those averages that you got. It will be a normal distribution. Another example of how powerful the central limit theorem is our Belgica school systems. Why is it the case that often involves legal systems researchers can treat things that are happening as if they were normally distributed. Well the reason for that is that in Belgica systems when you have something happening for instance you're observing how a some sort of Medicine what kind of effect it has on a human being. Well that is the result of thousands and thousands and thousands of random events. And even though we might not know the underlying distribution of those events in the human body. But what we do know is that once we take the sampling distributions if we treat all of those events as samples of a sum distribution and then we take the sampling distribution of the sample mean. Or so you take the means as we can see here you take all the means for all of those events and you take sampling distribution then it will be distributed normally and that is superpowerful of course it's much more complex than that. And I'm not a researcher in medicine or biological system so I can't comment on that thoroughly but that is the intuition behind what is going on there and why the central limit theorem has so much application. And and but for us of course it's more important in business scenarios and there are cases when you can extract additional information for business events using the power of the central limit theorem and people around you will be like whoa already you just do that. What just happened. But because you know the beauty of the Central Limit Theorem and the power of it you will if you train yourself up you will know when exactly you should apply it and how you can get information insights or insights and sense traces when others simply don't know what to do. And on that note there was the intro to the central limit theorem. In the next to Tauriel we'll go dive a bit deeper and have a bit of a play around with some practical stuff. So I hope you enjoyed today's tutorial and I look forward to seeing you next time. Until then help analyzing.