Central Limit Theorem

Kirill Eremenko
A free video tutorial from Kirill Eremenko
Data Scientist
4.5 instructor rating • 45 courses • 1,843,648 students

Learn more from the full course

Statistics for Business Analytics and Data Science A-Z™

Learn The Core Stats For A Data Science Career. Master Statistical Significance, Confidence Intervals And Much More!

06:02:26 of on-demand video • Updated July 2021

  • Understand what a Normal Distribution is
  • Understand standard deviations
  • Explain the difference between continuous and discrete variables
  • Understand what a sampling distribution is
  • Understand the Central Limit Theorem
  • Apply the Central Limit Theorem in practice
  • Apply Hypothesis Testing for Means
  • Apply Hypothesis Testing for Proportions
  • Use the Z-Score and Z-Tables
  • Use the t-Score and t-Tables
  • Understand the difference between a normal distribution and a t-distribution
  • Understand and apply statistical significance
  • Create confidence intervals
  • Understand the potential pitfalls of overusing p-Values
English [Auto] Hello and welcome back to the course on business statistics, and today we're talking about the central limit theorem. It's a super exciting topic. I've got a very interesting try prepared. And more importantly, the central limit theorem is said to be the most important theorem of statistics and probably even the whole of mathematics. And the reason for that is how useful it is to us when we are observing the world running experiments, assessing populations and so on and how powerful it is in itself. So being powerful doesn't mean has to be super complex, will break it down into simple steps just now and it'll all make total sense. And then at the end of the section, you'll see a very applied example of how it is used. But other than that will go through a couple. It will mention a couple of examples where it is also used in the world. And you'll see we'll start to like the whole puzzle started to come together. Why it's so important. All right. So let's go start. We stopped over here last time where we have the population with the parameters of the sample, with statistics, and then we introduce the sampling distribution where we've taken lots and lots of different samples from our population randomly. And we've recorded the sample mean for every time. And then we are now going to look at what does that distribution look like, what is the distribution or what is the sampling distribution of the sample mean look like. All right, so let's get rid of this temporary graph that we had. And are you ready for this? The central limit theorem states that the sampling distribution, given that you've taken enough samples, but the sampling distribution of the sample mean will be will look like that. And basically it'll be a normal distribution. And that is regardless of what population you had, what kind of distribution you had in the population. And that is the the real power. So let's let's just reiterate what happened here. Let's let's start from from the population, from the top. So let's move these things to the left. Let's say we have a population and it has its own distribution. It could be height, height of people. And for some reason, in this place, in this location that you're analyzing, there's there's a very two different distributions like that. So there's very tall people, giants and very, very short people might be on a different planet or something. It could be it could be a completely different distribution, could be an exponential distribution, could be any type of absolutely any type of distribution. It doesn't matter what it looks like, but we're going to use this example because it's very different to a normal distribution that just to make that point clear. So you have a population with a distribution like that, then if you take one sample from that distribution. So the red part over here that we had, if you take one sample from this distribution, what will look like? Well, it could look like something like this doesn't have to. Samples are random, so you might get a lot from the left or from the right. But generally speaking, this is a valid sample that you could get from this population. And as you can see, it kind of resembles the population a bit. And the more you take. So these are every single box. Here is a is an observation that you took from from your population. The more you take, the larger your sample, the closer will generally be to your population distribution. But it doesn't have to. Again, you could have just gone all of the boxes from here just randomly or all the boxes from here and those already two things that are quite restrictive about the sample. So by taking the sample, you might get something that resembles, in a way, your population distribution, which you don't know anything about, and therefore, how can you model it? How can you come up with equations for it and so on. So even if you can resemble it, it's a really helpful. And the second thing is that there is not much there's a chance that you'll get something different. You'll just be by chance because of your the way your samples have the sample aspect, you might get something that's all over here or over there. So it's completely unreliable in that sense. Given that you don't know your populate underlying population distribution that can be completely random, doesn't have to be a normal distribution. But then when you take the sampling distribution, what happens is magic. Basically, it's always going to be a normal distribution. So that's what the central limit theorem states and that's just scratching the surface of the central limit theorem. It's kind of like the overall concept. There's there's much more to it and we'll talk about that in the next Turrill. But just that in itself is a super powerful concept. And the reason why that is so important is because we can apply it in so many different ways and in our lives in the world. Take, for instance, a book and look at the oh, look at the length of words in your book. So they won't be distributed normally in no way. You will have very short words of one letter, for instance, I or a then you'll have the length of words, peeks around about four letters because you have a lot of words such as the and and so on that are used very commonly. And then it drops off. But you can have words which are up to 10, 12 letters long and so on. So it kind of goes like it goes up and then goes down like that. So it's not a normal distribution, but at the same time, if you take the average length of all of the words on every single page for a book, so basically your page is your sample and you take the average length of all of the words on your page, and then you do that for every single page in the book. Then you look at the sampling distribution of those averages that you got. It will be a normal distribution. Another example of how powerful the central limit theorem is our biological systems. Why is it the case that often in biological systems, researchers can treat things that are happening as if there were normally distributed? Well, the reason for that is that in biological systems, when you have something happening, for instance, you're observing how a some sort of medicine, what kind of effect it has on a human being. Well, that is the result of thousands and thousands and thousands of random events. And even though we might not know the underlying distribution of those events in the human body, but what we do know is that once we take the sampling distribution, so if we treat all those events as samples of a certain distribution and we take the sampling distribution of the sample mean or so you take the means, as we can see here, you take all the means for all of those events and you take the sampling distribution, then it will be distributed normally. And that is super powerful. Of course, it's much more complex than that. And I'm not a researcher in medicine or biological system, so I can't comment on that thoroughly. But that is the intuition behind what is going on there and why the central limit theorem has so much application. And and but for us, of course, it's more important in business scenarios. And there are cases when you can extract additional information for business events using the power of the central limit theorem. And people around you will be like, whoa, how did you just do that? What just happened? But because, you know, the beauty of the central limit theorem and the power of it, you will if you train yourself up, you will know when exactly you should apply it and how you can get information or insights in certain situations when others simply don't know what to do. And on that note, that was the intro to the Central Limit Theorem in the next tutorial will go a bit deeper and have a bit of a play around with some practical stuff. So I hope you enjoyed today's tutorial and I look forward to seeing you next time. Until then, have analyzing.