Central Limit Theorem

A free video tutorial from Kirill Eremenko
DS & AI Instructor
Rating: 4.5 out of 5Instructor rating
50 courses
2,829,883 students
Central Limit Theorem

Learn more from the full course

Statistics for Business Analytics and Data Science A-Z™

Learn The Core Stats For A Data Science Career. Master Statistical Significance, Confidence Intervals And Much More!

06:01:17 of on-demand video • Updated May 2024

Understand what a Normal Distribution is
Understand standard deviations
Explain the difference between continuous and discrete variables
Understand what a sampling distribution is
Understand the Central Limit Theorem
Apply the Central Limit Theorem in practice
Apply Hypothesis Testing for Means
Apply Hypothesis Testing for Proportions
Use the Z-Score and Z-Tables
Use the t-Score and t-Tables
Understand the difference between a normal distribution and a t-distribution
Understand and apply statistical significance
Create confidence intervals
Understand the potential pitfalls of overusing p-Values
English [Auto]
Hello and welcome back to the course on business statistics. And today we're talking about the Central limit theorem. It's a super exciting topic. I've got a very interesting tutorial prepared. And more importantly, the central limit theorem is said to be the most important theorem of statistics and probably even the whole of mathematics. And the reason for that is how useful it is to us when we are observing the world, running experiments, assessing populations and so on, and how just powerful it is in itself. So being powerful doesn't mean it has to be super complex. We'll break it down into simple steps just now and it'll all make total sense. And then at the end of the section you'll see a very applied example of how it is used. But other than that, we'll go through a couple. We'll mention a couple of examples where it is also used in the world and you'll see it'll start to like the whole puzzle will start to come together. Why it's it's so important. All right. So let's get started. We stopped over here last time where we have the population with the parameters, the sample with statistics. And then we introduced the sampling distribution where we've taken lots and lots of different samples from our population randomly. And we've, we've recorded the sample mean for every time. And then we are now going to look at what does that distribution look like, what is the distribution or what is the sampling distribution of the sample mean look like. All right, so let's get rid of this temporary graph that we had. And are you ready for this? The central limit theorem states that the sampling distribution, given that you've taken enough samples, but the sampling distribution of the sample mean will be will look like that and basically it will be a normal distribution. And that is regardless of what population you had, what kind of distribution you had in the population and that is the the real power. So let's let's just reiterate what happened here. Let's let's start from from the population from the top. So let's move these things to the left. Let's say we have a population and it has its own distribution. It could be height, height of people. And for some reason in this place, in this location that you're analyzing, there's there's a very two different distributions like that. So there's very tall people, giants and very, very short people might be like on a different planet or something. It could be it could be a completely different distribution. It could be an exponential distribution, could be any type. Absolutely any type of distribution. It doesn't matter what it looks like. But we're going to use this example because it's very different to a normal distribution just to make that point clear. So you have a population with a distribution like that. Then if you take one sample from that distribution. So the red part over here that we had, if you take one sample from this distribution, what will look like? Well, it could look like something like this. It doesn't have to. Samples are random, so you might get a lot from the left or a lot from the right. But generally speaking, this is a valid sample that you could get from this population. And as you can see, it kind of resembles the population a bit and the more you take. So these are every single box here is a is an observation that you took from from your population. The more you take, the larger your sample, the closer it will generally be to your population distribution. But it doesn't have to. Again, you could have just gotten all of the boxes from here just randomly or all of the boxes from here. And those are really two things that are quite restrictive about the sample. So by taking the sample, you might get something that resembles in a way your population distribution, which you don't know anything about, and therefore how can you model it? How can you come up with equations for it and so on. So even if you can resemble it, that's not really helpful. And the second thing is that there is not much. There's a chance that you will get something different. You'll just be by chance because of your the way your samples have your sample was picked, you might get something that's all over here or over there. So it's completely unreliable in that sense, given that you don't know your population, underlying population distribution, that can be completely random. It doesn't have to be a normal distribution. But then when you take the sampling distribution, what happens is magic. Basically, it's always going to be a normal distribution. So that's what the central limit theorem states, and that's just scratching the surface of the central limit theorem. That's kind of like the overall concept. There's there's much more to it and we'll talk about that in the next tutorial. But just that in itself is a super powerful concept. And the reason why that is so important is because we can apply it in so many different ways in our lives, in the world. Take, for instance, a book and look at the or look at the length of words in your book so they won't be distributed normally in no way. You will have very short words with one letter. For instance, I or a then you'll have the length of words peaks around about four letters because you have a lot of words such as the and, and so on that are used very commonly and then it drops off. But you can have words which are up to ten, 12 letters long and so on. So it kind of goes like it goes up and then goes down like that. So it's not a normal distribution. But at the same time, if you take the average length of all of the words on every single page for a book, so basically your page is your sample and you take the average length of all of the words on your page, and then you do that for every single page in the book. Then you look at the sampling distribution of those averages that you got. It will be a normal distribution. Another example of how powerful the central limit theorem is are biological systems. Why is it the case that often in biological systems researchers can treat things that are happening as if they were normally distributed? Well, the reason for that is that in biological systems, when you have something happening, for instance, you're observing how a some sort of medicine, what kind of effect it has on a human being. Well, that is the result of thousands and thousands and thousands of random events. And even though we might not know the underlying distribution of those events in the human body, but what we do know is that once we take the sampling distribution, so if we treat all those events as samples of a certain distribution and then we take the sampling distribution of the sample mean or so you take the means. As we can see here, you take all the means for all of those events and you take the sampling distribution, then it will be distributed normally. And that is super powerful. Of course, it's much more complex than that. And I'm not a researcher in medicine or biological systems, so I can't comment on that thoroughly. But that is the intuition behind what is going on there and why the central limit theorem has so much application. And and but for us, of course, it's more important in business scenarios. And there are cases when you can extract additional information for business events using the power of the central limit theorem and people around you will be like, Whoa, how did you just do that? What just happened? But because you know, the beauty of the central limit theorem and the power of it, you will if you train yourself up, you will know when exactly you should apply it and how you can get information in or insights in certain situations when others simply don't know what to do. And on that note, that was the intro to the Central Limit Theorem. In the next tutorial, we'll go dive a bit deeper and have a bit of a play around with some practical stuff. So I hope you enjoyed today's tutorial and I look forward to seeing you next time. Until then, happy analyzing.