Understand the fundamentals of statistics
Learn how to work with different types of data
How to plot different types of data
Calculate the measures of central tendency, asymmetry, and variability
Calculate correlation and covariance
Distinguish and work with different types of distributions
Estimate confidence intervals
Perform hypothesis testing
Make data driven decisions
Understand the mechanics of regression analysis
Carry out regression analysis
Use and understand dummy variables
Understand the concepts needed for data science even with Python and R!
Before we can talk about testing, we have
to learn what a distribution is. And in this lesson, we’ll do just that!
In statistics, when we use the term distribution, we usually mean a probability distribution.
Good examples are the Normal distribution, the Binomial distribution, and the Uniform
distribution. Alright. Let’s start with a definition!
A distribution is a function that shows the possible values for a variable and how often
they occur. Think about a fair die.
It has six sides, numbered from 1 to 6. We roll the die. What is the probability of getting
1? It is one out of six, so one-sixth, right?
Easy. What is the probability of getting 2? Once
again - one-sixth. The same holds for 3, 4, 5 and 6.
We have an equal chance of getting each of the 6 outcomes.
Now. What is the probability of getting a 7?
It is impossible to get a 7 when rolling a single die.
Therefore, the probability is 0. Okay. Let’s generalize.
The distribution of an event consists not only of the input values that can be observed
but is made up of all possible values. So, the distribution of the event - rolling a
die - will be given by the following table. The probability of getting one is one-sixth,
or 0.17, the probability of getting 2 is 0.17, and so on.
We are sure that you have exhausted all possible values when the sum of their probabilities
is equal to 1 or 100%. Similar to what we discussed about getting
a 7, for all other values, the probability of occurrence is 0.
And that’s the probability distribution of rolling a die. By the way, it is called
a discrete uniform distribution. All outcomes have an equal chance of occurring.
Okay. Each probability distribution has a visual
representation. It is a graph describing the likelihood of occurrence of every event. Here’s
the graph for our example. It is crucial to understand that the graph
is JUST a visual representation of a distribution. Often, when we talk about distributions, we
make use of the graph. That’s why many people believe that a distribution is the graph itself,
however, this is NOT true. A distribution is defined by the underlying probabilities
and not the graph. The graph is just a visual representation.
Alright. After this short clarification, let’s explore a different example.
Think about rolling two dice. What are the possible outcomes?
One and one, two and one, one and two, and so on. Here’s a table with all the possible
combinations. Say we are playing a game, where we are trying
to guess the sum of the two dice. What’s the probability of getting a sum
of 1? It’s 0, as this event is impossible. The minimum sum we can get is 2. So, what’s
the probability of getting a sum of 2? There is only one combination that would give us
a sum of 2 – when both dice are equal to 1.
So, 1 out of 36 total outcomes, or 0.03. Similarly, the probability of getting a sum
of 3 is given by the number of combinations that give a sum of three divided by 36. Therefore,
2 divided by 36, or 0.06. We can continue in this way until we have
the full probability distribution. Let’s see the graph associated with it.
Looking at it we can easily understand that when rolling two dice, the probability of
getting a 7 is the highest. Moreover, we can also compare different outcomes such as: the
probability of getting a 10 and the probability of getting a 5. It’s evident that it’s
less likely that we’ll get a 10. Great!
The examples that we saw here were of discrete variables. Next, we will focus on continuous
distributions, as they are more common in inferences.
In the next few lessons, we’ll examine some of the main types of continuous distributions,
starting with the Normal distribution. See you there!