Udemy

What is Hypothesis Testing?

A free video tutorial from Minerva Singh
Bestselling Instructor & Data Scientist(Cambridge Uni)
Rating: 4.4 out of 5Instructor rating
60 courses
142,985 students
What is Hypothesis Testing?

Learn more from the full course

Applied Statistical Modeling for Data Analysis in R

Your Complete Guide to Statistical Data Analysis and Visualization For Practical Applications in R

09:55:31 of on-demand video • Updated November 2024

Analyze their own data by applying appropriate statistical techniques
Interpret the results of their statistical analysis
Identify which statistical techniques are best suited to their data and questions
Have a strong foundation in fundamental statistical concepts
Implement different statistical analysis in R and interpret the results
Build intuitive data visualizations
Carry out formalized hypothesis testing
Implement linear modelling techniques such multiple regressions and GLMs
Implement advanced regression analysis and multivariate analysis
English [Auto]
So before we move on to the practical aspects of this section, I'm going to discuss what is hypothesis testing? This is the technique for using data to validate a claim about a population in a formalized manner. So suppose we want to assume that the population mean or the average delivery time for delay for getting a Domino's Pizza delivered to home is 30 minutes. So these are the kind of claims that we can examine using hypothesis testing. And in hypothesis testing, we collect data from a sample of the population. We derive a given test statistic. And that is something we are going to see further on in this section. Measure a claim about a population parameter we have, and before we start deriving a test statistic, we start from a baseline claim known as the null hypothesis or H naught and that says that no significant difference exists between the specified population or there is no relationship between the variables. After we compute the test statistics, we weigh the strength of the evidence and for that we compute something known as P values. If the P value is less than 0.05, then there is strong evidence against null hypothesis and the results are statistically significant. So, you know, there are statistically significant differences between. Specified populations and if the P value is greater than 0.05, we do not reject the null hypothesis and this cut off value. 0.05 is the most commonly used in literature, although you can use values like 0.01 or 0.1, but across most disciplines, if your P value is less than 0.05, then your results can be assumed to be statistically significant. So now I'm going to talk you through using P values and say first we are going to start by computing a test statistic. Say a p value. And what this is, is something that we will cover further on. And if the test statistic is large enough in either direction, we reject the null hypothesis. But the question is how far? So if our data are normally distributed like this or we have a large enough data sample, then we assume a normal bell curve. And this is a normal bell curve with the mean value at zero and standard deviation one. If the null hypothesis is true, then 95% of the test statistic test samples will result in the test statistic lying within the two standard errors of the claim. So 95% of the test samples will result in the test statistic lying somewhere here in this area. But beyond this, say at these tails, we reject the null hypothesis. The simple definition of P value is the probability associated with the test statistics. And further out the test statistic is on the tail of the Z distribution. Like so smaller is the P value and this is the probability of obtaining an effect as extreme as the one in our sample data, assuming the truth of the null hypothesis. So if our P value is 0.03, we would obtain such an observed difference or more in 3% of the studies due to random sampling error. So whatever we are observing, say the difference between population, the chances of that being because of an error are around 3% assuming your P value is 0.03. So now when we discuss how far and at what level do we reject the null hypothesis, we have a cutoff for accepting or rejecting the null hypothesis. And this is something that has been defined in literature and it's same for almost all disciplines. And we select a cutoff point called the alpha level like so and below a certain alpha level, we establish, we establish the significance level of the test. So if the cutoff value is less than 0.05, we reject the null hypothesis and assume that the differences are statistically different. But since all of statistics or most of statistics, it revolves around our samples because we can rarely access the full population. There's a scope for error, and there are two types of errors in hypothesis testing. Type one error. And in this, these are the false alarm kind of errors where we reject the null hypothesis when we should not. And type two error, and this is the case of missing detection. So not rejecting the null hypothesis when we should a large enough data sample can avoid this. The ability to detect null hypotheses is the power of the hypothesis test. And now if these things they feel a bit confusing to you. Further on in this section, we are going to look at the different ways of carrying out formalized hypothesis testing for different kinds of data samples. And finally, we are going to see how we can have confidence in our hypothesis testing and look at the power of the test.