What is Hypothesis Testing?

Minerva Singh
A free video tutorial from Minerva Singh
Bestselling Instructor & Data Scientist(Cambridge Uni)
4.3 instructor rating • 41 courses • 72,649 students

Learn more from the full course

Applied Statistical Modeling for Data Analysis in R

Your Complete Guide to Statistical Data Analysis and Visualization For Practical Applications in R

09:35:11 of on-demand video • Updated August 2020

  • Analyze their own data by applying appropriate statistical techniques
  • Interpret the results of their statistical analysis
  • Identify which statistical techniques are best suited to their data and questions
  • Have a strong foundation in fundamental statistical concepts
  • Implement different statistical analysis in R and interpret the results
  • Build intuitive data visualizations
  • Carry out formalized hypothesis testing
  • Implement linear modelling techniques such multiple regressions and GLMs
  • Implement advanced regression analysis and multivariate analysis
English [Auto] So before we move on to the practical aspects of this section, I'm going to discuss what is a hypothesis testing. This is a technique for using data to validate a claim about a population in a formalized manner. So suppose we want to assume that the population mean or the average delivery time for the league for getting a Domino's Pizza delivered to home in 30 minutes. So these are the kind of claims that we can examine using hypothesis testing and then hypothesis testing. We collect data from a sample of the population. We derive a given test statistic and that is something we are going to see further on in the section. Measure a claim about a population parameter we have. And before we start driving a test statistic, we start from a baseline claim known as the null hypothesis or H not. And that says that no significant difference exists between the specified population or there's no relationship between the variables. After we compute the test autistics, we weigh the strength of the evidence and for that we compute something known as P values. If the P value is less than zero point zero five, then there's strong evidence against null hypothesis and the results are statistically significant. So, you know, there are statistically significant differences between. Specified populations, and if the p value is greater than zero point zero five, we do not reject the null hypothesis and this cut off value, zero point zero five is the most commonly used in literature, although you can use values like zero point zero one or zero point one. But across most disciplines, if your P value is less than zero point zero five, then your results can be assumed to be statistically significant. So now I'm going to talk you through using P values and say, first, we are going to start by computing a test statistic 30 value. And what this is, is something that we will cover further on. And if the test statistic is large enough in either direction, we reject the null hypothesis. But the question is how far? So if our data are normally distributed like this or we have a large enough data sample, then we assume a normal bell curve. And this is a normal bell curve. But the mean value at zero one standard deviation one, if the null hypothesis is true, then 95 percent of the test statistic samples will result in a test statistic lying within the two standard errors of the claim. So 95 percent of the test samples will result in the test statistic lying somewhere here in this area. But beyond the say at these tails, we reject the null hypothesis. The simple definition of p value is the probability associated with the test statistics and further out, the test statistic is on the tail of the Z distribution like source. Molalla is the P value and this is the probability of obtaining an effect as extreme as the one in our sample data, assuming the truth of the null hypothesis. So if our P value is zero point zero three, we would obtain such an observed difference or more in three percent of the studies due to random sampling error. So whatever we are observing say the difference between population, the chances of that being because of an error around three percent, assuming your P value is zero point zero three. So now when we discuss how far and at what level do we reject the null hypothesis, we have a cutoff for accepting or rejecting the null hypothesis. And this is something that has been defined in literature, and it's same for almost all disciplines. And we select a cutoff point called alpha level like so. And below a certain alpha level, we establish we establish the significance level of the test. So if the cutoff value is less than zero point zero five, we reject the null hypothesis and assume that the differences are statistically different. But since all of statistics or most of statistics, it revolves around our samples because we can really access the full population, there's a scope for error. And there are two types of errors and hypothesis testing, like one error. And in this, these are the false alarm kind of errors where we reject the null hypothesis when we should not. And Type two error. And this is the case of missing detection. So not rejecting the null hypothesis when we should a large enough data sample can avoid this. The ability to detect null hypothesis is the power of the hypothesis test. And now if these things they feel a bit confusing to you further on in this section, we are going to take a look at the different ways of getting out formalized hypothesis testing for different kinds of data samples. And finally, we are going to see how we can have confidence in our hypothesis testing and look at the power of the test.