What is Hypothesis Testing?

Minerva Singh
A free video tutorial from Minerva Singh
Bestselling Udemy Instructor & Data Scientist(Cambridge Uni)
4.3 instructor rating • 39 courses • 70,035 students

Learn more from the full course

Applied Statistical Modeling for Data Analysis in R

Your Complete Guide to Statistical Data Analysis and Visualization For Practical Applications in R

09:35:11 of on-demand video • Updated August 2020

  • Analyze their own data by applying appropriate statistical techniques
  • Interpret the results of their statistical analysis
  • Identify which statistical techniques are best suited to their data and questions
  • Have a strong foundation in fundamental statistical concepts
  • Implement different statistical analysis in R and interpret the results
  • Build intuitive data visualizations
  • Carry out formalized hypothesis testing
  • Implement linear modelling techniques such multiple regressions and GLMs
  • Implement advanced regression analysis and multivariate analysis
English [Auto] So before we move on to the practical aspects of this section I'm going to discuss what is hypothesis testing. This is the technique for using data to validate a claim about a population in a formalized manner. So suppose we want to assume that the population mean or the average delivery time for the league for getting a Domino's pizza delivered to home in 30 minutes. So these are the kind of claims that we can examine using hypothesis testing and then hypothesis testing we collect data from a sample of the population. We derive a given test statistic and that is something we are going to see further on in the section measure a claim about a population parameter we have and before we start driving or test at a state we start from a baseline claim known as the null hypothesis. H not an it that says that no significant difference exists between the specified population or there is no relationship between the variables to be compute the statistics. We weight the strength of the evidence and for that we compute something known as B values if the p value was less than zero point zero or five. Then there's strong evidence against null hypothesis and the results are statistically significant. So you know there are statistically significant differences between specified populations and if the p value is greater than 0.05 we do not reject the null hypothesis and this cutoff value 0.05 is the most commonly used in literature although you can use values like 0.01 or zero point 1 but across most disciplines if your p value is less than 0.05 then your results can be assumed to be statistically significant. So now I'm going to talk you through using p values and say first we are going to start by computing statistics at the value and what this is something that people cover for that on. And if the test to stick is large enough in either direction we reject the null hypothesis. But the question is how far. So if our data are normally distributed like this or we have a large enough data sample then we as human normal bell curve. And this is a normal bell curve. The mean value at zero and standard deviation 1 is the null hypothesis is true then 95 percent of the test statistic samples will result in a test that a state lying within the two standard errors of the claim. So 95 percent of the test samples will result in that statistic lying somewhere here in this area. But beyond this say that these pills we reject the null hypothesis the simple definition of p value is the probability associated with a test or the stakes and further out the test statistic is on the scale of the distribution like so smaller is the p value. And this is the probability of obtaining in effect as extreme as the one in our sample data assuming the truth of the null hypothesis. So if I were P-value is 0.03 we would obtain such an observed difference or more in 3 percent of the studies due to random sampling error. So whatever we are observing say that difference between population the chances are that being because of an error around 3 percent as humans are P-value is 0.03. So now when we discuss how far and at what level do we reject the null hypothesis. We have a cutoff for accepting or rejecting the null hypothesis. And this is something that has been defined in literature. And it's same for almost all disciplines and we select a cutoff point called alpha level like so and below a certain alpha level we establish we establish the significance level of the test. So if the cutoff value is less than 0.05 we reject them out of hypotheses and assume that the differences are statistically different. But since all the statistics are most just statistics it reveals that over samples because we can really access the full population. There's a scope for error and there are two types of errors and hypothesis testing like one add up. And in this these are the false alarm kind of arrows where we reject the null hypothesis when we should not and type 2 error. And this is the case of missing detection so not rejecting the null hypothesis when we should a large enough data sample can avoid this ability to detect null hypothesis is the power of the hypothesis test. And now if these things they feel a bit confusing to you further on in this section we are going to look at the different ways of getting out formalized hypothesis testing for different kinds of data samples. And finally we are going to see how we can have confidence in our hypothesis testing and look at the power of that test.