MathStatisticsInferential StatisticsHypothesis testing

Introduction to statistical hypothesis testing

13 minutes read

New ideas come to our minds every day. Some are good; some are not. These ideas drive us to changes in our life. For example, is the new season more interesting than the old one? Is it worth the time at all? Which of these ideas can be tested? In this topic, we will delve into a very intriguing topic — a statistical hypothesis, how to test it, and how to decide based on this analysis.

You and your friends are about to start a new series on Netflix — "The new adventures of Thomas Bayes". You asked your friends to rate the series on a scale from 0 to 10. After collecting the answers, you got a mean of 8.5. According to your friends, the series must be worth it. You've googled the official ratings and found nothing — no published data on this series, yet. You've carried out small research; it's turned out that similar series has an average rating of 7.0. Is it just a few people who thought it was worth watching, or is the series just mediocre?

Statistical hypothesis

You've probably heard that the main purpose of statistics is to test hypotheses. So, what is a hypothesis? Formally, it is a claim that can be investigated or tested. As can be understood from the definition, the ability to test a hypothesis is the main criterion that makes a hypothesis out of a simple assumption. Take a look at an example:

An example of a measurable hypothesis and not a hypothesis..An example that uses existing data and not a hypothesis.

Now, we know that we need not only to test hypotheses, but also to measure them and find out whether they're based on the existing data. Statistics can help us understand which hypotheses work, and which are mere fluctuations in the data.

In our case, the statement "The series 'The new adventures of Thomas Bayes' has average ratings higher than 7.0" is a hypothesis. Another statement: "The series 'The new adventures of Thomas Bayes' has an average rating of 7.0" is a hypothesis, too. We need to test it. But how? Let's take a closer look at it.

Alternative and null hypotheses

Above, we've made our first hypotheses. However, in statistics, we don't prove whether a single hypothesis is true or false. To test a single hypothesis, we must specify two types of hypotheses. Let's introduce the null (H0H_0) and alternative (H1H_1) hypotheses. The null hypothesis is an initial claim that is based on the previous analysis; it indicates no effect or no difference from the previous knowledge. An alternative hypothesis is something we want to test; a statement that is different from our initial knowledge of a subject.

Let's return to our initial statement. Take a look at the following picture to determine what null and alternative hypotheses are:

Math expectation and meaning of the null hypothesis and an alternative hypothesis.

It is important to note that, similarly to the court, we are guided by the idea of innocent until proven guilty — a null hypothesis is considered true until the opposite is proved. In the case of rejecting a null hypothesis, we state that differences in the sample are statistically significant, and they most likely didn't arise by chance.

If we have insufficient evidence to prove an alternative hypothesis, we consider the null hypothesis to be correct.

At the end of the previous section, we've stated two hypotheses, namely:

  1. The series "The new adventures of Thomas Bayes" has average ratings higher than 7.0;

  2. The series "The new adventures of Thomas Bayes" has average ratings not equal to 7.0.

The first hypothesis is called directional. It means we specify a direction of an effect. In other words, we will test whether or not the positive effect ("higher than 7.0") took place.

The second one is called non-directional. In this case, the direction of the effect doesn't matter, we'll be testing whether or not there is any effect at all.

Significance level and p-value

Now, we are ready to test the hypotheses. How can we numerically describe them? For these purposes, we can use test statistics; we will talk about them in the following topics. With a test statistic, we can compare and examine the variables that represent a null hypothesis and calculate the so-called pp-value. PP-value is the probability of obtaining a result as extreme only by chance, assuming a null hypothesis is true. Seems hard? Now, let's take a look at our example.

Let's say, we have a pp-value of 0.010.01, then the probability to see the ratings of 7.0 or less (our H0H_0) in a sample is 1%1\%, which is likely to happen by chance. On the other hand, with a pp-value of 0.750.75, the probability to see the same ratings in a sample is 75%75\%. This is unlikely to occur strictly by chance.

But how do we decide which of the two hypotheses to choose? Is there a threshold for this? The answer is yes, and it's called the significance level (α\alpha). The lower the significance level, the more evidence is required to reject the null hypothesis. It's up to a researcher what values to set as a significance level. For many tasks like marketing, it is set by default as 0.050.05, but if we're talking about testing medical hypotheses, you can see a lot of variation in these values (for example, 0.010.01).

But what if you were wrong with the hypotheses?

Error types

Choosing between the two hypotheses can be tricky, so be careful.

If you've made a mistake, it is important to understand what type of mistake you have to make the right adjustments. For this analysis, take a look at the following terminology:

  • Type I Error reject a null hypothesis, though H0H_0 is true;

  • Type II Error — failure to reject a null hypothesis, though H1H_1 is true.

To get a better understanding, let's take a look at the following picture:

The types of errors in relation to the null hypothesis.

Both errors are very close. If we are trying to reduce Type I, it may lead to an increase in Type II. Let's turn back to α\alpha. How can it be connected with Type I and Type II Errors? Remember, α\alpha is something like a threshold. If it's too high, for example, if α\alpha is too big, it means that we may reject H0H_0, though it may be correct (Type I Error). If it's too low, (α\alpha is too small), it means that we are not rejecting H0H_0, though we may have enough evidence to reject it.

The general steps to test a hypothesis

So, the solution to the problem is related to data and statistics. We can consider it to be the following workflow:

A workflow to conclude that the null hypothesis is True or to reject the null hypothesis.

Some advice:

  1. Formulate your hypothesis first;

  2. Collect enough data for your hypothesis, as the sample size directly impacts the pp-value;

  3. Select the correct and suitable test statistic. This is some art that comes with experience.

  4. Calculate the pp-value according to the test;

  5. Compare the pp-value with the significance level and make a statistical decision;

  6. Is your null hypothesis correct?

Conclusion

Hypothesis testing is a powerful tool that allows you to make decisions based on real data. You don't have to examine each sample, as it allows you to look at everything from a bird's-eye view. In this topic, you've learned:

  • What is a statistical hypothesis;

  • How to formulate null and alternative hypotheses;

  • The difference between directional and non-directional hypotheses;

  • Error types and their connection with alternative and null hypotheses.

Further on, you'll learn more about hypothesis testing and will be able to test hypotheses by yourself. For now, let's practice what you've learned!

15 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo