The chi-squared distribution with parameter , denoted as , is the gamma distribution with parameters and . The parameter is called the number of degrees of freedom (df).
In particular, a random variable follows the chi-squared distribution with degrees of freedom when its distribution is . So, its probability density function is:
That was pretty easy, don't you think? Now let's take a look at its main properties.
Basic properties
Since the distribution is just the distribution, you can discover its properties by taking advantage of what you already know about the gamma distribution. To begin with, the expectation and variance are quite easy to remember:
It's always useful to know the graph of the density functions and see how they vary as the parameters change. Since the only parameter is the degrees of freedom, this is straightforward:
Here, you can notice some interesting points:
It's skewed to the right.
When , it's unbounded near .
When it's exponentially decreasing. Actually, the chi-squared distribution with df is a exponential distribution, can you guess its parameter?
When , it's unimodal with a peak at .
Since , it's roughly centered around . When, as increases, the distribution becomes more spread out due to the fact that .
In fact, as the degrees of freedom increase, the distribution appears increasingly symmetrical:
If you look closely, it looks a lot like the normal distribution, and this is precisely the case. Head over to the next section for the details!
Relationship with the normal distribution
This is the most important property about your new distribution and marks the beginning of its close connection with the normal distribution. When you have a random variable with standard normal distribution, it turns out that has a chi-squared distribution with degree of freedom. But this is not all, since this result remains the same as you add more variables.
If are independent random variables with standard normal distribution, then:
This allows you to interpret a random variable as the sum of independent and identically distributed random variables.
Coincidentally, the famous central limit theorem states that the sum of independent and identically distributed random variables approximates the normal distribution. Actually, the more variables you add, the better the approximation will be. In future topics you will learn more about this fundamental result, but for now you've just solved the mystery of the previous section.
Normal sampling
Let's delve into inferential statistics. A random sample, with a size denoted by , is a collection of independent and identically distributed (i.i.d) random variables, denoted by . These variables share the exact same distribution. You make inferences about this distribution by summarizing the sample, such as adding its values.
The most common tools for analyzing random samples are the sample mean and the sample variance. The sample mean gives you an idea of where the sample concentrates, while the sample variance provides an insight into how spread out the variables are from the mean. They are defined as follows:
Usually, it's rather complex to determine the distributions of these new variables. However, when you obtain the sample from a normal distribution with parameters and , you can ascertain the distribution of both these variables. As expected, . But what about the sample variance? Here, the chi-squared distribution provides the answer!
If are independent random variables with a common distribution , then:
This finding is crucial for statistical inference. Many procedures and results rely on it, making the chi-squared distribution a important in statistics. But how useful is it?
Applications in inferential statistics
You can find the chi-squared distribution in many areas of statistics; it's nearly as widespread as the normal distribution. One striking aspect is its universality; regardless of the data's original distribution, essential transformations of the data (known as statistics) often have chi-squared distribution when you're dealing with large sample sizes.
This apply particularly useful because it enables you to replace complex original distributions with the chi-squared distribution, allowing you to perform calculations involving probability, understanding the mean, variance, and other aspects of your sample.
The two most utilized techniques are the goodness-of-fit test and the independence test. You use the former to see if a categorical data sample aligns well with a theoretical probability distribution you believe it might follow. For instance, imagine you're investigating the probability of a person's blood type being O, A, B, or AB. If you've guessed the probability of each type, denoted as , , and , you can then gather some data and check if observed behavior aligns with your hypothesized probabilities.
With the independence test, you're trying to spot a correlation between two categorical variables. Suppose you're examining different medicine options for a disease to see if they're more effective in certain age groups. After gathering some data, you use the chi-squared distribution to determine if this relationship actually exists, or if the effectiveness of the medications isn't dependent on age.
Conclusion
The chi-squared distribution with parameter is denoted denoted as . Its parameter is know as the degrees of freedom.
This distribution is a particular case of the gamma distribution. In other words, a random variable has a chi-squared distribution with degrees of freedom when its distribution is .
When , the pdf is given by
If , then and .
If are independent random variables with standard normal distribution, then
If are independent random variables with distribution , then