There are many different distributions used in statistics. They're used for different purposes. Today we'll learn about the Student's t-distribution, which is usually applied to samples of small size. Unfortunately, the case of small sample sizes is not the same as the case of big ones, where we can use the normal, binomial or chi-squared distributions. t-distribution is often used for determination of population parameters and hypothesis testing.
Definition
In many real situations, we don't know the actual population characteristics. For instance, we might want to know the average height of all adult men in a country, but it's almost impossible to measure everyone. In these cases, we usually take a small sample and calculate the sample's mean and variance. However, small samples can be inconsistent and may yield high variability. That's when the Student's t-distribution, or simply the t-distribution, becomes useful. This distribution resembles the normal distribution, but it assigns more probability to values further away from the mean. Because of this attribute, the t-distribution is better suited for small sample sizes as it considers the increased variability.
So, the t-distribution is widely used in statistics, especially for small sample sizes (usually less than ), particularly when population parameters like the standard deviation are unknown.
William Sealy Gosset developed the t-distribution under the pseudonym Student, thus it is often referred to as Student's t-distribution.
T-value
We know when to use the t-distribution, but we haven't discussed how it's used. At the heart of the t-distribution is the t-value, which helps determine if the difference we notice in our sample data is due to luck, or if it signifies a real, significant variance.
For example, imagine you have a bag of marbles, each weighing on average, grams. Now, if you pick up a handful of marbles ( marbles, let's say) and find they weigh grams on average. You might ask: Is this group of marbles genuinely heavier than average, or did you just randomly pick a few unusually heavy ones?
You can answer this question with the t-value. Calculate it by the difference between your handful of marbles' average weight (sample mean ) and the average weight of all marbles in the bag (population mean ), and then divide this by the standard error divided by the root of the sample size .
The larger the t-value, the more likely it is that the observed difference between the sample mean and the population mean is not due to randomness, but is a real difference. If the t-value is small, it implies the difference could be down to chance. We decide if the t-value is large or small after comparing it with the critical value of the t-distribution.
In our marble example, . Is it big or small? To conclude, we should use a t-test. We'll discuss this in the last part of our topic.
T-distribution and degrees of freedom
In statistics, degrees of freedom refer to the number of options or choices you have left after making certain decisions or calculations and is denoted as . For example, if you have three apples, and you decide to eat one, you only have two degrees of freedom left— you can now pick from two apples, not three.
In the case of the Student's t-distribution, degrees of freedom are directly related to the size of your sample. For example, if you have a sample of marbles, and you know that each one weighs grams on average, the degrees of freedom for their weights would be minus , which equals . Why is that? While calculating the average weight, you add up the weights of all the marbles and then divide by the number of marbles (in this case, ). After arriving at a mean, the last marble's weight isn't free to vary anymore. For example, suppose the sum of the first marbles' weights is grams and the average weight is grams. The weight of the 10th marble must be grams, because the total weight has to be grams. Hence, the weight of the 10th marble isn't an independent piece of the information— it depends on the weights of all the other marbles. Here, the subtraction of represents the degree of freedom used while calculating the mean.
Parameter influences the shape of the t-distribution. Let's examine how it does so more closely.
Shape of the t-distribution
The t-distribution is symmetric and bell-shaped, meaning it peaks in the middle and decreases equally on both sides. The peak represents the most probable value, which is the mean, and as you move away from the peak towards the tails, the values become less likely. Doesn't this remind you of a normal distribution? You can see both the standard normal distribution (, ) and t-distribution plotted side by side here.
While glancing at the graph of the t-distribution, you might notice that it looks similar to the normal distribution. Indeed, it is symmetric and bell-shaped, but there's a discernible difference in curvature. This is because the t-distribution accommodates the uncertainty that surfaces primarily due to small sample sizes. Higher values of tend to generate a distribution that more closely resembles a normal distribution.
As we introduce more t-distributions with different values to our plot, we can see how the degrees of freedom influence the shape of the t-distribution.
The larger the value of degrees of freedom, the closer the t-distribution will resemble the normal distribution! If we have fewer degrees of freedom, implying a smaller sample, the t-distribution is broader and has thicker tails. This reflects the greater spread and higher variability of smaller samples. As the sample size and degrees of freedom increase, the t-distribution begins to look more like a normal distribution. It becomes narrower, with thinner tails, indicating that larger samples are generally more reliable and exhibit less variability. Regardless of the value, the area under the entire curve equals , like all other distributions.
After understanding the t-distribution, you might wonder about its uses. So, let's explore that in the next part of this topic!
Application of the t-distribution
The primary use of Student's t-distribution is in hypothesis testing, particularly when comparing a sample mean with the mean of the population.
Here's a simple scenario. Suppose you're a math teacher and created a test for your students expecting the average score to be . However, when you administer the test to a small class of students, the average score is . You might be curious: Is this class performing better than average, or is the higher score just randomly occurring because of the small sample size? Here's where the t-distribution comes in handy. It supports a statistical method called a t-test which determines whether the difference between the sample mean (average score of 20 students) and the population mean (expected average score of ) is statistically significant.
Put simply, the t-test helps establish whether the observed difference likely denotes a real, significant difference (the class is genuinely performing better than average) or whether it could just be a random variation due to the small sample size. The t-test computes a t-value, which we compare to a critical t-distribution value based on our chosen significance level and the degrees of freedom linked to our sample size. If the computed t-value is more significant than the critical value, we would reject the hypothesis that the class is performing at an average level and conclude that the class is, indeed, doing better than average.
We've only briefly discussed the t-test here, but a more detailed explanation is provided in a separate topic.
Conclusion
Let's recap the vital points we've covered today:
The Student's t-distribution is particularly significant when analyzing small sample sizes.
The t-value helps identify if the difference observed in sample data is due to chance, or if it signifies a significant difference; calculated using the formula
Similar to the normal distribution, the Student's t-distribution is symmetric and bell-shaped but has thicker tails.
Degrees of freedom influence the shape of the Student's t-distribution as follows: the more degrees of freedom we have, the more the t-distribution resembles the normal distribution, becoming narrower and with thinner tails. When we have fewer degrees of freedom, the t-distribution is more expansive and has thicker tails reflecting the less reliability and higher variability of smaller samples.
The primary application of the Student's t-distribution is the t-test, a statistical method that helps determine if the difference between the sample mean and the population mean is statistically significant.