Imagine we are conducting a study to determine the average height of citizens in a city. Due to the difficulty of measuring the height of every individual, we decided to take a large sample of people and measured their mean height to be 1.7 meters.
We know that the sample mean is not necessarily equal to the population mean. Yes, close but how close? Can the population mean be 2.5 meters in this city? Can it be 1 meter? For a certain level of confidence within which margin will the population mean lie?
The answer is in the confidence interval. Confidence interval is a concept based on the central limit theorem, it's used to make inferences about a population by looking at just its sample, also it provides a range of values that the population means will be estimated to fall in.
In this topic, you are going to learn what the confidence interval for the mean is, how to calculate it, and how can we use it in real-life situations.
Large sample confidence interval
A confidence interval is a range, not just a single value that represents the population mean. So, it is determined by finding the largest and smallest values the population mean can take based on our sample. Any value within this range is considered a plausible estimate for the population mean. The central limit theorem is crucial in establishing this interval.
According to the theorem, if the sample size is sufficiently large (≥30), the samples will follow a normal distribution. We leverage this knowledge to make inferences using the properties of the normal distribution.
The way to find the confidence interval for our sample is by following these systematic steps:
Step one (confidence level)
The level of confidence is a value dependent on our study's requirements. It represents the probability that the interval contains the true population mean. For example, if the confidence interval is 95% implies that if we took 100 samples from the population and measured their confidence intervals, 95 of those intervals would likely contain the true population mean.
To accomplish this, we can utilize the . The is a number we use for confidence level and many other topics, it provides us with the value at which we can obtain the specific probability or confidence level we desire. One special characteristic of the normal distribution curve is that it helps us determine a certain level of confidence. Imagine you want to be 95% confident in your data. With the normal distribution curve, we figure out the exact point where the curve covers this confidence level – it's like finding the right spot on the curve. This specific point is represented by a value called "z critical." It tells us where the curve encompasses 95% of the total area. Besides, each is associated with a confidence level. Visually it looks as in this figure:
In confidence interval calculation, we determine the appropriate for our desired confidence level. For instance, if we want a 95% confidence level, we use a value of 1.96, which covers 95% of the area under sampling distribution.
In this context, the most commonly used confidence level and their are shown in this table:
Confidence level | |
80% | 1.28 |
90% | 1.68 |
95% | 1.96 |
98% | 2.33 |
99% | 2.58 |
99.8% | 3.08 |
99.9% | 3.27 |
But wait, what if the ideal confidence level for our study isn't listed in this table? such as 75% for instance, in these cases, we can resort to using -tables. -tables are valuable resources that offer corresponding for various confidence levels.
The challenge is that typical -tables offer -values for the area starting from the end and going up to the critical -value. Yet, when calculating a confidence interval, we require the area around the mean in both the right and left directions:
Let's denote areas on either of the sides by .
To find a 75% confidence level:
Then we see the closest corresponding value for
then the -value for a 75% confidence level is the sum of the highlighted intersection (negative is neglectable):
Another example: if we want to find a -critical value for a confidence level of 92%:
Then we look for the corresponding -critical value in the table:
The corresponding -critical value from the table:
If you want to see the -table in full, just refer to the z-table website.
Step two (finding margin of error)
The margin of error is a single value that describes the variability in mean estimates. It quantifies the potential discrepancy between sample data and the true population parameters. We sum it to the sample mean and get the largest estimate of the population mean, we substruc it from the sample mean to get the smallest estimate of the population mean, this value is calculated using the following formula:
: associated with the chosen confidence level;
: population standard deviation;
: sample size;
For example: If the population has a standard deviation of 1 and we took a sample of size 100 and we need to find the margin of error with a 95% confidence level. From the previous step, we know that the -value associated with the 95% confidence level is 1.96. so, the margin of error will be:
Step three (constructing confidence interval)
After calculating the margin of error, we sum up this value to sample mean to get the largest estimate of the population, and we subtract it from it to get the smallest estimate.
In the previous example, If the sample mean is 5 and the margin of error is 0.196. The confidence interval will be calculated as follows.
This means that the population mean with a probability of 95% will fall between 4.804 and 5.196.
Example: Average daily calories intake
To estimate the average daily calorie intake per person in a city, Health Organization took a sample of 50 citizens and found the average calories to be 3150 Kcal, with the calorie intake standard deviation for the city being 430 Kcal. The sampling distribution was approximately normal. Construct the confidence interval for the mean with 95% confidence level:
1) Finding the -value for the confidence interval:
From the table of commonly used confidence levels, -value for a 95% confidence interval is 1.96 (z = 1.96).
2) Finding the margin of error:
3) Constructing confidence interval:
Now we are 95% confident that the true average daily calorie intake of citizens in the city falls between 3030.8 and 3269.2 Kcal, based on the sample.
Confidence interval using sample standard deviation
There are two problems that we often face while finding confidence interval for our sample, first is that the population standard deviation is unknown and this happens in most cases. Or that the sample size is small. To solve those problems, rather than using population standard deviation, we use the standard deviation of our sample.
And, the second change is that instead of using -tables we use what is called -tables. -tables allow us to obtain more accurate confidence intervals and margin of error estimates when dealing with small sample sizes or situations where the population standard deviation is unknown. This leads us to the following formula:
: the corresponding value of chosen confidence interval in -table
: sample standard deviation
: sample size
Using t-tables
To use the -table, there are two essential requirements: the confidence level and the degree of freedom. The degree of freedom is calculated by subtracting 1 from the sample size (). The -table is organized with the degree of freedom listed vertically and the confidence level displayed horizontally.
For instance, if the sample size is 15 and desired confidence level is 95%, the degree of freedom will be 14. Therefore, the value can be found at the intersection of the degrees of freedom equal to 14 and the 95% confidence level in the t-table.
So, -value for this case will be equal to 2.145.
Example: studying time in school
A teacher took a simple random sample of 20 students to estimate the average studying time in the entire school, each student was asked about the time they spent studying. The average studying time for the sample turned out to be 2 hours and sample standard deviation to be 0.5 hours, and the sample data was symmetric.
What is the confidence interval, given the sample and a confidence level of 90%?
1) find the -value for the given sample:
The degree of freedom: . With a confidence level of 90%.
From the table .
2) Margin of error:
3) Constructing confidence interval:
This means the teacher can be 90% confident that the true average studying time of students in the school falls between 1.8 and 2.193 hours, based on the sample collected.
for the whole t-table go to
https://www.tdistributiontable.com/
Valid confidence intervals
Our sample must follow some conditions that make the confidence interval valid:
A sampling distribution is normal: we know from the central limit theorem that if the sample size is large enough sampling distribution will approximately follow a normal distribution.
Random Sampling: The data should be collected randomly from the population of interest. So that each data point in the population has an equal chance of being selected for the sample.
Independence: Sample data should remain unaffected by one another. For instance, consider two individuals with different heights. The height of one person should have no bearing on the other person's height or any other characteristics.
Conclusion
The confidence interval is a concept based on the central limit theorem. It provides a valuable estimate for the likely range of the true population mean. It is calculated by following these three steps:
Finding -value associated with the confidence level;
Calculating the margin of error;
Constructing confidence interval.