MathStatisticsInferential StatisticsHypothesis testing

Confidence Intervals for the Mean

13 minutes read

Imagine we are conducting a study to determine the average height of citizens in a city. Due to the difficulty of measuring the height of every individual, we decided to take a large sample of people and measured their mean height to be 1.7 meters.

We know that the sample mean is not necessarily equal to the population mean. Yes, close but how close? Can the population mean be 2.5 meters in this city? Can it be 1 meter? For a certain level of confidence within which margin will the population mean lie?

The answer is in the confidence interval. Confidence interval is a concept based on the central limit theorem, it's used to make inferences about a population by looking at just its sample, also it provides a range of values that the population means will be estimated to fall in.

In this topic, you are going to learn what the confidence interval for the mean is, how to calculate it, and how can we use it in real-life situations.

Large sample confidence interval

A confidence interval is a range, not just a single value that represents the population mean. So, it is determined by finding the largest and smallest values the population mean can take based on our sample. Any value within this range is considered a plausible estimate for the population mean. The central limit theorem is crucial in establishing this interval.

According to the theorem, if the sample size is sufficiently large (nn≥30), the samples will follow a normal distribution. We leverage this knowledge to make inferences using the properties of the normal distribution.

The way to find the confidence interval for our sample is by following these systematic steps:

Step one (confidence level)

The level of confidence is a value dependent on our study's requirements. It represents the probability that the interval contains the true population mean. For example, if the confidence interval is 95% implies that if we took 100 samples from the population and measured their confidence intervals, 95 of those intervals would likely contain the true population mean.

To accomplish this, we can utilize the zscorez-score. The zscorez-score is a number we use for confidence level and many other topics, it provides us with the value at which we can obtain the specific probability or confidence level we desire. One special characteristic of the normal distribution curve is that it helps us determine a certain level of confidence. Imagine you want to be 95% confident in your data. With the normal distribution curve, we figure out the exact point where the curve covers this confidence level – it's like finding the right spot on the curve. This specific point is represented by a value called "z critical." It tells us where the curve encompasses 95% of the total area. Besides, each zscorez-score is associated with a confidence level. Visually it looks as in this figure:

z-score for some confidence levels using normal distribution

In confidence interval calculation, we determine the appropriate zscorez-score for our desired confidence level. For instance, if we want a 95% confidence level, we use a zz value of 1.96, which covers 95% of the area under sampling distribution.

In this context, the most commonly used confidence level and their zscorez-score are shown in this table:

Confidence level

zscorez-score

80%

1.28

90%

1.68

95%

1.96

98%

2.33

99%

2.58

99.8%

3.08

99.9%

3.27

But wait, what if the ideal confidence level for our study isn't listed in this table? such as 75% for instance, in these cases, we can resort to using zz-tables. zz-tables are valuable resources that offer corresponding zscorez-score for various confidence levels.

The challenge is that typical zz-tables offer zz-values for the area starting from the end and going up to the critical zz-value. Yet, when calculating a confidence interval, we require the area around the mean in both the right and left directions:

the right and left directions

Let's denote areas on either of the sides by AA.

A=1cl2A = \frac{1-cl}{2}

To find a 75% confidence level: A=cl+12=10.752=0.125A= \frac{cl+1}{2}=\frac{1-0.75}{2}=0.125

Then we see the closest corresponding value for A=0.125A=0.125

 value for A = 0.125

then the zz-value for a 75% confidence level is the sum of the highlighted intersection (negative is neglectable):

z=1.1+0.05=1.15z^*=1.1+0.05=1.15

Another example: if we want to find a zz-critical value for a confidence level of 92%:

A=10.922=0.04A=\frac{1-0.92}{2}=0.04

Then we look for the corresponding zz-critical value in the table:

z-critical

The corresponding zz-critical value from the table:

z=1.7+0.05=1.75z^*=1.7+0.05=1.75

If you want to see the zz-table in full, just refer to the z-table website.

Step two (finding margin of error)

The margin of error is a single value that describes the variability in mean estimates. It quantifies the potential discrepancy between sample data and the true population parameters. We sum it to the sample mean and get the largest estimate of the population mean, we substruc it from the sample mean to get the smallest estimate of the population mean, this value is calculated using the following formula:

E=zσnE= z\frac{\sigma}{\sqrt{n}}

zz : zscorez-score associated with the chosen confidence level;

σ\sigma: population standard deviation;

nn: sample size;

For example: If the population has a standard deviation of 1 and we took a sample of size 100 and we need to find the margin of error with a 95% confidence level. From the previous step, we know that the zz-value associated with the 95% confidence level is 1.96. so, the margin of error will be:

E=zσn=1.961100=0.196E= z\frac{\sigma}{\sqrt{n}}= 1.96\cdot\frac{1}{\sqrt{100}}= 0.196

Step three (constructing confidence interval)

After calculating the margin of error, we sum up this value to sample mean to get the largest estimate of the population, and we subtract it from it to get the smallest estimate.

Confidence interval = sample mean ±margin of error\text{Confidence interval = sample mean }\pm \text{margin of error}

In the previous example, If the sample mean is 5 and the margin of error is 0.196. The confidence interval will be calculated as follows.

[sample mean+margin of error,sample meanmargin of error]=[50.196,5+0.196]=[4.804,5.196][\text{sample mean} + \text{margin of error}, \text{sample mean} - \text{margin of error}] \\ =[5-0.196, 5+0.196] \\= [4.804,5.196]

This means that the population mean with a probability of 95% will fall between 4.804 and 5.196.

Example: Average daily calories intake

To estimate the average daily calorie intake per person in a city, Health Organization took a sample of 50 citizens and found the average calories to be 3150 Kcal, with the calorie intake standard deviation for the city being 430 Kcal. The sampling distribution was approximately normal. Construct the confidence interval for the mean with 95% confidence level:

1) Finding the zz-value for the confidence interval:

From the table of commonly used confidence levels, zz-value for a 95% confidence interval is 1.96 (z = 1.96).

2) Finding the margin of error:

E=zσn=1.9643050=119.2E= z\frac{\sigma}{\sqrt{n}}=1.96\cdot\frac{430}{\sqrt{50}}=119.2

3) Constructing confidence interval:

Confidence interval = sample mean ±margin of error\text{Confidence interval = sample mean }\pm \text{margin of error}

[3150119.2,3150+119.2]=[3030.8,3269.2][3150-119.2,3150+119.2]=[3030.8,3269.2]

Now we are 95% confident that the true average daily calorie intake of citizens in the city falls between 3030.8 and 3269.2 Kcal, based on the sample.

Confidence interval using sample standard deviation

There are two problems that we often face while finding confidence interval for our sample, first is that the population standard deviation is unknown and this happens in most cases. Or that the sample size is small. To solve those problems, rather than using population standard deviation, we use the standard deviation of our sample.

And, the second change is that instead of using zz-tables we use what is called tt-tables. tt-tables allow us to obtain more accurate confidence intervals and margin of error estimates when dealing with small sample sizes or situations where the population standard deviation is unknown. This leads us to the following formula:

E=tsnE=t\cdot\frac{s}{\sqrt{n}}

tt: the corresponding value of chosen confidence interval in tt-table

ss: sample standard deviation

nn: sample size

Using t-tables

To use the tt-table, there are two essential requirements: the confidence level and the degree of freedom. The degree of freedom is calculated by subtracting 1 from the sample size (nn). The tt-table is organized with the degree of freedom listed vertically and the confidence level displayed horizontally.

t table

For instance, if the sample size is 15 and desired confidence level is 95%, the degree of freedom will be 14. Therefore, the tt value can be found at the intersection of the degrees of freedom equal to 14 and the 95% confidence level in the t-table.

highlighted 1

So, tt-value for this case will be equal to 2.145.

Example: studying time in school

A teacher took a simple random sample of 20 students to estimate the average studying time in the entire school, each student was asked about the time they spent studying. The average studying time for the sample turned out to be 2 hours and sample standard deviation to be 0.5 hours, and the sample data was symmetric.

What is the confidence interval, given the sample and a confidence level of 90%?

1) find the tt-value for the given sample:

The degree of freedom: df=n1=201=19df = n-1 = 20 -1 =19. With a confidence level of 90%.

t-table

From the table t=1.729t= 1.729.

2) Margin of error:

E=tsn=1.7290.520=0.193E=t\cdot\frac{s}{\sqrt{n}}=1.729\cdot\frac{0.5}{\sqrt{20}}=0.193

3) Constructing confidence interval:

Confidence interval = sample mean ±margin of error\text{Confidence interval = sample mean }\pm \text{margin of error}

[20.193,2+0.193]=[1.807,2.193][2-0.193,2+0.193]=[1.807,2.193]

This means the teacher can be 90% confident that the true average studying time of students in the school falls between 1.8 and 2.193 hours, based on the sample collected.

for the whole t-table go to
https://www.tdistributiontable.com/

Valid confidence intervals

Our sample must follow some conditions that make the confidence interval valid:

  • A sampling distribution is normal: we know from the central limit theorem that if the sample size is large enough sampling distribution will approximately follow a normal distribution.

  • Random Sampling: The data should be collected randomly from the population of interest. So that each data point in the population has an equal chance of being selected for the sample.

  • Independence: Sample data should remain unaffected by one another. For instance, consider two individuals with different heights. The height of one person should have no bearing on the other person's height or any other characteristics.

Conclusion

The confidence interval is a concept based on the central limit theorem. It provides a valuable estimate for the likely range of the true population mean. It is calculated by following these three steps:

  • Finding zz-value associated with the confidence level;

  • Calculating the margin of error;

  • Constructing confidence interval.

2 learners liked this piece of theory. 2 didn't like it. What about you?
Report a typo