MathStatisticsDistributions

Normal distribution

10 minutes read

Imagine the annual rainfall is modeled as a normal distribution with the mean of 80 cm80\ cm and the standard deviation of 20 cm20\ cm. A farmer will have a successful crop only if the rainfall is between 40 cm40\ cm and 100 cm100\ cm. What's the probability that the farmer will have a successful crop this year? To answer this, you'll delve into the topic of normal distribution.

Normal random variable

Normal random variables hold a crucial role in statistics and probability theory, serving as the foundation for many statistical models and analyses. Their importance goes beyond theory, having substantial practical applications, especially relating to error distribution assumptions.

Consider a continuous random variable XX to be Normal or Gaussian if it has a PDF that the following formula can describe:

fX(x)=1σ2πe(xμ)2/(2σ2)f_X(x) = \frac{1} {\sigma\sqrt{2\pi}}e^{-(x - \mu)^{2}/(2\sigma^{2}) }

If you see this formula for the first time, you might wonder where the term 1σ2π\frac{1} {\sigma\sqrt{2\pi}} comes from. It's a normalizing coefficient that ensures the total PDF integrates to 11, a requirement for all probability distributions.

The normal distribution graph resembles a symmetrical bell-shaped curve, centered around its mean.

Normal distribution

The symbol μ\mu is the mean, σ\sigma is the standard deviation, and σ2\sigma^{2} is the variance of the normal distribution. σ\sigma is assumed to be greater than zero.

To denote that a continuous random variable XX has a normal distribution, use the notation XN(μ,σ2)X \sim \mathcal{N}(\mu,\,\sigma^{2})\,.

In probability theory, the normal distribution is one of the most crucial distributions. It can accurately describe the distribution of values for many natural phenomena. Additionally, it offers convenient analytical properties.

Below are a few real-life examples of normally distributed data:

  • Distribution of human height across the population;

  • Human IQ;

  • Blood pressure;

  • Measurement errors.

The normal distribution is also key in statistical tests used to draw inferences about population parameters based on sample data, like t-test and z-test. They both presume the data follows a normal distribution. The z-test is used to test hypotheses about the population mean when the sample size is large. The t-test compares the means of two groups or compares a sample mean to a known population mean when the sample size is small.

Standard normal distribution

A normal random distribution, where the mean is zero (μ=0\mu = 0) and the standard deviation is one (σ=1\sigma = 1), is called a standard normal distribution and is denoted as N(0,1)\mathcal{N}(0,1)\,. The graph is centered around the μ=0\mu = 0 value. Have a look at it below.

normal distribution graph, the center is equal to 0.4, x in [-4, 4]

The PDF formula for the standard normal variable is as follows:

fX(x)=12πex2/2f_X(x) = \frac{1} {\sqrt{2\pi}}e^{-x ^{2}/2}

Let's also draw some more normal distributions with different values of μ\mu and σ\sigma.

Normal distributions with different values of mean and standard deviation

So, if the value of μ\mu increases, the distribution shifts to the right and if the value of σ\sigma increases, the distribution becomes flatter.

Standard normal table

The CDF of the standard normal distribution is denoted by Φ\Phi:

Φ(x)=P(Xx)=P(X<x)=12πxet2/2dt\Phi(x)=P(X\leq x) = P(X< x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}e^{-t^{2}/2}dt

These values correspond to the area under the PDF up to the required value of point xx.

For example, the CDF value for x=1x = 1 is Φ(1)=P(X1)=P(X<1)\Phi(1)=P(X\leq 1) = P(X< 1), and it corresponds to the blue shaded area under the PDF of the standard normal distribution illustrated below:

normal distribution graph, the center is equal to 0.4, x in [-4, 4], x<=1

Don't be scared of the formula because there's a table, known as the Standard normal table or Z-table, which records the CDF values for the standard normal distribution. This table is a handy tool for calculating any normal distribution problems.

Standard normal table

standard normal table

From the table above, you can find the probability of an event if its distribution is normal. As for the earlier example, the CDF value for x=1x = 1 is Φ(1)=P(X1)=P(X<1)=0.84134\Phi(1)=P(X\leq 1) = P(X< 1)= 0.84134. Thus, the probability of XX being less than one is approximately 84%.

If you want to see the Z-table in full, refer to the dedicated website.

Standardizing

You might wonder, how can you solve the general normal distribution problem when the mean is not zero and the variance is not one? Obviously, you can't have an infinite number of tables for varying mean and variance quantities. The solution is to convert a general normal distribution into a standard normal distribution, known as standardizing.

Let XX be a normal distribution with mean μ\mu and variance σ2>0\sigma^{2} > 0. To convert XN(μ,σ2)X \sim \mathcal{N}(\mu,\,\sigma^{2})\, into the standard normal distribution ZN(0,1)Z \sim\mathcal{N}(0,1)\,, use the Z-score formula:

Z=XμσZ= \frac{X-\mu}{\sigma}

Here, ZZ is the "z-score" (standard score): it is the value that you refer to in the standard normal table to get the required probability.

Chances of Cultivating Crop This Year

To tackle the general normal distribution problem, you need to follow a few straightforward steps:

  1. Define the problem in terms of normal distribution.

  2. Standardize the existing normal distribution.

  3. Refer to the Z-score result (or results in the standard normal table) to identify probability and calculate.

You can now revisit the problem formulated at the start of the topic. To calculate the probability that the farmer will have a crop this year, just adhere to the steps outlined above.

1. Define the problem in terms of normal distribution.

Assume annual rainfall as a continuous random variable XX. Given that it follows a normal distribution, you can express it as XN(μ,σ2)X \sim \mathcal{N}(\mu,\,\sigma^{2})\,. You are already aware that the mean value is μ=80 cm\mu = 80\ cm, and the standard deviation is σ=20 cm\sigma = 20\ cm. Now, collate this information to frame the problem:

XN(80,400)X \sim \mathcal{N}(80, 400)\,You have to identify the probability that the farmer will have a crop this year. This will occur only if the rainfall lands between 40 cm40\ cm and 100 cm100\ cm. Let's define it in terms of probability:

P(40X100)P(40 \leq X \leq 100)

2. Standardize the given normal distribution.

To mold XN(80,400)X \sim \mathcal{N}(80, 400)\, into the standard normal distribution ZN(0,1)Z \sim\mathcal{N}(0,1)\,, apply the Z-score formula to both boundaries of your interest:

Z=XμσZ= \frac{X-\mu}{\sigma}
P(40X100)=P(40μσXμσ100μσ)=P(408020Z1008020)=P(2Z1)P(40 \leq X \leq 100) = P(\frac{40-\mu}{\sigma} \leq \frac{X-\mu}{\sigma} \leq \frac{100-\mu}{\sigma}) = P(\frac{40-80}{20} \leq Z \leq \frac{100-80}{20}) = P(-2 \leq Z \leq 1)

3. Look up probability using the Z-score result (or results in the standard normal table) and calculate.

The standard normal table only provides probability values up to a specific point of interest. Thus, you need to arrange the formula above to appear as follows:

P(2Z1)=P(Z1)P(Z2)=Φ(1)Φ(2)P(-2 \leq Z \leq 1) = P(Z \leq 1) - P(Z \leq -2) = \Phi(1) - \Phi(-2)

Let's plot them on a graph and get their values from the standard normal table.

Φ(1)=P(Z1)=0.84134\Phi(1) = P(Z \leq 1) = 0.84134

normal distribution graph, the center is equal to 0.4, x in [-4, 4], x<=1

Φ(2)=P(Z2)=0,02275\Phi(-2) = P(Z \leq -2) = 0,02275

normal distribution graph, the center is equal to 0.4, x in [-4, 4], x<=-2

Φ(1)Φ(2)=P(Z1)P(Z2)=0.841340.02275=0.81859\Phi(1) - \Phi(-2) = P(Z \leq 1) - P(Z \leq -2) = 0.84134 - 0.02275 = 0.81859

normal distribution graph, the center is equal to 0.4, x in [-4, 4], -2<=x<=1

So, the likelihood of the farmer cultivating crops this year is 0.818590.81859, which is roughly 82%.

This problem can often be encountered in reality when a farmer is considering whether to invest in crop insurance based on historical weather data. A farmer's income relies on the crop yield, so rash decisions are to be avoided at all costs, leading to careful analysis of weather conditions.

More applications

Moreover, the normal distribution is widely used for modeling various real-life problems across different fields. Here are some more examples of its applications:

  • Financial Analysis: In finance, the normal distribution is often used to model stock returns, exchange rates, and other financial variables. It provides a useful framework for risk management, portfolio optimization, and option pricing.

  • Quality Control: The normal distribution is employed in quality control to model the distribution of product characteristics. It helps determine acceptable limits and detect deviations from the desired quality standards.

  • Biostatistics and Medical Research: Many biological and medical phenomena, such as blood pressure, height, and enzyme activity, follow a normal distribution. This distribution is used for hypothesis testing, estimating population parameters, and analyzing clinical trial data.

  • Social Sciences: The normal distribution is frequently used in social sciences to model variables like intelligence quotient (IQ), personality traits, and exam scores. It aids in understanding the distribution of these characteristics in a population and making statistical inferences.

  • Manufacturing and Process Control: The normal distribution is valuable for modeling process variations in manufacturing. It helps identify and control factors that affect product quality, ensuring that the manufacturing process operates within acceptable limits.

  • Demographics and Population Studies: When studying population characteristics, such as height, weight, or income, the normal distribution is often assumed. It allows researchers to make predictions, estimate percentiles, and analyze the distribution of these variables.

  • Weather and Climate Modeling: In meteorology and climate science, the normal distribution is used to model various weather parameters like temperature, rainfall, and wind speed. It helps understand the statistical behavior of these variables and make predictions based on historical data.

  • Machine Learning and Data Science: The normal distribution is frequently utilized as an assumption or prior knowledge in various machine learning algorithms.

Conclusion

Below is a summary of the concepts covered in this topic:

  • A continuous random variable XX is said to be normal or Gaussian if it has a PDF that can be described by the formula

fX(x)=1σ2πe(xμ)2/(2σ2)f_X(x) = \frac{1} {\sigma\sqrt{2\pi}}e^{-(x - \mu)^{2}/(2\sigma^{2}) }
  • The normal distribution graph looks like a symmetric bell-shaped curve, centered around its mean.

  • To denote a continuous random variable XX as a normal distribution, you can use the notation XN(μ,σ2)X \sim \mathcal{N}(\mu,\,\sigma^{2})\,.

  • A normal random distribution, where the mean is equal to zero (μ=0\mu = 0) and the standard deviation is equal to one (σ=1\sigma = 1), is called standard normal distribution and is denoted as N(0,1)\mathcal{N}(0,1)\,.

  • A corresponding CDF of the standard normal distribution is the following:

    Φ(x)=P(Xx)=P(X<x)=12πxet2/2dt\Phi(x)=P(X\leq x) = P(X< x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}e^{-t^{2}/2}dt
  • CDF values for the standard normal distribution are recorded in a table known as the standard normal table or Z-table.

  • The process of converting general normal distribution into standard normal distribution is known as standardizing.

In the following topics you will learn where the normal distribution plays a key role, such as in the central limit theorem, for example. You will also learn about other distributions and their implementations. You will implement your knowledge of random variables to learn some laws of statistics as well. But before moving forward, make sure that you completely master the concept of the normal distribution, since you'll be using its properties.

2 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo