7 minutes read

We have already noted the expectation of a random variable, XX also called 'mean' and denoted by E(X)\mathbb{E}(X). Another quantity of great importance in probability and statistics is the variance. Variance is the tool that we use to understand and visualize the spread of values around the mean value, which is critical in its analysis.

The intuition behind the concept

There is almost always a spread in real-life values around the mean value. For example, the environmental temperature in a city is different at different times of the day. It is low in the morning but increases as the sun rises. It starts dropping again in the evening. If tourists wanted to know the city's temperature, they could check the average temperature value. But this value does not convey enough information. It is just the mean value of different temperature values throughout the day. If tourists come to the city considering only the average daily temperature value, they will feel cold in the morning or night because the temperature may drop significantly. On the other hand, they may feel hot in the afternoon because they did not consider the extreme values of the temperature while packing the clothes. That is why weather apps show the maximum and minimum temperatures as well. If the difference between the maximum and minimum values is high, tourists must get warm and light clothes because the temperature will probably fluctuate a lot. But if the difference is not much, then there will not be a significant change in the temperature throughout the day, and they can base their expectations on the average value. However, the difference between the extreme values is not sufficient too. Nights are long in the winter and shorter in the summer. A longer night means that the probability density of a lower temperature would be higher.

The temperature of the city is a random variable measured at random times. The random variable takes different values or different observations. Every observation has a probability associated with it. The relationship between observations and their probability is called probability density. Some random variable outcomes have high probability density, and others will have low probability density.

Formal definition and mathematical example

The formula for the variance Var(X)\operatorname{Var}(X) is identical for discrete and continuous random variables. The formula is given below:

Var(X)=E(X2)(E(X))2\operatorname{Var}(X)=\mathbb{E}\left(X^{2}\right)-(\mathbb{E}(X))^{2}

E(X)\mathbb{E}(X) is the mean or expected value of the variable XX. According to the definition, this is obtained by the summation of the value-probability product if the random variable is discrete and by integration of the value-density product if it is continuous. That is why the representation of the variance formulas for discrete and continuous random variables is different. Variance is about the distance of the values from the mean value, and hence, it can never be negative. Variance is zero when all the values are identical. For example, a discrete random variable XX takes the following values in a trial:

5, 5, 5, 5, 5, 5

Mean or expected value = E(X)\mathbb{E}(X)= 5

Each value of the discrete random variable is the same as the mean. There is no deviation between the mean and random variable values. Therefore, the variance is zero for identical values of the random variable.

If the random variable XX has a specific dimension like meters mm, its variance will have a dimension m2m^2.

The variance of a discrete random variable

Let us define the variance using a specific example. Suppose we toss a single unfair dice. The discrete random variable XX can take any value from 1 to 6 (six sides of the dice). The probability pp for each value xx taken by the discrete random variable is given below:

a six-sided dice

xix_i

1

2

3

4

5

6

pip_{i}

0.5

0.1

0.1

0.1

0.1

0.1

The formula for calculating the variance of a discrete random variable XX is the following:

Var(X)=xi2pi(E(X))2\operatorname{Var}(X)=\sum x_i^{2} p_i-(\mathbb{E}(X))^{2} In the above equation, E(X)\mathbb{E}(X) is the mean or expected value for the random variable, and it is equal to xipi\sum x_i p_i. So, to calculate the variance, we first have to calculate the mean, E(X)\mathbb{E}(X) of the random variable. Here are the three steps to calculate the variance.

1) Calculation of (E(X))2(\mathbb{E}(X))^{2}

The process of calculating the (E(X))2(\mathbb{E}(X))^{2} is illustrated in the table below:

xix_i

1

2

3

4

5

6

pip_i

0.5

0.1

0.1

0.1

0.1

0.1

xipix_ip_i

0.5

0.2

0.3

0.4

0.5

0.6

E(X)=xipi\mathbb{E}(X)=\sum x_i p_i

0.5+0.2+0.3+0.4+0.5+0.6=2.5

(E(X))2(\mathbb{E}(X))^{2}

(2.5)2=6.25

2) Calculation of xi2pi\sum x_i^{2} p_i

We calculate the term, xi2pi\sum x_i^{2} p_i as follows. First, square each value and multiply by its probability. Then add all of the individual values.

xix_i

1

2

3

4

5

6

pip_i

0.5

0.1

0.1

0.1

0.1

0.1

xi2x_i^2

1

4

9

16

25

36

xi2pix_i^{2} p_i

0.5

0.4

0.9

1.6

2.5

3.6

xi2pi\sum x_i^{2} p_i

0.5+0.4+0.9+1.6+2.5+3.6=9.50

3) Calculation of Var(X)\operatorname{Var}(X)

After the aforementioned steps, the calculation of Var(X)\operatorname{Var}(X) becomes straightforward.

Var(X)=xi2pi(E(X))2=9.56.25=3.25\operatorname{Var}(X)=\sum x_i^{2} p_i-(\mathbb{E}(X))^{2}=9.5-6.25=3.25

The variance is 3.25.

The variance of a continuous random variable

If XX is a continuous random variable, then the formula to calculate the variance is slightly different than the variance formula for a discrete random variable. We replace sum (\sum) with an integral (\int) sign in the discrete random variable's variance formula. Moreover, we replace probability pp with probability density function f(x)f(x) in the variance formula of a discrete random variable. Hence, the formula to calculate the variance of a continuous random variable involves integration and probability density function f(x)f(x). The formula is given as following:

Var(X)=+x2f(x)dx(E(X))2\operatorname{Var}(\mathrm{X})=\int_{-\infty}^{+\infty} x^{2} f(x) \mathrm{d} x-(\mathbb{E}(X))^{2}Let's calculate the variance of a continuous random variable XX with the probability density function f(x)f(x) that is given below:

f(x)={19x20x30 otherwise f(x)= \begin{cases}\frac{1}{9} x^{2} & 0 \leq x \leq 3 \\ 0 & \text { otherwise }\end{cases}

First, we need to find the mean (E(X)\mathbb{E}(X)) to calculate the variance.

Here is the formula for calculating the mean (E(X)\mathbb{E}(X)) for a continuous random variable:

E(X)=xf(x)dx\mathbb{E}(X)=\int_{-\infty}^{\infty} x f(x) \mathrm{d} x

After substituting the value of the probability density function f(x)f(x) in the above equation, we get the following:

E(X)=03x×19x2 dx=1903x3 dx=19×[x44]03=19×[344044]=19×[814]=94\begin{aligned} \mathbb{E}(X) &=\int_{0}^{3} x \times \frac{1}{9} x^{2} \mathrm{~d} x \\ &=\frac{1}{9} \int_{0}^{3} x^{3} \mathrm{~d} x \\ &=\frac{1}{9} \times\left[\frac{x^{4}}{4}\right]_{0}^{3} \\ &=\frac{1}{9} \times\left[\frac{3^{4}}{4}-\frac{0^{4}}{4}\right] \\ &=\frac{1}{9} \times\left[\frac{81}{4}\right] \\ &=\frac{9}{4} \end{aligned}

Hence, mean (E(X)\mathbb{E}(X)) is 95\frac{9}{5}. Now we can proceed to the calculation of the variance.

Calculation of variance is now a simple process:

Var(X)=+x2f(x)dx(E(X))2=03(x)2×(19x2)dx(94)2=1903x4dx8116=19[x55]038116=19×[355055]8116=19×[2435]8116=2758116=2780=0.34\begin{aligned} \operatorname{Var}(X) &=\int_{-\infty}^{+\infty} x^{2} f(x) \mathrm{d} x-(\mathbb{E}(X))^{2} \\ &=\int_{0}^{3}(x)^{2} \times\left(\frac{1}{9} x^{2}\right) \mathrm{dx}-\left(\frac{9}{4}\right)^{2} \\ &=\frac{1}{9} \int_{0}^{3} x^{4} d x-\frac{81}{16} \\ &=\frac{1}{9}\left[\frac{x^{5}}{5}\right]_{0}^{3}-\frac{81}{16} \\ &=\frac{1}{9} \times\left[\frac{3^{5}}{5}-\frac{0^{5}}{5}\right]-\frac{81}{16} \\ &=\frac{1}{9} \times\left[\frac{243}{5}\right]-\frac{81}{16} \\ &=\frac{27}{5}-\frac{81}{16} \\ &=\frac{27}{80}=0.34 \end{aligned}

Hence, the variance is 2780\frac{27}{80}.

Applications

Variance has a very useful interpretation. Suppose you are looking for a basketball player for your team. You want a player who can average a particular number of points per game. Two such basketball players have the same mean score in the previous matches. As both have the same average score, it is hard to tell which one would better fit your team. It is crucial to consider the variance in this situation for a quick decision. Select the one with lower variance because that player is more reliable than the one with a higher variance. The player with lower variance is expected to record a closer number of points to the average. The situation presented here reflects only one particular scenario when you want the player to perform a specific task, score a given number of points in this case.

Similarly, suppose someone wants to invest money in the stock market. There are two companies named 'X,' and 'Y.' Company 'X' has an average annual return of 15%. The average annual return of company 'Y' is 5%. How can they decide on the company to invest in?

Apparently, if they see only the average annual returns of the companies, then company 'X' is the best choice to invest because its average annual return is higher than that of company 'Y' at 15%.

However, the average annual return is not enough to decide on the investment. In finance, the term 'risk' is related to variance along with a lot of other factors. Now let's suppose the variance of company 'X' is 784. The variance of company 'Y' is 25. It means the annual return of company 'X' is expected to change a lot from its average annual return. But, the annual return of company 'Y' will more likely be closer to its average annual return. Hence company 'Y' investment is considered to be a less risker one than investment into company 'X'. Please remember that variance is only one out of many factors determining the risk. Once again, it applies to a situation when you want the company to give you an annual return close to its average annual return, i.e. low risk, low return.

Conclusion

The average or mean of a random variable is sometimes not enough for a complete interpretation. Variation carries more information about the spread of the values around the mean value:

  • A variance is a number representing the scatter of the random variable values about the mean E(X)\mathbb{E}(X).

  • If values get together around the mean, variation is low.

  • If values spread far from the mean, variation is high.

  • The formula to calculate variance for a discrete random variable "X" is Var(X)=xi2pi(E(X))2\operatorname{Var}(X)=\sum x_i^{2} p_i-(\mathbb{E}(X))^{2}.

  • Var(X)=+x2f(x)dx(E(X))2\operatorname{Var}(X)=\int_{-\infty}^{+\infty} x^{2} f(x) d x-(\mathbb{E}(X))^{2} is the formula to calculate variance for a continuous random variable, where f(x)f(x) is probability density function.

3 learners liked this piece of theory. 2 didn't like it. What about you?
Report a typo