6 minutes read

In the past discussions, you've delved into the topic of continuous random variables, particularly how to compute their average, or the expected value. This information is incredibly useful, but there's more to explore. What if you want to understand how values are distributed around this mean value of the random variable, or the degree of dispersion from the expected value? This dispersion, sometimes called spread measure, is encapsulated by the concepts of variance and standard deviation. Standard deviation, in essence, is the average distance of a random variable's values from its mean. You can view it as the average deviation of a random variable's value from its expected value.

To grasp the notion of "spread" more intuitively, envision an archery competition.

Pic 1. Shots with high variance

Pic 1. Shots with high variance

In this scenario, the shots are spread out significantly, indicating a high variance. Conversely, the following image shows a situation with low variance, where the shots are closely grouped.

Pic 2. Shots with low variance

Pic 2. Shots with low variance

Let's imagine you've planned a surprise birthday party for your friend at their home. You're aware that your friend gets off work at 5:00 PM, and depending on traffic or the route they take, they should arrive between 20 to 40 minutes later.

To accurately answer the question of the probability that their actual arrival time falls comfortably within this range, we need to define what we mean by "comfortably placed" and "around".

Definition of Variance of a continuous random variable

The term "comfortably placed" typically refers to being in the middle of an interval, or in the context of a random variable, the mean or expected value. On the other hand, "around" is often associated with variance, which is a measure of how spread out the values of a random variable are from its mean. Think about it as a value that represents the average distance of the random variable values from its mean.

With a clearer understanding of variance, let's revisit the concept of the expected value, also known as the mean. The expected value E[X]E[X] or μ\mufor a continuous random variable XX can be defined as below:

E[X]=μ=+xf(x)dx\mathrm{E}[X]=\mu= \int\limits_{-\infty}^{+\infty}x \cdot f(x) \mathrm{d}xwhere xx is a specific value, and f(x)f(x) is the Probability Density Function aka PDF of a given random variable XX.

Due to certain mathematical reasons, it's often more practical to calculate the squared distance from the mean rather than the simple distance. The reason variance calculations involve squaring is twofold. Firstly, squaring emphasizes the impact of outliers, or extreme values, as these will have a greater effect when squared due to their distance from the mean. Secondly, squaring ensures that all differences are treated as positive values. Without squaring, positive differences (values above the mean) could cancel out negative differences (values below the mean), misleadingly suggesting that there is no variance, or spread, in the data when in fact, there is. You may recall from previous topics that the squared distance of values from the mean for a random variable can be represented as (Xμ)2(X - \mu)^2.

As you saw previously, variance is not just any distance, but an average distance. Therefore, the average of (Xμ)2(X - \mu)^2 is just an expected value E[(Xμ)2]E[(X - \mu)^2]. This gives us the definition of variance of a continuous random variable XX as:

Var(X)=E[(Xμ)2]Var(X) = E[(X - \mu)^2]

You can further refine this equation for variance. Remember the rule for the expected value rule of measurable function g(X)g(X) associated with a continuous random variable XX:

E[g(X)]=+g(x)f(x)dxE[g(X)] =\int\limits_{-\infty}^{+\infty}g(x) \cdot f(x) \mathrm{d}x

By substituting g(x)=(Xμ)2g(x) = (X - \mu)^2 into the above formula, you get:

Var(X)=E[(Xμ)2]=E[g(X)]=+g(x)f(x)dx=+(xμ)2f(x)dxVar(X) = E[(X - \mu)^2] = E[g(X)]= \int\limits_{-\infty}^{+\infty}g(x) \cdot f(x) \mathrm{d}x = \int\limits_{-\infty}^{+\infty}(x - \mu)^2 \cdot f(x) \mathrm{d}xIt allows you to calculate the squared distance of each value from the mean and weigh that quantity according to the corresponding PDF value of the random variable.

There's also a very handy alternative formula to calculate the variance of the continuous random variable:

Var(X)=E[X2](E[X])2=E[X2]μ2Var(X) = E[X^2] - (E[X])^2 = E[X^2] - \mu^2

But you might think, the unit of the variance is in the square of the original, and it does not represent the actual average distance. If the values of the random variable are in seconds (ss), then the variance is in square seconds (s2s^2). So, the more intuitive quantity would be the square root of the variance (Var(X)\sqrt{Var(X)}), which is known as the standard deviation.

The standard deviation for the continuous random variable XX is:

σX=Var(X)\sigma_X = \sqrt{Var(X)}The standard deviation has the same unit as the random variable and captures the distance from the mean of the distribution. It gives you the quantitative measure of a spread.

Since you got familiar with the general concept of variance, let's now talk a bit more about its properties.

Properties of Variance of a continuous random variable

There are some very handy properties that you can leverage when you have a complicated problem. All the formulas are following the definition due to the properties of the calculus basics used to define the variable. Though it is not necessary but it may be good knowledge if you try to derive it on your own.

Here is the list of those properties:

  • The variance is always non-negative, as it is the average of the non-negative numbers (E[(Xμ)2]E[(X - \mu)^2])

Var(X)0Var(X) \ge 0

  • Variance remains the same if you add a constant value bb to the random variable XX:

var(X+b)=Var(X)var(X+b) = Var(X)

  • However, if the random variable is multiplied by any constant value aa, the variance will be:

Var(aX)=a2Var(X)Var(aX) = a^2Var(X)

  • By combining the two statements above, you will get:

Var(aX+b)=a2Var(X)Var(aX+b) = a^2Var(X)We almost clarified all the uncertainties we had to solve the problem from the introduction. In order to get further, we need to discuss these properties to a continuous random variable.

Variance of uniform continuous random variables

In a continuous uniform distribution, the PDF is constant, hence it will look like a rectangle. Since PDF is constant it means that every variable has an equal chance of occurring. You know that area under PDF represents a probability, and hence total area under the PDF must be equal to 1. You can easily calculate this using our geometry.

Now let's consider a continuous uniform random variable with parameters a,ba, b. The graph of the continuous uniform random variable is as follows:

Uniform Continuous random variable graph

Uniform Continuous random variable graph

You know that the area within a,ba, b under the PDF line must be equal to 1. Since it is rectangular in shape, you can easily calculate the value of our constant PDF (f(x)f(x)).

Area=1=(ba)f(x)Area = 1 = (b-a) *f(x)f(x)=1baf(x) = \frac{1}{b - a }And hence more general PDF of the uniform continuous distribution is:

f(x)={1ba, when axb0, otherwisef(x)= \left\{\begin{matrix} \frac{1}{b -a }, \ when \ a \leq x \leq b\\ 0, \ otherwise \end{matrix}\right.

You can calculate the variance of the uniform continuous random variable by using the alternative formula of the variance:

Var(X)=E[X2](E[X])2Var(X) = E[X^2] - (E[X])^2To use that formula, let's first calculate the expected value (The expected value is a midpoint between points aa and bb):

E[X]=μ=+xf(x)dx=abx1(ab)dx=(a+b)2=>midpoint\mathrm{E}[X]=\mu= \int\limits_{-\infty}^{+\infty}x \cdot f(x) \mathrm{d}x = {\int\limits_{a}^{b}x {1\over(a-b)} \mathrm{d}x} = {(a+b)\over2} => midpointYou can calculate E[X2]E[X^2] using the expected value rule as:

E[X2]=abx21(ab)dx=1(ba)(b33a33)E[X^2] = {\int\limits_{a}^{b}x^2 {1\over(a-b)} \mathrm{d}x} = {1\over(b-a)}({b^3 \over 3} -{a^3 \over 3})If you insert the values into the variance formula, you get:

Var(X)=E[X2](E[X])2=(ba)212Var(X) = E[X^2] - (E[X])^2 = {(b-a)^2 \over 12}The standard deviation of the uniform continuous random variable will be:

σX=Var(X)=(ba)12\sigma_X = \sqrt{Var(X)} = {(b-a) \over \sqrt{12}}You now have everything you need to find out the answer to your surprise party planning.

Surprise party

Since you have everything you need to solve your surprise party planning, let's go back to it. You know it takes 20 to 40 minutes for your friend to arrive home. And you know that the probability of the arrival is constant within this time period. Simply means that they have equal chances to arrive at any time within a given period. Thus you can consider it as a uniform continuous random variable.

You need to find the probability that the actual arrival time will not exceed a standard deviation of the mean. Simply put, you should find the probability that their arrival time will be within [μσ,μ+σ][\mu - \sigma, \mu + \sigma] where μ\mu is the mean and σ\sigma is the standard deviation.

To solve the problem, let's break the task down into simple steps:

  1. Calculate the expected value;
  2. Calculate the variance and the standard deviation;
  3. Calculate the probability for the required range.

With all the information provided, you can say a=20a = 20 and b=40b = 40.

The expected value for the continuous uniform random variable where a=20a = 20 and b=40b = 40 is in the exact middle of this interval:

E[X]=a+b2=20+402=30E[X]=\frac{a+b}{2}=\frac{20+40}{2}=30

Now you can calculate the variance as:

Var(X)=(ba)212=(4020)21233.33Var(X) = {(b-a)^2 \over 12} = {(40-20)^2 \over 12} \approx 33.33The standard deviation will be:

σX=Var(X)=(4020)125.77\sigma_X = \sqrt{Var(X)} = {(40-20) \over \sqrt{12}} \approx 5.77

The required range is [μσ,μ+σ]=[305.77,30+5.77]=[24.23,35.77][24,36][\mu - \sigma, \mu + \sigma] = [30 - 5.77, 30+5.77] = [24.23, 35.77] \approx [24, 36]. This range gives you one standard deviation range from the mean.

Now you can calculate the probability within the approximate range [24,36][24, 36]. To find this, you need to determine the relative area covered by the range in relation to the total possible area.

Probability=(3624)×14020=0.6Probability = (36 - 24) \times \frac{1}{40 - 20} = 0.6That means the chances that the actual arrival time will not exceed a standard deviation of the mean is 60%60 \%.

The graph of our continuous random variable example will be:

 graph of continuous

Conclusion

In this topic, you've learned about the variance of the continuous random variable. You've also learned about the continuous uniform random variable. This topic has demonstrated how to calculate both the variance and the standard deviation of any continuous random variable. Let's have a quick recap:

  • The variance of a continuous random variable represents the spread of the random variable around its mean. You can find it in one of two ways:Var(X)=E[(Xμ)2]=+(xμ)2f(x)dxVar(X) = E[(X - \mu)^2] = \int\limits_{-\infty}^{+\infty}(x - \mu)^2 \cdot f(x) \mathrm{d}xVar(X)=E[X2](E[X])2Var(X) = E[X^2] - (E[X])^2

  • The standard deviation for the continuous random variable XX is:

    σX=Var(X)\sigma_X = \sqrt{Var(X)}

  • The properties of variance for the continuous case are consistent with those of the discrete case. The general rule for non-zero constants a and b can be expressed as.
    Var(aX+b)=a2Var(X)Var(aX+b) = a^2Var(X)

  • The PDF of the uniform continuous distribution is:

    f(x)={1ba, when axb0, otherwisef(x)= \left\{\begin{matrix} \frac{1}{b -a }, \ when \ a \leq x \leq b\\ 0, \ otherwise \end{matrix}\right.

  • The variance for the uniform continuous random variable as derived in the topic is:

    Var(X)=(ba)212Var(X) = {(b-a)^2 \over 12}

2 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo