MathProbabilityDiscrete random variables

Cumulative distribution function

8 minutes read

Imagine you were in a hurry and almost got on a bus when its doors rapidly closed right in front of you. "Don't worry," said an old lady at the bus stop, looking at your awkward situation. She told you that within the following 10 minutes, the next bus would definitely arrive. What is the probability that the next bus will arrive in no more than 8 minutes? The cumulative distribution function, abbreviated as CDF, is literally the answer to such a question. Let's see how it works and what a PMF has to do with it.

Intuition behind the CDF and its definition

Let's get back to the situation introduced above. You know for sure that the bus will arrive within the next 10 minutes. However, you don't know the exact moment when this will happen and if there are any other rules in this bus system. The next bus may arrive within the next minute, or during the 10th minute.

Since you do not know any other rules, you can assume that the bus arrives with equal probability at any minute. Thus, the probability that the next bus will arrive within the first minute is 1101 \over 10. This means that the probability of its arrival within the first two minutes is the sum of the probabilities that the bus will arrive during the first minute and during the second one. That is, such a probability is equal to 110+110=210=15{1 \over 10} + {1 \over 10} = {2 \over 10} = {1 \over 5}.

Let's denote the number of minutes you wait for a bus as tt (rounded up, a wait of 3 minutes 20 seconds would mean that the bus arrived at the 4th minute of waiting). Then P(t2)P(t \le 2) would mean the probability that the waiting time is no more than two minutes – that is, that the bus will arrive in the first two minutes. The probability of such an event, as calculated above, is 151 \over 5.

It should be pretty clear that if the bus arrives with equal probability at any moment within ten minutes, then it will arrive with a probability of 121 \over 2 in the first five minutes, as shown below:

P(t5)=P(t=5)+P(t4)=P(t=5)+P(t=4)+P(t3)=...=P(t=5)+P(t=4)+P(t=3)+P(t=2)+P(t=1)=110+110+110+110+110=510=12P(t \le 5) = P(t=5) + P(t \le 4) = P(t=5) + P(t = 4) + P(t \le 3)=...= P(t=5) + P(t = 4) + P(t=3) +P(t=2) + P(t = 1) = {1 \over 10} + {1 \over 10} + {1 \over 10} +{1 \over 10} +{1 \over 10} = {5 \over 10} = {1 \over 2}

Accordingly, the arrival of the bus within the first eight minutes will have the probability of 810=45{8 \over 10} = {4 \over 5}.

Now, how can you scale from this specific example to a more general case? Well, luckily, CDF works beyond just a bus waiting time example: it works for any random variable. Let's denote a random variable as XX. Instead of tt in your probability formula P(t2)P(t \le 2), you will write a random variable XX that is not tied to a specific case. It should be reasonable enough that you can't limit a random variable's values to ten minutes – after all, now you want to deal with a more general case. Let's say that the random variable XXwill be bounded to some value of xx, then your formula P(t2)P(t \le 2) will have a more general form of P(Xx)P(X \le x). This leads us to the formal definition of CDF.

Formal definition of CDF

The cumulative distribution function FF of a real-valued random variable XX is the following function:

FX(x)=P(Xx),F_X(x) = P(X \le x),

where P(Xx)P(X \le x) is the probability that XxX \le x.

Since CDF equals probability, it will inherit the properties of probabilities, as you will see in the following sections.

Illustration of CDF

Let's plot a CDF function graph for your bus example.

If xx is less than 1, then FX(x)=0F_X(x) = 0 because just by starting to wait for the bus at the bus stop, you start the count from x=1x = 1.

When x=1x = 1, then FX(1)=P(X1)=P(X=1)=110F_X(1) = P(X \le 1) = P(X = 1) = {1 \over 10}. If x=2x = 2, then FX(2)=P(X2)=P(X=2)=210=15F_X(2) = P(X \le 2) = P(X = 2) = {2 \over 10} = {1 \over 5}.

CDF: probability rises with each minute

You can probably already imagine what the whole graph will look like. As you might have guessed, with every new minute, the value of the function will increase by 1101 \over 10. But can you guess what the largest value of such a function will be? Well, since CDF is equivalent to probability, it can't be greater than 11. The function will take such a value when you wait the entire 10 minutes – after all, the old lady at the bus stop told you that the bus would definitely arrive within 10 minutes. Then the whole graph will look like this:

CDF: probability reach 100% after 10 minutes

Now you see the cumulative effect: when xx becomes bigger, the probability XX of being less or equal to xx grows (or remains the same).

Properties of CDF

As you have seen before, FXF_X is a non-decreasing function. This means that moving along the x-axis will never lead to decreasing the FXF_X value.

CDF's properties come from the fact that it is a probability function. Here they are:

  1. Values of FXF_X lay between 0 and 1 (because the result of FXF_Xis probability).

  2. For all aba \leq b, you have P(a<Xb)=FX(b)FX(a)P(a < X \leq b) = F_X(b) - F_X(a).

  3. limx+FX(x)=1\lim_{x\rightarrow +\infty} F_X(x) = 1, limxFX(x)=0\lim_{x\rightarrow -\infty} F_X(x) = 0.

    This means that no matter how high you set xx, the CDF can't be higher than one.

    The second part also comes from the fact that CDF is defined via probability. Therefore, it is bounded from below. It can't fall below zero, no matter how low you set xx to be.

Building CDF from scratch

At the very beginning, you assumed that the bus would arrive with an equal probability at any minute. You made this assumption because you didn't have any additional information. However, in real life that is not the case: the bus doesn't arrive with the same probability during minute 1 and during minute 9.

Imagine that the old lady from the bus stop also told you this: "The bus will not arrive in the next five minutes; most likely, it will arrive not earlier than in 8 minutes". It would mean that P(t5)=0P(t \le 5) = 0. Let's assume that the phrase "most likely" means that at every minute after 8 minutes the probability of the bus arriving will be 131 \over 3, that is, P(X=9)=P(X=10)=13P(X=9) = P(X=10) = {1 \over 3}. Now you are left with the minutes 6, 7, and 8, and the sum of the probabilities for these minutes equals 131 \over 3. Let's again assume that the probability is the same for all these minutes, that is 191 \over 9 for each of them.

If you put everything together, here's what you'll have: P(t5)=0P(t \le 5) = 0, P(X=6)=P(x=7)=P(x=8)=19P(X=6) = P(x=7) = P(x=8)= {1 \over 9}, and P(X=9)=P(x=10)=13P(X=9) = P(x=10) = {1 \over 3}.

By definition, you get the following CDF:

FX(x)={0,x5,19,6x<7,29,7x<8,13,8x<9,23,9x<10,110xF_X(x) = \begin{cases} 0, & x \le 5,\\ \frac{1}{9}, & 6\leq x<7,\\ \frac{2}{9}, & 7 \leq x < 8,\\ \frac{1}{3}, & 8 \leq x <9,\\ \frac{2}{3}, & 9 \leq x <10,\\ 1 & 10 \geq x \end{cases}The graph will look like this:

CDF if probability changes

CDF via PMF

Probability Mass Function (PMF) shows the relative impact of each value on the distribution (actually, the probability). Hence, when it comes to building a CDF, PMF just shows points where the CDF value changes.

Let's look at the graph that illustrates both the PMF (green) and the CDF (red) in your bus case:

PMF shows points where the CDF value changes

PMF shows that you have no chance of the bus arriving in 5 minutes, and specifies the weight of every remaining waiting minute. Each non-zero result of the PMF corresponds to an increase of the CDF.

Conclusion

To sum up what this topic has covered, below are some crucial points.

  1. CDF is a function that shows distribution via cumulative effect.

  2. CDF is a non-decreasing function, laying between 0 and 1.

  3. PMF shows you where CDF changes its value.

4 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo