10 minutes read

Imagine yourself in an expensive restaurant. It takes only 5 minutes on average to serve a customer. Can you guess the probability that you will get your food between minute 3 and minute 4 of your waiting time? Putting it in another way, what is the probability that the waiter will bring your order at any time between the third and fourth minutes of waiting? Well, Probability Density Function (or PDF) helps us to answer this question.

PDF and its use

The probability density function (in short PDF) of a continuous random variable XX is a function fX:R[0,)f_X:\mathbb{R} \to [0,\infin) such thatP(X[a,b])=abfX(x)dxP( X \in [a, b])= \int_{a}^bf_X(x)dx for any interval.

This is a formal definition of the PDF. It looks quite scary but let's try to understand by breaking it down and solving a simple problem.

You know that the continuous random variable takes values over the continuous range of numbers. Let's take the real number line xx as our sample space. We still have a total unit of probability equal to one as in PMF. The only difference is that the probability mass is distributed over the real line and hence it is uncountable. From the graph, we can say that some parts of our real line have higher values per unit length and some parts of a line have lower values. Probability density function is used to describe the value sitting on top of each part of our real line. PDF is denoted as fX(x)f_X(x) where subscript XX is a continuous random variable of our interest.

Suppose we have a continuous random variable XX. We are implementing PDF to calculate the probability that our random variable XX is within a certain interval on the real number. Let's say that a certain interval is from aa to bb as shown below.

gaussian with a dedicated interval from a to b

The probability that our random variable XX is within the interval from aa to bb is P(aXb)P(a \leq X \leq b). In order to calculate P(aXb)P(a \leq X \leq b) we have to find the area under PDF that sits on top of the interval from aa to bb which is the green shaded area on the plot above.

From the calculus, we know that the area under the function curve can be found by using an integral. The area between in the interval from aa to bb will be equal to the definite integral from aa to bb:

P(aXb)=abfX(x)dxP(a \leq X \leq b) = \int_{a}^b f_X(x)dx

Properties of the PDF

The required properties that PDF must meet are:

  1. fX(x)0f_X(x) \geq 0 for all xx : PDF must be non-negative. Complies with the Nonnegativity axiom which states that all probabilities are non-negative.
  2. +f(x)dx=1\int_{-\infty}^ {+\infty} f(x)dx = 1The total area under the PDF curve must be equal to 1. That means integration over the entire real number line must be equal to 1. Complies with the Normalization axiom which states that the probability of a sample space is equal to one.

Now since we are familiar with the properties of the PDF we can give a formal definition of the continuous random variable: the random variable is a continuous random variable only if it can be described by a PDF.

For the random variable taking values in a continuous set is not enough to be a proper continuous random variable. We also need that the random variable can be described by proper PDF, which means it must meet all the required properties of PDF mentioned above.

Now let's have a look when we have a very small interval. Let our interval be equal to δδ, where δ>0δ > 0 but it is very small.

gaussian with a selected interval from a to b and very small interval δ

The equation we will get for the probability that our random variable is within the interval δδ is:

P(aXa+δ)fX(a)δP(a \leq X \leq a +\delta) \approx f_X(a) \cdot \deltaIf δδ is very small, the area of the region is approximately equal to the area of the rectangle with length equal to fX(a)f_X(a) and width δ\delta as in Arectangle=lengthwidth=fX(a)δA_{rectangle} = length \cdot width = f_X(a) \cdot \deltaIf we send δδ factor to other side and rearrange the formula for the PDF (fX(a)f_X(a)) is fX(a)P(aXa+δ)δf_X(a) \approx \frac {P(a \leq X \leq a +\delta)} {\delta}From this formula, we can see that PDF is not a probability but rather a probability per unit length. The probability per unit length is just a density, and that is why PDF is called the Probability Density Function.

If the fX(a)f_X(a) is finite and the length of δ\delta is sent to zero we will get zero probability that our random variable XX lies in that interval, as in P(aXa+δ)fX(a)0=0P(a \leq X \leq a +\delta) \approx f_X(a) \cdot 0 = 0To explain it more formally let's rewrite the general formula where b=ab = a:

P(aXa)=P(X=a)=aafX(x)dx=0P(a \leq X \leq a)= P(X = a) = \int_{a}^af_X(x)dx = 0When we say that δ=0\delta = 0 that means we are look for the interval from point aa to point aa. This means we look for the probability that our random variable XX is equal to aa as in P(X=a)P(X= a). On the integral side of the general formula, we solve for the integral of zero length and it is equal to zero as shown below.

The statement above tells us that if XX takes the same value as any specific point xix_i in a continuous set of values the probability is going to be zero as in P(X=xi)=0P(X= x_i) = 0. That is a reason that any particular point in a continuous random variable has a zero probability. However, infinitely many points in a particular interval cumulatively will have a positive probability.

Consequence of the P(X=xi)=0P(X= x_i) = 0 will also conclude that the probability of the closed interval is equal to the probability of the open interval:

P(aXb)=P(X=a)+P(X=b)+P(a<X<b)=P(a<X<b)P(a \leq X \leq b)= P(X = a) + P(X = b) + P(a <X<b) = P(a <X<b)

We need to use PDF in order to calculate expectation/mean and variance in future. A little later you'll figure out how to associate a PDF to Cumulative distribution functions, and normal random variables.

Probability calculation steps using PDF.

Let's break down the probability calculation of a continuous random variable into three small steps.

1) Represent our experiment in terms of the continuous random variable within a needed interval:

(aXb)(a \leq X \leq b)

2) Make sure that the given PDF function meets the required properties:

fX(x)0f_X(x) \geq 0

for all xx and

+f(x)dx=1\int_{-\infty}^ {+\infty} f(x)dx = 1

3) Calculate the probability that our continuous random variable lies in a given interval by finding the area under PDF:

P(aXb)=abfX(x)dxP(a \leq X \leq b) = \int_{a}^b f_X(x)dx

Probability of receiving your food within a one-minute interval

At the very beginning of the topic, you were asked the probability that the waiter will bring your order at any time between the third and fourth minutes of waiting.

The PDF that we are given is:

fX(x)={x12if 1x50otherwisef_X(x) = \Big\{ \begin{array}{lr}\frac{x}{12}& \text{if } 1\leq x \leq 5\\ 0 & \text{otherwise} \end{array}

Now let's follow probability calculation steps using PDF:

1) Represent our experiment in terms of a continuous random variable within a needed interval:

  • Our continuous random variable XX will be in the interval from 3 to 4 as 3X43 \leq X \leq 4.

2) Make sure that the given PDF function meets the required properties.

  • fX(x)0f_X(x) \geq 0 for all xx is met since no value of xx inserted into the given PDF formula will be less than zero.
  • +f(x)dx=1\int_{-\infty}^ {+\infty} f(x)dx = 1: The total area under the PDF curve must be equal to 1. That means integration over the entire real number line must be equal to 1. +f(x)dx=15x12dx+10  dx+5+0  dx=15x12dx=x22415=2524124=1\int_{-\infty}^ {+\infty} f(x)dx = \int_{1}^ {5} \frac{x}{12} dx+ \int_{-\infty}^ {1}0 \; dx+ \int_{5}^ {+\infty}0 \;dx= \int_{1}^ {5}\frac{x}{12} dx=\frac{{x}^{2}}{24}\bigg|_1^5 =\frac{25}{24} - \frac{1}{24} = 1We can clearly see from the calculation above that our PDF meets the second required property as well. We can move on to the next step.

3) Calculate the probability that our continuous random variable lies in a given interval by finding the area under PDF.

Our continuous random variable XX will be in 3X43 \leq X \leq 4 interval. Let's put those values in our probability formula:

P(3X4)=34fX(x)dx=34x12dx=x22434=1624924=7240.292P(3 \leq X \leq 4) = \int_{3}^4 f_X(x)dx = \int_{3}^ {4}\frac{x}{12}dx=\frac{{x}^{2}}{24}\bigg|_3^4 =\frac{16}{24} - \frac{9}{24} = \frac{7}{24} \approx 0.292

The probability of receiving your food within the 4th minute is P(3X4)0.292P(3 \leq X \leq 4) \approx 0.292. That means it is 29.2%29.2\% chance.

Conclusion.

Let's summarize the concepts we have learned on this topic:

  • Probability Density Function in short PDF is used to describe the value sitting on top of each part of our real line.
  • The probability that our random variable XX is within the interval from aa to bb is P(aXb)P(a \leq X \leq b).
  • To calculate P(aXb)P(a \leq X \leq b) we have to find the area under PDF that sits on top of the interval from aa to bb. The area will be equal to the definite integral from aa to bb.
  • The required properties of PDF are fX(x)0f_X(x) \geq 0 for all xx and +f(x)dx=1\int_{-\infty}^ {+\infty} f(x)dx = 1.
  • If the δδ interval is very small, then P(aXa+δ)fX(a)δP(a \leq X \leq a +\delta) \approx f_X(a) \cdot \delta.
6 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo