Imagine yourself in an expensive restaurant. It takes only 5 minutes on average to serve a customer. Can you guess the probability that you will get your food between minute 3 and minute 4 of your waiting time? Putting it in another way, what is the probability that the waiter will bring your order at any time between the third and fourth minutes of waiting? Well, Probability Density Function (or PDF) helps us to answer this question.
PDF and its use
The probability density function (in short PDF) of a continuous random variable is a function such that for any interval.
This is a formal definition of the PDF. It looks quite scary but let's try to understand by breaking it down and solving a simple problem.
You know that the continuous random variable takes values over the continuous range of numbers. Let's take the real number line as our sample space. We still have a total unit of probability equal to one as in PMF. The only difference is that the probability mass is distributed over the real line and hence it is uncountable. From the graph, we can say that some parts of our real line have higher values per unit length and some parts of a line have lower values. Probability density function is used to describe the value sitting on top of each part of our real line. PDF is denoted as where subscript is a continuous random variable of our interest.
Suppose we have a continuous random variable . We are implementing PDF to calculate the probability that our random variable is within a certain interval on the real number. Let's say that a certain interval is from to as shown below.
The probability that our random variable is within the interval from to is . In order to calculate we have to find the area under PDF that sits on top of the interval from to which is the green shaded area on the plot above.
From the calculus, we know that the area under the function curve can be found by using an integral. The area between in the interval from to will be equal to the definite integral from to :
Properties of the PDF
The required properties that PDF must meet are:
- for all : PDF must be non-negative. Complies with the Nonnegativity axiom which states that all probabilities are non-negative.
- The total area under the PDF curve must be equal to 1. That means integration over the entire real number line must be equal to 1. Complies with the Normalization axiom which states that the probability of a sample space is equal to one.
Now since we are familiar with the properties of the PDF we can give a formal definition of the continuous random variable: the random variable is a continuous random variable only if it can be described by a PDF.
Now let's have a look when we have a very small interval. Let our interval be equal to , where but it is very small.
The equation we will get for the probability that our random variable is within the interval is:
If is very small, the area of the region is approximately equal to the area of the rectangle with length equal to and width as in If we send factor to other side and rearrange the formula for the PDF () is From this formula, we can see that PDF is not a probability but rather a probability per unit length. The probability per unit length is just a density, and that is why PDF is called the Probability Density Function.
If the is finite and the length of is sent to zero we will get zero probability that our random variable lies in that interval, as in To explain it more formally let's rewrite the general formula where :
When we say that that means we are look for the interval from point to point . This means we look for the probability that our random variable is equal to as in . On the integral side of the general formula, we solve for the integral of zero length and it is equal to zero as shown below.
The statement above tells us that if takes the same value as any specific point in a continuous set of values the probability is going to be zero as in . That is a reason that any particular point in a continuous random variable has a zero probability. However, infinitely many points in a particular interval cumulatively will have a positive probability.
Consequence of the will also conclude that the probability of the closed interval is equal to the probability of the open interval:
We need to use PDF in order to calculate expectation/mean and variance in future. A little later you'll figure out how to associate a PDF to Cumulative distribution functions, and normal random variables.
Probability calculation steps using PDF.
Let's break down the probability calculation of a continuous random variable into three small steps.
1) Represent our experiment in terms of the continuous random variable within a needed interval:
2) Make sure that the given PDF function meets the required properties:
for all and
3) Calculate the probability that our continuous random variable lies in a given interval by finding the area under PDF:
Probability of receiving your food within a one-minute interval
At the very beginning of the topic, you were asked the probability that the waiter will bring your order at any time between the third and fourth minutes of waiting.
The PDF that we are given is:
Now let's follow probability calculation steps using PDF:
1) Represent our experiment in terms of a continuous random variable within a needed interval:
- Our continuous random variable will be in the interval from 3 to 4 as .
2) Make sure that the given PDF function meets the required properties.
- for all is met since no value of inserted into the given PDF formula will be less than zero.
- : The total area under the PDF curve must be equal to 1. That means integration over the entire real number line must be equal to 1. We can clearly see from the calculation above that our PDF meets the second required property as well. We can move on to the next step.
3) Calculate the probability that our continuous random variable lies in a given interval by finding the area under PDF.
Our continuous random variable will be in interval. Let's put those values in our probability formula:
The probability of receiving your food within the 4th minute is . That means it is chance.
Conclusion.
Let's summarize the concepts we have learned on this topic:
- Probability Density Function in short PDF is used to describe the value sitting on top of each part of our real line.
- The probability that our random variable is within the interval from to is .
- To calculate we have to find the area under PDF that sits on top of the interval from to . The area will be equal to the definite integral from to .
- The required properties of PDF are for all and .
- If the interval is very small, then .