MathProbabilitySpecial discrete distributions

Bernoulli distribution

5 minutes read

Previously, we covered topics like probability mass function and cumulative mass function. In this topic, we will see how we can apply such concepts to a particular discrete probability distribution, the Bernoulli distribution.

The Bernoulli distribution can be applied when there is a single experiment with only two possible outcomes, such as success/failure, red/blue, heads/tails, or true/false. Such an experiment is called a Bernoulli trial. Researchers will often denote the two possibilities as success vs. failure, with success referring to the expected or desired result. The Bernoulli distribution has applications in medicine, marketing, finance, etc.

As an example, suppose that you are in a group of four friends and you are deciding when to watch a movie together. Three out of four people are free this Saturday, but one person is not.

The "experiment" of randomly discovering whether one of these people is free on a chosen day only has two outcomes: a person will either be free or not free on that day. As "available" and "not available" are the only two outcomes, the Bernoulli distribution can be applied to this scenario.

PMF and CDF of Bernoulli Distribution

In general, Bernoulli trials must follow certain conditions:

  • Bernoulli trials must be independent of each other
  • Their probabilities must remain the same across the trials

A Bernoulli trial has two possible outcomes: success/failure. For success, the random variable takes value 11 with probability pp. For failure, it takes value 00 with probability 1p1-p. Note that if the probability of one outcome is pp, then the opposite is 1p1-p.

Looking back at the example in the introduction, if we consider availability to watch a movie as a success, then p=0.75p=0.75 and 1p=0.251-p=0.25. Since 1/41/4 is 0.250.25, the probability of outcome with value 00 is 0.250.25, and the other one is 10.25=0.751-0.25 = 0.75.

The probability mass function (PMF) gives the probability that a discrete random variable equals a specific value. For the Bernoulli distribution, the PMF, denoted ff, of a discrete random variable XX with a parameter pp and evaluated at xx , is as follows:

f(x,p)={px=11px=0\begin{equation*} f(x, p) = \left\{ \begin{array}{ll} p & \quad x = 1 \\ 1-p & \quad x = 0 \end{array} \right. \end{equation*}This means that xx will equal 11 with a probability pp and it will equal 00 with a probability 1p1-p. It is common to use 00 to denote an unfavorable outcome, but strictly speaking, nothing prevents us from doing otherwise.

The histogram below is the PMF based on the example described in the introduction.

PMF: x equals 0 with probability 0.25 and 1 with probability 0.75

As a side note, if we were flipping an ideal coin we would see the bars of the same height since the probability of getting heads and tails is equal to 0.50.5 and 10.51-0.5 is still 0.50.5.

The cumulative distribution function (CDF) gives the probability that a real-valued random variable is less than or equal to a specific value. For the Bernoulli distribution, the CDF, FF, of a random variable XX with a parameter pp and evaluated at xx, is as follows:

F(x,p)={0x<01p0x<11x1\begin{equation*} F(x, p) = \left\{ \begin{array}{ll} 0 & \quad x < 0 \\ 1-p & \quad 0 \leq x < 1 \\ 1 & \quad x \geq 1 \end{array} \right. \end{equation*}This means that there the probability of x<0x < 0 is 00, that of 0x<10 \leq x < 1 is 1p1-p and that of x1x \geq 1 is pp.

The step function below represents the CDF for the example from the introduction.

CDF: probaility is 0 when x is less that 0, 0.25 when x is between 0 and 1, and 1 when x is greater than or equal to 1

Mean and variance

Two important values in the field of probability and statistics are mean and variance. The mean is essentially a weighted average. The variance is a measure of variability in the data; that is, it tells you how spread out the data is.

The mean or expected value of the distribution is calculated as follows:

μ=(1p)0+p1=pμ = (1-p)\cdot0+p\cdot1=p The variance or the expected squared distance of a value from the mean is calculated as follows:

σ2=(1p)(0p)2+p(1p)2=(1p)p2+p(12p+p2)=p(1p)σ^2=(1-p)\cdot(0-p)^2 + p\cdot(1-p)^2 = (1-p)\cdot p^2 + p\cdot (1-2p+p^2) = p\cdot(1-p)As you probably noticed, the formulas in the mean and variance calculations are pretty simple. This is due to the fact that one of the outcomes is 00 and so some terms are nixed.

Let's revisit the problem from the introduction. The mean of the experiment is:

μ=p=0.75μ = p = 0.75Consequently, the variance of the experiment is:

σ2=p(1p)=0.750.25=0.1875σ^2=p \cdot (1-p) = 0.75\cdot0.25 = 0.1875

Therefore, the expected value of the experiment amounts to 0.750.75. The variance, 0.18750.1875, gives an indication of how far the data points are spread out from the expected value.

Symmetry of notation

In the example we used in this topic, we considered success to be when a person is free on a given date, and failure to be when they are not free. It is important to note that success and failure have nothing to do with the actual "SUCCESS, MONEY, AND FAME". 'Success' and 'failure' are just the labels we use.

So, what would happen if we assigned 00 and 11 differently? Let's say that success is now when a person is not free. Now, p=0.25p=0.25 and 1p=0.751-p=0.75.

This is what the PMF would look like.

PMF: x equals 0 with probability 0.75 and 1 with probability 0.25

And this is what the CDF would look like.

CDF: probaility is 0 when x is less that 0, 0.75 when x is between 0 and 1, and 1 when x is greater than or equal to 1

The mean:

μ=p=0.25μ = p = 0.25The variance:

σ2=p(1p)=0.750.25=0.1875σ^2=p \cdot (1-p) = 0.75\cdot0.25 = 0.1875Thus, while the real-life situation and experiment do not change, our model changes slightly. Our diagrams and mean value change, but the variance does not.

In general, noticing symmetry in such problems is a nice skill to have since sometimes such changes might lead to easier calculations.

Conclusion

  • The Bernoulli distribution has two possible outcomes:
    • Success: value 11, probability pp
    • Failure: value 00, probability 1p1-p
  • Probability mass function={px=11px=0\begin{equation*} \text{Probability mass function} = \left\{ \begin{array}{ll} p & \quad x = 1 \\ 1-p & \quad x = 0 \end{array} \right. \end{equation*}
  • Cumulative distribution function={0x<01p0x<11x1\begin{equation*} \text{Cumulative distribution function} = \left\{ \begin{array}{ll} 0 & \quad x < 0 \\ 1-p & \quad 0 \leq x < 1 \\ 1 & \quad x \geq 1 \end{array} \right. \end{equation*}
  • Expected value: μ=pμ = p
  • Variance: σ2=p(1p)σ^2=p \cdot (1-p)
How did you like the theory?
Report a typo