MathProbabilityDiscrete random variables

Discrete random variable

13 minutes read

Quite often, the sample space does not help us determine the numerical characteristics of an experiment. An excellent example of this is an experiment with the sum of two dice. Here, we are always dealing with a set of pairs. However, it is much easier to work with one set of numbers representing the sum of the elements in a pair.

For this purpose, we will now become familiar with the concepts of a discrete random variable, probability distribution, and the independence of a random variable. These concepts will later help us define characteristics of a random variable, such as expected value and dispersion.

In fact, you deal with discrete random variables all the time in real life when you pick an apple from a fridge, knowing that some of them are green and some of them are red, or choose a route through the park, knowing that some routes are long and others are short.

Discrete random variable

Imagine that your friend said they picked a number from 11 to 1010 and asked you to guess it in three tries. We know that the sample space has 1010 elements, so it's discrete. We also know that your friend picked the number at random. So, we can say that this number is a discrete random variable. What would that be if we wanted to be more strict about this definition? How would we use such an entity?

A random variable is an arbitrary function from the sample space Ω\Omega to the real numbers ξ:ΩR\xi: \Omega \rightarrow \mathbb{R}. However, we will only need a discrete random variable ξ:ΩZ\xi : \Omega \rightarrow \mathbb{Z}, which is a function from the sample space to the integers.

Let's discuss some more examples to better understand this concept.

Imagine tossing two dice. In this experiment, ξ\xi can be defined as:

ξ((i,j))=i+j\xi((i, j)) = i + j

where ii is the result of the first dice and jj is the second. The argument here is a pair because the sample space is a set of pairs. All the possible outcomes are presented in the following picture.

Outcomes of rolling two dices

Imagine that the first dice has 22 points, and the second one has 33. Their sum will look like this:

ξ((2,3))=2+3=5\xi((2, 3)) = 2 + 3 = 5

Let's move on to a coin toss, since it is also a random experiment. Why is it so? Let's say we tossed a coin nn times. We want to find out how many times tails come up. In that case, Ω\Omega is a set of binary strings of length nn where 11 is a tail and 00 is a head.

Ω={binary string:#ones=k and length=n}\Omega = \{ \text{binary string} : \#ones = k \text{ and length}=n \}

So, if we toss a coin three times and tails come up twice, then the proper representation of binary sets is 011,101,110011, 101, 110. We can define ξ\xi as the sum of bits in the string, so it gives us the number of tails.

We've already learned about one-dimensional random variables. But such variables can also have more than one dimension! For example, when you pick a seat at the cinema and expect to be seated next to your friend who already got a ticket somewhere in the middle, you are dealing with a two-dimensional random variable.

Picking seat next to a friend

Distribution

Technically, we cannot calculate probability with a random variable because we map our elementary outcomes to numbers. However, as a workaround, we can define the probability of a random variable taking a specific value instead.

Let's assume that Ω\Omega is a finite set. The set of ξ\xi values is finite, too. We can denote the set of ξ\xi values as X={x1,x2,,xn}X = \{x_1, x_2, \dots, x_n\}. We can assume that an event Ak=xkA_k = x_k is the set of points ω\omega of the sample space for which ξ(ω)=xk\xi(\omega)=x_k. Mathematically, this statement should be written as:

Ak={ωΩ:ξ(ω)=xk}A_k = \{ \omega \in \Omega: \xi(\omega) = x_k \}

We can calculate the probability of AkA_k by summarizing the probabilities of each outcome in our set:

P(Ak)=ω:ξ(ω)=xkP(ω)\mathbb{P}(A_k) = \sum \limits_{\omega: \xi(\omega) = x_k} {\mathbb{P}(\omega)}

So, a function that describes the probability of a variable taking a specific value is called the probability distribution. It can also be represented as a set below.

{(x1,P(A1)),(x2,P(A2)),,(xn,P(An))}\{ (x_1, \mathbb{P}(A_1)), (x_2, \mathbb{P}(A_2)), \dots, (x_n, \mathbb{P}(A_n)) \}

Let's take our example with two dice and calculate the distribution. There are 66=366 \cdot 6 = 36 different outcomes in this experiment. Now we have to calculate the probability for each possible sum. How do we do it? We have to use all sample points picked by the event. So, the probability that the ξ((i,j))=i+j\xi((i, j)) = i + j score sum is 22 is 136\frac{1}{36}, as there is only one suitable outcome (1,1)(1, 1). If we are looking for 33, then there are two options, (1,2)(1, 2) and (2,1)(2, 1). Now, you can do the calculations for other values and compare the result with the following diagram.

Diagram of probability of outcomes when tossing two dice

Let's consider another example of the distribution of a random variable representing the number of siblings a person has. In this case, the random variable can take on discrete values: 0,1,2,30, 1, 2, 3, and so on, representing the number of siblings a person has. Let's assume that we have a sample of 1010 people. 33 of them have no siblings, 44 have 11 sibling, 22 have 22 siblings, and 11 has 33 siblings. The following diagram represents the distribution in this case.

Distribution of number of siblings

There are two things to remember. First, a simple table or diagram is enough to represent any distribution. Second, our distribution is independent of Ω\Omega. Thus, a variable's set of values and distribution can fully characterize every random variable.

Common discrete distributions

As you progress in probability, you will encounter many distributions, but here we will focus on the two most common ones.

First, let's consider the Bernoulli distribution. It is commonly used when we have one yes-no question, like tossing a coin. It is a special case of the Binomial distribution, which is used when we have a set of identical yes-no questions.
Let's discuss the Binomial distribution in more detail. Imagine that we toss a coin nn times and count how many times a tail appears. We can generalize it even further. With every toss, there is a probability of success (pp) and a probability of failure (q=1pq = 1 - p). It is crucial for the experiments to be independent of each other. If we want to count the number of successful outcomes, we can refer to binary strings again. A random variable ξ\xi refers to the number of successes in a series of nn similar experiments.

Independence

Let's go back to the dice example and consider one more random variable η\eta that's equal to the product of the scores:

η((i,j))=ij\eta((i, j)) = i \cdot j

If we need to calculate the probability P(ξ=xi,η=yj)\mathbb{P}(\xi = x_i, \eta = y_j) of a set Y={y1,y2,,ym}Y = \{y_1, y_2 ,\dots, y_m\} of η\eta values, we can refer to the joint distribution. In general, it is extremely difficult or even impossible to calculate a joint distribution. There is one important exception, and that is the independence of random variables. Two random variables ξ\xi and η\eta are independent if:

P(ξ=x,η=y)=P(ξ=x)P(η=y)\mathbb{P}(\xi = x, \eta = y) = \mathbb{P}(\xi = x) \cdot \mathbb{P}(\eta = y) for all xx and yy

So, ξ\xi and η\eta from the dice example are not independent because:

236=P(ξ=4,η=3)P(ξ=4)P(η=3)=336236=636\frac{2}{36} = \mathbb{P}(\xi = 4, \eta = 3) \ne \mathbb{P}(\xi = 4) \cdot \mathbb{P}(\eta = 3) = \frac{3}{36} \cdot \frac{2}{36} = \frac{6}{36}

When the sum of two dice equals 22, we know that 11 is the outcome for each dice. However, if a random variable ξ\xi represents the outcome of the first dice and η\eta represents the outcome of the second dice, they are independent. For all i,j1..6i, j \in 1..6, the following is true:

136=P(ξ=i,η=j)=P(ξ=i)P(η=j)=1616\frac{1}{36} = \mathbb{P}(\xi = i, \eta = j) = \mathbb{P}(\xi = i) \cdot \mathbb{P}(\eta = j) = \frac{1}{6} \cdot \frac{1}{6}

Later we will see that the independence of random variables greatly simplifies many calculations.

Now we know the basic definitions and properties of random variables and distributions. But what are they needed for? Let's find out together in the next topic!

Calculations and expectations

If you know the distribution, you can estimate the expected value of a random variable. It is a measure of its average or long-term value. It represents the average outcome we would expect to observe if we repeated an experiment or observation many times.

Calculating the expected variable can be a little tricky, and it deserves a separate topic. Here we'll just briefly discuss what can be concluded using the expected value. Imagine that you were asked if it is challenging or not to get a sum equal to at least 1010 when throwing 55 dice. For each dice, the expected value is 3.53.5, so for 55 dice, the sum can be roughly 3.55=17.53.5\cdot5=17.5. So it is not that challenging to get a sum equal to at least 1010.

You can also estimate more properties using distributions, but this will be discussed in the following topics. There's one thing left before finishing our topic: hints for solving tasks.

Challenges with discrete random variables

Here are some points that you should remember when dealing with discrete random variables.

  1. The gambler's fallacy. It is a cognitive bias that occurs when people believe that the outcome of a random event is more likely to occur or not occur based on previous outcomes, even when the events are statistically independent. For example, let's consider a game of flipping a fair coin. If the coin is flipped and lands on heads five times in a row, someone experiencing the gambler's fallacy might incorrectly believe that the next flip is more likely to result in tails, thinking that "tails is due" or "it's about time for tails to come up." However, in reality, the probability of getting heads or tails on each flip remains 50%50\%, regardless of the previous outcomes. The outcome of one coin flip does not influence the outcome of subsequent flips.

  2. Conditions are important. Let's consider an example of rolling a fair six-sided dice. Each outcome is equally likely to occur, so the probability of getting any specific number is 16\frac{1}{6}. Now, let's modify the example slightly. Suppose we have a biased six-sided dice, where the probability of rolling a 11 is 13\frac{1}{3}, the probability of rolling a 22 is 16\frac{1}{6}, and the remaining numbers (3,4,53, 4, 5 and 66) each have a probability of 112\frac{1}{12}. To calculate the probability of a compound event, let's say we want to find the probability of rolling an even number and then rolling a 11 on the next roll. The probability of rolling an even number on the first roll is the sum of the probabilities of rolling a 2,42, 4 or 66, which is 16+112+112=13\frac{1}{6} + \frac{1}{12} + \frac{1}{12} = \frac{1}{3}. Now, since we want to roll a 11 on the next roll, the probability of rolling a 11 is 13\frac{1}{3}. To find the probability of both events occurring, we multiply the individual probabilities together: 1313=19\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}.

  3. Dealing with samples of large size. If you're dealing with a sample of large size, it might be useful to find a subset of the sample or redefine the problem to the opposite one and then use the outcome to find what is asked. For example, suppose we flip a fair coin 100100 times and count the number of heads. The random variable in this case is the number of heads obtained, and it can take values from 00 to 100100. The probability distribution of this random variable follows a binomial distribution. By redefining the problem to the opposite, we can also consider the distribution of the number of tails obtained. Since the coin is fair, the number of tails will be equal to the number of flips minus the number of heads. So, in this case, the random variable is the number of tails, which can also take values from 00 to 100100. The probability distribution of the number of tails will also follow a binomial distribution. By redefining the problem to the opposite, we can gain a different perspective and analyze the distribution of the complement of the random variable. This approach can be useful in certain scenarios, such as when calculating probabilities or making comparisons between different outcomes.

Conclusion

Let's summarize what we have learned:

  • We deal with discrete random variables in everyday situations when we pick or choose something.

  • We can use a function from Ω\Omega to Z\mathbb{Z} to avoid dealing with a specific Ω\Omega.

  • We can consider the distribution of a discrete random variable ξ\xi to be a simple table where each of its values is associated with the probability of the variable taking that specific value. Diagrams help to visualize it.

  • The Bernoulli distribution is a special case of the binomial distribution with only one trial.

  • Dependence and independence affect calculations with random variables.

  • If you know the distribution, you can estimate the expected value of the random variable, which shows its average or long-term value.

  • Don't forget about the gambler's fallacy and the importance of conditions when dealing with random variables.

  • If you deal with large sample sizes, it might be useful to use subsets of the sample or even redefine the problem.

How did you like the theory?
Report a typo