MathProbabilityDiscrete random variables

Discrete random variable

Provided by: Edvancium

13 minutes read

Quite often, the sample space does not help us determine the numerical characteristics of an experiment. An excellent example of this is an experiment with the sum of two dice. Here, we are always dealing with a set of pairs. However, it is much easier to work with one set of numbers representing the sum of the elements in a pair.

For this purpose, we will now become familiar with the concepts of a discrete random variable, probability distribution, and the independence of a random variable. These concepts will later help us define characteristics of a random variable, such as expected value and dispersion.

In fact, you deal with discrete random variables all the time in real life when you pick an apple from a fridge, knowing that some of them are green and some of them are red, or choose a route through the park, knowing that some routes are long and others are short.

Discrete random variable

Imagine that your friend said they picked a number from $1$ to $10$ and asked you to guess it in three tries. We know that the sample space has $10$ elements, so it's discrete. We also know that your friend picked the number at random. So, we can say that this number is a discrete random variable. What would that be if we wanted to be more strict about this definition? How would we use such an entity?

A random variable is an arbitrary function from the sample space $\Omega$ to the real numbers $\xi: \Omega \rightarrow \mathbb{R}$ . However, we will only need a discrete random variable $\xi : \Omega \rightarrow \mathbb{Z}$ , which is a function from the sample space to the integers.

Let's discuss some more examples to better understand this concept.

Imagine tossing two dice. In this experiment, $\xi$ can be defined as:

$\xi((i, j)) = i + j$

where $i$ is the result of the first dice and $j$ is the second. The argument here is a pair because the sample space is a set of pairs. All the possible outcomes are presented in the following picture.

Outcomes of rolling two dices

Imagine that the first dice has $2$ points, and the second one has $3$ . Their sum will look like this:

$\xi((2, 3)) = 2 + 3 = 5$

Let's move on to a coin toss, since it is also a random experiment. Why is it so? Let's say we tossed a coin $n$ times. We want to find out how many times tails come up. In that case, $\Omega$ is a set of binary strings of length $n$ where $1$ is a tail and $0$ is a head.

$\Omega = \{ \text{binary string} : \#ones = k \text{ and length}=n \}$

So, if we toss a coin three times and tails come up twice, then the proper representation of binary sets is $011, 101, 110$ . We can define $\xi$ as the sum of bits in the string, so it gives us the number of tails.

We've already learned about one-dimensional random variables. But such variables can also have more than one dimension! For example, when you pick a seat at the cinema and expect to be seated next to your friend who already got a ticket somewhere in the middle, you are dealing with a two-dimensional random variable.

Picking seat next to a friend

Distribution

Technically, we cannot calculate probability with a random variable because we map our elementary outcomes to numbers. However, as a workaround, we can define the probability of a random variable taking a specific value instead.

Let's assume that $\Omega$ is a finite set. The set of $\xi$ values is finite, too. We can denote the set of $\xi$ values as $X = \{x_1, x_2, \dots, x_n\}$ . We can assume that an event $A_k = x_k$ is the set of points $\omega$ of the sample space for which $\xi(\omega)=x_k$ . Mathematically, this statement should be written as:

$A_k = \{ \omega \in \Omega: \xi(\omega) = x_k \}$

We can calculate the probability of $A_k$ by summarizing the probabilities of each outcome in our set:

$\mathbb{P}(A_k) = \sum \limits_{\omega: \xi(\omega) = x_k} {\mathbb{P}(\omega)}$

So, a function that describes the probability of a variable taking a specific value is called the probability distribution. It can also be represented as a set below.

$\{ (x_1, \mathbb{P}(A_1)), (x_2, \mathbb{P}(A_2)), \dots, (x_n, \mathbb{P}(A_n)) \}$

Let's take our example with two dice and calculate the distribution. There are $6 \cdot 6 = 36$ different outcomes in this experiment. Now we have to calculate the probability for each possible sum. How do we do it? We have to use all sample points picked by the event. So, the probability that the $\xi((i, j)) = i + j$ score sum is $2$ is $\frac{1}{36}$ , as there is only one suitable outcome $(1, 1)$ . If we are looking for $3$ , then there are two options, $(1, 2)$ and $(2, 1)$ . Now, you can do the calculations for other values and compare the result with the following diagram.

Diagram of probability of outcomes when tossing two dice

Let's consider another example of the distribution of a random variable representing the number of siblings a person has. In this case, the random variable can take on discrete values: $0, 1, 2, 3$ , and so on, representing the number of siblings a person has. Let's assume that we have a sample of $10$ people. $3$ of them have no siblings, $4$ have $1$ sibling, $2$ have $2$ siblings, and $1$ has $3$ siblings. The following diagram represents the distribution in this case.

Distribution of number of siblings

There are two things to remember. First, a simple table or diagram is enough to represent any distribution. Second, our distribution is independent of $\Omega$ . Thus, a variable's set of values and distribution can fully characterize every random variable.

Common discrete distributions

As you progress in probability, you will encounter many distributions, but here we will focus on the two most common ones.

First, let's consider the Bernoulli distribution. It is commonly used when we have one yes-no question, like tossing a coin. It is a special case of the Binomial distribution, which is used when we have a set of identical yes-no questions.
Let's discuss the Binomial distribution in more detail. Imagine that we toss a coin $n$ times and count how many times a tail appears. We can generalize it even further. With every toss, there is a probability of success ( $p$ ) and a probability of failure ( $q = 1 - p$ ). It is crucial for the experiments to be independent of each other. If we want to count the number of successful outcomes, we can refer to binary strings again. A random variable $\xi$ refers to the number of successes in a series of $n$ similar experiments.

Independence

Let's go back to the dice example and consider one more random variable $\eta$ that's equal to the product of the scores:

$\eta((i, j)) = i \cdot j$

If we need to calculate the probability $\mathbb{P}(\xi = x_i, \eta = y_j)$ of a set $Y = \{y_1, y_2 ,\dots, y_m\}$ of $\eta$ values, we can refer to the joint distribution. In general, it is extremely difficult or even impossible to calculate a joint distribution. There is one important exception, and that is the independence of random variables. Two random variables $\xi$ and $\eta$ are independent if:

$\mathbb{P}(\xi = x, \eta = y) = \mathbb{P}(\xi = x) \cdot \mathbb{P}(\eta = y)$ for all $x$ and $y$

So, $\xi$ and $\eta$ from the dice example are not independent because:

$\frac{2}{36} = \mathbb{P}(\xi = 4, \eta = 3) \ne \mathbb{P}(\xi = 4) \cdot \mathbb{P}(\eta = 3) = \frac{3}{36} \cdot \frac{2}{36} = \frac{6}{36}$

When the sum of two dice equals $2$ , we know that $1$ is the outcome for each dice. However, if a random variable $\xi$ represents the outcome of the first dice and $\eta$ represents the outcome of the second dice, they are independent. For all $i, j \in 1..6$ , the following is true:

$\frac{1}{36} = \mathbb{P}(\xi = i, \eta = j) = \mathbb{P}(\xi = i) \cdot \mathbb{P}(\eta = j) = \frac{1}{6} \cdot \frac{1}{6}$

Later we will see that the independence of random variables greatly simplifies many calculations.

Now we know the basic definitions and properties of random variables and distributions. But what are they needed for? Let's find out together in the next topic!

Calculations and expectations

If you know the distribution, you can estimate the expected value of a random variable. It is a measure of its average or long-term value. It represents the average outcome we would expect to observe if we repeated an experiment or observation many times.

Calculating the expected variable can be a little tricky, and it deserves a separate topic. Here we'll just briefly discuss what can be concluded using the expected value. Imagine that you were asked if it is challenging or not to get a sum equal to at least $10$ when throwing $5$ dice. For each dice, the expected value is $3.5$ , so for $5$ dice, the sum can be roughly $3.5\cdot5=17.5$ . So it is not that challenging to get a sum equal to at least $10$ .

You can also estimate more properties using distributions, but this will be discussed in the following topics. There's one thing left before finishing our topic: hints for solving tasks.

Challenges with discrete random variables

Here are some points that you should remember when dealing with discrete random variables.

The gambler's fallacy. It is a cognitive bias that occurs when people believe that the outcome of a random event is more likely to occur or not occur based on previous outcomes, even when the events are statistically independent. For example, let's consider a game of flipping a fair coin. If the coin is flipped and lands on heads five times in a row, someone experiencing the gambler's fallacy might incorrectly believe that the next flip is more likely to result in tails, thinking that "tails is due" or "it's about time for tails to come up." However, in reality, the probability of getting heads or tails on each flip remains $50\%$ , regardless of the previous outcomes. The outcome of one coin flip does not influence the outcome of subsequent flips.
Conditions are important. Let's consider an example of rolling a fair six-sided dice. Each outcome is equally likely to occur, so the probability of getting any specific number is $\frac{1}{6}$ . Now, let's modify the example slightly. Suppose we have a biased six-sided dice, where the probability of rolling a $1$ is $\frac{1}{3}$ , the probability of rolling a $2$ is $\frac{1}{6}$ , and the remaining numbers ( $3, 4, 5$ and $6$ ) each have a probability of $\frac{1}{12}$ . To calculate the probability of a compound event, let's say we want to find the probability of rolling an even number and then rolling a $1$ on the next roll. The probability of rolling an even number on the first roll is the sum of the probabilities of rolling a $2, 4$ or $6$ , which is $\frac{1}{6} + \frac{1}{12} + \frac{1}{12} = \frac{1}{3}$ . Now, since we want to roll a $1$ on the next roll, the probability of rolling a $1$ is $\frac{1}{3}$ . To find the probability of both events occurring, we multiply the individual probabilities together: $\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}$ .
Dealing with samples of large size. If you're dealing with a sample of large size, it might be useful to find a subset of the sample or redefine the problem to the opposite one and then use the outcome to find what is asked. For example, suppose we flip a fair coin $100$ times and count the number of heads. The random variable in this case is the number of heads obtained, and it can take values from $0$ to $100$ . The probability distribution of this random variable follows a binomial distribution. By redefining the problem to the opposite, we can also consider the distribution of the number of tails obtained. Since the coin is fair, the number of tails will be equal to the number of flips minus the number of heads. So, in this case, the random variable is the number of tails, which can also take values from $0$ to $100$ . The probability distribution of the number of tails will also follow a binomial distribution. By redefining the problem to the opposite, we can gain a different perspective and analyze the distribution of the complement of the random variable. This approach can be useful in certain scenarios, such as when calculating probabilities or making comparisons between different outcomes.

Conclusion

Let's summarize what we have learned:

We deal with discrete random variables in everyday situations when we pick or choose something.
We can use a function from $\Omega$ to $\mathbb{Z}$ to avoid dealing with a specific $\Omega$ .
We can consider the distribution of a discrete random variable $\xi$ to be a simple table where each of its values is associated with the probability of the variable taking that specific value. Diagrams help to visualize it.
The Bernoulli distribution is a special case of the binomial distribution with only one trial.
Dependence and independence affect calculations with random variables.
If you know the distribution, you can estimate the expected value of the random variable, which shows its average or long-term value.
Don't forget about the gambler's fallacy and the importance of conditions when dealing with random variables.
If you deal with large sample sizes, it might be useful to use subsets of the sample or even redefine the problem.

How did you like the theory?

Report a typo

Discrete random variable

Discrete random variable

Distribution

Common discrete distributions

Independence

Calculations and expectations

Challenges with discrete random variables

Conclusion

Related topics