Random variables that are defined by similar questions belong to the same class of random variables. Consider a random variable , that describes, how many times on average you must play a game to win if you know the probability of winning. Another random variable describes, how many times you must toss a fair coin to get heads. The random variables and are both defined by a question: "how many times must you do an action before getting a desirable result?". These random variables belong to the same class or family. They behave similarly and can be described and calculated using the same rules and principles.
If a random variable belongs to such class it is said that it is described by some distribution function. We will learn of three main distributions of discrete random variables – geometric, binomial, and Poisson distributions. Learning them will give you a base knowledge of random variables and their distributions.
Geometric distribution
Don't mind the name, it has nothing to do with the geometry. The formal definition of this distribution states: consider a sequence of independent trials (the result of each separate try doesn't impact the results of others) where the result of each trial may either be true or false. The probability that a trial is successful is denoted by and is equal for all tries. A random variable describes the probability that the first successful trial will be the n-th trial, and we express it this way: .
For there are 4 possible outcomes — {HH, HT, TH, TT}. HH doesn't satisfy our condition, because in this case, the first success would have been in the first toss, not the second one. We are left only with one variant — TH, so it's probability is or 25%. So the more tries we do, the lower our chances of failing in all the tries are.
This time calculating these values wasn't that hard, but for a large or different it may get much more complicated. And for such tasks, we introduce the probability mass function (PMF) which maps the number of trials to the probability . The probability mass function describes the probability of a random variable taking the value , and uses different formulas for random variables with different special distributions.
For a random variable with geometric distribution it is computed as follows: . Now let's repeat the calculation from above using the PMF. The is still equal 50% and for we have: or 50%, just what we got before. Now, for : again, we get the same result. Now, let's experiment for a bit. Using the PMF we calculate the probability of taking different values of and and map the results onto a graph to compare them:
As you may see, the graphs look very different depending on the value of , but all of them follow the same pattern — the probability drops as the value of rises. All graphs that describe the PMF of a random variable with geometric distribution follow this trend and look somewhat like the graph of function .
Binomial distribution
This time we look at the total number of desired results from a set number of tries. We have a number of independent trials and is the probability of success is the same for all trials. A random variable with the binomial distribution describes the number of successful trials. This may be written as follows: .
Imagine yourself participating in an archery contest. You may hit the target with the probability of 70% and there are a total of shots. The random variable would describe the number of shots that hit the target during the contest. The probability mass function of a random variable with binomial distribution is calculated in such a way: .
Using the PMF formula let's try to find the probability of you hitting out of shots: . Now let's calculate the values for different and , and draw a graph:
The graph takes a form of a "bell". Its height, as well as a shift towards left or right, depend on the probability and the number of hits .
Poisson distribution
A random variable with the Poisson distribution has the same properties as the ones with the binomial distribution, but we use Poisson instead, when a probability of each separate event is extremely small, or a number of tries is extremely large — hundreds of thousands, millions, etc. Such random variables are denoted as: with .
For example, if a factory has produced light bulbs, and the probability of a defect in a light bulb . A random variable that describes the number of faulty bulbs would have Poisson distribution. It would use too many resources to calculate the PMF using the binomial formula for these values, so we use a much simpler PMF formula that gives us an approximate value: . So for given values, the probability that , where k is the number of defective lightbulbs, would be approximately: 0.104. Now let's compare the graphs for different values of :
As you may notice the graphs once again are look like bells. It makes sense because the PMF of a random variable with Poisson distribution is an approximation of PMF of a random variable with the binomial distribution. So their graphs look similar. Note that the parameters and once again influence the form of the graphs. You will learn how and why this happens in future topics.
Conclusion
In this topic you have learned that random variables may be divided into classes depending on their underlying question, and such classes are called distributions of random variables. The three main distributions are geometric distribution, binomial distribution, and Poisson distribution. If, for example, a random variable has the geometric distribution, we write: . Each type of distribution has unique ways of describing corresponding random variables and making calculations with them. You have learned about Probability mass function – a function that maps all possible values of a random variable to the probability that the random variable will actually take these values. This function is denoted as . The way we compute PMF depends on the type of special distribution a random variable has. The graphs of different random variables with same special distribution share similar form.