Understanding distributions is essential when it comes to statistics. Distributions help to model our world, helping us to obtain estimates of the probability that a certain event may occur, or estimate the variability of occurrence. They are a common way to describe, and possibly predict, the probability of an event: from the level of job satisfaction to the population of rare insects.
This topic covers a variety of distributions as well as how they are classified and used.
Distribution
Sometimes the amount of data is so big that it is impossible to sort all the values manually. Here comes the idea of distribution.
A statistical, or probability, distribution describes how values are distributed and how often they occur. In other words, a distribution in statistics is a function that shows the possible values for a variable and the probability of them occurring.
For example, here's a histogram that shows the distribution of shoe sizes in the USA.
We can understand that data about 10 people will be less accurate than data about or people. So, the more values we have, the narrower our "stacks" will be, and the more accurate and universal our graph will be.
Types of distributions
Now that we know what a distribution is, we shall look at two types of distributions: discrete and continuous.
A discrete distribution is one in which the data can only take on discrete values. It describes the probability of each possible outcome of a discrete random variable that takes only distinct and separate values. Discrete values are countable, finite, non-negative integers, such as 1, 10, 15, and so on. For example, the number of phone calls per day, the number of passengers carried, and the number of defects in a batch of products.
Common discrete distributions that you can meet in statistics:
The Bernoulli distribution
Binomial distribution
The Poisson distribution
A continuous distribution is one in which the data can only take on continuous random variables. A continuous random variable is a random variable with a set of possible values (known as the range) that is infinite and uncountable. For example, the duration of a phone call, or daily power consumption.
Common continuous distributions that you can meet in statistics:
Normal, or the Gaussian, distribution
Uniform distribution
Exponential distribution
Discrete distributions
The Bernoulli distribution – a uniform distribution with only two values – "success" and "failure", or and . It is used to model a random experiment with a predetermined probability of success or failure. Now, the idea behind the Bernoulli distribution is that the experiment is repeated only once.
Let's say we know the value is the probability that a visitor entering the store will be a buyer. Then is the probability that a visitor entering the store will not be a buyer. Obviously, there are only two outcomes, which means that the probability will have only two values: "buyer" or "not a buyer". So, in this type of distribution, the probability can be equal only or , depending on the customer .
Binomial distribution – a distribution of the number of "successes" in a sequence of random experiments, such that the probability of "success" in each of them is constant. The observations are independent from each other and each observation represents one of two outcomes ("success" or "failure"). Basically, this is what happens if we run more than one experiment, under the assumption that experiments are independent from each other.
An example is the successive tossing of a coin, because:
1) There are two outcomes in each toss: heads and tails.
2) The probability of "success" (for example, heads) is constant and always equals .
The Poisson distribution – a binomial distribution that models a random variable representing the number of events that occurred in a fixed time, provided that these events occur with some fixed average intensity and independently of each other. This distribution is heavily used in quality control charts, telecommunications, medical statistics, etc.
Consider a very typical situation for the Poisson distribution: purchases in shops occur at random times. Let us determine the number of occurrences of such events in a certain time interval. A random number of events that occurred this time will be distributed according to Poisson's law.
axis is the number of purchases in the shop in a time interval
axis is the probability of this number of purchases happening in our time interval
is the average number of purchases that usually happen over this time
Continuous distributions
Below are some of the most common continuous distributions that you can meet in statistics:
Uniform distribution – a distribution where each value in the distribution occurs with the same probability. For example, we know that the weight of a box for transporting vegetables is uniformly distributed in the range from to grams. One box is randomly selected. Unfortunately, we can't calculate the weight of this box even with the help of uniform distribution. However, we know another important thing: the probability of this box being g is equal to the probability of this box being g, and so on, up to g. Below you can find a graph that represents such distribution:
axis is the weight of a box
axis is the probability of a specific weight occurring
are the biggest and the smallest possible weights ( respectively)
Normal, or the Gaussian, distribution – a distribution that plays a critical role in many fields of knowledge, especially in physics. A physical quantity follows a normal distribution when it is influenced by a huge number of random noises. For example, the coordinates of the point of impact of a projectile, height, and weight of a person. Discussing normal distribution is a whole other topic, so we will just recall two important things. First, the mean, mode, and median are all at one center point, and second, the normal distribution is symmetrical around the mean.
Exponential distribution – a distribution that allows you to model the time intervals between the occurrence of events. For example, the time after which the customer ends the search and orders something in the store (success), the time after which the equipment fails (failure), and the waiting time for the bus (arrival).
Conclusion
A statistical distribution, or probability distribution, describes how the values are distributed. In other words, the statistical distribution shows which values are common and which are uncommon.
Of course, there are many kinds of statistical distributions. All of them show how a group can be classified into subgroups based on different characteristics. Here we have covered the most popular and often-used types of distributions.
Discrete distributions. Variables in them, as a result of a test, take on individual values with certain probabilities. The number of possible values can be finite or infinite. Such distributions include uniform, binomial, Bernoulli, and Poisson.
Continuous distributions. Variables in them, as a result of the test, take all values from a certain numerical interval. The number of possible values is infinite. Such distributions include normal and exponential.
All of these statistical distributions help us to determine how likely a given value is.