MathStatisticsDescriptive statistics

Quartiles

10 minutes read

There are different ways to analyze a set of quantitative data. Today, we're going to talk about a very helpful way to characterize distributions of random variables: quartiles. The notion of distribution was discussed earlier in the topic "Histograms and distributions".

Definition

Quartiles divide a collection of data points into four equal-sized parts, each containing 25% of the total sample size. There are the first quartile (lower quartile) and third quartile (upper quartile). The first (lower) quartile cuts off ¼¼ of the units with minimum values from the population, and the third (upper) cuts off ¼¼ of the units with maximum values, the second quartile is the median. To calculate the quartile, it is necessary to find the median of the sample which is our second quartile. On either side of the median we will have equal parts, by finding their corresponding medians we will get the first and the third quartiles. Sometimes there are also zeroth and fourth quartiles, which are simply the minimum and maximum values, respectively.

Quartiles can only be calculated when the data is ordered from the smallest to the largest.

For example, if the sample consists of 6 items, then the second item is taken as the first quartile of the sample, and the fifth item is taken as the third quartile. Let's take a look at the graph below:

An example of splitting a data sample into quartiles.

Here we see that there is no actual median represented by a specific value, so we have to calculate it by ourselves by finding the mean of 77 and 88: (7+8)/2=7.5(7+8)/2=7.5 . Luckily, here we have single elements corresponding to the 1st1st and 3rd3rd quartiles: these are 33 and 1111. But what if, like with the median, we are not able to find them in a distribution? Then, we apply the following formulas:

To find the index of an element that is the first quartile: Q1=xn+34Q_1= x_{\frac {n+3}{4}} , where nn stands for the number of elements in a distribution.

To find the index of an element that is the third quartile: Q3=x3n+14Q_3= x_{\frac{3n+1}{4}} , where nn stands for the number of elements in a distribution.

Summing up, we have the following table:

Quartile

How to find it

Q0Q_0

Minimum value

Q1Q_1

Q1=xn+34Q_1= x_{\frac {n+3}{4}}

Q2Q_2

Median

Q3Q_3

Q3=x3n+14Q_3= x_{\frac{3n+1}{4}}

Q4Q_4

Maximum value

Note:

  • If the number of values is odd, the median equals the middle value of a sorted list of values.

  • If the number of values is even, the median equals the sum of the middle pair of values divided by two.

Imagine we have a sample: {1,5,10,15,20}\{1, 5, 10, 15, 20\}. It has only five elements. The task is to find the values in all five quartiles. It's quite obvious that Q0=1,Q4=20Q_0 = 1, Q_4 = 20. The median value is 1010, so Q2=10Q_2 = 10. After applying the formulas for the remaining two quartiles, we've got that the second element is the first quartile (5+34=2\frac {5 + 3} 4 = 2 ), and the fourth element is the third quartile (15+14=4\frac {15 + 1} 4 = 4 ), so Q1=5,Q3=15Q_1 = 5, Q_3 = 15. Note that in some cases that the element index you get from the formulas above turns out to be a fraction.

Other quantiles

Quantiles are quantities that divide a population into a certain number of parts equal in the number of elements. We have just talked about quartiles, which are used when dividing ranked series into 44 equal parts, but there are some more types of quantiles. The most famous quantile is the median, which divides the population into 22 equal parts. In addition to the median, there are deciles and percentiles.

Deciles are options that divide the ranked series into 1010 equal parts. The first decile cuts off 1/101/10 of the population, and the ninth decile cuts off 9/109/10. Thus, 99 deciles are distinguished.

Percentiles, coming from the word "percent", divide the ranked series into 100100 equal parts. Accordingly, the median is the 50th50th percentile, and the first and third quartiles are the 25th25th and 75th75th percentiles, respectively. In general, we can see that the concepts of quantile and percentile are interchangeable.

Let's say a souvenir company wants to know its production rate. In order to do that, we need to find quartiles, the first decile, and the ninth decile. The list below shows the number of souvenirs made by each worker on a given day:

92,100,89,98,101,84,113,93,81,14,113,86,98,99,105,88,101,89,93,102,101,99,87,109,92,99,111,98,102,9592, 100, 89, 98, 101, 84, 113, 93, 81, 14, 113, 86, 98, 99, 105, 88, 101, 89, 93, 102, 101, 99, 87, 109, 92, 99, 111, 98, 102, 95

First of all, we sort the values and note that there are 3030 of them: 14,81,84,86,87,88,89,89,92,92,93,93,95,98,98,__,98,99,99,99,100,101,101,101,102,102,105,109,111,113,11314, 81, 84, 86, 87, 88, 89, 89, 92, 92, 93, 93, 95, 98, 98, \_\_, 98, 99, 99, 99, 100, 101, 101 , 101, 102, 102, 105, 109, 111, 113, 113

Next, we find the quartiles. The median divides our sample into two halves with 1515 values in each one (the median is indicated by a blank space). The 1st1st quartile is the 8th8th value, the 3rd3rd quartile is the 23rd23rd value: Q1=89,Q3=101Q1= 89, Q3= 101. Finally, we find deciles: the 1st1st decile is the 3rd3rd value, and the 9th9th decile is the 27th27th value: D1=84,D9=111D1 = 84, D9 = 111.

Range and interquartile range

Range (R) is the difference between the maximum and minimum values of the variation series: R=xmaxxminR = x_{max} - x_{min}.

The interquartile range (IQR) is a measure of the variability of a sample. It is defined as the difference between the upper and lower quartiles:IQR=Q3Q1IQR = Q_3-Q_1

Range and interquartile range both measure the spread in a data set. Looking at the spread lets us see how much data varies. The range is a quick way to get an idea of the spread. It takes longer to find the IQR, but it sometimes gives us more useful information about the spread. For example, IQR is better than range because it not only shows outliers but data skewness as well.

Let's see how IQR is widely used to detect outliers. Outliers are usually defined as observations that fall below the lower fence that is Q11.5IQRQ_1 - 1.5IQR or above the upper fence: (Q3+1.5IQRQ_3 + 1.5IQR).

Example

It's time to practice. Let's assume we have a sample {4,20,50,7,77,66,80,90,250,40}\{ 4, 20, 50, 7, 77, 66, 80, 90, 250, 40\}. Our task is to find all the quartiles of the sample.

First, we have to sort the data: {4,7,20,40,50,66,77,80,90,250}\{ 4, 7, 20, 40, 50, 66, 77, 80, 90, 250\}.

From the sorted data it becomes apparent that Q0=4,Q4=250Q_0 = 4, Q_4 = 250.

Now, we'll calculate the median value: Q2=12(xn2+xn2+1)=12(x102+x102+1)=12(x5+x6)=50+662=58Q_2 = \frac{1}{2}(x_{\frac{n}{2}} + x_{\frac{n}{2}+1}) = \frac{1}{2}(x_{\frac{10}{2}} + x_{\frac{10}{2}+1}) = \frac{1}{2}(x_5 + x_6) =\frac{50+66}{2} = 58.

Next, we should find the index of an element that would be the first quartile. Q1=x10+34=x134=x3.25Q_1 = x_\frac{10 +3}{4} = x_\frac{13}{4} = x_{3.25}. The third element is x3=20x_3=20, the fourth element is x4=40x_4=40. So now let's find the exact value of the first quartile using linear interpolation:Q1=20+(3.253)(4020)=20+5=25Q_1 = 20 + (3.25 - 3) (40 -20) = 20 +5= 25

For the third quartile, we'll repeat almost the same set of calculations: Q3=x30+14=x314=x7.75Q_3 = x_\frac{30 +1}{4} = x_\frac{31}{4} = x_{7.75}; x7=77,x8=80x_7 = 77 , x_8= 80, so Q3=77+(7.757)(8077)=77+2.25=79.25Q_3 = 77 + (7.75 - 7) (80 -77) = 77 + 2.25 = 79.25.

Now, we need to find the interquartile range: IQR=79.2525=54.25IQR = 79.25 - 25 = 54.25.

Let's check whether our data has outliers. The lower fence is 2581.375=56.37525 - 81.375 = -56.375, the upper fence is 79.25+81.375=160.62579.25 + 81.375 = 160.625. In our case, there aren't any observations that lie below 56.375-56.375. However, above the upper fence, on the contrary, lies one observation, namely, 250250. So, 250250 is the outlier for our sample.

Conclusion

In this topic, we have discussed the idea of quartiles and interquartile ranges, how to calculate them, and what other types of quantiles exist.

18 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo