What is the five number summary in statistics?

Artículo revisado y aprobado por nuestro equipo editorial, siguiendo los criterios de redacción y edición de YuBrain.

Descriptive statistics allows us to summarize a data set in a small number of numbers or measures that serve to describe how that data is distributed. There are different measures that serve to describe the central tendency of the data, its dispersion and the shape of the distribution curves, some of which are found in the five-number summary.

What is the five number summary?

Based on the above, the summary of five numbers can be defined as a set of five measures or statistics related to a data set that allow describing in a very simple way the amplitude of the set, its dispersion. It also provides a measure of its central tendency. In addition, the five-number summary can also be represented graphically, making it easy to visualize these characteristics of a data set, while allowing it to be easily compared with other related data sets.

What are the five numbers and what do they mean?

The five-number summary is made up of the minimum value, the three quartiles, and the maximum value of a series of statistical data. Quartiles are those data or values ​​that divide the ordered set of all data into four subgroups with the same number of elements . Thus, if we have a set of 100 data, the quartiles are those data or values ​​that divide the set into 4 subsets of 25 data each.

The quartiles are named in the order in which they appear, from lowest to highest, such as the first, second, and third quartiles. In addition, they are represented by the capital letter Q followed by the number that indicates their ordinal position. By its definition, the second quartile, Q2, is also known as the median or midpoint of the data . It should not be confused with the mean, which is the arithmetic average of the data.

In addition to the three quartiles (Q1, Q2, and Q3), the five-number summary also includes the minimum value of the data, ordered from smallest to largest, and the maximum value. In other words, the five numbers in this summary are:

  • Minimum.– It is the first value of a set of statistical data ordered from lowest to highest. It is the lowest value data.
  • Q1 or first quartile.– It is that data or value that divides the data set, leaving 25% (or a quarter) of them below and the other 75% above.
  • Q2 or second quartile.– It is the data or value that divides the data set into two equal groups. That is, it is the value that leaves 50% of the data both below and above it, so it also represents the median or midpoint of the data.
  • Q3 or third quartile.– This is the data or value that leaves 75% or three quarters of the data below and the other 25% above.
  • Maximum.– As its name indicates, it is the data with the highest value of the entire data series. That is, it is the last data when they are ordered from lowest to highest.

When interpreting the five number summary, the difference between the minimum and maximum value provides what is known as the width of the data series. On the other hand, the difference between the third and first quartiles, called the Interquartile Range (RIC), shows us how dispersed the data is, since it indicates the range of values ​​that contains 50% of the central data.

On the other hand, the second quartile or median is a measure of central tendency that can be used to represent the value of all the data in the series in a single number. Although the mean is often used as a measure of central tendency in many situations, the median offers the advantage of not being sensitive to extreme values ​​(too high or too low).

Box plots: the graphical representation of the five number summary

A practical way to visualize a summary of five numbers is by means of what is called a box plot or Box Plot . In this type of representation, the interquartile range (IQR) is represented as a rectangle or box that extends from Q1 to Q3, and is divided in two by a line perpendicular to the measurement axis located in Q2, that is, in the median.

Finally, on each side of the box lines are drawn parallel to the measurement axis that extend from the minimum to Q1 and from Q3 to the maximum, as long as the minimum and maximum are not more than 1.5.RIC of distance to the left and right of Q1 and Q3, respectively. These lateral lines are what are known as the whiskers of the box. If there is data outside the range demarcated by Q1 – 1.5.RIC and Q3 + 1.5.RIC, then the sides (sometimes called whiskers) extend to the data furthest from the box that is inside. within that range, and the rest are marked as outliers.

Example of the preparation of the summary of five numbers for a series of data

Next, the procedure is presented, step by step, for the elaboration of a summary of five numbers from a set of statistical data. In addition, it explains how to build the box plot for the visualization of this summary in graphical form.

The data correspond to the number of items sold in the women’s department of a department store during a 10-week period. The results of the study are presented below:

Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Week 1 158 145 156 156 164 167 147
week 2 161 146 157 152 162 160 153
W eek 3 152 150 157 155 164 166 152
week 4 150 149 153 162 169 162 149
week 5 157 152 154 155 168 161 155
week 6 157 145 160 164 164 168 149
week 7 160 152 151 152 168 163 145
week 8 157 152 155 156 162 169 155
week 9 160 148 157 150 164 170 154
week 10 158 146 163 158 165 169 150

Step 1: Sort all the data from smallest to largest and assign them an index starting with 1.

The result of this step is presented below:

Index Worth Index Worth Index Worth Index Worth
1 145 22 152 43 158 64 168
2 145 23 153 44 160 65 168
3 145 24 153 Four. Five 160 66 168
4 146 25 154 46 160 67 169
5 146 26 154 47 160 68 169
6 147 27 155 48 161 69 169
7 148 28 155 49 161 70 170
8 149 29 155 fifty 162
9 149 30 155 51 162
10 149 31 155 52 162
eleven 150 32 156 53 162
12 150 33 156 54 163
13 150 3. 4 156 55 163
14 150 35 157 56 164
fifteen 151 36 157 57 164
16 152 37 157 58 164
17 152 38 157 59 164
18 152 39 157 60 164
19 152 40 157 61 165
twenty 152 41 158 62 166
twenty-one 152 42 158 63 167

Step 2: Determine the Q1 and Q3 quartiles

To determine the Q1, Q2 and Q3 quartiles, we begin by calculating an index for the data corresponding to each quartile. The formula is the following:

Five Number Summary

Five Number Summary

Five Number Summary

Where N is the total number of data. This calculation can be integer or not, so the procedure is divided into two cases:

Case 1: Integer result

If the result is integer, then the respective quartile will be the value of the data to which the index corresponds. For example, if the index of Q1 gives 10, this means that Q1 will be the value of data number 10 (149 in our example).

Case 2: Decimal result

If the index is a decimal number, then the quartile will not correspond exactly to any of the data present in the series. In this case, the result is rounded down and the quartile is calculated from this data and the one that follows it, using the following formula:

Five Number Summary

Where d represents the decimal part of the index, x i is the data with the index rounded down, and x i+1 is the next data point.

In the case of our example, this is the result of calculating the indices of the three quartiles:

Five Number Summary

Five Number Summary

Five Number Summary

In all cases the result was a decimal number, so now we apply the formula from case 2 to determine the value of each quartile:

Five Number Summary

Five Number Summary

Five Number Summary

Step 3: Identify the five numbers

Now that we have the data ordered and we have also determined the values ​​of the three quartiles, the summary of the five numbers is:

Minimum: 145
Q1: 152
Q2 or Median: 157
Q3: 162.25
Maximum: 170

Step 4: Construct the boxplot

We already have everything necessary to build the boxplot except for the RIC. Based on the result obtained in the previous step, the difference between Q3 and Q1 is:

Five Number Summary

To determine if there are outliers, we calculate Q1 – 1.5 IQR and Q3 + 1.5 IQR and compare with the minimum and maximum:

Five Number Summary

Five Number Summary

As we can see, there are no outliers since the minimum, 140, is greater than 136,625. There are also no outliers since the maximum, 170, is less than 177,625.

The following figure shows the result of building the box plot corresponding to the example:

Five Number Summary

References

How to assemble a five-number summary of a statistical sample . (nd). FaqSalex.info. https://faqsalex.info/educaci%C3%B3n/21361-c%C3%B3mo-reunir-a-un-resumen-de-cinco-n%C3%BAmeros-de-una.html

McAdams, D. (2009, March 4). Summary of five numbers. Life is a Story Problem.org. https://lifeisastoryproblem.tripod.com/en/f/fivenumbersummary.html

Serra, BR (2020, November 22). median . Universe Formulas. https://www.universoformulas.com/estadistica/descriptiva/mediana/#calculo

Serra, BR (2021, August 4). quartiles . Universe Formulas. https://www.universoformulas.com/estadistica/descriptiva/cuartiles/#example

Zentica Global. (nd). Brutalk – How to calculate the 5 number summary for your data in Python . Brutalk. https://www.brutalk.com/en/news/brutalk-blog/view/how-to-calculate-the-summary-of-5-numbers-for-your-data-in-python-6047097da7d56

Israel Parada (Licentiate,Professor ULA)
Israel Parada (Licentiate,Professor ULA)
(Licenciado en Química) - AUTOR. Profesor universitario de Química. Divulgador científico.

Artículos relacionados