Quartile

A more significant measure of dispersion is the quartile and related interquartile ranges. It also stands for quarterly percentile, which means that it is the value on the measurement scale below which 25, 50, 75, and 100 percent of the scores in the sorted dataset fall. The quartiles are three points that split the dataset into four groups, with each one containing one-fourth of the data. To illustrate this, suppose we have a dataset of 20 test scores that are ranked, as follows:

    In [27]: import random
             random.seed(100)
             testScores = [random.randint(0,100) for p in 
                           xrange(0,20)]
             testScores
    Out[27]: [14, 45, 77, 71, 73, 43, 80, 53, 8, 46, 4, 94, 95, 33, 31, 77, 20, 18, 19, 35]
    
    In [28]: #data needs to be sorted for quartiles
          sortedScores = np.sort(testScores) 
    In [30]: rankedScores = {i+1: sortedScores[i] for i in 
                             xrange(len(sortedScores))}
    
    In [31]: rankedScores
    Out[31]:
    {1: 4,
     2: 8,
     3: 14,
     4: 18,
     5: 19,
     6: 20,
     7: 31,
    8: 33,
     9: 35,
     10: 43,
     11: 45,
     12: 46,
     13: 53,
     14: 71,
     15: 73,
     16: 77,
     17: 77,
     18: 80,
     19: 94,
     20: 95}

The first quartile (Q1) lies between the fifth and sixth score, the second quartile (Q2) lies between the tenth and eleventh score, and the third quartile (Q3) lies between the fifteenth and sixteenth score. Thus, we get the following results by using linear interpolation and calculating the midpoint:

Q1 = (19+20)/2 = 19.5
Q2 = (43 + 45)/2 = 44
Q3 = (73 + 77)/2 = 75

To see this in IPython, we can use the scipy.stats or numpy.percentile packages:

    In [38]: from scipy.stats.mstats import mquantiles
             mquantiles(sortedScores)
    Out[38]: array([ 19.45,  44.  ,  75.2 ])
    
    In [40]: [np.percentile(sortedScores, perc) for perc in [25,50,75]]
    Out[40]: [19.75, 44.0, 74.0]

The reason why the values don't match exactly with our previous calculations is due to the different interpolation methods. The interquartile range is the first quartile subtracted from the third quartile (Q3 - Q1). It represents the middle 50 in a dataset.

For more information on statistical measures, refer to https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php.
For more details on the scipy.stats and numpy.percentile functions, see the documents at http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.mquantiles.html and http://docs.scipy.org/doc/nu mpy-dev/reference/generated/numpy.percentile.html.

Table of Contents for Quartile

Create new playlist

Sign In

Sign Up

Table of Contents for
Quartile