2.4 Numerical Measures of Variability

Measures of central tendency provide only a partial description of a quantitative data set. The description is incomplete without a measure of the variability, or spread, of the data set. Knowledge of the data set’s variability, along with knowledge of its center, can help us visualize the shape of the data set as well as its extreme values.

For example, suppose we want to compare response time to a stimulus for subjects treated with two different drugs, A and B. The histograms for the response times (in seconds) for each of the two drugs are shown in Figure 2.19. If you examine the two histograms, you’ll notice that both data sets are symmetric, with equal modes, medians, and means. However, Drug A (Figure 2.19a) has response times spread with almost equal relative frequency over the measurement classes, while Drug B (Figure 2.19b) has most of its response times clustered about its center. Thus, the response times for Drug B are less variable than those for Drug A. Consequently, you can see that we need a measure of variability as well as a measure of central tendency to describe a data set.

Figure 2.19

Response time histograms for two drugs

Perhaps the simplest measure of the variability of a quantitative data set is its range.

The range of a quantitative data set is equal to the largest measurement minus the smallest measurement.

The range is easy to compute and easy to understand, but it is a rather insensitive measure of data variation when the data sets are large. This is because two data sets can have the same range and be vastly different with respect to data variation. The phenomenon is demonstrated in Figure 2.19: Both distributions of data shown in the figure have the same range, but we already know that the response times for Drug B are much less variable than those for Drug A. Thus, you can see that the range does not always detect differences in data variation for large data sets.

Teaching Tip

To illustrate the drawback associated with the range, draw a picture of two distributions that have approximately the same range, but vastly different spreads in the data.

Table 2.5 Two Hypothetical Data Sets

Sample 1 Sample 2
Measurements 1, 2, 3, 4, 5 2, 3, 3, 3, 4
Mean x¯=1+2+3+4+55=155=3 x¯=2+3+3+3+45=155=3
Deviations of measurement values from x¯ (13),(23),(33),(43),(53) or2,1,0,1,2 (23),(33),(33),(33),(43)or1,0,0,0,1

Let’s see if we can find a measure of data variation that is more sensitive than the range. Consider the two samples in Table 2.5: Each has five measurements. (We have ordered the numbers for convenience.) Note that both samples have a mean of 3 and that we have calculated the distance and direction, or deviation, between each measurement and the mean. What information do these deviations contain? If they tend to be large in magnitude, as in sample 1, the data are spread out, or highly variable. If the deviations are mostly small, as in sample 2, the data are clustered around the mean, x¯, and therefore do not exhibit much variability. You can see that these deviations, displayed graphically in Figure 2.20, provide information about the variability of the sample measurements.

Figure 2.20

Dot plots for deviations in Table 2.5

The next step is to condense the information in these distances into a single numerical measure of variability. Averaging the deviations from x¯ won’t help because the negative and positive deviations cancel; that is, the sum of the deviations (and thus the average deviation) is always equal to zero.

Two methods come to mind for dealing with the fact that positive and negative deviations from the mean cancel. The first is to treat all the deviations as though they were positive, ignoring the sign of the negative values. We won’t pursue this line of thought because the resulting measure of variability (the mean of the absolute values of the deviations) presents analytical difficulties beyond the scope of this text. A second method of eliminating the minus signs associated with the deviations is to square them. The quantity we can calculate from the squared deviations provides a meaningful description of the variability of a data set and presents fewer analytical difficulties in making inferences.

To use the squared deviations calculated from a data set, we first calculate the sample variance.

The sample variance for a sample of n measurements is equal to the sum of the squared deviations from the mean, divided by (n1). The symbol s2 is used to represent the sample variance.

Formula for the Sample Variance:

[&s^{2}|=|*frac*{~SA~[C]*sum*{n}{i|=|1}|pbo|x_{i}|-|*orule*{x}|pbc|^{2}}{n|-|1} &]

s2=i=1n(xix¯)2n1

Note: A shortcut formula for calculating s2 is

[&s^{2}|=|*hfrac*{~SA~[C]*sum*{n}{i|=|1}x^{2}_{i}|-|*frac*{|pbo|*sum*{n}{i|=|1}x_{i}|pbc|^{2}}{n}}{n|-|1} &]

s2=i=1nxi2(i=1nxi)2nn1

Teaching Tip

Explain that the variance is used to calculate a measure of variation. The standard deviation will be used in the next section to interpret what this measure of variation represents.

Referring to the two samples in Table 2.5, you can calculate the variance for sample 1 as follows:

[&*AS*s^{2}*AP*|=|*frac*{|pbo|1|-|3|pbc|^{2}|+||pbo|2|-|3|pbc|^{2}|+||pbo|3|-|3|pbc|^{2}|+||pbo|4|-|3|pbc|^{2}|+||pbo|5|-|3|pbc|^{2}}{5|-|1} &]

[&*AS**AP*|=|*frac*{4|+|1|+|0|+|1|+|4}{4}|=|2.5 &]

s2=(13)2+(23)2+(33)2+(43)2+(53)251=4+1+0+1+44=2.5

The second step in finding a meaningful measure of data variability is to calculate the standard deviation of the data set.

The sample standard deviation, s, is defined as the positive square root of the sample variance, s2, or, mathematically,

[&s|=|*rad*{s^{2}} &]

s=s2

The population variance, denoted by the symbol σ2 (sigma squared), is the average of the squared deviations from the mean, μ, of the measurements on all units in the population, and σ (sigma) is the square root of this quantity.

Symbols for Variance and Standard Deviation

  • s2=Samplevariance

  • s=Samplestandarddeviation

  • σ2=Populationvariance

  • σ=Populationstandarddeviation

Teaching Tip

Let students know that the divisor question will become clearer when they learn more about estimating parameters with sampling distributions.

Notice that, unlike the variance, the standard deviation is expressed in the original units of measurement. For example, if the original measurements are in dollars, the variance is expressed in the peculiar units “dollars squared,” but the standard deviation is expressed in dollars.

You may wonder why we use the divisor (n1) instead of n when calculating the sample variance. Wouldn’t using n seem more logical, so that the sample variance would be the average squared distance from the mean? The trouble is, using n tends to produce an underestimate of the population variance σ2. So we use (n1) in the denominator to provide the appropriate correction for this tendency.* Since sample statistics such as s2 are used primarily to estimate population parameters such as σ2, (n1) is preferred to n in defining the sample variance.

Example 2.9 Computing Measures of Variation

Problem

  1. Calculate the variance and standard deviation of the following sample: 2, 3, 3, 3, 4.

Solution

  1. If you use the formula in the box to compute s2 and s, you first need to find x¯. From Table 2.6, we see that Σx=15. Thus, x¯=Σxn=155=3. Now, for each measurement, find (xx¯) and (xx¯)2, as shown.

    Table 2.6 Calculating s2

    x (xx¯) (xx¯)2
    2 1 1
    3   0 0
    3   0 0
    3   0 0
    4   1 1
    Σx=15 Σ(xx¯)2=2

    Then we use

    [&*AS*s^{2}*AP*|=|*frac*{|Sig||pbo|x|-|*orule*{x}|pbc|^{2}}{n|-|1}|=|*frac*{2}{5|-|1}|=|*frac*{2}{4}|=|.5 &]

    [&*AS*s*AP*|=|*rad*{.5}|=|.71 &]

    s2=Σ(xx¯)2n1=251=24=.5s=.5=.71

Look Back

As the sample size n increases, these calculations can become very tedious. As the next example shows, we can use the computer to find s2 and s.

Now Work Exercise 2.80a

EPAGAS Example 2.10 Finding Measures of Variation on a Printout

Problem

  1. Use the computer to find the sample variance s2 and the sample standard deviation s for the 100 gas mileage readings given in Table 2.2.

Solution

  1. The SAS printout describing the gas mileage data is reproduced in Figure 2.21. The variance and standard deviation, highlighted on the printout, are s2=5.85 and s=2.42 (rounded to two decimal places). The value s=2.42 represents a typical deviation of a gas mileage from the sample mean, x¯=36.99.

    Figure 2.21

    SAS numerical descriptive measures for 100 EPA mileages

You now know that the standard deviation measures the variability of a set of data, and you know how to calculate the standard deviation. The larger the standard deviation, the more variable the data are. The smaller the standard deviation, the less variation there is in the data. But how can we practically interpret the standard deviation and use it to make inferences? This is the topic of Section 2.5.

Exercises 2.73–2.92

Understanding the Principles

  1. 2.73 What is the range of a data set?

  2. 2.74 What is the primary disadvantage of using the range to compare the variability of data sets?

  3. 2.75 Describe the sample variance in words rather than with a formula. Do the same with the population variance.

  4. 2.76 Can the variance of a data set ever be negative? Explain. Can the variance ever be smaller than the standard deviation? Explain. 

  5. 2.77 If the standard deviation increases, does this imply that the data are more variable or less variable?

Learning the Mechanics

  1. 2.78 Calculate the variance and standard deviation for samples for which

    1. n=10, x2=84,x=20

    2. n=40, x2=380,x=100

    3. n=20, x2=18,x=17

  2. 2.79 Calculate the range, variance, and standard deviation for the following samples:

    1. 39, 42, 40, 37, 41

    2. 100, 4, 7, 96, 80, 3, 1, 10, 2

    3. 100, 4, 7, 30, 80, 30, 42, 2

  3. 2.80 Calculate the range, variance, and standard deviation for the following samples:

    1. 4, 2, 1, 0, 1

    2. 1, 6, 2, 2, 3, 0, 3

    3. 8, 2, 1, 3, 5, 4, 4, 1, 3

    4. 0, 2, 0, 0, 1, 1, 2, 1, 0, 1, 1, 1, 0, 3, 2, 1, 0, 1

  4. 2.81 Using only integers between 0 and 10, construct two data sets with at least 10 observations each such that the two sets have the same mean, but different variances. Construct dot plots for each of your data sets, and mark the mean of each data set on its dot plot.

  5. 2.82 Using only integers between 0 and 10, construct two data sets with at least 10 observations each such that the two sets have the same range, but different means. Construct a dot plot for each of your data sets, and mark the mean of each data set on its dot plot.

  6. 2.83 Consider the following sample of five measurements: 2, 1, 1, 0, 3.

    1. Calculate the range, s2, and s.

    2. Add 3 to each measurement and repeat part a.

    3. Subtract 4 from each measurement and repeat part a.

    4. Considering your answers to parts a, b, and c, what seems to be the effect on the variability of a data set of adding the same number to or subtracting the same number from each measurement?

  7. 2.84 Compute s2, and s for each of the data sets listed. Where appropriate, specify the units in which your answer is expressed.

    1. 3, 1, 10, 10, 4

    2. 8 feet, 10 feet, 32 feet, 5 feet

    3. 1, 4, 3, 1, 4, 4

    4. 1/5 ounce, 1/5 ounce, 1/5 ounce, 2/5 ounce, 1/5 ounce, 4/5 ounce

Applet Exercise 2.4

Use the applet entitled Standard Deviation to find the standard deviation of each of the four data sets listed in Exercise 2.80. For each data set, set the lower limit to a number less than all of the data, set the upper limit to a number greater than all of the data, and then click on Update. Click on the approximate location of each data item on the number line. You can get rid of a point by dragging it to the trash can. To clear the graph between data sets, simply click on the trash can.

    1. Compare the standard deviations generated by the applet with those you calculated by hand in Exercise 2.80. If there are differences, explain why the applet might give values slightly different from the hand-calculated values.

    2. Despite the fact that it provides a slightly different value of the standard deviation of a data set, describe some advantages of using the applet.

Applet Exercise 2.5

Use the applet Standard Deviation to study the effect that multiplying or dividing each number in a data set by the same number has on the standard deviation. Begin by setting appropriate limits and plotting the given data on the number line provided in the applet.

[&0|em|1|em|1|em|1|em|2|em|2|em|3|em|4 &]

01112234
    1. Record the standard deviation. Then multiply each data item by 2, plot the new data items, and record the standard deviation. Repeat the process, first multiplying each of the original data items by 3 and then by 4. Describe what happens to the standard deviation as the data items are multiplied by ever higher numbers. Divide each standard deviation by the standard deviation of the original data set. Do you see a pattern? Explain.

    2. Divide each of the original data items by 2, plot the new data, and record the standard deviation. Repeat the process, first dividing each of the original data items by 3 and then by 4. Describe what happens to the standard deviation as the data items are divided by ever higher numbers. Divide each standard deviation by the standard deviation of the original data set. Do you see a pattern? Explain.

    3. Using your results from parts a and b, describe what happens to the standard deviation of a data set when each of the data items in the set is multiplied or divided by a fixed number n. Experiment by repeating parts a and b for other data sets if you need to.

Applet Exercise 2.6

Use the applet Standard Deviation to study the effect that an extreme value has on the standard deviation. Begin by setting appropriate limits and plotting the following data on the number line provided in the applet:

[&0|em|6|em|7|em|7|em|8|em|8|em|8|em|9|em|9|em|10 &]

06778889910
    1. Record the standard deviation. Replace the extreme value of 0 with 2, then 4, and then 6. Record the standard deviation each time. Describe what happens to the standard deviation as 0 is replaced by ever higher numbers.

    2. How would the standard deviation of the data set compare with the original standard deviation if the 0 were replaced by 16? Explain.

Applying the Concepts—Basic

  1. TURTLES 2.85 Shell lengths of sea turtles. Aquatic Biology (Vol. 9, 2010) reported on a study of green sea turtles inhabiting the Grand Cayman South Sound lagoon. The data on curved carapace (shell) length (in centimeters) for 76 captured turtles are saved in the TURTLES file. Descriptive statistics for the data are shown on the accompanying MINITAB printout.

    1. Locate the range of the shell lengths on the printout.

    2. Locate the variance of the shell lengths on the printout.

    3. Locate the standard deviation of the shell lengths on the printout.

    4. If the target of your interest is these specific 76 captured turtles, what symbols would you use to represent the variance and standard deviation?

  2. SHAFTS 2.86 Shaft graves in ancient Greece. Refer to the American Journal of Archaeology (Jan. 2014) study of sword shaft graves in ancient Greece, Exercise 2.60 (p. 61). The number of sword shafts buried at each of 13 recently discovered grave sites is reproduced in the following table.

    Alternate View
    1 2 3 1 5 6 2 4 1 2 4 2 9

    Source: Harrell, K. “The fallen and their swords: A new explanation for the rise of the shaft graves.” American Journal of Archaeology, Vol. 118, No. 1, January 2014 (Figure 1).

    1. Calculate the range of the sample data.

    2. Calculate the variance of the sample data.

    3. Calculate the standard deviation of the sample data.

    4. Which of the measures of variation computed in parts ac have the same units of measure (number of sword shafts) as the original variable?

  3. PAI 2.87 Music performance anxiety. Refer to the British Journal of Music Education (Mar. 2014) study of music performance anxiety, Exercise 2.62 (p. 61). Scores (measured in points) on the Performance Anxiety Inventory (PAI) scale for participants in eight different studies are reproduced in the table.

    Alternate View
    54 42 51 39 41 43 55 40

    Source: Patston, T. “Teaching stage fright? Implications for music educators.” British Journal of Music Education, Vol. 31, No. 1, Mar. 2014 (adapted from Figure 1).

    1. Find the variance of the sample PAI scores. Give the units of measurement for the variance.

    2. Find the standard deviation of the sample PAI scores. Give the units of measurement for the standard deviation.

Applying the Concepts—Intermediate

  1. ROCKS 2.88 Characteristics of a rockfall. Refer to the Environmental Geology (Vol. 58, 2009) study of how far a block from a collapsing rock wall will bounce, Exercise 2.61 (p. 61). The rebound lengths (meters) for a sample of 13 rock bounces are reproduced in the table below.

    Alternate View
    10.94 13.71 11.38 7.26 17.83 11.92 11.87
     5.44 13.35  4.90 5.85 5.10  6.77

    Based on Paronuzzi, P. “Rockfall-induced block propagation on a soil slope, northern Italy.” Environmental Geology, Vol. 58, 2009 (Table 2).

    1. Compute the range of the 13 rebound lengths. Give the units of measurement of the range.

    2. Compute the variance of the 13 rebound lengths. Give the units of measurement of the variance.

    3. Compute the standard deviation of the 13 rebound lengths. Give the units of measurement of the standard deviation.

  2. COUGH 2.89 Is honey a cough remedy? Refer to the Archives of Pediatrics and Adolescent Medicine (Dec. 2007) study of honey as a remedy for coughing, Exercise 2.64 (p. 62). The coughing improvement scores for the patients in the over-the-counter cough medicine dosage (DM) group, honey dosage group, and control group are reproduced in the accompanying table.

    Alternate View
    Honey 12 11 15 11 10 13 10  4 15 16  9
     Dosage: 14 10  6 10  8 11 12 12  8 12  9
    11 15 10 15  9 13  8 12 10  8  9
     5 12
    DM  4  6  9  4  7  7  7  9 12 10 11
     Dosage:  6  3  4  9 12  7  6  8 12 12  4
    12 13  7 10 13  9  4  4 10 15  9
    No Dosage  5  8  6  1  0  8 12  8  7  7  1
     (Control):  6  7  7 12  7  9  7  9  5 11  9
     5  6  8 86  7 10  9  4  8  7  3
     1  4  3

    Based on Paul, I. M., et al. “Effect of honey, dextromethorphan, and no treatment on nocturnal cough and sleep quality for coughing children and their parents.” Archives of Pediatrics and Adolescent Medicine, Vol. 161, No. 12, Dec. 2007 (data simulated).

    1. Find the standard deviation of the improvement scores for the honey dosage group.

    2. Find the standard deviation of the improvement scores for the DM dosage group.

    3. Find the standard deviation of the improvement scores for the control group.

    4. Based on the results, parts a–c, which group appears to have the most variability in coughing improvement scores? The least variability?

  3. SAND 2.90 Permeability of sandstone during weathering. Refer to the Geographical Analysis (Vol. 42, 2010) study of the decay properties of sandstone when exposed to the weather, Exercise 2.69 (p. 63). Recall that slices of sandstone blocks were tested for permeability under three conditions: no exposure to any type of weathering (A), repeatedly sprayed with a 10% salt solution (B), and soaked in a 10% salt solution and dried (C). Measures of variation for the permeability measurements (mV) of each sandstone group are displayed in the accompanying MINITAB printout.

    1. Find the range of the permeability measurements for group A sandstone slices. Verify its value using the minimum and maximum values shown on the printout.

    2. Find the standard deviation of the permeability measurements for group A sandstone slices. Verify its value using the variance shown on the printout.

    3. Which condition (A, B, or C) has the more variable permeability data?

  4. MOLARS 2.91 Cheek teeth of extinct primates. Refer to the American Journal of Physical Anthropology (Vol. 142, 2010) study of the characteristics of cheek teeth (e.g., molars) in an extinct primate species, Exercise 2.65 (p. 62). The data on dentary depth of molars (in millimeters) for 18 cheek teeth extracted from skulls are reproduced in the table.

    Data on Dentary Depth (mm) of Molars
    18.12 15.76 13.25
    19.48 17.00 16.12
    19.36 13.96 18.13
    15.94 16.55 14.02
    15.83 15.70 14.04
    19.70 17.83 16.20

    Based on Boyer, D. M., Evans, A. R., and Jernvall, J. “Evidence of dietary differentiation among late Paleocene–early Eocene Plesiadapids (Mammalia, primates),” American Journal of Physical Anthropology, Vol. 142, 2010 (Table A3).

    1. Find the range of the data set. If the largest depth measurement in the sample were doubled, how would the range change? Would it increase or decrease?

    2. Find the variance of the data set. If the largest depth measurement in the sample were doubled, how would the variance change? Would it increase or decrease?

    3. Find the standard deviation of the data set. If the largest depth measurement in the sample were doubled, how would the standard deviation change? Would it increase or decrease?

  5. NUC 2.92 Active nuclear power plants. Refer to Exercise 2.72 (p. 64) and the U.S. Energy Information Administration’s data on the number of nuclear power plants operating in each of 20 states.

    1. Find the range, variance, and standard deviation of this data set.

    2. Eliminate the largest value from the data set and repeat part a. What effect does dropping this measurement have on the measures of variation found in part a?

    3. Eliminate the smallest and largest value from the data set and repeat part a. What effect does dropping both of these measurements have on the measures of variation found in part a?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset