Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2.3 Numerical Measures of Central Tendency

When we speak of a data set, we refer to either a sample or a population. If statistical inference is our goal, we’ll ultimately wish to use sample numerical descriptive measures to make inferences about the corresponding measures for a population.

As you’ll see, a large number of numerical methods are available to describe quantitative data sets. Most of these methods measure one of two data characteristics:

The central tendency of the set of measurements—that is, the tendency of the data to cluster, or center, about certain numerical values. (See Figure 2.14a.)

Figure 2.14

Numerical descriptive measures
The variability of the set of measurements—that is, the spread of the data. (See Figure 2.14b.)

In this section, we concentrate on measures of central tendency. In the next section, we discuss measures of variability.

The most popular and best understood measure of central tendency for a quantitative data set is the arithmetic mean (or simply the mean) of the data set.

The mean of a set of quantitative data is the sum of the measurements, divided by the number of measurements contained in the data set.

Teaching Tip

In calculating a population mean, the denominator is the population size N.

In everyday terms, the mean is the average value of the data set and is often used to represent a “typical” value. We denote the mean of a sample of measurements by $\bar{x}$ $\bar{x}$ (read “x-bar”) and represent the formula for its calculation as shown in the following box:

Formula for a Sample Mean

[&*orule*{x}|=|*frac*{~SA~[C]*sum*{n}{i|=|1}x_{i}}{n} &]

\bar{x} = \frac{\sum_{i = 1}^{n} x_{i}}{n}

$\bar{x} = \frac{\sum_{i = 1}^{n} x_{i}}{n}$

[Note: $\sum_{i = 1}^{n} x_{i} = (x_{1} + x_{2} + \dots + x_{n})$ $\sum_{i = 1}^{n} x_{i} = (x_{1} + x_{2} + \dots + x_{n})$ . For more details and examples on this summation notation, see Appendix A.]

Example 2.3 Computing the Sample Mean

Problem

Calculate the mean of the following five sample measurements: 5, 3, 8, 5, 6.

Solution

Using the definition of sample mean and the summation notation, we find that

[&*orule*{x}|=|*frac*{~SA~[C]*sum*{5}{i|=|1}x_{i}}{5}|=|*frac*{5|+|3|+|8|+|5|+|6}{5}|=|*frac*{27}{5}|=|5.4 &]
$\bar{x} = \frac{\sum_{i = 1}^{5} x_{i}}{5} = \frac{5 + 3 + 8 + 5 + 6}{5} = \frac{27}{5} = 5.4$ $\bar{x} = \frac{\sum_{i = 1}^{5} x_{i}}{5} = \frac{5 + 3 + 8 + 5 + 6}{5} = \frac{27}{5} = 5.4$

Thus, the mean of this sample is 5.4.

Look Back

There is no specific rule for rounding when calculating $\bar{x}$ $\bar{x}$ because $\bar{x}$ $\bar{x}$ is specifically defined to be the sum of all measurements, divided by n; that is, it is a specific fraction. When $\bar{x}$ $\bar{x}$ is used for descriptive purposes, it is often convenient to round the calculated value of $\bar{x}$ $\bar{x}$ to the number of significant figures used for the original measurements. When $\bar{x}$ $\bar{x}$ is to be used in other calculations, however, it may be necessary to retain more significant figures.

Now Work Exercise 2.57

EPAGAS Example 2.4 Finding the Mean on a Printout—Mean Gas Mileage

Problem

Calculate the sample mean for the 100 EPA mileages given in Table 2.2.

Solution

The mean gas mileage for the 100 cars is denoted

[&*orule*{x}|=|*frac*{~SA~[C]*sum*{100}{i|=|1}x_{i}}{100} &]
$\bar{x} = \frac{\sum_{i = 1}^{100} x_{i}}{100}$ $\bar{x} = \frac{\sum_{i = 1}^{100} x_{i}}{100}$

Rather than compute $\bar{x}$ $\bar{x}$ by hand (or even with a calculator), we employed SAS to compute the mean. The SAS printout is shown in Figure 2.15. The sample mean, highlighted on the printout, is $\bar{x} = 36.9940$ $\bar{x} = 36.9940$ .

Figure 2.15

SAS numerical descriptive measures for 100 EPA gas mileages

Look Back

Given this information, you can visualize a distribution of gas mileage readings centered in the vicinity of $\bar{x} \approx 37$ $\bar{x} \approx 37$ . An examination of the relative frequency histogram (Figure 2.10) confirms that $\bar{x}$ $\bar{x}$ does in fact fall near the center of the distribution.

The sample mean $\bar{x}$ $\bar{x}$ will play an important role in accomplishing our objective of making inferences about populations on the basis of information about the sample. For this reason, we need to use a different symbol for the mean of a population—the mean of the set of measurements on every unit in the population. We use the Greek letter $μ$ $μ$ (mu) for the population mean.

Symbols for the Sample Mean and the Population Mean

In this text, we adopt a general policy of using Greek letters to represent numerical descriptive measures of the population and Roman letters to represent corresponding descriptive measures of the sample. The symbols for the mean are

[&*orule*{x}|=|~rom~Sample mean|em||em||em||mu||=|Population mean~normal~ &]

\begin{array}{l} \bar{x} = Sample mean & μ = Population mean \end{array}

$\begin{array}{l} \bar{x} = Sample mean & μ = Population mean \end{array}$

Teaching Tip

Explain that Greek letters are used to represent population values throughout the text.

Teaching Tip

Average, mean, and expected value are all terms that are used to represent the same descriptive measure.

We’ll often use the sample mean $\bar{x}$ $\bar{x}$ to estimate (make an inference about) the population mean $μ$ $μ$ . For example, the EPA mileages for the population consisting of all cars has a mean equal to some value $μ .$ $μ .$ Our sample of 100 cars yielded mileages with a mean of $\bar{x} = 36.9940$ $\bar{x} = 36.9940$ . If, as is usually the case, we don’t have access to the measurements for the entire population, we could use $\bar{x}$ $\bar{x}$ as an estimator or approximator for $μ .$ $μ .$ Then we’d need to know something about the reliability of our inference. That is, we’d need to know how accurately we might expect $\bar{x}$ $\bar{x}$ to estimate $μ .$ $μ .$ In Chapter 7, we’ll find that this accuracy depends on two factors:

The size of the sample. The larger the sample, the more accurate the estimate will tend to be.
The variability, or spread, of the data. All other factors remaining constant, the more variable the data, the less accurate is the estimate.

Teaching Tip

Look ahead to sampling distributions to plant the idea that measures of center and spread will be used together to generate estimates of population values.

Another important measure of central tendency is the median.

The median of a quantitative data set is the middle number when the measurements are arranged in ascending (or descending) order.

The median is of most value in describing large data sets. If a data set is characterized by a relative frequency histogram (Figure 2.16), the median is the point on the x-axis such that half the area under the histogram lies above the median and half lies below. [Note: In Section 2.2, we observed that the relative frequency associated with a particular interval on the x-axis is proportional to the amount of area under the histogram that lies above the interval.] We denote the median of a sample by M. Like with the population mean, we use a Greek letter $(η)$ $(η)$ to represent the population median.

Calculating a Sample Median `M`

Arrange the n measurements from the smallest to the largest.

If n is odd, M is the middle number.
If n is even, M is the mean of the middle two numbers.

Teaching Tip

Remind students to order the data before calculating a value for the median.

Symbols for the Sample and Population Median

[&*AS*M*AP*|=|~rom~Sample median~norm~ &]

[&*AS*|eta|*AP*|=|~rom~Population median~norm~ &]

\begin{array}{l} M & = & Sample median \\ η & = & Population median \end{array}

$\begin{array}{l} M & = & Sample median \\ η & = & Population median \end{array}$

Example 2.5 Computing the Median

Problem

Consider the following sample of $n = 7$ $n = 7$ measurements: 5, 7, 4, 5, 20, 6, 2.
1. Calculate the median M of this sample.
2. Eliminate the last measurement (the 2), and calculate the median of the remaining $n = 6$ $n = 6$ measurements.

Solution

The seven measurements in the sample are ranked in ascending order: 2, 4, 5, 5, 6, 7, 20. Because the number of measurements is odd, the median is the middle measurement. Thus, the median of this sample is $M = 5$ $M = 5$ .
After removing the 2 from the set of measurements, we rank the sample measurements in ascending order as follows: 4, 5, 5, 6, 7, 20. Now the number of measurements is even, so we average the middle two measurements. The median is $M = (5 + 6) / 2 = 5.5$ $M = (5 + 6) / 2 = 5.5$ .

Look Back

When the sample size n is even (as in part b), exactly half of the measurements will fall below the calculated median M. However, when n is odd (as in part a), the percentage of measurements that fall below M is approximately 50%. The approximation improves as n increases.

Now Work Exercise 2.55

In certain situations, the median may be a better measure of central tendency than the mean. In particular, the median is less sensitive than the mean to extremely large or small measurements. Note, for instance, that all but one of the measurements in part a of Example 2.5 are close to $x = 5$ $x = 5$ . The single relatively large measurement, $x = 20$ $x = 20$ , does not affect the value of the median, 5, but it causes the mean, $\bar{x} = 7$ $\bar{x} = 7$ , to lie to the right of most of the measurements.

As another example of data for which the central tendency is better described by the median than the mean, consider the household incomes of a community being studied by a sociologist. The presence of just a few households with very high incomes will affect the mean more than the median. Thus, the median will provide a more accurate picture of the typical income for the community. The mean could exceed the vast majority of the sample measurements (household incomes), making it a misleading measure of central tendency.

EPAGAS Example 2.6 Finding the Median on a Printout—Median Gas Mileage

Problem

Calculate the median for the 100 EPA mileages given in Table 2.2. Compare the median with the mean computed in Example 2.4.

Solution

For this large data set, we again resort to a computer analysis. The median is highlighted on the SAS printout displayed in Figure 2.15 (p. 55). You can see that the median is 37.0. Thus, half of the 100 mileages in the data set fall below 37.0 and half lie above 37.0. Note that the median, 37.0, and the mean, 36.9940, are almost equal, a relationship that indicates a lack of skewness in the data. In other words, the data exhibit a tendency to have as many measurements in the left tail of the distribution as in the right tail. (Recall the histogram of Figure 2.10.)

$M = 37$ $M = 37$

Look Back

In general, extreme values (large or small) affect the mean more than the median, since these values are used explicitly in the calculation of the mean. The median is not affected directly by extreme measurements, since only the middle measurement (or two middle measurements) is explicitly used to calculate the median. Consequently, if measurements are pulled toward one end of the distribution, the mean will shift toward that tail more than the median will.

Teaching Tip

Explain the median as the point on the graph that has 50% of the data below it and 50% of the data above it. Explain the mean as the point in the distribution that would balance the graph if it could be placed on your finger.

A data set is said to be skewed if one tail of the distribution has more extreme observations than the other tail.

A comparison of the mean and the median gives us a general method for detecting skewness in data sets, as shown in the next box. With rightward skewed data, the right tail (high end) of the distribution has more extreme observations. These few, but large, measurements tend to pull the mean away from the median toward the right; that is, rightward skewness typically indicates that the mean is greater than the median. Conversely, with leftward skewed data, the left tail (low end) of the distribution has more extreme observations. These few, but small, measurements also tend to pull the mean away from the median, but toward the left; consequently, leftward skewness typically implies that the mean is smaller than the median.

Teaching Tip

Use a numerical example with one or two extreme values to show how those values affect the value of the mean and how they have no effect on the median.

Detecting Skewness by Comparing the Mean and the Median

If the data set is skewed to the right, then typically the median is less than the mean.

If the data set is symmetric, then the mean equals the median.

If the data set is skewed to the left, then typically the mean is less than the median.

Teaching Tip

Explain that, in skewed distributions, the median is the preferred measure of center because the mean is affected by the extreme values while the median is not.

Now Work Exercise 2.54

A third measure of central tendency is the mode of a set of measurements.

Teaching Tip

Show that the mode is the only measure of center that has to be an actual data value in the sample.

The mode is the measurement that occurs most frequently in the data set.

Therefore, the mode shows where the data tend to concentrate.

Example 2.7 Finding the Mode

Problem

Each of 10 taste testers rated a new brand of barbecue sauce on a 10-point scale, where $1 =$ $1 =$ awful and $10 =$ $10 =$ excellent. Find the mode for the following 10 ratings:

[&8|em|7|em|9|em|6|em|8|em|10|em|9|em|9|em|5|em|7 &]
$\begin{array}{l} 8 & 7 & 9 & 6 & 8 & 10 & 9 & 9 & 5 & 7 \end{array}$ $\begin{array}{l} 8 & 7 & 9 & 6 & 8 & 10 & 9 & 9 & 5 & 7 \end{array}$

Solution

Since 9 occurs most often (three times), the mode of the ten taste ratings is 9.

Look Back

Note that the data are actually qualitative in nature (e.g., “awful,” “excellent”). The mode is particularly useful for describing qualitative data. The modal category is simply the category (or class) that occurs most often.

Now Work Exercise 2.56

Because it emphasizes data concentration, the mode is also used with quantitative data sets to locate the region in which much of the data is concentrated. A retailer of men’s clothing would be interested in the modal neck size and sleeve length of potential customers. The modal income class of the laborers in the United States is of interest to the U.S. Department of Labor.

Teaching Tip

Present an example that has two modes (is bimodal), and explain that no mode exists when all data values appear just once.

For some quantitative data sets, the mode may not be very meaningful. For example, consider the EPA mileage ratings in Table 2.2. A reexamination of the data reveals that the gas mileage of 37.0 occurs most often (four times). However, the mode of 37.0 is not particularly useful as a measure of central tendency.

A more meaningful measure can be obtained from a relative frequency histogram for quantitative data. The measurement class containing the largest relative frequency is called the modal class. Several definitions exist for locating the position of the mode within a modal class, but the simplest is to define the mode as the midpoint of the modal class. For example, examine the frequency histogram for the EPA mileage ratings in Figure 2.10 (p. 45). You can see that the modal class is the interval 37–38. The mode (the midpoint) is 37.5 This modal class (and the mode itself) identifies the area in which the data are most concentrated and, in that sense, is a measure of central tendency. However, for most applications involving quantitative data, the mean and median provide more descriptive information than the mode.

QUAKE Example 2.8 Comparing the Mean, Median, and Mode—Earthquake Aftershocks

Problem

Seismologists use the term “aftershock” to describe the smaller earthquakes that follow a main earthquake. Following the Northridge earthquake, the Los Angeles area experienced a record 2,929 aftershocks in a three-week period. The magnitudes (measured on the Richter scale) of these aftershocks as well as their interarrival times (in minutes) were recorded by the U.S. Geological Survey. (The data are saved in the QUAKE file.) Today seismologists continue to use these data to model future earthquake characteristics. Find and interpret the mean, median, and mode for both of these variables. Which measure of central tendency is better for describing the magnitude distribution? The distribution of interarrival times?

Solution

Measures of central tendency for the two variables, magnitude and interarrival time, were produced using MINITAB. The means, medians, and modes are displayed in Figure 2.17.

Figure 2.17

MINITAB descriptive statistics for earthquake data

Teaching Tip

Review the relationship the mean, median, and mode have in both symmetric and skewed distributions

Magnitude: All 3

Interarrival Times: Median

For magnitude, the mean, median, and mode are 2.12, 2.00, and 1.8, respectively, on the Richter scale. The average magnitude is 2.12; half the magnitudes fall below 2.0; and the most commonly occurring magnitude is 1.8. These values are nearly identical, with the mean slightly larger than the median. This implies a slight rightward skewness in the data, which is shown graphically in the MINITAB histogram for magnitude displayed in Figure 2.18a. Because the distribution is nearly symmetric, any of the three measures would be adequate for describing the “center” of the earthquake aftershock magnitude distribution.

The mean, median, and mode of the interarrival times of the aftershocks are 9.77, 6.0, and 2.0 minutes, respectively. On average, the aftershocks arrive 9.77 minutes apart; half the aftershocks have interarrival times below 6.0 minutes; and the most commonly occurring interarrival time is 2.0 minutes. Note that the mean is much larger than the median, implying that the distribution of interarrival times is highly skewed to the right. This extreme rightward skewness is shown graphically in the histogram, in Figure 2.18b. The skewness is due to several exceptionally large interarrival times. Consequently, we would probably want to use the median of 6.0 minutes as the “typical” interarrival time for the aftershocks. You can see that the mode of 2.0 minutes is not very descriptive of the “center” of the interarrival time distribution.

Figure 2.18a

MINITAB Histogram for Magnitudes of Aftershocks

Figure 2.18b

MINITAB Histogram for Inter-Arrival Times of Aftershocks

Look Back

The choice of which measure of central tendency to use will depend on the properties of the data set analyzed and the application of interest. Consequently, it is vital that you understand how the mean, median, and mode are computed.

Now Work Exercise 2.68

Exercises 2.49–2.72

Understanding the Principles

2.49 Give three different measures of central tendency.
2.50 Explain the difference between a measure of central tendency and a measure of variability.
2.51 What is the symbol used to represent the sample mean? The population mean?

$\bar{x}; μ$ $\bar{x}; μ$
2.52 Explain the concept of a skewed distribution.
2.53 What two factors affect the accuracy of the sample mean as an estimate of the population mean?

n; Variation
2.54 Describe how the mean compares with the median for a distribution as follows:
1. Skewed to the left
  
  $Mean < Median$ $Mean < Median$
2. Skewed to the right
  
  $Mean > Median$ $Mean > Median$
3. Symmetric
  
  $Mean = Median$ $Mean = Median$

Learning the Mechanics

L02055 2.55 Calculate the mean and median of the following grade point averages:

2.72; 2.65

[&3.2|em||en|2.5|em||en|2.1|em||en|3.7|em||en|2.8|em||en|2.0 &]
$\begin{array}{l} 3.2 & 2.5 & 2.1 & 3.7 & 2.8 & 2.0 \end{array}$ $\begin{array}{l} 3.2 & 2.5 & 2.1 & 3.7 & 2.8 & 2.0 \end{array}$
L02056 2.56 Calculate the mode, mean, and median of the following data:

14.55; 15

[&18|em|10|em|15|em|13|em|17|em|15|em|12|em|15|em|18|em|16|em|11 &]
$\begin{array}{l} 18 & 10 & 15 & 13 & 17 & 15 & 12 & 15 & 18 & 16 & 11 \end{array}$ $\begin{array}{l} 18 & 10 & 15 & 13 & 17 & 15 & 12 & 15 & 18 & 16 & 11 \end{array}$
2.57 Calculate the mean for samples for which
1. $n = 10, \sum x = 85$ $n = 10, \sum x = 85$
  
  8.5
2. $n = 16, \sum x = 400$ $n = 16, \sum x = 400$
  
  25
3. $n = 45, \sum x = 35$ $n = 45, \sum x = 35$
  
  .78
4. $n = 18, \sum x = 242$ $n = 18, \sum x = 242$
  
  13.44
2.58 Construct one data set consisting of five measurements, and another consisting of six measurements, for which the medians are equal.
2.59 Calculate the mean, median, and mode for each of the following samples:
1. $7, - 2, 3, 3, 0, 4$ $7, - 2, 3, 3, 0, 4$
  
  2.5, 3, 3
2. 2, 3, 5, 3, 2, 3, 4, 3, 5, 1, 2, 3, 4
  
  3.08, 3, 3
3. 51, 50, 47, 50, 48, 41, 59, 68, 45, 37
  
  49.6, 49, 50

Applet Exercise 2.1

Use the applet entitled Mean versus Median to find the mean and median of each of the three data sets presented in Exercise 2.59. For each data set, set the lower limit to a number less than all of the data, set the upper limit to a number greater than all of the data, and then click on Update. Click on the approximate location of each data item on the number line. You can get rid of a point by dragging it to the trash can. To clear the graph between data sets, simply click on the trash can.

1. Compare the means and medians generated by the applet with those you calculated by hand in Exercise 2.59. If there are differences, explain why the applet might give values slightly different from the hand-calculated values.
2. Despite providing only approximate values of the mean and median of a data set, describe some advantages of using the applet to find those values.

Applet Exercise 2.2

Use the applet Mean versus Median to illustrate your descriptions in Exercise 2.54. For each part a, b, and c, create a data set with 10 items that has the given property. Using the applet, verify that the mean and median have the relationship you described in Exercise 2.54.

Applet Exercise 2.3

Use the applet Mean versus Median to study the effect that an extreme value has on the difference between the mean and median. Begin by setting appropriate limits and plotting the following data on the number line provided in the applet:

[&0|em|6|em|7|em|7|em|8|em|8|em|8|em|9|em|9|em|10 &]

\begin{array}{l} 0 & 6 & 7 & 7 & 8 & 8 & 8 & 9 & 9 & 10 \end{array}

$\begin{array}{l} 0 & 6 & 7 & 7 & 8 & 8 & 8 & 9 & 9 & 10 \end{array}$

1. Describe the shape of the distribution and record the value of the mean and median. On the basis of the shape of the distribution, do the mean and median have the relationship that you would expect?
2. Replace the extreme value of 0 with 2, then 4, and then 6. Record the mean and median each time. Describe what is happening to the mean as 0 is replaced, in turn, by the higher numbers stated. What is happening to the median? How is the difference between the mean and the median changing?
3. Now replace 0 with 8. What values does the applet give you for the mean and the median? Explain why the mean and the median should now be the same.

Applying the Concepts—Basic

SHAFTS 2.60 Shaft graves in ancient Greece. Refer to the American Journal of Archaeology (Jan. 2014) study of sword shaft graves in ancient Greece, Exercise 2.37 (p. 50). The number of sword shafts buried at each of 13 recently discovered grave sites is reproduced in the following table.

Alternate View

1 2 3 1 5 6 2 4 1 2 4 2 9

Source: Harrell, K. “The fallen and their swords: A new explanation for the rise of the shaft graves.” American Journal of Archaeology, Vol. 118, No. 1, January 2014 (Figure 1).
1. Calculate the mean of the data. Interpret the result.
2. Calculate the median of the data. Interpret the result.
3. Find the mode of the data. Interpret the result.
  
  2
ROCKS 2.61 Characteristics of a rock fall. In Environmental Geology (Vol. 58, 2009) computer simulation was employed to estimate how far a block from a collapsing rock wall will bounce—called rebound length—down a soil slope. Based on the depth, location, and angle of block-soil impact marks left on the slope from an actual rockfall, the following 13 rebound lengths (in meters) were estimated. Compute the mean and median of the rebound lengths and interpret these values.

9.72; 10.94

Alternate View

10.94 13.71 11.38 7.26 17.83 11.92 11.87 5.44 13.35 4.90 5.85 5.10 6.77

Based on Paronuzzi, P. “Rockfall-induced block propagation on a soil slope, northern Italy,” Environmental Geology, Vol. 58, 2009 (Table 2).
PAI 2.62 Music performance anxiety. Refer to the British Journal of Music Education (Mar. 2014) study of music performance anxiety, Exercise 2.39 (p. 50). Scores on the Performance Anxiety Inventory (PAI) scale for participants in eight different studies are reproduced in the table.

Alternate View

54 42 51 39 41 43 55 40

Source: Patston, T. “Teaching stage fright? Implications for music educators.” British Journal of Music Education, Vol. 31, No. 1, Mar. 2014 (adapted from Figure 1).
1. Find and interpret the mean of the PAI scores.
  
  45.6
2. Find and interpret the median of the PAI scores.
  
  42.5
3. Suppose the PAI score of 39 results from a study involving development-delayed children; consequently, the researchers eliminated this data value from the analysis. What impact does this have on the value of the mean? The median?
  
  Increase; Slight Increase
SUSTAIN 2.63 Corporate sustainability of CPA firms. Refer to the Business and Society (Mar. 2011) study on the sustainability behaviors of CPA corporations, Exercise 2.36 (p. 50). Recall that the level of support for corporate sustainability (measured on a quantitative scale ranging from 0 to 160 points) was obtained for each of 992 senior managers at CPA firms. Numerical measures of central tendency for level of support are shown in the accompanying MINITAB printout.

MINITAB Output for Exercise 2.63
1. Locate the mean on the printout. Comment on the accuracy of the statement: “On average, the level of support for corporate sustainability for the 992 senior managers is 67.76 points.”
  
  Accurate
2. Locate the median on the printout. Comment on the accuracy of the statement: “Half of the 992 senior managers reported a level of support for corporate sustainability below 68 points.”
  
  Accurate
3. Locate the mode on the printout. Comment on the accuracy of the statement: “Most of the 992 senior managers reported a level of support for corporate sustainability below 64 points.”
  
  Inaccurate
4. Based on the values of the measures of central tendency, make a statement about the type of skewness (if any) that exists in the distribution of 992 support levels. Check your answer by examining the histogram shown in Exercise 2.36.
  
  Little Skewness

COUGH 2.64 Is honey a cough remedy? Refer to the Archives of Pediatrics and Adolescent Medicine (Dec. 2007) study of honey as a remedy for coughing, Exercise 2.40 (p. 51). Recall that the 105 ill children in the sample were randomly divided into three groups: those who received a dosage of an over-the-counter cough medicine (DM), those who received a dosage of honey (H), and those who received no dosage (control group). The coughing improvement scores for the patients are reproduced in the table below.

Alternate View

Honey 12 11 15 11 10 13 10 4 15 16 9

Dosage: 14 10 6 10 8 11 12 12 8 12 9

11 15 10 15 9 13 8 12 10 8 9

5 12

DM 4 6 9 4 7 7 7 9 12 10 11

Dosage: 6 3 4 9 12 7 6 8 12 12 4

12 13 7 10 13 9 4 4 10 15 9

No Dosage 5 8 6 1 0 8 12 8 7 7 1

(Control): 6 7 7 12 7 9 7 9 5 11 9

5 6 8 8 6 7 10 9 4 8 7

3 1 4 3

Based on Paul, I. M., et al. “Effect of honey, dextromethorphan, and no treatment on nocturnal cough and sleep quality for coughing children and their parents.” Archives of Pediatrics and Adolescent Medicine, Vol. 161, No. 12, Dec. 2007 (data simulated).

Find the median improvement score for the honey dosage group.

11
Find the median improvement score for the DM dosage group.

9
Find the median improvement score for the control group.

7
Based on the results, parts a–c, what conclusions can pediatric researchers draw? (We show how to support these conclusions with a measure of reliability in subsequent chapters.)

Applying the Concepts—Intermediate

MOLARS 2.65 Cheek teeth of extinct primates. Refer to the American Journal of Physical Anthropology (Vol. 142, 2010) study of the characteristics of cheek teeth (e.g., molars) in an extinct primate species, mode, mean, and median of the following data:Exercise 2.38 (p. 50). The data on dentary depth of molars (in millimeters) for 18 cheek teeth extracted from skulls are reproduced below.

Data on Dentary Depth (mm) of Molars

18.12 16.55

19.48 15.70

19.36 17.83

15.94 13.25

15.83 16.12

19.70 18.13

15.76 14.02

17.00 14.04

13.96 16.20

Based on Boyer, D. M., Evans, A. R., and Jernvall, J. “Evidence of dietary differentiation among late Paleocene–early Eocene Plesiadapids (Mammalia, primates).” American Journal of Physical Anthropology, Vol. 142, © 2010 (Table A3).
1. Find and interpret the mean of the data set. If the largest depth measurement in the sample were doubled, how would the mean change? Would it increase or decrease?
  
  16.5; increase
2. Find and interpret the median of the data set. If the largest depth measurement in the sample were doubled, how would the median change? Would it increase or decrease?
  
  16.16; no change
3. Note that there is no single measurement that occurs more than once. How does this fact impact the mode?
  
  No mode

Data on Dentary Depth (mm) of Molars
18.12	16.55
19.48	15.70
19.36	17.83
15.94	13.25
15.83	16.12
19.70	18.13
15.76	14.02
17.00	14.04
13.96	16.20

PGA 2.66 Ranking driving performance of professional golfers. A group of Northeastern University researchers developed a new method for ranking the total driving performance of golfers on the Professional Golf Association (PGA) tour (The Sport Journal, Winter 2007). The method requires knowing a golfer’s average driving distance (yards) and driving accuracy (percent of drives that land in the fairway). The values of these two variables are used to compute a driving performance index. Data for the top 40 PGA golfers (ranked by the new method) are saved in the PGA file. The first five and last five observations are listed in the accompanying table.

Rank	Player	Driving Distance (yards)	Driving Accuracy (%)	Driving Performance Index
1	Woods	316.1	54.6	3.58
2	Perry	304.7	63.4	3.48
3	Gutschewski	310.5	57.9	3.27
4	Wetterich	311.7	56.6	3.18
5	Hearn	295.2	68.5	2.82
$⋮$ $⋮$	$⋮$ $⋮$	$⋮$ $⋮$	$⋮$ $⋮$	$⋮$ $⋮$
36	Senden	291	66	1.31
37	Mickelson	300	58.7	1.30
38	Watney	298.9	59.4	1.26
39	Trahan	295.8	61.8	1.23
40	Pappas	309.4	50.6	1.17

Based on Wiseman, F., et. al. “A new method for ranking total driving performance on the PGA Tour,” Sports Journal, Vol. 10, No. 1, Winter 2007 (Table 2).

Find the mean, median, and mode for the 40 driving performance index values.

1.93; 1.755; 1.4
Interpret each of the measures of central tendency calculated in part a.
Use the results from part a to make a statement about the type of skewness in the distribution of driving performance indexes. Support your statement with a graph.

Skewed Right

2.67 Symmetric or skewed? Would you expect the data sets that follow to possess relative frequency distributions that are symmetric, skewed to the right, or skewed to the left? Explain.
1. The salaries of all persons employed by a large university
  
  Skewed right
2. The grades on an easy test
  
  Skewed left
3. The grades on a difficult test
  
  Skewed right
4. The amounts of time students in your class studied last week
  
  Symmetric
5. The ages of automobiles on a used-car lot
6. The amounts of time spent by students on a difficult examination (maximum time is 50 minutes)
ANTS 2.68 Mongolian desert ants. The Journal of Biogeography (Dec. 2003) published an article on the first comprehensive study of ants in Mongolia (Central Asia). Botanists placed seed baits at 11 study sites and observed the ant species attracted to each site. Some of the data recorded at each study site are provided in the table below.
1. Find the mean, median, and mode for the number of ant species discovered at the 11 sites. Interpret each of these values.
  
  12.82; 5; 4 and 5
2. Which measure of central tendency would you recommend to describe the center of the number-of-ant-species distribution? Explain.
  
  Median
3. Find the mean, median, and mode for the percentage of total plant cover at the five Dry Steppe sites only.
  
  40.4; 40; 40
4. Find the mean, median, and mode for the percentage of total plant cover at the six Gobi Desert sites only.
  
  28; 26; 30
5. On the basis of the results of parts c and d, does the center of the distribution for total plant cover percentage appear to be different at the two regions?
  
  Yes

SAND 2.69 Permeability of sandstone during weathering. Natural stone, such as sandstone, is a popular building construction material. An experiment was carried out in order to better understand the decay properties of sandstone when exposed to the weather (Geographical Analysis, Vol. 42, 2010). Blocks of sandstone were cut into 300 equal-sized slices and the slices randomly divided into three groups of 100 slices each. Slices in group A were not exposed to any type of weathering; slices in group B were repeatedly sprayed with a 10% salt solution (to simulate wetting by driven rain) under temperate conditions; and slices in group C were soaked in a 10% salt solution and then dried (to simulate blocks of sandstone exposed during a wet winter and dried during a hot summer). All sandstone slices were then tested for permeability, measured in milliDarcies (mD). These permeability values measure pressure decay as a function of time. The data for the study (simulated) are saved in the SAND file. Measures of central tendency for the permeability measurements of each sandstone group are displayed in the MINITAB printout on p. 64.

Alternate View

Site Region Annual Rainfall (mm) Max. Daily Temp. (°C) Total Plant Cover (%) Number of Ant Species Species Diversity Index

1 Dry Steppe 196 5.7 40 3 .89

2 Dry Steppe 196 5.7 52 3 .83

3 Dry Steppe 179 7.0 40 52 1.31

4 Dry Steppe 197 8.0 43 7 1.48

5 Dry Steppe 149 8.5 27 5 .97

6 Gobi Desert 112 10.7 30 49 .46

7 Gobi Desert 125 11.4 16 5 1.23

8 Gobi Desert 99 10.9 30 4

9 Gobi Desert 125 11.4 56 4 .76

10 Gobi Desert 84 11.4 22 5 1.26

11 Gobi Desert 115 11.4 14 4 .69

Based on Pfeiffer, M., et al. “Community organization and species richness of ants in Mongolia along an ecological gradient from steppe to Gobi desert.” Journal of Biogeography, Vol. 30, No. 12, Dec. 2003 (Table 1 and 2).

Site	Region	Annual Rainfall (mm)	Max. Daily Temp. (°C)	Total Plant Cover (%)	Number of Ant Species	Species Diversity Index
1	Dry Steppe	196	5.7	40	3	.89
2	Dry Steppe	196	5.7	52	3	.83
3	Dry Steppe	179	7.0	40	52	1.31
4	Dry Steppe	197	8.0	43	7	1.48
5	Dry Steppe	149	8.5	27	5	.97
6	Gobi Desert	112	10.7	30	49	.46
7	Gobi Desert	125	11.4	16	5	1.23
8	Gobi Desert	99	10.9	30	4
9	Gobi Desert	125	11.4	56	4	.76
10	Gobi Desert	84	11.4	22	5	1.26
11	Gobi Desert	115	11.4	14	4	.69

Interpret the mean and median of the permeability measurements for group A sandstone slices.
Interpret the mean and median of the permeability measurements for group B sandstone slices.
Interpret the mean and median of the permeability measurements for group C sandstone slices.
Interpret the mode of the permeability measurements for group C sandstone slices.
The lower the permeability value, the slower the pressure decay in the sandstone over time. Which type of weathering (type B or type C) appears to result in faster decay?

B

SILICA 2.70 Mineral flotation in water study. Refer to the Minerals Engineering (Vol. 46–47, 2013) study of the impact of calcium and gypsum on the flotation properties of silica in water, Exercise 2.48 (p. 53). The zeta potential (mV) was determined for each of 50 liquid solutions prepared without calcium/gypsum and for 50 liquid solutions prepared with calcium/gypsum.
1. Find the mean, median, and mode for the zeta potential measurements of the liquid solutions prepared without calcium/gypsum. Interpret these values.
  
  Mean: −52.07
2. Find the mean, median, and mode for the zeta potential measurements of the liquid solutions prepared with calcium/gypsum. Interpret these values.
  
  Mean: −10.96
3. In Exercise 2.48, you used graphs to compare the zeta potential distributions for the two types of solutions. Now use the measures of central tendency to make the comparison. How does the addition of calcium/gypsum to the solution impact water quality (measured by zeta potential of silica)?

Applying the Concepts—Advanced

MYOPIA 2.71 Contact lenses for myopia. Myopia (i.e., nearsightedness) is a visual condition that affects over 100 million Americans. Two treatments that may slow myopia progression are the use of (1) corneal reshaping contact lenses and (2) bifocal soft contact lenses. In Optometry and Vision Science (Jan., 2013), university optometry professors compared the two methods for treating myopia. A sample of 14 myopia patients participated in the study. Each patient was fitted with a contact lens of each type for the right eye, and the peripheral refraction was measured for each type of lens. The differences (bifocal soft minus corneal reshaping) are shown in the following table. (These data are simulated, based on information provided in the journal article.)

Peripheral refraction differences

Alternate View

$- 0.15$ $- 0.15$ $- 8.11$ $- 8.11$ $- 0.79$ $- 0.79$ $- 0.80$ $- 0.80$ $- 0.81$ $- 0.81$ $- 0.39$ $- 0.39$ $- 0.68$ $- 0.68$

$- 1.13$ $- 1.13$ $- 0.32$ $- 0.32$ 0.01 $- 0.63$ $- 0.63$ 0.05 $- 0.41$ $- 0.41$ $- 1.11$ $- 1.11$

Find measures of central tendency for the difference measurements and interpret their values.

Mean: −1.09
Note that the data contain one unusually large (negative) difference relative to the other difference measurements. Find this difference. (In Section 2.7, we call this value an outlier.)

−8.11
The large negative difference of $- 8.11$ $- 8.11$ is actually a typographical error. The actual difference for this patient is $- 0.11$ $- 0.11$ . Rerun the analysis in part a using the corrected difference. Which measure of central tendency is most affected by the correcting of the outlier?

Mean: −.52

NUC 2.72 Active nuclear power plants. The U.S. Energy Information Administration monitors all nuclear power plants operating in the United States. The table below lists the number of active nuclear power plants operating in each of a sample of 20 states.

Find the mean, median, and mode of this data set.

Data for Exercise 2.72

State	Number of Power Plants
Alabama	5
Arizona	3
California	4
Florida	5
Georgia	4
Illinois	11
Kansas	1
Louisiana	2
Massachusetts	1
Mississippi	1
New Hampshire	1
New York	6
North Carolina	5
Ohio	3
Pennsylvania	9
South Carolina	7
Tennessee	3
Texas	4
Vermont	1
Wisconsin	3

Based on Statistical Abstract of the United States, 2012 (Table 942). U.S. Energy Information Administration, Electric Power Annual.

Eliminate the largest value from the data set and repeat part a. What effect does dropping this measurement have on the measures of central tendency found in part a?

3.58; 3; 1
Arrange the 20 values in the table from lowest to highest. Next, eliminate the lowest two values and the highest two values from the data set, and find the mean of the remaining data values. The result is called a 10% trimmed mean, since it is calculated after removing the highest 10% and the lowest 10% of the data values. What advantages does a trimmed mean have over the regular arithmetic mean?

3.56

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Honey	12	11	15	11	10	13	10	4	15	16	9
Dosage:	14	10	6	10	8	11	12	12	8	12	9
	11	15	10	15	9	13	8	12	10	8	9
	5	12
DM	4	6	9	4	7	7	7	9	12	10	11
Dosage:	6	3	4	9	12	7	6	8	12	12	4
	12	13	7	10	13	9	4	4	10	15	9
No Dosage	5	8	6	1	0	8	12	8	7	7	1
(Control):	6	7	7	12	7	9	7	9	5	11	9
	5	6	8	8	6	7	10	9	4	8	7
	3	1	4	3

$- 0.15$ $- 0.15$	$- 8.11$ $- 8.11$	$- 0.79$ $- 0.79$	$- 0.80$ $- 0.80$	$- 0.81$ $- 0.81$	$- 0.39$ $- 0.39$	$- 0.68$ $- 0.68$
$- 1.13$ $- 1.13$	$- 0.32$ $- 0.32$	0.01	$- 0.63$ $- 0.63$	0.05	$- 0.41$ $- 0.41$	$- 1.11$ $- 1.11$

Table of Contents for 2.3 Numerical Measures of Central Tendency

Create new playlist

Sign In

Sign Up

Table of Contents for
2.3 Numerical Measures of Central Tendency