2.2 Graphical Methods for Describing Quantitative Data

Teaching Tip

Explain that quantitative data must be condensed in some manner to generate any kind of meaningful graphical summary of data.

Recall from Section 1.4 that quantitative data sets consist of data that are recorded on a meaningful numerical scale. To describe, summarize, and detect patterns in such data, we can use three graphical methods: dot plots, stem-and-leaf displays, and histograms. Since most statistical software packages can be used to construct these displays, we’ll focus here on their interpretation rather than their construction.

For example, the Environmental Protection Agency (EPA) performs extensive tests on all new car models to determine their mileage ratings. Suppose that the 100 measurements in Table 2.2 represent the results of such tests on a certain new car model. How can we summarize the information in this rather large sample?

A visual inspection of the data indicates some obvious facts. For example, most of the mileages are in the 30s, with a smaller fraction in the 40s. But it is difficult to provide much additional information on the 100 mileage ratings without resorting to some method of summarizing the data. One such method is a dot plot.

Dot Plots

Teaching Tip

The dot plot condenses the data by grouping all values that are the same.

A MINITAB dot plot for the 100 EPA mileage ratings is shown in Figure 2.8. The horizontal axis of the figure is a scale for the quantitative variable in miles per gallon (mpg). The rounded (to the nearest half gallon) numerical value of each measurement in the data set is located on the horizontal scale by a dot. When data values repeat, the dots are placed above one another, forming a pile at that particular numerical location. As you can see, this dot plot verifies that almost all of the mileage ratings are in the 30s, with most falling between 35 and 40 miles per gallon.

Table 2.2 EPA Mileage Ratings on 100 Cars

Alternate View
36.3 41.0 36.9 37.1 44.9
32.7 37.3 41.2 36.6 32.9
40.5 36.5 37.6 33.9 40.2
36.2 37.9 36.0 37.9 35.9
38.5 39.0 35.5 34.8 38.6
36.3 36.8 32.5 36.4 40.5
41.0 31.8 37.3 33.1 37.0
37.0 37.2 40.7 37.4 37.1
37.1 40.3 36.7 37.0 33.9
39.9 36.9 32.9 33.8 39.8
36.8 30.0 37.2 42.1 36.7
36.5 33.2 37.4 37.5 33.6
36.4 37.7 37.7 40.0 34.2
38.2 38.3 35.7 35.6 35.1
39.4 35.3 34.4 38.8 39.7
36.6 36.1 38.2 38.4 39.3
37.6 37.0 38.7 39.0 35.8
37.8 35.9 35.6 36.7 34.5
40.1 38.0 35.2 34.8 39.5
34.0 36.8 35.0 38.1 36.9

Data Set: EPAGAS

Figure 2.8

MINITAB dot plot for 100 EPA mileage ratings

Stem-and-Leaf Display

Another graphical representation of these same data, a MINITAB stem-and-leaf display, is shown in Figure 2.9. In this display, the stem is the portion of the measurement (mpg) to the left of the decimal point, while the remaining portion, to the right of the decimal point, is the leaf.

Teaching Tip

The stem-and-leaf display condenses the data by grouping all data with the same stem.

In Figure 2.9, the stems for the data set are listed in the second column, from the smallest (30) to the largest (44). Then the leaf for each observation is listed to the right, in the row of the display corresponding to the observation’s stem.* For example, the leaf 3 of the first observation (36.3) in Table 2.2 appears in the row corresponding to the stem 36. Similarly, the leaf 7 for the second observation (32.7) in Table 2.2 appears in the row corresponding to the stem 32, while the leaf 5 for the third observation (40.5) appears in the row corresponding to the stem 40. (The stems and leaves for these first three observations are highlighted in Figure 2.9.) Typically, the leaves in each row are ordered as shown in the MINITAB stem-and-leaf display.

Teaching Tip

Choices for the stems and the leaves are critical to producing the most meaningful stem-and-leaf display. Encourage students to try different options until they produce the display that they think best characterizes the data.

Figure 2.9

MINITAB stem-and-leaf display for 100 mileage ratings

The stem-and-leaf display presents another compact picture of the data set. You can see at a glance that the 100 mileage readings were distributed between 30.0 and 44.9, with most of them falling in stem rows 35 to 39. The 6 leaves in stem row 34 indicate that 6 of the 100 readings were at least 34.0, but less than 35.0. Similarly, the 11 leaves in stem row 35 indicate that 11 of the 100 readings were at least 35.0, but less than 36.0. Only five cars had readings equal to 41 or larger, and only one was as low as 30.

The definitions of the stem and leaf for a data set can be modified to alter the graphical description. For example, suppose we had defined the stem as the tens digit for the gas mileage data, rather than the ones and tens digits. With this definition, the stems and leaves corresponding to the measurements 36.3 and 32.7 would be as follows:

Biography John Tukey (1915–2000)

The Picasso of Statistics

Like the legendary artist Pablo Picasso, who mastered and revolutionized a variety of art forms during his lifetime, John Tukey is recognized for his contributions to many subfields of statistics. Born in Massachusetts, Tukey was home schooled, graduated with his bachelor’s and master’s degrees in chemistry from Brown University, and received his Ph.D. in mathematics from Princeton University. While at Bell Telephone Laboratories in the 1960s and early 1970s, Tukey developed exploratory data analysis, a set of graphical descriptive methods for summarizing and presenting huge amounts of data. Many of these tools, including the stem-and-leaf display and the box plot, are now standard features of modern statistical software packages. (In fact, it was Tukey himself who coined the term software for computer programs.)

Stem Leaf
3 6
Stem Leaf
3 2

Note that the decimal portion of the numbers has been dropped. Generally, only one digit is displayed in the leaf.

If you look at the data, you’ll see why we didn’t define the stem this way. All the mileage measurements fall into the 30s and 40s, so all the leaves would fall into just two stem rows in this display. The resulting picture would not be nearly as informative as Figure 2.9.

Now Work Exercise 2.31

Histograms

An SPSS histogram for the 100 EPA mileage readings given in Table 2.2 is shown in Figure 2.10. The horizontal axis of the figure, which gives the miles per gallon for a given automobile, is divided into class intervals, commencing with the interval from 30–31 and proceeding in intervals of equal size to 44–45 mpg. The vertical axis gives the number (or frequency) of the 100 readings that fall into each interval. It appears that about 21 of the 100 cars, or 21%, attained a mileage between 37 and 38 mpg. This class interval contains the highest frequency, and the intervals tend to contain a smaller number of the measurements as the mileages get smaller or larger.

Teaching Tip

The histogram condenses data by grouping similar data values into the same class.

Histograms can be used to display either the frequency or relative frequency of the measurements falling into the class intervals. The class intervals, frequencies, and relative frequencies for the EPA car mileage data are shown in the summary table, Table 2.3.*

Figure 2.10

SPSS histogram for 100 EPA gas mileage ratings

By summing the relative frequencies in the intervals 35–36, 36–37, 37–38, and 38–39, you find that 65% of the mileages are between 35 and 39. Similarly, only 2% of the cars obtained a mileage rating over 42.0. Many other summary statements can be made by further examining the histogram and accompanying summary table. Note that the sum of all class frequencies will always equal the sample size n.

Teaching Tip

Classes of equal width should be used in generating a histogram.

In interpreting a histogram, consider two important facts. First, the proportion of the total area under the histogram that falls above a particular interval on the x-axis is equal to the relative frequency of measurements falling into that interval. For example, the relative frequency for the class interval 37–38 in Figure 2.10 is .20. Consequently, the rectangle above the interval contains .20 of the total area under the histogram.

Teaching Tip

When constructing histograms, use more classes as the number of values in the data set gets larger.

Table 2.3 Class Intervals, Frequencies, and Relative Frequencies for the Gas Mileage Data

Class Interval Frequency Relative Frequency
30–31 1 0.01
31–32 1 0.01
32–33 4 0.04
33–34 6 0.06
34–35 6 0.06
35–36 11 0.11
36–37 20 0.20
37–38 21 0.21
38–39 10 0.10
39–40 8 0.08
40–41 7 0.07
41–42 3 0.03
42–43 1 0.01
43–44 0 0.00
44–45 1 0.01
Totals 100 1.00

Second, imagine the appearance of the relative frequency histogram for a very large set of data (representing, say, a population). As the number of measurements in a data set is increased, you can obtain a better description of the data by decreasing the width of the class intervals. When the class intervals become small enough, a relative frequency histogram will (for all practical purposes) appear as a smooth curve. (See Figure 2.11.)

Figure 2.11

The effect of the size of a data set on the outline of a histogram

Some recommendations for selecting the number of intervals in a histogram for smaller data sets are given in the following box:

Determining the Number of Classes in a Histogram

Number of Observations in Data Set Number of Classes
Fewer than 25 5–6
25–50 7–14
More than 50 15–20

While histograms provide good visual descriptions of data sets—particularly very large ones—they do not let us identify individual measurements. In contrast, each of the original measurements is visible to some extent in a dot plot and is clearly visible in a stem-and-leaf display. The stem-and-leaf display arranges the data in ascending order, so it’s easy to locate the individual measurements. For example, in Figure 2.9 we can easily see that two of the gas mileage measurements are equal to 36.3, but we can’t see that fact by inspecting the histogram in Figure 2.10. However, stem-and-leaf displays can become unwieldy for very large data sets. A very large number of stems and leaves causes the vertical and horizontal dimensions of the display to become cumbersome, diminishing the usefulness of the visual display.

Example 2.2 Graphing a Quantitative Variable—The “Water-Level Task”

Problem

  1. Over 60 years ago, famous child psychologist Jean Piaget devised a test of basic perceptual and conceptual skills dubbed the “water-level task.” Subjects were shown a drawing of a glass being held at a 45° angle and asked to draw a line representing the true surface of the water. Today, research psychologists continue to use the task to test the perception of both adults and children. In one study, the water-level task was given to several groups that included 20 male bartenders and 20 female waitresses (Psychological Science, Mar. 1995). For each participant, the researchers measured the deviation (in angle degrees) of the judged line from the true line. These deviations (simulated on the basis of summary results presented in the journal article) are shown in Table 2.4. [Note: Deviations can be negative if the judged angle is smaller than the angle of the true line.]

    Teaching Tip

    Have students use a dot plot to describe the data in Table 2.4. Compare to the graphs of Example 2.2.

    Table 2.4 Water-Level Task Deviations (angle degrees)

    Alternate View
    Bartenders: 9  6 10 6 10 3  7 8  6 14  7 8 5  2 1   0 2 3   0   2
    Waitresses:   7 10 25 8 10   8 12 9 35 10 12 11   7 10 21 1 4 0 16 1

    Data Set: WLTASK

    1. Use a statistical software package to create a frequency histogram for the combined data in Table 2.4. Then, shade the area under the histogram that corresponds to deviations recorded for waitresses. Interpret the result.

    2. Use a statistical software package to create a stem-and-leaf display for these combined data. Again, shade each leaf of the display that corresponds to a deviation recorded for a waitress. Interpret the result.

Solution

  1. We used SPSS to generate the frequency histogram, shown in Figure 2.12. Note that SPSS formed 20 classes, with class intervals 20 to 15, 15 to 10,…, 30 to 35, and 35 to 40. This histogram clearly shows the clustering of the deviation angles between 0° and 15°, with a few deviations in the upper end of the distribution (greater than 20°). SPSS used green bars to shade the areas of the histogram that correspond to the measurements for waitresses. The graph clearly shows that waitresses tend to have greater (positive) deviations than do bartenders and fewer deviations near 0° relative to bartenders.

    Figure 2.12

    SPSS histogram for task deviations

  2. We used MINITAB to produce the stem-and-leaf display in Figure 2.13. Note that the stem (the second column on the printout) represents the first digit (including 0) in the deviation angle measurement while the leaf (the third column on the printout) represents the second digit. Thus, the leaf 5 in the stem 2 row represents the deviation angle of 25°. The shaded leaves represent deviations recorded for waitresses. As with the histogram, the stem-and-leaf display shows that deviations for waitresses tend to appear in the upper tail of the distribution. Together, the graphs imply that waitresses tend to overestimate the angle of the true line relative to bartenders.

Look Back

As is usually the case with data sets that are not too large (say, fewer than 100 measurements), the stem-and-leaf display provides more detail than the histogram without being unwieldy. For instance, the stem-and-leaf display in Figure 2.13 clearly indicates the values of the individual measurements in the data set. For example, the largest deviation angle (representing the measurement 35°) is shown in the last stem row. By contrast, histograms are most useful for displaying very large data sets when the overall shape of the distribution of measurements is more important than the identification of individual measurements.

Figure 2.13

MINITAB stem-and-leaf display for task deviations

Now Work Exercise 2.44

Most statistical software packages can be used to generate histograms, stem-and-leaf displays, and dot plots. All three are useful tools for graphically describing data sets. We recommend that you generate and compare the displays whenever you can.

Summary of Graphical Descriptive Methods for Quantitative Data

  • Dot Plot: The numerical value of each quantitative measurement in the data set is represented by a dot on a horizontal scale. When data values repeat, the dots are placed above one another vertically.

  • Stem-and-Leaf Display: The numerical value of the quantitative variable is partitioned into a “stem” and a “leaf.” The possible stems are listed in order in a column. The leaf for each quantitative measurement in the data set is placed in the corresponding stem row. Leaves for observations with the same stem value are listed in increasing order horizontally.

  • Histogram: The possible numerical values of the quantitative variable are partitioned into class intervals, each of which has the same width. These intervals form the scale of the horizontal axis. The frequency or relative frequency of observations in each class interval is determined. A vertical bar is placed over each class interval, with the height of the bar equal to either the class frequency or class relative frequency.

Statistics in Action Revisited

Interpreting Histograms for the Body Image Data

In the Body Image: An International Journal of Research (Jan. 2010) study of 92 BDD patients, the researchers asked each patient to respond to a series of questions on body image (e.g., “How satisfied are you with your physical attractiveness and looks?”). Recall that the scores were summed to yield an Appearance Evaluation score that ranged from 7 to 35 points. This score represents a quantitative variable. Consequently, to graphically investigate whether BDD females tend to be more dissatisfied with their looks than BDD males, we can form side-by-side histograms for the total score, one histogram for females and one for males. These histograms are shown in Figure SIA2.4.

Like the pie charts in the previous Statistics in Action Revisited section, the histograms tend to support the theory. For females, the histogram for appearance evaluation score is centered at about 17 points, while for males the histogram is centered higher, at about 20 points. Also from the histograms you can see that about 55% of the female patients had a score of less than 20, compared to only about 45% of the males. Again, the histograms seem to indicate that BDD females tend to be more dissatisfied with their looks than males. In later chapters, we’ll learn how to attach a measure of reliability to such an inference.

Data Set: BDD

Figure SIA2.4

MINITAB side-by-side histograms for Appearance Evaluation by Gender

Exercises 2.25–2.48

Understanding the Principles

  1. 2.25 Explain the difference between a dot plot and a stem-and-leaf display.

  2. 2.26 Explain the difference between a bar graph and a histogram.

  3. 2.27 Explain the difference between the stem and the leaf in a stem-and-leaf display.

  4. 2.28 In a histogram, what are the class intervals?

  5. 2.29 How many classes are recommended in a histogram of a data set with more than 50 observations?

Learning the Mechanics

  1. 2.30 Consider the MINITAB histogram shown below.

    1. Is this a frequency histogram or a relative frequency histogram? Explain.

    2. How many class intervals were used in the construction of this histogram?

    3. How many measurements are there in the data set described by this histogram?

  2. 2.31 Consider the stem-and-leaf display shown here:

    Stem Leaf
    5 1
    4 457
    3 00036
    2 1134599
    1 2248
    0 012
    1. How many observations were in the original data set?

    2. In the bottom row of the stem-and-leaf display, identify the stem, the leaves, and the numbers in the original data set represented by this stem and its leaves.

    3. Re-create all the numbers in the data set, and construct a dot plot.

  3. 2.32 Graph the relative frequency histogram for the 500 measurements summarized in the accompanying relative frequency table.

    Class Interval Relative Frequency
    .5–2.5 .10
    2.5–4.5 .15
    4.5–6.5 .25
    6.5–8.5 .20
    8.5–10.5 .05
    10.5–12.5 .10
    12.5–14.5 .10
    14.5–16.5 .05
  4. 2.33 Refer to Exercise 2.32 . Calculate the number of the 500 measurements falling into each of the measurement classes. Then graph a frequency histogram of these data.

Applying the Concepts—Basic

  1. ISR 2.34 Irrelevant speech effects. In a psychological study of short-term memory, irrelevant speech effects refer to the degree to which the memorization process is impaired by irrelevant background speech (for example, trying to memorize a list of numbers while listening to a speech in an unfamiliar language). An analysis of irrelevant speech effects was carried out and published in Acoustical Science & Technology (Vol. 35, 2014). Subjects performed the memorization task under two conditions: (1) with irrelevant background speech and (2) in silence. The difference in the error rates for the two conditions—called the relative difference in error rate (RDER)—was computed for each subject. A MINITAB histogram summarizing the RDER values for 71 subjects is displayed here.

    1. Convert the frequency histogram into a relative frequency histogram.

    2. What proportion of the subjects had RDER values between 75 and 105?

    3. What proportion of the subjects had RDER values below 15?

  2. 2.35 Stability of compounds in new drugs. Testing the metabolic stability of compounds used in drugs is the cornerstone of new drug discovery. Two important values computed from the testing phase are the fraction of compound unbound to plasma (fup) and the fraction of compound unbound to microsomes (fumic). A key formula for assessing stability assumes that the fup/fumic ratio is 1. Pharmacologists at Pfizer Global Research and Development investigated this phenomenon and reported the results in ACS Medicinal Chemistry Letters (Vol. 1, 2010). The fup/fumic ratio was determined for each of 416 drugs in the Pfizer database. A graph describing the fup/fumic ratios is shown below.

    1. What type of graph is displayed?

    2. What is the quantitative variable summarized in the graph?

    3. Determine the proportion of fup/fumic ratios that fall above 1.

    4. Determine the proportion of fup/fumic ratios that fall below .4.

  3. SUSTAIN 2.36 Corporate sustainability of CPA firms. Refer to the Business and Society (Mar. 2011) study on the sustainability behaviors of CPA corporations, Exercise 1.26 (p. 22). Recall that corporate sustainability refers to business practices designed around social and environmental considerations. Data on the level of support for corporate sustainability were obtained for 992 senior managers. Level of support was measured quantitatively. Simulation was used to convert the data from the study to a scale ranging from 0 to 160 points, where higher point values indicate a higher level of support for sustainability.

    1. A histogram for level of support for sustainability is shown below. What type of histogram is produced, frequency or relative frequency?

    2. Use the graph to estimate the percentage of the 992 senior managers who reported a high (100 points or greater) level of support for corporate sustainability.

  4. SHAFTS 2.37 Shaft graves in ancient Greece. Archeologists have discovered a rise in shaft graves during the Middle Helladic period in ancient Greece (i.e., around 2000 BC). Shaft graves are named for the beautifully decorated sword shafts that are buried along with the bodies. An analysis of shaft graves was carried out and published in the American Journal of Archaeology (Jan. 2014). The table below gives the number of shafts buried at each of 13 recently discovered grave sites. Construct a dot plot for the data. What number of sword shafts was observed most often in the sample of 13 graves?

    Alternate View
    1 2 3 1 5 6 2 4 1 2 4 2 9

    Source: Harrell, K. “The fallen and their swords: A new explanation for the rise of the shaft graves.” American Journal of Archaeology, Vol. 118, No. 1, January 2014 (Figure 1).

  5. MOLARS 2.38 Cheek teeth of extinct primates. Refer to the American Journal of Physical Anthropology (Vol. 142, 2010) study of the characteristics of cheek teeth (e.g., molars) in an extinct primate species, Exercise 2.9 (p. 38). In addition to degree of wear, the researchers recorded the dentary depth of molars (in millimeters) for 18 cheek teeth extracted from skulls. These depth measurements are listed in the accompanying table. Summarize the data graphically with a stem-and-leaf display. Is there a particular molar depth that occurs more frequently in the sample?

    Data on Dentary Depth (mm) of Molars
    18.12 16.55
    19.48 15.70
    19.36 17.83
    15.94 13.25
    15.83 16.12
    19.70 18.13
    15.76 14.02
    17.00 14.04
    13.96 16.20

    Based on Boyer, D. M., Evans, A. R., and Jernvall, J. “Evidence of dietary differentiation among late Paleocene–early Eocene Plesiadapids (Mammalia, primates).” American Journal of Physical Anthropology, Vol. 142, © 2010 (Table A3).

  6. PAI 2.39 Music performance anxiety. The nature of performance anxiety by music students was investigated in the British Journal of Music Education (Mar. 2014). Symptoms of music performance anxiety include increased heart rate, shallow breathing, anxious thoughts, and the avoidance of practice. A Performance Anxiety Inventory (PAI)—measured on a scale from 20 to 80 points—was developed to measure music performance anxiety. The table below gives average PAI values for participants in eight different studies.

    Alternate View
    54 42 51 39 41 43 55 40

    Source: Patston, T. “Teaching stage fright? Implications for music educators.” British Journal of Music Education, Vol. 31, No. 1, Mar. 2014 (adapted from Figure 1).

    1. Construct a stem-and-leaf plot for the data.

    2. Locate the PAI value of 42 on the plot, part a.

    3. Based on the graph, which of the following PAI score ranges is most likely to occur, 20–29, 30–39, 40–49, 50–59, 60–69, or 70–79?

Applying the Concepts—Intermediate

  1. COUGH 2.40 Is honey a cough remedy? Does a teaspoon of honey before bed really calm a child’s cough? To test the folk remedy, pediatric researchers at Pennsylvania State University carried out a designed study involving a sample of 105 children who were ill with an upper respiratory tract infection (Archives of Pediatrics and Adolescent Medicine, Dec. 2007). On the first night, parents rated their children’s cough symptoms on a scale from 0 (no problems at all) to 30 (extremely severe). On the second night, the parents were instructed to give their sick child a dosage of liquid “medicine” prior to bedtime. Unknown to the parents, some were given a dosage of dextromethorphan (DM)—an over-the-counter cough medicine—while others were given a similar dose of honey. Also, a third group of parents (the control group) gave their sick children no dosage at all. Again, the parents rated their children’s cough symptoms, and the improvement in total cough symptoms score was determined for each child. The data (improvement scores) for the study are shown in the table below.

    1. Construct a dot plot for the coughing improvement scores for the 35 children in the honey dosage group.

    2. Refer to part a. What coughing improvement score occurred most often in the honey dosage group?

      Alternate View
      Honey 12 11 15 11 10 13 10 4 15 16 9
       Dosage: 14 10 6 10 8 11 12 12 8 12 9
      11 15 10 159 138 12 108 9
      5 12
      DM Dosage: 4 6 9 4 7 7 7 9 12 10 11 6
      3 4 9 12 7 6 8 12 12 4 12 13
      7 10 13 9 4 4 10 15 9
      No Dosage 5 8 6 1 0 8 12 8 7 7 1 6 7 7
       (Control): 12 7 9 7 9 5 11 9 5 6 8 8 6 7
      10 9 4 8 7 3 1 4 3

      Based on Paul, I. M., et al. “Effect of honey, dextromethorphan, and no treatment on nocturnal cough and sleep quality for coughing children and their parents.” Archives of Pediatrics and Adolescent Medicine, Vol. 161, No. 12, Dec. 2007 (data simulated).

    3. A MINITAB dot plot for the improvement scores of all three groups is shown below. Note that the green dots represent the children who received a dose of honey, the red dots represent those who got the DM dosage, and the black dots represent the children in the control group. What conclusions can pediatric researchers draw from the graph? Do you agree with the statement (extracted from the article), “Honey may be a preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection”?

  2. SANIT 2.41 Sanitation inspection of cruise ships. To minimize the potential for gastrointestinal disease outbreaks, all passenger cruise ships arriving at U.S. ports are subject to unannounced sanitation inspections. Ships are rated on a 100-point scale by the Centers for Disease Control and Prevention. A score of 86 or higher indicates that the ship is providing an accepted standard of sanitation. The latest (as of Aug. 2013) sanitation scores for 186 cruise ships are saved in the SANIT file. The first five and last five observations in the data set are listed in the following table:

    Ship Name Sanitation Score
    Adonia  96
    Adventure of the Seas  93
    AID Aaura  86
    AID ABella  95
    AID Aluna  93
    Voyager of the Seas  96
    VSP beta 100
    Westerdam  98
    Zaandam 100
    Zuiderdam  96

    Based on National Center for Environmental Health, Centers for Disease Control and Prevention, Aug. 5, 2013.

    1. Generate a stem-and-leaf display of the data. Identify the stems and leaves of the graph.

    2. Use the stem-and-leaf display to estimate the proportion of ships that have an accepted sanitation standard.

    3. Locate the inspection score of 69 (MS Columbus 2) on the stem-and-leaf display.

    MINITAB dot plot for Exercise 2.40

  3. SPIDER 2.42 Crab spiders hiding on flowers. Crab spiders use camouflage to hide on flowers while lying in wait to prey on other insects. Ecologists theorize that this natural camouflage also enables the spiders to hide from their own predators, such as birds and lizards. Researchers at the French Museum of Natural History conducted a field test of this theory and published the results in Behavioral Ecology (Jan. 2005). They collected a sample of 10 adult female crab spiders, each sitting on the yellow central part of a daisy. The chromatic contrast between each spider and the flower it was sitting on was measured numerically with a spectroradiometer, on which higher values indicate a greater contrast (and, presumably, easier detection by predators). The data for the 10 crab spiders are shown in the following table.

    Alternate View
    57 75 116 37 96 61 56 2 43 32

    Based on Thery, M., et al. “Specific color sensitivities of prey and predator explain camouflage in different visual systems.” Behavioral Ecology, Vol. 16, No. 1, Jan. 2005 (Table 1).

    1. Summarize the chromatic contrast measurements for the 10 spiders with a stem-and-leaf display.

    2. For birds, the detection threshold is 70. (A contrast of 70 or greater allows the bird to see the spider.) Locate the spiders that can be seen by bird predators by circling their respective contrast values on the stem-and-leaf display.

    3. Use the result of part b to make an inference about the likelihood of a bird detecting a crab spider sitting on the yellow central part of a daisy.

  4. BBALL 2.43 Sound waves from a basketball. An experiment was conducted to characterize sound waves in a spherical cavity (American Journal of Physics, June 2010). A fully inflated basketball, hanging from rubber bands, was struck with a metal rod, producing a series of metallic-sounding pings. Of particular interest were the frequencies of sound waves resulting from the first 24 resonances (echoes). A mathematical formula, well known in physics, was used to compute the theoretical frequencies. These frequencies (measured in hertz) are listed in the table. Use a graphical method to describe the distribution of sound frequencies for the first 24 resonances.

    Resonance Frequency
     1  979
     2 1572
     3 2113
     4 2122
     5 2659
     6 2795
     7 3181
     8 3431
     9 3638
    10 3694
    11 4038
    12 4203
    13 4334
    14 4631
    15 4711
    16 4993
    17 5130
    18 5210
    19 5214
    20 5633
    21 5779
    22 5836
    23 6259
    24 6339

    Based on Russell, D. A. “Basketballs as spherical acoustic cavities.” American Journal of Physics, Vol. 78, No. 6, June 2010 (Table I).

  5. BULIMIA 2.44 Research on eating disorders. Data from a psychology experiment were reported and analyzed in The American Statistician (May 2001). Two samples of female students participated in the experiment. One sample consisted of 11 students known to suffer from the eating disorder bulimia; the other sample consisted of 14 students with normal eating habits. Each student completed a questionnaire from which a “fear of negative evaluation” (FNE) score was produced. (The higher the score, the greater was the fear of negative evaluation.) The data are displayed in the following table:

    Alternate View
    Bulimic students: 21 13 10 20 25 19 16 21 24 13 14
    Normal students: 13  6 16 13  8 19 23 18 11 19  7 10 15 20

    Based on Randles, R. H. “On neutral responses (zeros) in the sign test and ties in the Wilcoxon–Mann–Whitney test.” The American Statistician, Vol. 55, No. 2, May 2001 (Figure 3).

    1. Construct a dot plot or stem-and-leaf display for the FNE scores of all 25 female students.

    2. Highlight the bulimic students on the graph you made in part a. Does it appear that bulimics tend to have a greater fear of negative evaluation? Explain.

    3. Why is it important to attach a measure of reliability to the inference made in part b?

  6. BRAIN 2.45 Research on brain specimens. The postmortem interval (PMI) is defined as the time elapsed (in days) between death and an autopsy. Knowledge of the PMI is considered essential to conducting medical research on human cadavers. The data in the accompanying table are the PMIs of 22 human brain specimens obtained at autopsy in a recent study (Brain and Language, June 1995). Describe the PMI data graphically with a dot plot. On the basis of the plot, make a summary statement about the PMIs of the 22 human brain specimens.

    Alternate View
    Postmortem Intervals for 22 Human Brain Specimens
    5.5 14.5  6.0 5.5  5.3 5.8 11.0 6.1
    7.0 14.5 10.4 4.6  4.3 7.2 10.5 6.5
    3.3  7.0  4.1 6.2 10.4 4.9

    Based on Hayes, T. L., and Lewis, D. A. “Anatomical specialization of the anterior motor speech area: Hemispheric differences in magnopyramidal neurons.” Brain and Language, Vol. 49, No. 3, June 1995, p. 292 (Table 1).

Applying the Concepts—Advanced

  1. SAT 2.46 State SAT scores. Educators are constantly evaluating the efficacy of public schools in the education and training of U.S. students. One quantitative assessment of change over time is the difference in scores on the SAT, which has been used for decades by colleges and universities as one criterion for admission. The SAT file contains average SAT scores for each of the 50 states and the District of Columbia for 2011 and 2014. Selected observations are shown in the following table:

    State 2011 2014
    Alabama 1623 1608
    Alaska 1513 1495
    Arizona 1539 1551
    Arkansas 1692 1697
    California 1513 1505
    Wisconsin 1767 1771
    Wyoming 1692 1757

    Based on College Entrance Examination Board, 2014.

    1. Use graphs to display the two SAT score distributions. How have the distributions of state scores changed from 2011 to 2014?

    2. As another method of comparing the 2011 and 2014 average SAT scores, compute the paired difference by subtracting the 2011 score from the 2014 score for each state. Summarize these differences with a graph.

    3. Interpret the graph you made in part b. How do your conclusions compare with those of part a?

    4. Identify the state with the largest improvement in the SAT score between 2011 and 2014.

  2. PHISH 2.47 Phishing attacks to e-mail accounts. Phishing is the term used to describe an attempt to extract personal/financial information (e.g., PIN numbers, credit card information, bank account numbers) from unsuspecting people through fraudulent e-mail. An article in Chance (Summer 2007) demonstrates how statistics can help identify phishing attempts and make e-commerce safer. Data from an actual phishing attack against an organization were used to determine whether the attack may have been an “inside job” that originated within the company. The company set up a publicized e-mail account—called a “fraud box”—which enabled employees to notify it if they suspected an e-mail phishing attack. The interarrival times, i.e., the time differences (in seconds), for 267 fraud box e-mail notifications were recorded. Researchers showed that if there is minimal or no collaboration or collusion from within the company, the interarrival times would have a frequency distribution similar to the one shown in the accompanying figure. The 267 interarrival times are saved in the PHISH file. Construct a frequency histogram for the interarrival times. Give your opinion on whether the phishing attack against the organization was an “inside job.”

  3. SILICA 2.48 Mineral flotation in water study. A high concentration of calcium and gypsum in water can affect the water quality and limit mineral flotation. In Minerals Engineering (Vol. 46–47, 2013), chemical and materials engineers published a study of the impact of calcium and gypsum on the flotation properties of silica in water. Solutions of deionized water were prepared both with and without calcium/gypsum, and the level of flotation of silica in the solution was measured using a variable called zeta potential (measured in millivolts, mV). Assume that 50 specimens for each type of liquid solution were prepared and tested for zeta potential. The data (simulated, based on information provided in the journal article) are provided in the table. Create side-by-side graphs to compare the zeta potential distributions for the two types of solutions. How does the addition of calcium/gypsum to the solution affect water quality (measured by zeta potential of silica)?

    Without calcium/gypsum
    47.1 53.0 50.8 54.4 57.4 49.2 51.5 50.2 46.4 49.7
    53.8 53.8 53.5 52.2 49.9 51.8 53.7 54.8 54.5 53.3
    50.6 52.9 51.2 54.5 49.7 50.2 53.2 52.9 52.8 52.1
    50.2 50.8 56.1 51.0 55.6 50.3 57.6 50.1 54.2 50.7
    55.7 55.0 47.4 47.5 52.8 50.6 55.6 53.2 52.3 45.7
    With calcium/gypsum
    9.2 11.6 10.6 8.0 10.9 10.0 11.0 10.7 13.1 11.5
    11.3 9.9 11.8 12.6 8.9 13.1 10.7 12.1 11.2 10.9
    9.1 12.1 6.8 11.5 10.4 11.5 12.1 11.3 10.7 12.4
    11.5 11.0 7.1 12.4 11.4 9.9 8.6 13.6 10.1 11.3
    13.0 11.9 8.6 11.3 13.0 12.2 11.3 10.5 8.8 13.4
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset