2.9 Distorting the Truth with Descriptive Statistics

A picture may be “worth a thousand words,” but pictures can also color messages or distort them. In fact, the pictures displayed in statistics—histograms, bar charts, and other graphical images—are susceptible to distortion, so we have to examine each of them with care. Accordingly, we begin this section by mentioning a few of the pitfalls to watch for in interpreting a chart or a graph. Then we discuss how numerical descriptive statistics can be used to distort the truth.

Teaching Tip

Use this section to emphasize the importance of looking past the picture to the information it is trying to convey. If a student can successfully interpret the graph, she will be able to see through the deception.

Graphical Distortions

One common way to change the impression conveyed by a graph is to alter the scale on the vertical axis, the horizontal axis, or both. For example, consider the data on collisions of large marine vessels operating in European waters over a certain five-year period, summarized in Table 2.10. Figure 2.38 is a MINITAB bar graph showing the frequency of collisions for each of the three locations listed in the table. The graph shows that in-port collisions occur more often than collisions at sea or collisions in restricted waters.

Table 2.10 Collisions of Marine Vessels by Location

Location Number of Ships
At Sea 376
In Restricted Waters 273
In Port 478
Total 1,127

Based on The Dock and Harbour Authority.

Data Set: MARINE

Now, suppose you want to use the same data to exaggerate the difference between the number of in-port collisions and the number of collisions in restricted waters. One way to do this is to increase the distance between successive units on the vertical axis—that is, stretch the vertical axis by graphing only a few units per inch. A telltale sign of stretching is a long vertical axis, but this indication is often hidden by starting the vertical axis at some point above the origin, 0. Such a graph is shown in the SPSS printout in Figure 2.39. By starting the bar chart at 250 collisions (instead of 0), it appears that the frequency of in-port collisions is many times greater than the frequency of collisions in restricted waters.

Teaching Tip

Compare these two graphs to show how changes in the scaling can affect the information that a graph is portraying.

Ethics in Statistics

Intentionally distorting a graph to portray a particular viewpoint is considered unethical statistical practice.

Figure 2.38

MINITAB bar graph of vessel collisions by location

Figure 2.39

SPSS bar graph of vessel collisions by location, with adjusted vertical axis

Another method of achieving visual distortion with bar graphs is by making the width of the bars proportional to their height. For example, look at the bar chart in Figure 2.40a, which depicts the percentage of the total number of motor vehicle deaths in a year that occurred on each of four major highways. Now suppose we make both the width and the height grow as the percentage of fatal accidents grows. This change is shown in Figure 2.40b. The distortion is that the reader may tend to equate the area of the bars with the percentage of deaths occurring at each highway when, in fact, the true relative frequency of fatal accidents is proportional only to the height of the bars.

Figure 2.40

Relative frequency of fatal motor vehicle accidents on each of four major highways

Teaching Tip

Use these two graphs to discuss the information conveyed by both. What has changed, the information or the way it has been presented? The wise student will be able to collect the information presented and analyze it for him- or herself.

Although we’ve discussed only a few of the ways that graphs can be used to convey misleading pictures of phenomena, the lesson is clear: Look at all graphical descriptions of data with a critical eye. In particular, check the axes and the size of the units on each axis. Ignore the visual changes, and concentrate on the actual numerical changes indicated by the graph or chart.

Misleading Numerical Descriptive Statistics

The information in a data set can also be distorted by using numerical descriptive measures, as Example 2.20 shows.

Example 2.20 Misleading Descriptive Statistics

Problem

  1. Suppose you’re considering working for a small law firm—one that currently has a senior member and three junior members. You inquire about the salary you could expect to earn if you join the firm. Unfortunately, you receive two answers:

    • Answer A: The senior member tells you that an “average employee” earns $107,500.

    • Answer B: One of the junior members later tells you that an “average employee” earns $95,000.

    Which answer can you believe?

Solution

  1. The confusion exists because the phrase “average employee” has not been clearly defined. Suppose the four salaries paid are $95,000 for each of the three junior members and $145,000 for the senior member. Then,

    [&*AS*~rom~Mean*AP*|=|~norm~*frac*{3|pbo||doll|95,000|pbc||+||doll|145,000}{4}|=|*frac*{|doll|430,000}{4}|=||doll|107,500 &][&*AS*~rom~Median*AP*|=||doll|~normal~95,000 &]

    Mean=3($95,000)+$145,0004=$430,0004=$107,500Median=$95,000

    You can now see how the two answers were obtained: The senior member reported the mean of the four salaries, and the junior member reported the median. The information you received was distorted because neither person stated which measure of central tendency was being used.

Look Back

On the basis of our earlier discussion of the mean and median, we would probably prefer the median as the number that best describes the salary of the “average employee.”

Teaching Tip

Discuss the shape of the distribution of the salaries for these two statements. Remind the student of the role of the distribution shape as it pertains to the measures of center.

Another distortion of information in a sample occurs when only a measure of central tendency is reported. Both a measure of central tendency and a measure of variability are needed to obtain an accurate mental image of a data set.

Suppose, for instance, that you want to buy a new car and are trying to decide which of two models to purchase. Since energy and economy are both important issues, you decide to purchase model A because its EPA mileage rating is 32 miles per gallon in the city, whereas the mileage rating for model B is only 30 miles per gallon in the city.

However, you may have acted too quickly. How much variability is associated with the ratings? As an extreme example, suppose that further investigation reveals that the standard deviation for model A mileages is 5 miles per gallon, whereas that for model B is only 1 mile per gallon. If the mileages form a mound-shaped distribution, they might appear as shown in Figure 2.41. Note that the larger amount of variability associated with model A implies that more risk is involved in purchasing that model. That is, the particular car you purchase is more likely to have a mileage rating that will differ greatly from the EPA rating of 32 miles per gallon if you purchase model A, while a model B car is not likely to vary from the 30-miles-per-gallon rating by more than 2 miles per gallon.

We conclude this section with another example on distorting the truth with numerical descriptive measures.

Figure 2.41

Mileage distributions for two car models

Example 2.21 More Misleading Descriptive Statistics—Delinquent Children

Problem

  1. Children Out of School in America is a report on the delinquency of school-age children written by the Children’s Defense Fund (CDF). Consider the following three reported results of the CDF survey.

    • Reported result: Twenty-five percent of the 16- and 17-year-olds in the Portland, Maine, Bayside East Housing Project were out of school. Actual data: Only eight children were surveyed; two were found to be out of school.

    • Reported result: Of all the secondary school students who had been suspended more than once in census tract 22 in Columbia, South Carolina, 33% had been suspended two times and 67% had been suspended three or more times. Actual data: CDF found only three children in that entire census tract who had been suspended; one child was suspended twice and the other two children three or more times.

    • Reported result: In the Portland Bayside East Housing Project, 50% of all the secondary school children who had been suspended more than once had been suspended three or more times. Actual data: The survey found just two secondary school children who had been suspended in that area; one of them had been suspended three or more times.

    Identify the potential distortions in the results reported by the CDF.

Solution

  1. In each of these examples, the reporting of percentages (i.e., relative frequencies) instead of the numbers themselves is misleading. No inference we might draw from the examples cited would be reliable. (We’ll see how to measure the reliability of estimated percentages in Chapter 7.) In short, either the report should state the numbers alone instead of percentages, or, better yet, it should state that the numbers were too small to report by region.

Look Back

If several regions were combined, the numbers (and percentages) would be more meaningful.

Teaching Tip

Discuss how these results would change if different samples of the same sample size were collected. Use this information to look ahead at the variability associated with sample statistics. The information will tie in nicely when sampling distributions are discussed later in the text.

Ethics in Statistics

Purposeful reporting of numerical descriptive statistics in order to mislead the target audience is considered unethical statistical practice.

Exercises 2.170–2.173

Applying the Concepts—Intermediate

  1. MMC 2.170 Museum management. Refer to the Museum Management and Curatorship (June 2010) study of how museums evaluate their performance, Exercise 2.22 (p. 41). Recall that managers of 30 museums of contemporary art identified the performance measure used most often. A summary of the results is reproduced in the table. Consider the bar graph shown. Identify two ways in which the bar graph might mislead the viewer by overemphasizing the importance of one of the performance measures.

    Performance Measure Number of Museums Proportion of Museums
    Total visitors 8 .267
    Paying visitors 5 .167
    Big shows 6 .200
    Funds raised 7 .233
    Members 4 .133

  2. 2.171 Trend in Iraq War casualties. While the United States was still actively fighting in the Iraq War, a news media outlet produced a graphic showing a dramatic decline in the annual number of American casualties. The number of deaths for the years 2003, 2004, 2005, and 2006 were (approximately) 475, 850, 820, and 130, respectively.

    1. Create a scatterplot showing the dramatic decline in the number of American deaths per year.

    2. The graphic was based on data collected through February 2006. Knowing this fact, why is the time series plot misleading?

    3. What information would you like to have in order to construct a graph that accurately reflects the trend in American casualties from the Iraq War?

  3. BPOIL 2.172 BP oil leak. In the summer of 2010, an explosion on the Deepwater Horizon oil drilling rig caused a leak in one of British Petroleum (BP) Oil Company’s wells in the Gulf of Mexico. Crude oil rushed unabated for 3 straight months into the Gulf until BP could fix the leak. During the disaster, BP used suction tubes to capture some of the gushing oil. In May 2011, in an effort to demonstrate the daily improvement in the process, a BP representative presented a graphic on the daily number of 42-gallon barrels (bbl) of oil collected by the suctioning process. A graphic similar to the one used by BP is shown below.

    1. Note that the vertical axis represents the “cumulative” number of barrels collected per day. This is calculated by adding the amounts of the previous days’ oil collection to the current day’s oil collection. Explain why this graph is misleading.

    2. Estimates of the actual number of barrels of oil collected per day for each of the 8 days are listed in the accompanying table. Construct a graph for this data that accurately depicts BP’s progress in its daily collection of oil. What conclusions can you draw from the graph?

      Estimates of Daily Collection of Oil

      Day Number of Barrels (bbl)
      May 16 500
      May 17 1,000
      May 18 3,000
      May 19 2,500
      May 20 2,500
      May 21 2,000
      May 22 1,000
      May 23 1,500
  4. ISR 2.173 Irrelevant speech effects. Refer to the Acoustical Science & Technology (Vol. 35, 2014) study of irrelevant speech effects, Exercise 2.104 (p. 77). Recall that subjects performed a memorization task under two conditions: (1) with irrelevant background speech and (2) in silence. The difference in the error rates for the two conditions—called the relative difference in error rate (RDER)—was computed for each subject. Descriptive statistics for the RDER values are shown in the MINITAB printout on p. 103. The media has requested that the researchers provide a single statistic that best represents the center of the distribution of RDER values. This statistic will be used to publicize the study findings.

    MINITAB Output for Exercise 2.173

    1. Suppose you are in support of the irrelevant speech effect theory. Consequently, you want to magnify the difference in error rates between the two conditions. Which statistic would you select, and why?

    2. Suppose you do not believe in the irrelevant speech effect theory. Consequently, you want to diminish the difference in error rates between the two conditions. Which statistic would you select, and why?

    3. If you truly are neutral in regard to the irrelevant speech effect theory, explain why it is better to obtain descriptive statistics for the absolute difference between error rates (i.e., the absolute value of RDER).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset