Clarity, honesty, and a sense of purpose

There are two big schools of thinking in terms of data visualization at the moment: there's the ultra-minimalist philosophy espoused by Alberto Cairo and Edward Tufte, where the primary goal of data visualization is to reduce confusion, and then there are those who use data to create beautiful things that uphold design over communication. If you couldn't tell by the title of this section, I generally believe that the former is far more appropriate in most cases. As somebody wishing to visually communicate data, the absolute worst thing you can do is mislead an audience, whether intentionally or not; not only do you lose credibility with your audience once they discover how they've been misled, but you also increase public skepticism over the ability of data to communicate the truth.

Axes and scales are the one of the easiest things to get wrong. You should usually start them at zero, because not doing so can dramatically distort the shape of the chart and hide information from the viewer. For time series data, the amount of data you show can also impact how a chart is perceived. Here's an example from Alan Smith's Chart Doctor series:

Taken from How alternative facts rewrite history by Alan Smith: https://www.ft.com/content/3062d082-e3da-11e6-8405-9e5580d6e5fb

The preceding screenshot shows how simply changing the duration of a chart can impact what it's saying. They both show the same dataset, which is the number of people in the UK claiming job seeker's allowance and can prove that they're looking for work, which to some degree is a proxy for the unemployment rate. The chart to the right is particularly misleading because the y axis doesn't start at zero, which exaggerates the 0.2 million increase in 2016. The chart on the left, meanwhile, starts at zero and while one can see the rise depicted in the second chart, the added perspective of portraying the change since 2012 shows that the trend has been effectively flat since 2015, and far less than it was just a few years earlier, in 2012.

Here's another contemporary example. In September 2015, the U.S. Congress held a hearing on Planned Parenthood, the American reproductive and women's health group. During the hearing, Republican congressman Jason Chaffetz showed the following chart created by anti-abortion group, Americans United For Life:

There are many problems with this chart, not least the complete lack of a y axis. Politifact redrew the chart with corrected axes and it came out like this:

Vox took it a step further and drew the rest of Planned Parenthood's services:

As you can see, while there had definitely been a decrease in cancer screenings and prevention services (as well as contraceptives, for that matter) and a slight rise in the number of abortions, there had also been a dramatic increase in spending for STI/STD treatment and prevention. As Alberto Cairo commented on the original chart,

"That graphic is a damn lie ... Regardless of whatever people think of this issue, this distortion is ethically wrong."

The public backlash about this one misleading chart led it to being named 2015's Most Misleading Chart by Quartz. Regardless of what the creators of the chart originally intended with it, any hope of achieving that goal was obliterated once viewers felt they were being misled.

For a more thorough discussion of everything wrong with Chaffetz's chart, I highly recommend the commentary by both Politifact and Vox, at http://www.politifact.com/truth-o-meter/statements/2015/oct/01/jason-chaffetz/chart-shown-planned-parenthood-hearing-misleading-/ and http://www.vox.com/2015/9/29/9417845/planned-parenthood-terrible-chart respectively.

In the preceding quote, Cairo makes an interesting point in that communicating data carries with it certain fundamental ethical requirements. This is how data visualization differs from data art; in the latter, what's ethically required of the artist is to purposefully and honestly communicate their emotions, beliefs, fears, and so on. This is a long way off from the ethical requirement of the visualizer, which is to communicate specific qualities of the data through visual methods. Taking this a step further, the ethics of data journalism compel the journalist to explain to the audience what the data really means and how it relates to that audience.

In your projects, decide where your scope lies. Are you acting in the role of a data journalist with a desire to walk the reader through a specific bunch of numbers and figures? Are you acting as a data visualizer, perhaps creating a dashboard designed to quickly and effectively summarize a very large, multivariate dataset? Or do you want to build something fun and entertaining that leverages data merely as a method by which to achieve that aim? All three of these roles are perfectly acceptable, and there is room for work ranging from incisively explained line-charts all the way through to objets d'art that give us a better understanding of our size and place in the universe. Whatever you do, be clear with your intentions and never mislead.

Sometimes, however, you need to pay attention to what the data is saying and where the drama lies. A good example of when not to start a chart at zero is when the data never goes near zero. If you build a stacked area chart depicting votes by party in U.S. federal elections over two decades, starting the y axis at zero will effectively show a flat line. Similarly, if you plot the Dow Jones Industrial Average since 1900 starting at zero using a linear scale, the entirety of the Great Depression is basically flattened into oblivion (in fact, when depicting data that scales multiple orders of magnitude, it's quite possible that you will want to use a logarithmic scale; for a good overview of when and why to use log scales, see the Chart Doctor piece mentioned at the beginning of the chapter).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset