i
i
i
i
i
i
i
i
2 7
2 7
Tamara Munzner
Visualization
A major application area of computer graphics is visualization, where computer-
generated images are used to help people understand both spatial and non-spatial
data. Visualization is used when the goal is to augment human capabilities in
situations where the problem is not sufciently well dened for a computer to
handle algorithmically. If a totally automatic solution can completely replace hu-
man judgement, then visualization is not typically required. Visualization can be
used to generate new hypotheses when exploring a completely unfamiliar dataset,
to conrm existing hypotheses in a partially understood dataset, or to present in-
formation about a known dataset to another audience.
Visualization allows people to ofoad cognition to the perceptual system, us-
ing carefully designed images as a form of external memory. The human visual
system is a very high-bandwidth channel to the brain, with a signicant amount
of processing occurring in parallel and at the pre-conscious level. We can thus
use external images as a substitute for keeping track of things inside our own
heads. For an example, let us consider the task of understanding the relationships
between a subset of the topics in the splendid book G
¨
odel, Escher, Bach: The
Eternal Golden Braid (Hofstadter, 1979); see Figure 27.1.
When we see the dataset as a text list, at the low level we must read words
and compare them to memories of previously read words. It is hard to keep track
of just these dozen topics using cognition and memory alone, let alone the hun-
dreds of topics in the full book. The higher-level problem of identifying neigh-
borhoods, for instance nding all the topics two hops away from the target topic
Paradoxes,isverydifcult.
675
i
i
i
i
i
i
i
i
676 27. Visualization
Infinity - Lewis Carroll
Infinity - Zeno
Infinity - Paradoxes
Infinity - Halting problem
Zeno - Lewis Carroll
Paradoxes - Lewis Carroll
Paradoxes - Epimenides
Paradoxes - Self-ref
Epimenides - Self-ref
Epimenides - Tarski
Tarski - Epimenides
Halting problem - Decision procedures
Halting problem - Turing
Lewis Carroll - Wordplay
Tarski - Truth vs. provability
Tarski - Undecidability
Figure 27.1. Keeping track of relationships between topics is difficult using a text list.
Figure 27.2 shows an external visual representation of the same dataset as a
node-link graph, where each topic is a node and the linkage between two topics
is shown directly with a line. Following the lines by moving our eyes around the
image is a fast low-level operation with minimal cognitive load, so higher-level
neighborhood nding becomes possible. The placement of the nodes and the
routing of the links between them was created automatically by the dot graph
drawing program (Gansner et al., 1993).
We call the mapping of dataset attributes to a visual representation a visual
encoding. One of the central problems in visualization is choosing appropriate
encodings from the enormous space of possibile visual representations, taking
into account the characteristics of the human perceptual system, the dataset in
question, and the task at hand.
Infinity
Halting problemZeno Paradoxes
Lewis Carroll TuringDecision procedures
Self-ref
Epimenides
Wordplay Tarski
Truth vs. provability Undecidability
Figure 27.2. Substituting perception for cognition and memory allows us to understand
relationships between book topics quickly.
i
i
i
i
i
i
i
i
27.1. Background 677
27.1 Background
27.1.1 History
People have a long history of conveying meaning through static images, dating
back to the oldest known cave paintings from over thirty thousand years ago. We
continue to visually communicate today in ways ranging from rough sketches on
the back of a napkin to the slick graphic design of advertisements. For thousands
of years, cartographers have studied the problem of making maps that represent
some aspect of the world around us. The rst visual representations of abstract,
nonspatial datasets were created in the 18th century by William Playfair (Friendly,
2008).
Although we have had the power to create moving images for over one hun-
dred and fty years, creating dynamic images interactively is a more recent de-
velopment only made possible by the widespread availability of fast computer
graphics hardware and algorithms in the past few decades. Static visualizations
of tiny datasets can be created by hand, but computer graphics enables interactive
visualization of large datasets.
27.1.2 Resource Limitations
When designing a visualization system, we must consider three different kinds
of limitations: computational capacity, human perceptual and cognitive capacity,
and display capacity.
As with any application of computer graphics, computer time and memory are
limited resources and we often have hard constraints. If the visualization system
needs to deliver interactive response, then it must use algorithms that can run in a
fraction of a second rather than minutes or hours.
On the human side, memory and attention must be considered as nite re-
sources. Human memory is notoriously limited, both for long-term recall and
for shorter-term working memory. Later in this chapter, we discuss some of the
power and limitations of the low-level visual attention mechanisms that carry out
massively parallel processing of the visual eld. We store surprisingly little in-
formation internally in visual working memory, leaving us vulnerable to change
blindness, the phenomenon where even very large changes are not noticed if we
are attending to something else in our view (Simons, 2000). Moreover, vigi-
lance is also a highly limited resource; our ability to perform visual search tasks
degrades quickly, with far worse results after several hours than in the rst few
minutes (Ware, 2000).
i
i
i
i
i
i
i
i
678 27. Visualization
Display capacity is a third kind of limitation to consider. Visualization de-
signers often “run out of pixels, where the resolution of the screen is not large
enough to show all desired information simultaneously. The information density
of a particular frame is a measure of the amount of information encoded versus
the amount of unused space. There is a tradeoff between the benets of showing
as much as possible at once, to minimize the need for navigation and exploration,
and the costs of showing too much at once, where the user is overwhelmed by
visual clutter.
27.2 Data Types
Many aspects of a visualization design are driven by the type of the data that we
need to look at. For example, is it a table of numbers, or a set of relations between
items, or inherently spatial data such as a location on the Earths surface or a
collection of documents?
We start by considering a table of data. We call the rows items of data and the
columns are dimensions, also known as attributes. For example, the rows might
represent people, and the columns might be names, age, height, shirt size, and
favorite fruit.
We distinguish between three types of dimensions: quantitative, ordered, and
categorical. Quantitative data, such as age or height, is numerical and we can
do arithmetic on it. For example, the quantity of 68 inches minus 42 inches is
26 inches. With ordered data, such as shirt size, we cannot do full-edged arith-
metic, but there is a well-dened ordering. For example, Large minus Medium
is not a meaningful concept, but we know that Medium falls between Small and
Large. Categorical data, such as favorite fruit or names, does not have an implicit
ordering. We can only distinguish whether two things are the same (apples) or
different (apples vs. bananas).
Relational data, or graphs, are another data type where nodes are connected by
links. One specic kind of graph is a tree, which is typically used for hierarchical
data. Both nodes and edges can have associated attributes. The word graph is
unfortunately overloaded in visualization. The node-link graphs we discuss here,
following the terminology of graph drawing and graph theory, could also be called
networks.Intheeld of statistical graphics, graph is often used for chart,asin
the line charts for time-series data shown in Figure 27.10.
Some data is inherently spatial, such as geographic location or a eld of mea-
surements at positions in three-dimensional space as in the MRI or CT scans used
by doctors to see the internal structure of a person’s body. The information as-
sociated with each point in space may be an unordered set of scalar quantities,
i
i
i
i
i
i
i
i
27.2. Data Types 679
or indexed vectors, or tensors. In contrast, non-spatial data can be visually en-
coded using spatial position, but that encoding is chosen by the designer rather
than given implicitly in the semantics of the dataset itself. This choice is the one
of the most central and difcult problems of visualization design.
27.2.1 Dimension and Item Count
The number of data dimensions that need to be visually encoded is one of the most
fundamental aspects of the visualization design problem. Techniques that work
for a low-dimensional dataset with a few columns will often fail for very high-
dimensional datasets with dozens or hundreds of columns. A data dimension may
have hierarchical structure, for example with a time series dataset where there are
interesting patterns at multiple temporal scales.
The number of data items is also important: a visualization that performs well
for a few hundred items often does not scale to millions of items. In some cases
the difculty is purely algorithmic, where a computation would take too long; in
others it is an even deeper perceptual problem that even an instantaneous algo-
rithm could not solve, where visual clutter makes the representation unusable by
a person. The range of possible values within a dimension may also be relevant.
27.2.2 Data Transformation and Derived Dimensions
Data is often transformed from one type to another as part of a visualization
pipeline for solving the domain problem. For example, an original data dimen-
sion might be made up of quantitative data: oating point numbers that represent
temperature. For some tasks, like nding anomalies in local weather patterns, the
raw data might be used directly. For another task, like deciding whether water is
an appropriate temperature for a shower, the data might be transformed into an
ordered dimension: hot, warm, or cold. In this transformation, most of the detail
is aggregated away. In a third example, when making toast, an even more lossy
transformation into a categorical dimension might sufce: burned or not burned.
The principle of transforming data into derived dimensions, rather than simply
visually encoding the data in its original form, is a powerful idea. In Figure 27.10,
the original data was an ordered collection of time-series curves. The transforma-
tion was to cluster the data, reducing the amount of information to visually encode
to a few highly meaningful curves.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset