27. Visualization (1/7)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2 7

Tamara Munzner

Visualization

A major application area of computer graphics is visualization, where computer-

generated images are used to help people understand both spatial and non-spatial

data. Visualization is used when the goal is to augment human capabilities in

situations where the problem is not sufﬁciently well deﬁned for a computer to

handle algorithmically. If a totally automatic solution can completely replace hu-

man judgement, then visualization is not typically required. Visualization can be

used to generate new hypotheses when exploring a completely unfamiliar dataset,

to conﬁrm existing hypotheses in a partially understood dataset, or to present in-

formation about a known dataset to another audience.

Visualization allows people to ofﬂoad cognition to the perceptual system, us-

ing carefully designed images as a form of external memory. The human visual

system is a very high-bandwidth channel to the brain, with a signiﬁcant amount

of processing occurring in parallel and at the pre-conscious level. We can thus

use external images as a substitute for keeping track of things inside our own

heads. For an example, let us consider the task of understanding the relationships

between a subset of the topics in the splendid book G

odel, Escher, Bach: The

Eternal Golden Braid (Hofstadter, 1979); see Figure 27.1.

When we see the dataset as a text list, at the low level we must read words

and compare them to memories of previously read words. It is hard to keep track

of just these dozen topics using cognition and memory alone, let alone the hun-

dreds of topics in the full book. The higher-level problem of identifying neigh-

borhoods, for instance ﬁnding all the topics two hops away from the target topic

Paradoxes,isverydifﬁcult.

675

676 27. Visualization

Inﬁnity - Lewis Carroll

Inﬁnity - Zeno

Inﬁnity - Paradoxes

Inﬁnity - Halting problem

Zeno - Lewis Carroll

Paradoxes - Lewis Carroll

Paradoxes - Epimenides

Paradoxes - Self-ref

Epimenides - Self-ref

Epimenides - Tarski

Tarski - Epimenides

Halting problem - Decision procedures

Halting problem - Turing

Lewis Carroll - Wordplay

Tarski - Truth vs. provability

Tarski - Undecidability

Figure 27.1. Keeping track of relationships between topics is difﬁcult using a text list.

Figure 27.2 shows an external visual representation of the same dataset as a

node-link graph, where each topic is a node and the linkage between two topics

is shown directly with a line. Following the lines by moving our eyes around the

image is a fast low-level operation with minimal cognitive load, so higher-level

neighborhood ﬁnding becomes possible. The placement of the nodes and the

routing of the links between them was created automatically by the dot graph

drawing program (Gansner et al., 1993).

We call the mapping of dataset attributes to a visual representation a visual

encoding. One of the central problems in visualization is choosing appropriate

encodings from the enormous space of possibile visual representations, taking

into account the characteristics of the human perceptual system, the dataset in

question, and the task at hand.

Infinity

Halting problemZeno Paradoxes

Lewis Carroll TuringDecision procedures

Self-ref

Epimenides

Wordplay Tarski

Truth vs. provability Undecidability

Figure 27.2. Substituting perception for cognition and memory allows us to understand

relationships between book topics quickly.

27.1. Background 677

27.1 Background

27.1.1 History

People have a long history of conveying meaning through static images, dating

back to the oldest known cave paintings from over thirty thousand years ago. We

continue to visually communicate today in ways ranging from rough sketches on

the back of a napkin to the slick graphic design of advertisements. For thousands

of years, cartographers have studied the problem of making maps that represent

some aspect of the world around us. The ﬁrst visual representations of abstract,

nonspatial datasets were created in the 18th century by William Playfair (Friendly,

2008).

Although we have had the power to create moving images for over one hun-

dred and ﬁfty years, creating dynamic images interactively is a more recent de-

velopment only made possible by the widespread availability of fast computer

graphics hardware and algorithms in the past few decades. Static visualizations

of tiny datasets can be created by hand, but computer graphics enables interactive

visualization of large datasets.

27.1.2 Resource Limitations

When designing a visualization system, we must consider three different kinds

of limitations: computational capacity, human perceptual and cognitive capacity,

and display capacity.

As with any application of computer graphics, computer time and memory are

limited resources and we often have hard constraints. If the visualization system

needs to deliver interactive response, then it must use algorithms that can run in a

fraction of a second rather than minutes or hours.

On the human side, memory and attention must be considered as ﬁnite re-

sources. Human memory is notoriously limited, both for long-term recall and

for shorter-term working memory. Later in this chapter, we discuss some of the

power and limitations of the low-level visual attention mechanisms that carry out

massively parallel processing of the visual ﬁeld. We store surprisingly little in-

formation internally in visual working memory, leaving us vulnerable to change

blindness, the phenomenon where even very large changes are not noticed if we

are attending to something else in our view (Simons, 2000). Moreover, vigi-

lance is also a highly limited resource; our ability to perform visual search tasks

degrades quickly, with far worse results after several hours than in the ﬁrst few

minutes (Ware, 2000).

678 27. Visualization

Display capacity is a third kind of limitation to consider. Visualization de-

signers often “run out of pixels,” where the resolution of the screen is not large

enough to show all desired information simultaneously. The information density

of a particular frame is a measure of the amount of information encoded versus

the amount of unused space. There is a tradeoff between the beneﬁts of showing

as much as possible at once, to minimize the need for navigation and exploration,

and the costs of showing too much at once, where the user is overwhelmed by

visual clutter.

27.2 Data Types

Many aspects of a visualization design are driven by the type of the data that we

need to look at. For example, is it a table of numbers, or a set of relations between

items, or inherently spatial data such as a location on the Earth’s surface or a

collection of documents?

We start by considering a table of data. We call the rows items of data and the

columns are dimensions, also known as attributes. For example, the rows might

represent people, and the columns might be names, age, height, shirt size, and

favorite fruit.

We distinguish between three types of dimensions: quantitative, ordered, and

categorical. Quantitative data, such as age or height, is numerical and we can

do arithmetic on it. For example, the quantity of 68 inches minus 42 inches is

26 inches. With ordered data, such as shirt size, we cannot do full-ﬂedged arith-

metic, but there is a well-deﬁned ordering. For example, Large minus Medium

is not a meaningful concept, but we know that Medium falls between Small and

Large. Categorical data, such as favorite fruit or names, does not have an implicit

ordering. We can only distinguish whether two things are the same (apples) or

different (apples vs. bananas).

Relational data, or graphs, are another data type where nodes are connected by

links. One speciﬁc kind of graph is a tree, which is typically used for hierarchical

data. Both nodes and edges can have associated attributes. The word graph is

unfortunately overloaded in visualization. The node-link graphs we discuss here,

following the terminology of graph drawing and graph theory, could also be called

networks.Intheﬁeld of statistical graphics, graph is often used for chart,asin

the line charts for time-series data shown in Figure 27.10.

Some data is inherently spatial, such as geographic location or a ﬁeld of mea-

surements at positions in three-dimensional space as in the MRI or CT scans used

by doctors to see the internal structure of a person’s body. The information as-

sociated with each point in space may be an unordered set of scalar quantities,

27.2. Data Types 679

or indexed vectors, or tensors. In contrast, non-spatial data can be visually en-

coded using spatial position, but that encoding is chosen by the designer rather

than given implicitly in the semantics of the dataset itself. This choice is the one

of the most central and difﬁcult problems of visualization design.

27.2.1 Dimension and Item Count

The number of data dimensions that need to be visually encoded is one of the most

fundamental aspects of the visualization design problem. Techniques that work

for a low-dimensional dataset with a few columns will often fail for very high-

dimensional datasets with dozens or hundreds of columns. A data dimension may

have hierarchical structure, for example with a time series dataset where there are

interesting patterns at multiple temporal scales.

The number of data items is also important: a visualization that performs well

for a few hundred items often does not scale to millions of items. In some cases

the difﬁculty is purely algorithmic, where a computation would take too long; in

others it is an even deeper perceptual problem that even an instantaneous algo-

rithm could not solve, where visual clutter makes the representation unusable by

a person. The range of possible values within a dimension may also be relevant.

27.2.2 Data Transformation and Derived Dimensions

Data is often transformed from one type to another as part of a visualization

pipeline for solving the domain problem. For example, an original data dimen-

sion might be made up of quantitative data: ﬂoating point numbers that represent

temperature. For some tasks, like ﬁnding anomalies in local weather patterns, the

raw data might be used directly. For another task, like deciding whether water is

an appropriate temperature for a shower, the data might be transformed into an

ordered dimension: hot, warm, or cold. In this transformation, most of the detail

is aggregated away. In a third example, when making toast, an even more lossy

transformation into a categorical dimension might sufﬁce: burned or not burned.

The principle of transforming data into derived dimensions, rather than simply

visually encoding the data in its original form, is a powerful idea. In Figure 27.10,

the original data was an ordered collection of time-series curves. The transforma-

tion was to cluster the data, reducing the amount of information to visually encode

to a few highly meaningful curves.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 27. Visualization (1/7)

Create new playlist

Sign In

Sign Up

Table of Contents for
27. Visualization (1/7)