Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Ben Jones
Avoiding Data Pitfalls
Cover
Preface
Chapter One: The Seven Types of Data Pitfalls
Seven Types of Data Pitfalls
Avoiding the Seven Pitfalls
“I've Fallen and I Can't Get Up”
Notes
Chapter Two: Pitfall 1: Epistemic Errors
How We Think About Data
Avoiding the Swan Pitfall and the God Pitfall
Notes
Chapter Three: Pitfall 2: Technical Trespasses
How We Process Data
Notes
Chapter Four: Pitfall 3: Mathematical Miscues
How We Calculate Data
Notes
Chapter Five: Pitfall 4: Statistical Slipups
How We Compare Data
Notes
Chapter Six: Pitfall 5: Analytical Aberrations
How We Analyze Data
Notes
Chapter Seven: Pitfall 6: Graphical Gaffes
How We Visualize Data
Notes
Chapter Eight: Pitfall 7: Design Dangers
How We Dress up Data
Notes
Chapter Nine: Conclusion
Avoiding Data Pitfalls Checklist
The Pitfall of the Unheard Voice
Notes
Index
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Title Page
Table of Contents
Cover
Preface
Chapter One: The Seven Types of Data Pitfalls
Seven Types of Data Pitfalls
Avoiding the Seven Pitfalls
“I've Fallen and I Can't Get Up”
Notes
Chapter Two: Pitfall 1: Epistemic Errors
How We Think About Data
Avoiding the Swan Pitfall and the God Pitfall
Notes
Chapter Three: Pitfall 2: Technical Trespasses
How We Process Data
Notes
Chapter Four: Pitfall 3: Mathematical Miscues
How We Calculate Data
Notes
Chapter Five: Pitfall 4: Statistical Slipups
How We Compare Data
Notes
Chapter Six: Pitfall 5: Analytical Aberrations
How We Analyze Data
Notes
Chapter Seven: Pitfall 6: Graphical Gaffes
How We Visualize Data
Notes
Chapter Eight: Pitfall 7: Design Dangers
How We Dress up Data
Notes
Chapter Nine: Conclusion
Avoiding Data Pitfalls Checklist
The Pitfall of the Unheard Voice
Notes
Index
End User License Agreement
List of Illustrations
Chapter 1
FIGURE 1.1 An ominous warning sign of a pitfall on the path to Coal Creek Fa...
Chapter 2
FIGURE 2.1 Meteorite strikes by Ramon Martinez.
FIGURE 2.2 A timeline of recorded meteorite falls, 2,500 BCE–2012
FIGURE 2.3 A line plot of earthquakes of magnitude of 6.0 and greater.
FIGURE 2.4 Breaking out worldwide earthquakes by magnitude group.
FIGURE 2.5 The Fremont Bridge as seen from the Aurora Bridge in Seattle, Was...
FIGURE 2.6 A time series of the counts of bicycles crossing the Fremont brid...
FIGURE 2.7 Cumulative timeline of Ebola fatality count in West Africa, 2014.
FIGURE 2.8 World Health Organization classification table of Ebola cases.
FIGURE 2.9 Example of rounding in human-keyed data.
FIGURE 2.10 Geometric regularity.
FIGURE 2.11 Diapers changed by minute of timestamp.
FIGURE 2.12 A histogram of NBA player weights, with bin size of 10 pounds.
FIGURE 2.13 An adjusted histogram of NBA player weights, this time with bin ...
FIGURE 2.14 A histogram of weights of North American football players.
FIGURE 2.15 Histogram of weights of 283 players in the 2018 NFL Combine.
FIGURE 2.16 Histogram of the number (and %) of players having weights ending...
FIGURE 2.17 Social media poll.
FIGURE 2.18 Bananas in various stages of ripeness.
FIGURE 2.19 Results of the banana ripeness assessment.
FIGURE 2.20 Respondents' changes in ripeness ratings.
FIGURE 2.21 Ratings of photo 2 vs ratings of photo 10.
FIGURE 2.22 The ninth banana photograph shown.
FIGURE 2.23 A black swan I photographed on a recent trip to Maui.
FIGURE 2.24 Fremont Bride bike counter measurements.
Chapter 3
FIGURE 3.1 Baltimore City Department of Transportation tow records.
FIGURE 3.2 The raw vehicle year data visualized in a histogram.
FIGURE 3.3 The adjusted vehicle year data visualized in a histogram.
FIGURE 3.4 Outliers in the adjusted vehicle year data.
FIGURE 3.5 This word cloud gives a general sense of which vehicle makes were...
FIGURE 3.6 Clustering of vehicle make names in OpenRefine.
FIGURE 3.7 Imperfections in the recommendations of the clustering algorithm.
FIGURE 3.8 The 36 different ways Volkswagen was spelled in the data set.
FIGURE 3.9 Editing the data values one by one that the clustering algorithm ...
FIGURE 3.10 Before and after: analysis of vehicle make frequencies before cl...
FIGURE 3.11 A treemap of vehicle colors based on towing records to the yards...
FIGURE 3.12 A treemap of records of towed vehicles to the Pulaski tow yard o...
FIGURE 3.13 A calculated field in Tableau to correct for a few known discrep...
FIGURE 3.14 The final treemap of known, non-null vehicle colors towed to the...
FIGURE 3.15 Cleaning car colors further.
FIGURE 3.16 The treemap resulting from our second cleaning pass using Tablea...
FIGURE 3.17 A world map of 2016 Pageviews of Allison's company website.
FIGURE 3.18 An overview of the number of distinct country names in three dif...
FIGURE 3.19 A Venn diagram showing overlap between Google Analytics and Worl...
FIGURE 3.20 Comparison of pageviews per thousand analysis before and after c...
FIGURE 3.21 A Venn diagram showing overlap between Google Analytics and the ...
FIGURE 3.22 A Venn diagram comparing all three country name lists.
Chapter 4
FIGURE 4.1 Count of reported wildlife strikes by aircraft in the U.S., by ye...
FIGURE 4.2 Visualizing reported collisions by various levels of data aggrega...
FIGURE 4.3 An imagined circumnavigation of the New Zealand island pair.
FIGURE 4.4 Wildlife strikes by month.
FIGURE 4.5 Wildlife strikes by month, with added yearly bar segments.
FIGURE 4.6 Wildlife strikes by month, with yearly segments, 2017 data exclud...
FIGURE 4.7 A timeline of the complete works of Edgar Allan Poe by year publi...
FIGURE 4.8 A timeline of Poe's works with years plotted continuously on the ...
FIGURE 4.9 A timeline of Poe's works showing missing years at the default va...
FIGURE 4.10 Poe's works depicted as columns, with missing years shown.
FIGURE 4.11 Reported infectious diseases.
FIGURE 4.12 First 10 entries in the data set.
FIGURE 4.13 Reported infectious diseases.
FIGURE 4.14 Choropleth map for tuberculosis infections, 2015.
FIGURE 4.15 Geographic roles.
FIGURE 4.16 Percent of urban population in 2016, all countries included.
FIGURE 4.17 Percent of urban population in 2016, all countries included.
FIGURE 4.18 Percent of urban population in 2016, null values excluded.
FIGURE 4.19 Table showing percent of urban population for countries in North...
FIGURE 4.20 Computing the regional percent using the arithmetic average, or ...
FIGURE 4.21 Percent of urban population for each country shown as a quotient...
FIGURE 4.22 Table with both percent of urban population and total population...
FIGURE 4.23 Table showing percent of urban population, total population, and...
FIGURE 4.24 Scatterplot of North American countries.
FIGURE 4.25 Slopegraph showing difference between incorrect (left) and corre...
FIGURE 4.26 Artist's rendering of the Mars Climate Orbiter.
Chapter 5
FIGURE 5.1 Guess which distribution shows age, weight, salary, height, jerse...
FIGURE 5.2 Match the histogram letter with the player variable type.
FIGURE 5.3 The distribution matching game answer key.
FIGURE 5.4 Histogram of American Football player jersey numbers, bin size = ...
FIGURE 5.5 Smaller bin size.
FIGURE 5.6 Player height.
FIGURE 5.7 The standard normal distribution.
FIGURE 5.8 Calculating the distance in standard deviations from the mean to ...
FIGURE 5.9 Player age.
FIGURE 5.10 Name length in characters.
FIGURE 5.11 Player weight.
FIGURE 5.12 Player weight by rough position group.
FIGURE 5.13 Salary cap hit.
FIGURE 5.14 Cumulative salary.
FIGURE 5.15 Cumulative heights of players.
FIGURE 5.16 Mislabeled fish, misleading bars.
FIGURE 5.17 Fish mislabeling by city and outlet.
FIGURE 5.18 A fishy sample platter.
FIGURE 5.19 Mislabeled fish, with error bars.
FIGURE 5.20 Interactive dashboard.
FIGURE 5.21 The traffic sign welcoming drivers to Lost Springs, Wyoming.
Chapter 6
FIGURE 6.1 Albert Einstein.
FIGURE 6.2 Facial expressions.
FIGURE 6.3 Two versions of the same hockey player scatterplot.
FIGURE 6.4 Quote from Garry Kasparov's book
How Life Imitates Chess
.
FIGURE 6.5 A tale of two Koreas: life expectancy in North and South Korea, 1...
FIGURE 6.6 A tale of two Koreas: life expectancy in North and South Korea, 1...
FIGURE 6.7 Linear extrapolation from 1975.
FIGURE 6.8 Life expectancy of people born in China, 1960–1972.
FIGURE 6.9 Slopegraph of increase in life expectancy.
FIGURE 6.10 Timeline of change in life expectancy.
FIGURE 6.11 Unemployment forecasts by the Office of Management and Budget of...
FIGURE 6.12 Unemployment forecasts.
FIGURE 6.13 Average speed versus player impact estimate.
Chapter 7
FIGURE 7.1 Relative frequencies of letters in English text.
FIGURE 7.2 A recreation of the data visualizations my fitness network site p...
FIGURE 7.3 My extended analysis of my 2018 trip stats.
FIGURE 7.4 Reported cases of narcotics crimes in Orlando for 41 weeks, June ...
FIGURE 7.5 Weekly reported cases of narcotics crimes in Orlando, 2010–2017.
FIGURE 7.6 A line chart showing 24 different categories of reported crimes i...
FIGURE 7.7 Focusing our audience's attention on the shoplifting timeline.
FIGURE 7.8 Four chart types that show number of reported crimes by category.
FIGURE 7.9 A simplified color palette to focus on the top three categories.
FIGURE 7.10 Seven ways to compare reported cases of assault and narcotics cr...
FIGURE 7.11 Adding data labels and gridlines to afford greater precision of ...
FIGURE 7.12 A pie chart and a treemap that convey that theft accounts for ha...
FIGURE 7.13 A timeline showing the change in number of reported cases of cri...
FIGURE 7.14 Breakdown of reported crime by category for 2017.
FIGURE 7.15 Breakdown of reported crime by category for 2010.
FIGURE 7.16 An individuals control chart showing signals in the time series ...
FIGURE 7.17 Two approaches to choosing selecting solutions.
FIGURE 7.18 Four ways to show the 100 most common passwords.
FIGURE 7.19 An example of how to determine which factors are important.
Chapter 8
FIGURE 8.1 A Boston Marathon dashboard that uses the same color for differen...
FIGURE 8.2 A dashboard with two different sequential color palettes using th...
FIGURE 8.3 A version of the dashboard using only one sequential color encodi...
FIGURE 8.4 A company sales dashboard using a fabricated store data set.
FIGURE 8.5 A redesigned version of the dashboard that limits use to a single...
FIGURE 8.6 Poe's works displayed as columns with missing years shown.
FIGURE 8.7 A modified version of the Poe chart that adds aesthetic elements.
FIGURE 8.8 A “bumps chart” showing ranking of skills over time.
FIGURE 8.9 A modified version of the dashboard that adds images to further e...
FIGURE 8.10 A common stovetop design, with no natural mapping between burner...
FIGURE 8.11 An example of default versus natural mapping of filters on a dat...
FIGURE 8.12 A dashboard of bicycle stands in Dublin that uses an outrageous ...
FIGURE 8.13 Research results showing error involved with different encoding ...
Guide
Cover
Table of Contents
Begin Reading
Pages
iii
iv
ix
x
xi
xii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset