Unless you define your data inside your script (as in Example 5-1, for example) or have your script generate it (as in Examples 4-6 and 4-7), you will have to get the data into your script somehow. This involves two separate steps: fetching the data from its location (which may be the local filesystem, a remote server, or another resource, such as a web service) and parsing it into a usable data structure. If you want to create text labels from data, you will need to do the opposite and format data for textual output. This chapter describes the facilities D3 offers to help with these tasks. Discussions of file formats are usually a drag—I’ll try to make it brief.
The JavaScript Fetch API is a modern replacement for the venerable
XMLHttpRequest
object—the technology that first enabled web pages
to exchange data with servers asynchronously, thus giving rise to AJAX
and the entire contemporary, “dynamic” web experience. D3 wraps
this Fetch API, replicates some of its methods, and adds functionality
that is convenient when working with web pages or tabular data. Some
artifacts of the underlying API remain visible through D3; for this
reason, it is frequently worth consulting the Fetch API reference as well.
Table 6-1 lists all functions provided by D3 that can retrieve (“fetch”) a resource (such as a file) from a URL. Of course the resource need not be a file—it can be anything, as long as it is describable by a URL (like a server that generates data on demand).
All functions take a specification of the desired resource as a string containing a URL.
All functions return a Promise
object (see “JavaScript Promises”).
All functions take an optional RequestInit
object. The elements
permitted in this object and their values are defined by the Fetch
standard; they control various aspects of the remote communication,
such as permission and caching issues. Some of these may be relevant
even in relatively simple applications; we will discuss some of them
toward the end of this section.
The convenience functions that parse tabular data may also take a conversion function that will be applied to the data as it is read. (We will discuss conversion functions in the next section.)
Function | Description |
---|---|
|
Fetches the specified resource and treats it as UTF-8 decoded string. a |
|
Fetches the specified resource and parses it as JSON into an object. a |
|
Takes a delimiter (such as |
|
Fetches the specified resource, which must include a descriptive header line, and parses it as comma-separated values; the result will be an array of objects. An optional conversion function may be specified as the last parameter. |
|
Fetches the specified resource, which must include a descriptive header line, and parses it as tab-separated values; the result will be an array of objects. An optional conversion function may be specified as the last parameter. |
|
Fetches the specified resource and parses it into an |
|
Fetches the specified resource and parses it into an |
|
Fetches the specified resource and parses it into a |
|
Fetches the specified resource and parses it into an |
|
Fetches the specified resource and treats it as a |
|
Fetches the specified resource and treats it as an |
a This function replicates part of the Fetch API. |
Let’s consider a few examples. Assuming that you have a simple JSON file, like this one:
{
"val"
:
5
,
"txt"
:
"Hello"
}
then you can read it (and access its properties) like so:
d3
.
json
(
"simple.json"
).
then
(
res
=>
console
.
log
(
res
.
val
,
res
.
txt
));
JSON parsers are picky; make sure your JSON is correctly formed! (In particular, JSON property keys must be double-quoted strings, in marked contrast to the syntax for JavaScript object initializers.)
It is also easy to fetch a bitmap image and attach it to the document (or page):
d3
.
image
(
"image.jpg"
).
then
(
function
(
res
)
{
d3
.
select
(
"#figure"
).
append
(
()
=>
res
)
}
);
This code assumes that the page contains a placeholder with an
appropriate id
attribute (for example, <div id="figure">...</div>
).
Note the argument to the append()
function: it is a function, taking
no arguments, and returning the result of the fetch! The reason for
this roundabout route is that append()
can handle either a string or
a function returning a node, but not a node by itself. The result of
the fetch is a node, hence it is necessary to “wrap it into a
function” like this.
Finally, let’s assume that you have an SVG file, for example, containing the definition of a symbol that you would like to reuse:
<svg
xmlns=
"http://www.w3.org/2000/svg"
xmlns:svg=
"http://www.w3.org/2000/svg"
>
<defs>
<g
id=
"heart"
>
<path
d=
"M0 -3 A3 6.6 -35 1 0 0 6 A3 6.6 35 1 0 0 -3Z"
/>
</g>
</defs>
</svg>
You can insert the <defs>
section into the current document and use
the defined symbol like this:
d3
.
svg
(
"heart.svg"
).
then
(
function
(
res
)
{
d3
.
select
(
"svg"
).
insert
(
()
=>
res
.
firstChild
,
":first-child"
);
d3
.
select
(
"svg"
).
append
(
"use"
).
attr
(
"href"
,
"#heart"
)
.
attr
(
"transform"
,
"translate(100,100) scale(2)"
);
}
);
Again, note how the result of the fetch is “wrapped into a function.”
In this snippet, res
is a DOM SVGDocument
instance (not a D3 data type), hence you must use the native DOM API to extract the <defs>
element (the first and only child of this SVG document).
The external SVG file in this example contains declarations for XML namespaces. We have not worried about XML namespaces so far (because D3 for the most part takes care of them for us). Here, they are required, because otherwise the SVG parser will not work correctly.
Most of the time, the functions in Table 6-1 just work
without any further tweaking. But the simplicity and convenience of
the API hides underlying complexity (and sometimes obstructs proper
diagnosis if something goes wrong). The means to control aspects of
the remote communication is the RequestInit
object that all functions
in Table 6-1 accept as an optional parameter.2
Browsers may cache resources fetched from a remote location,
with the consequence that changes to the remote resource will not become
visible in the browser. In particular during development, this can become
a major nuisance! A simple way to prevent all browser caching of a remote
resource is to set the cache
property to no-store
:
d3
.
svg
(
"heart.svg"
,
{
cache
:
"no-store"
}
).
then
(
...
);
While good practice during development, preventing browser-side caching is wasteful when used in production. The details of resource caching can be complex; you may want to check the appropriate reference.3
When attempting to load resources using the Fetch API from third-party websites, you may occasionally encounter strange failures and permission issues. The browser refuses to complete the request when made from within the JavaScript runtime—even though the resource may be readily available using a command-line tool or even by pointing the browser itself to the URL. The cause is likely to be the browser’s same-origin policy and the cross-origin resource sharing (CORS) mechanism, which limit JavaScript access to third-party resources.4
The CORS protocol is strange in the way it splits responsibility between
browser and server (under certain conditions, the server will send the
requested resource to the browser, but the browser will refuse to make
it accessible to its own JavaScript runtime!), and browsers differ in
the CORS policy they implement. The mode
property of the RequestInit
object contains additional
information.5
CORS relies on an interplay between browser and server. In particular, the server must be configured to send the appropriate header information. If it isn’t, there is nothing you can do. You will have to download the required resource separately and serve it from your own server, or access the resource through a proxy server.
Occasionally, it is necessary to write a file—for example, to save a graph as an SVG file. That is not easy, though, because the local filesystem is not accessible from within JavaScript. It is, however, possible to upload a file (or any data, for that matter) to a server. Example 6-1 shows how this can be accomplished. (A very similar function was used to capture the graphs for this book.)
function
upload
(
)
{
var
out
=
new
FormData
(
)
;
d3
.
selectAll
(
"svg"
)
.
each
(
function
(
)
{
var
id
=
d3
.
select
(
this
)
.
attr
(
"id"
)
;
if
(
id
)
{
out
.
set
(
"filename"
,
id
)
;
out
.
set
(
"data"
,
this
.
outerHTML
)
;
d3
.
text
(
"http://localhost:8080/upload"
,
{
method
:
"POST"
,
body
:
out
}
)
.
then
(
function
(
r
)
{
console
.
log
(
"Succ:"
,
id
)
}
,
function
(
r
)
{
console
.
log
(
"FAIL:"
,
id
)
}
)
;
}
}
)
;
}
Creates a FormData
object as the container for the data to upload.
Invoke the following anonymous function for every SVG element in the page.
Grab the value of the element’s id
attribute—it will later become
the filename. If no id
attribute exists, skip the upload.
The outerHTML
of a page element is the element’s content together
with the tags that make up the element itself. In this case, this
constitutes the <svg>
tag itself and all of its children. The
innerHTML
is just the contents without the enclosing tags.)
Upload the FormData
element, using the HTTP POST method, to a
suitable server. Notice how the RequestInit
object is used to
hold the payload and method specification.
Print a confirmation message to the browser console.
Of course, all this assumes that a server is listening at the specified URL that can handle the uploaded data and do something useful with it (for example, save it to disk).
D3 provides functions to parse (and write) strings containing delimiter-separated data (see Tables 6-2 and 6-3). They are primarily intended for files of the
"text/csv"
MIME type (as laid down in RFC 4180), commonly used by
spreadsheet programs. Some notes on parsing more general file formats
follow in the next subsection.
The library supports two different styles to represent a data set:
The functions parse()
and format()
treat each record as an object.
Names of the object properties are taken from the file’s first (or header) line, which must be present.
The data set is returned as an array of objects.
The functions parseRows()
and formatRows()
treat each record as an
array (of columns).
The entire file, including its first line, is assumed to contain data.
The data set is returned as an array of arrays.
Use parseRows()
if the input files does not contain a header line with
metadata.
The Array
object returned by parse()
provides an additional member
variable columns
, which contains a list of the original column names
in the order of the input file.
Function | Description |
---|---|
|
Returns a parser-formatter instance. The mandatory argument specifies the delimiter to use; it should be a single character. |
|
Parses the input string and returns an array of objects. The first record of the input is expected to contain column names that will be used as property keys within the created objects. If supplied, the optional conversion function will be invoked on each record after it has been split into fields; it should return an object. |
|
Parses the input string and returns an array of arrays. If supplied, the optional conversion function will be invoked on each record after it has been split into fields; it should return an array. |
|
Takes an array of objects and returns a delimiter-separated string. The ordered array of property names to be included in the output is optional; if it is omitted, all properties are included (in arbitrary order). |
|
Takes an array of arrays and returns a delimiter-separated string. |
Whether you use parse()
or parseRows()
, the field values are strings;
values are not automatically converted to numbers. This does cause problems
occasionally when other parts of the program require numerical input;
for this reason it is good practice to always convert input to numbers explicitly.
You can supply an optional second argument to either parse()
or
parseRows()
to perform such conversions or other desired clean-ups.
This function will be invoked for each input line after the
line has been split into fields and turned into an object or array. It
is therefore not intended to parse each row into fields but to apply
conversions to the individual field values.
The conversion function will be passed three arguments:
The field values in the current row (as object or array)
The line number of the current row (starting at zero, not counting the header line)
An array of column names (for parse()
only)
The conversion function should return an object or array representing the
current line (or null
or undefined
to skip the current line).
For data files containing only strings, numbers, and dates, you can use
the built-in conversion function d3.autoType()
, which converts entries
that “look like” numbers or dates to the appropriate type. For more
complicated situations, you have to write your own conversion function.
Consider the following CSV file:
Year,Month,Name,Weight (kg) 2005,1,Peter,86.3 2007,7,Paul,72.5
The following code will turn it into an array of objects with lowercase member names and using appropriate data types:
d3
.
text
(
"csv.csv"
)
.
then
(
function
(
res
)
{
var
data
=
d3
.
csvParse
(
res
,
(
d
,
i
,
cs
)
=>
{
return
{
date
:
new
Date
(
d
.
Year
,
d
.
Month
-
1
)
,
name
:
d
.
Name
,
weight
:
+
d
[
"Weight (kg)"
]
}
;
}
)
;
console
.
log
(
data
)
;
}
)
;
The preceding code snippet uses the convenience function d3.csvParse()
,
which assumes a comma-separated file. Because comma- and tab-separated
files are so common, D3 provides a set of shorthands for them (see
Table 6-3).
For arbitrary delimiters you must first instantiate a parser-formatter
instance using d3.dsvFormat(delim)
, then invoke parse()
,
format()
(or parseRows()
, and formatRows()
) on this instance, while
supplying the input string to parse (or the array to format into a
string). The explicit delimiter argument is mandatory; it should be a
single character. On the other hand, you can parse a resource when
fetching it in one fell swoop using d3.csv()
. Here are three ways
to achieve the same effect (data
will always be an array of objects,
use d3.csvParseRows()
or parser.parseRows()
otherwise):
d3
.
csv
(
"csv.csv"
).
then
(
function
(
res
)
{
var
data
=
res
;
}
);
d3
.
text
(
"csv.csv"
).
then
(
function
(
res
)
{
var
data
=
d3
.
csvParse
(
res
);
}
);
d3
.
text
(
"csv.csv"
).
then
(
function
(
res
)
{
var
parser
=
d3
.
dsvFormat
(
","
);
var
data
=
parser
.
parse
(
res
);
}
);
Comma-delimited | Tab-delimited |
---|---|
|
|
|
|
|
|
|
|
The functions format()
and formatRows()
implement the inverse
operation: serializing a data structure into a string. The input
must be an array of objects (for format()
) or an array of arrays
(for formatRows()
). The format()
function takes as additional,
optional arguments a list of object property names to be included
in the output; if this is omitted, a union of all property names
found across the entire input is used. Fields in the created string
are separated using the specified delimiter, records are separated
using newlines (
), and fields are quoted as necessary.
Even if data files do not conform to the formats required by the methods just described, the infrastructure provided by them can still be useful. For example, consider the case of a data file with columns separated by whitespace: any combination of tabs and spaces. This is a situation calling for regular expressions, and the following snippet shows how to use them in conjunction with the overall framework:
d3
.
text
(
"txt.txt"
)
.
then
(
function
(
res
)
{
var
parser
=
d3
.
dsvFormat
(
""
)
;
var
rows
=
parser
.
parseRows
(
res
,
function
(
d
,
i
,
cs
)
{
return
d
[
0
]
.
split
(
/s+/g
)
.
map
(
x
=>
+
x
)
;
}
)
;
console
.
log
(
rows
)
;
}
)
;
Create a parser-formatter instance, selecting a delimiter character
that you are certain does not occur in the input file. (The empty
string seems to work, but the ASCII NUL character " "
, or any other
character that you are sure won’t occur, present alternatives.)
Because the delimiter does not occur in the input, no separation into columns is performed (but the input is correctly broken into lines or records)…
… so that the array d
of “column” values has only a single
element. The split()
function is invoked on it with a regular
expression that matches any combination of whitespace characters.
Finally, all resulting field values are converted to numbers.
JavaScript does not have built-in routines to convert arbitrary scalars
into formatted strings, comparable to the family of printf()
functions
familiar from many other programming languages. D3 provides a remedy:
a sophisticated formatting facility modeled after similar functionality
in Python 3. This section describes how to convert numbers into
human-readable strings; routines for formatting timestamps will be
discussed in Chapter 10.
In full generality, the workflow to format a value involves three steps:
Obtain a locale object (or use the current “default locale”).
Use the locale object to obtain a formatter instance for the intended output format.
Apply the formatter to a numeric value to obtain the value’s formatted, human-readable string representation.
Of course you can bundle all three steps into a single statement without assigning the intermediate objects to individual variables. For example, using the default locale, you might simply say:
var
str
=
d3
.
format
(
".2f"
)(
Math
.
PI
);
There are two functions to obtain a locale object (see Table 6-4).
Both require a locale definition as input. A locale definition
specifies details such as the currency symbol, prevailing number
formats, names of months and weekdays, and so on (see the D3 Reference
Documentation for details). Locale definitions
in a format suitable for D3 are available from https://unpkg.com, a repository for content packaged using the JavaScript package manager
npm. The following snippet shows how to fetch and use a new locale
definition. It prints the string 3,1316
to the console, following the
German convention of using a comma (not a point) as a decimal indicator:
d3
.
json
(
"https://unpkg.com/d3-format/locale/de-DE.json"
).
then
(
function
(
res
)
{
var
loc
=
d3
.
formatLocale
(
res
);
console
.
log
(
loc
.
format
(
".4f"
)(
Math
.
PI
)
);
},
function
(
err
)
{
throw
err
;
}
);
Function | Description |
---|---|
|
Takes a locale definition and returns a locale object. |
|
Takes a locale definition and returns a locale object, but also sets the default locale to the supplied definition. |
A locale object serves as a factory for formatters.
Once you have chosen a locale (either a specific one or the default
locale), you use it to obtain a formatter instance by providing
the intended output format as a string. The formatter is a function
object, which takes a single number and returns a formatted string.
(It is not possible to create a string containing multiple formatted
values simultaneously, in contrast to printf()
.) All the factory
functions that produce a formatter instance are listed in
Table 6-5.
Function | Description |
---|---|
|
Returns a formatter instance for the current default locale using the
format specification in the string |
|
Returns a formatter instance for the receiver locale using the
format specification in the string |
|
Returns a formatter instance for the current default locale. Quantities will be expressed as multiples of the supplied “scale” argument, which should be an engineering power 10±3, 10±6, 10±9, … The scale is represented through SI prefixes in the output string. |
|
Same as |
In addition to the usual formatter, there is also a special formatter that expresses all quantities in a fixed engineering unit. For instance, it will express all quantities in “kilos,” “millis,” or whichever scale you choose. (See the D3 Reference Documentation or the Wikipedia Metric Prefixes page for all available prefixes.) This behavior is not indicated through the supplied output format specifier; instead, you must use a special factory function to obtain a formatter with this behavior. For example:
d3
.
formatPrefix
(
".4f"
,
10
e
-
3
)(
Math
.
PI
)
===
"3141.5927m"
d3
.
formatPrefix
(
".4f"
,
10
e3
)(
Math
.
PI
)
===
"0.0031k"
A formatter object can be reused to format several values. This snippet turns an array of numbers into an array of formatted strings:
var
f
=
d3
.
format
(
".3f"
);
[
Math
.
E
,
Math
.
PI
].
map
(
f
);
The format specifier string can consist of up to nine different fields:
[[fill]align][sign][symbol][zero][width][,][.precision][type]
See Table 6-6 for the permissible values and their effects.
Be aware that you don’t use %
in the format specification (in contrast
to what’s customary for printf()
).
Field | Specifier | Description |
---|---|---|
fill |
any character |
Used for padding when aligning values |
align |
|
Right-align value within available space (default if missing) |
|
Left-align value within available space |
|
|
Center value within available space |
|
|
Right-align value, left-align sign and symbol |
|
sign |
|
Minus sign for negative values, nothing otherwise (default if missing) |
|
Minus sign for negative values, |
|
|
Parentheses for negative values, nothing otherwise |
|
blank |
Minus sign for negative values, a blank space otherwise |
|
symbol |
|
Insert currency symbol per the locale definition |
|
Prefix binary, octal, or hexadecimal numbers by |
|
zero |
|
When a |
width |
number |
Defines the minimum field width. If the value does not exhaust the width, the value will be padded. If not present, the width will be determined by the content. |
, |
|
When a comma is present, a grouping separator will be used. |
precision |
number |
Number of digits to the right of the decimal (for |
type |
|
Exponential notation |
|
Floating point notation |
|
|
Decimal notation if the resulting string has fewer than precision significant digits, otherwise exponential notation |
|
|
Decimal notation, rounded to significant digits |
|
|
Decimal notation with an SI prefix, rounded to significant digits |
|
|
Multiply by 100, then decimal notation with a percent sign |
|
|
Multiply by 100, round to significant digits, then decimal notation with a percent sign |
|
|
Binary notation, rounded to integer |
|
|
Octal notation, rounded to integer |
|
|
Decimal notation, rounded to integer |
|
|
Hexadecimal notation, using lowercase letters, rounded to integer |
|
|
Hexadecimal notation, using uppercase letters, rounded to integer |
|
|
Convert integer to the corresponding Unicode character |
|
|
Shorthand for |
|
missing |
D3 includes some functions that can help construct format
specifiers. First among these is the function d3.formatSpecifier(fmt)
that parses the specifier fmt
into its constituent fields. You
can now inspect individual values, even change some of the fields
(possibly programmatically), and then glue the values together again to
obtain a new specifier based upon the old one. Other functions
(d3.precisionFixed()
, d3.precisionPrefix()
, and d3.precisionRound()
)
help you find the proper value for the precision
field in a specifier,
given the finest resolution (that is, the smallest difference between
consecutive values) that you still would like to be visible. These
functions are used internally (for instance, to determine the appropriate
format for axis tick marks), but they are available for general uses
as well.
1 For example, the MDN Promises Guide.
2 See the MDN Fetch Reference for a list of all legal parameters.
3 For example: MDN Request Cache Reference and Cache Control for Fetch.
4 The best short introduction I am aware of is Spring’s “Understanding CORS”.
5 See the MDN Request.mode Reference.