Chapter 6. Files, Fetches, Formats: Getting Data In and Out

Unless you define your data inside your script (as in Example 5-1, for example) or have your script generate it (as in Examples 4-6 and 4-7), you will have to get the data into your script somehow. This involves two separate steps: fetching the data from its location (which may be the local filesystem, a remote server, or another resource, such as a web service) and parsing it into a usable data structure. If you want to create text labels from data, you will need to do the opposite and format data for textual output. This chapter describes the facilities D3 offers to help with these tasks. Discussions of file formats are usually a drag—I’ll try to make it brief.

Fetching a File

The JavaScript Fetch API is a modern replacement for the venerable XMLHttpRequest object—the technology that first enabled web pages to exchange data with servers asynchronously, thus giving rise to AJAX and the entire contemporary, “dynamic” web experience. D3 wraps this Fetch API, replicates some of its methods, and adds functionality that is convenient when working with web pages or tabular data. Some artifacts of the underlying API remain visible through D3; for this reason, it is frequently worth consulting the Fetch API reference as well.

Table 6-1 lists all functions provided by D3 that can retrieve (“fetch”) a resource (such as a file) from a URL. Of course the resource need not be a file—it can be anything, as long as it is describable by a URL (like a server that generates data on demand).

  • All functions take a specification of the desired resource as a string containing a URL.

  • All functions return a Promise object (see “JavaScript Promises”).

  • All functions take an optional RequestInit object. The elements permitted in this object and their values are defined by the Fetch standard; they control various aspects of the remote communication, such as permission and caching issues. Some of these may be relevant even in relatively simple applications; we will discuss some of them toward the end of this section.

  • The convenience functions that parse tabular data may also take a conversion function that will be applied to the data as it is read. (We will discuss conversion functions in the next section.)

Table 6-1. Methods for retrieving a resource (all methods return a Promise object)
Function Description

d3.text( url, init )

Fetches the specified resource and treats it as UTF-8 decoded string. a

d3.json( url, init )

Fetches the specified resource and parses it as JSON into an object. a

d3.dsv( delimiter, url, init, converter )

Takes a delimiter (such as ",") and a URL as required parameters. Fetches the specified resource, which must include a descriptive header line, and parses it as delimiter-separated values; the result will be an array of objects. An optional conversion function may be specified as the last parameter.

d3.csv( url, init, converter )

Fetches the specified resource, which must include a descriptive header line, and parses it as comma-separated values; the result will be an array of objects. An optional conversion function may be specified as the last parameter.

d3.tsv( url, init, converter )

Fetches the specified resource, which must include a descriptive header line, and parses it as tab-separated values; the result will be an array of objects. An optional conversion function may be specified as the last parameter.

d3.html( url, init )

Fetches the specified resource and parses it into an HTMLDocument element.

d3.svg( url, init )

Fetches the specified resource and parses it into an SVGDocument element.

d3.xml( url, init )

Fetches the specified resource and parses it into a Document element.

d3.image( url, init )

Fetches the specified resource and parses it into an HTMLImageElement element.

d3.blob( url, init )

Fetches the specified resource and treats it as a Blob object.a

d3.buffer( url, init )

Fetches the specified resource and treats it as an ArrayBuffer object.a

a This function replicates part of the Fetch API.

Examples

Let’s consider a few examples. Assuming that you have a simple JSON file, like this one:

{ "val": 5, "txt": "Hello" }

then you can read it (and access its properties) like so:

d3.json("simple.json").then(res=>console.log(res.val,res.txt));

JSON parsers are picky; make sure your JSON is correctly formed! (In particular, JSON property keys must be double-quoted strings, in marked contrast to the syntax for JavaScript object initializers.)

It is also easy to fetch a bitmap image and attach it to the document (or page):

d3.image( "image.jpg" ).then( function(res) {
    d3.select( "#figure" ).append( () => res ) } );

This code assumes that the page contains a placeholder with an appropriate id attribute (for example, <div id="figure">...</div>). Note the argument to the append() function: it is a function, taking no arguments, and returning the result of the fetch! The reason for this roundabout route is that append() can handle either a string or a function returning a node, but not a node by itself. The result of the fetch is a node, hence it is necessary to “wrap it into a function” like this.

Finally, let’s assume that you have an SVG file, for example, containing the definition of a symbol that you would like to reuse:

<svg xmlns="http://www.w3.org/2000/svg"
     xmlns:svg="http://www.w3.org/2000/svg">
  <defs>
    <g id="heart">
      <path d="M0 -3 A3 6.6 -35 1 0 0 6 A3 6.6 35 1 0 0 -3Z" />
    </g>
  </defs>
</svg>

You can insert the <defs> section into the current document and use the defined symbol like this:

d3.svg( "heart.svg" ).then( function(res) {
    d3.select("svg").insert( ()=>res.firstChild,":first-child");
    d3.select("svg").append( "use" ).attr( "href", "#heart" )
        .attr( "transform", "translate(100,100) scale(2)" );
} );

Again, note how the result of the fetch is “wrapped into a function.” In this snippet, res is a DOM SVGDocument instance (not a D3 data type), hence you must use the native DOM API to extract the <defs> element (the first and only child of this SVG document).

The external SVG file in this example contains declarations for XML namespaces. We have not worried about XML namespaces so far (because D3 for the most part takes care of them for us). Here, they are required, because otherwise the SVG parser will not work correctly.

Controlling Fetches with the RequestInit Object

Most of the time, the functions in Table 6-1 just work without any further tweaking. But the simplicity and convenience of the API hides underlying complexity (and sometimes obstructs proper diagnosis if something goes wrong). The means to control aspects of the remote communication is the RequestInit object that all functions in Table 6-1 accept as an optional parameter.2

Caching

Browsers may cache resources fetched from a remote location, with the consequence that changes to the remote resource will not become visible in the browser. In particular during development, this can become a major nuisance! A simple way to prevent all browser caching of a remote resource is to set the cache property to no-store:

d3.svg( "heart.svg", { cache: "no-store" } ).then( ... );

While good practice during development, preventing browser-side caching is wasteful when used in production. The details of resource caching can be complex; you may want to check the appropriate reference.3

Third-party resources and CORS

When attempting to load resources using the Fetch API from third-party websites, you may occasionally encounter strange failures and permission issues. The browser refuses to complete the request when made from within the JavaScript runtime—even though the resource may be readily available using a command-line tool or even by pointing the browser itself to the URL. The cause is likely to be the browser’s same-origin policy and the cross-origin resource sharing (CORS) mechanism, which limit JavaScript access to third-party resources.4

The CORS protocol is strange in the way it splits responsibility between browser and server (under certain conditions, the server will send the requested resource to the browser, but the browser will refuse to make it accessible to its own JavaScript runtime!), and browsers differ in the CORS policy they implement. The mode property of the RequestInit object contains additional information.5

CORS relies on an interplay between browser and server. In particular, the server must be configured to send the appropriate header information. If it isn’t, there is nothing you can do. You will have to download the required resource separately and serve it from your own server, or access the resource through a proxy server.

Writing a file

Occasionally, it is necessary to write a file—for example, to save a graph as an SVG file. That is not easy, though, because the local filesystem is not accessible from within JavaScript. It is, however, possible to upload a file (or any data, for that matter) to a server. Example 6-1 shows how this can be accomplished. (A very similar function was used to capture the graphs for this book.)

Example 6-1. A JavaScript function to upload all SVG figures in a page to a server
function upload() {
    var out = new FormData();                                     1

    d3.selectAll( "svg" ).each( function() {                      2
        var id = d3.select( this ).attr( "id" );                  3
        if( id ) {
            out.set( "filename", id );
            out.set( "data", this.outerHTML );                    4

            d3.text( "http://localhost:8080/upload",              5
                     { method: "POST", body: out } )
                .then( function(r) { console.log("Succ:", id) },  6
                       function(r) { console.log("FAIL:", id) } );
        } } );
}
1

Creates a FormData object as the container for the data to upload.

2

Invoke the following anonymous function for every SVG element in the page.

3

Grab the value of the element’s id attribute—it will later become the filename. If no id attribute exists, skip the upload.

4

The outerHTML of a page element is the element’s content together with the tags that make up the element itself. In this case, this constitutes the <svg> tag itself and all of its children. The innerHTML is just the contents without the enclosing tags.)

5

Upload the FormData element, using the HTTP POST method, to a suitable server. Notice how the RequestInit object is used to hold the payload and method specification.

6

Print a confirmation message to the browser console.

Of course, all this assumes that a server is listening at the specified URL that can handle the uploaded data and do something useful with it (for example, save it to disk).

Parsing and Writing Tabular Data

D3 provides functions to parse (and write) strings containing delimiter-separated data (see Tables 6-2 and 6-3). They are primarily intended for files of the "text/csv" MIME type (as laid down in RFC 4180), commonly used by spreadsheet programs. Some notes on parsing more general file formats follow in the next subsection.

The library supports two different styles to represent a data set:

  • The functions parse() and format() treat each record as an object.

    • Names of the object properties are taken from the file’s first (or header) line, which must be present.

    • The data set is returned as an array of objects.

  • The functions parseRows() and formatRows() treat each record as an array (of columns).

    • The entire file, including its first line, is assumed to contain data.

    • The data set is returned as an array of arrays.

Use parseRows() if the input files does not contain a header line with metadata. The Array object returned by parse() provides an additional member variable columns, which contains a list of the original column names in the order of the input file.

Table 6-2. Methods to parse and format delimiter-separated data (p is a parser-formatter instance)
Function Description

d3.dsvFormat( delim )

Returns a parser-formatter instance. The mandatory argument specifies the delimiter to use; it should be a single character.

p.parse( string, converter )

Parses the input string and returns an array of objects. The first record of the input is expected to contain column names that will be used as property keys within the created objects. If supplied, the optional conversion function will be invoked on each record after it has been split into fields; it should return an object.

p.parseRows( string, converter )

Parses the input string and returns an array of arrays. If supplied, the optional conversion function will be invoked on each record after it has been split into fields; it should return an array.

p.format( data, columns )

Takes an array of objects and returns a delimiter-separated string. The ordered array of property names to be included in the output is optional; if it is omitted, all properties are included (in arbitrary order).

p.formatRows( data )

Takes an array of arrays and returns a delimiter-separated string.

Field Value Conversions

Whether you use parse() or parseRows(), the field values are strings; values are not automatically converted to numbers. This does cause problems occasionally when other parts of the program require numerical input; for this reason it is good practice to always convert input to numbers explicitly.

You can supply an optional second argument to either parse() or parseRows() to perform such conversions or other desired clean-ups. This function will be invoked for each input line after the line has been split into fields and turned into an object or array. It is therefore not intended to parse each row into fields but to apply conversions to the individual field values.

The conversion function will be passed three arguments:

  • The field values in the current row (as object or array)

  • The line number of the current row (starting at zero, not counting the header line)

  • An array of column names (for parse() only)

The conversion function should return an object or array representing the current line (or null or undefined to skip the current line).

For data files containing only strings, numbers, and dates, you can use the built-in conversion function d3.autoType(), which converts entries that “look like” numbers or dates to the appropriate type. For more complicated situations, you have to write your own conversion function. Consider the following CSV file:

Year,Month,Name,Weight (kg)
2005,1,Peter,86.3
2007,7,Paul,72.5

The following code will turn it into an array of objects with lowercase member names and using appropriate data types:

d3.text( "csv.csv" ).then( function(res) {                    1
    var data = d3.csvParse( res, (d,i,cs) => {
        return {
            date: new Date( d.Year, d.Month-1 ),              2
            name: d.Name,                                     3
            weight: +d["Weight (kg)"]                         4
        };
    } );
    console.log( data );
} );
1

Load as plain text, parse, and convert in callback.

2

Combine two columns into Date type. (JavaScript’s month index is zero-based.)

3

Convert property name to lowercase.

4

Eliminate invalid property name, convert value to number.

Parsing Input Containing Arbitrary Delimiters

The preceding code snippet uses the convenience function d3.csvParse(), which assumes a comma-separated file. Because comma- and tab-separated files are so common, D3 provides a set of shorthands for them (see Table 6-3). For arbitrary delimiters you must first instantiate a parser-formatter instance using d3.dsvFormat(delim), then invoke parse(), format() (or parseRows(), and formatRows()) on this instance, while supplying the input string to parse (or the array to format into a string). The explicit delimiter argument is mandatory; it should be a single character. On the other hand, you can parse a resource when fetching it in one fell swoop using d3.csv(). Here are three ways to achieve the same effect (data will always be an array of objects, use d3.csvParseRows() or parser.parseRows() otherwise):

d3.csv( "csv.csv" ).then( function(res) {
    var data = res;
} );

d3.text( "csv.csv" ).then( function(res) {
    var data = d3.csvParse( res );
} );

d3.text( "csv.csv" ).then( function(res) {
    var parser = d3.dsvFormat( "," );
    var data = parser.parse( res );
} );
Table 6-3. Shorthands for comma- and tab-separated files. The functions are equivalent to those in Table 6-2.
Comma-delimited Tab-delimited

d3.csvParse( string, converter )

d3.tsvParse( string, converter )

d3.csvParseRows( string, converter )

d3.tsvParseRows( string, converter )

d3.csvFormat( data, columns )

d3.tsvFormat( data, columns )

d3.csvFormatRows( data )

d3.tsvFormatRows( data )

Generating Tabular Output

The functions format() and formatRows() implement the inverse operation: serializing a data structure into a string. The input must be an array of objects (for format()) or an array of arrays (for formatRows()). The format() function takes as additional, optional arguments a list of object property names to be included in the output; if this is omitted, a union of all property names found across the entire input is used. Fields in the created string are separated using the specified delimiter, records are separated using newlines ( ), and fields are quoted as necessary.

Using Regular Expressions to Parse Whitespace-Separated Data

Even if data files do not conform to the formats required by the methods just described, the infrastructure provided by them can still be useful. For example, consider the case of a data file with columns separated by whitespace: any combination of tabs and spaces. This is a situation calling for regular expressions, and the following snippet shows how to use them in conjunction with the overall framework:

d3.text( "txt.txt" ).then( function(res) {
    var parser = d3.dsvFormat( "" );                          1
    var rows = parser.parseRows( res, function(d, i, cs) {    2
        return d[0].split( /s+/g ).map( x => +x );        3 4
    } );
    console.log( rows );
} );
1

Create a parser-formatter instance, selecting a delimiter character that you are certain does not occur in the input file. (The empty string seems to work, but the ASCII NUL character "", or any other character that you are sure won’t occur, present alternatives.)

2

Because the delimiter does not occur in the input, no separation into columns is performed (but the input is correctly broken into lines or records)…

3

… so that the array d of “column” values has only a single element. The split() function is invoked on it with a regular expression that matches any combination of whitespace characters.

4

Finally, all resulting field values are converted to numbers.

Formatting Numbers

JavaScript does not have built-in routines to convert arbitrary scalars into formatted strings, comparable to the family of printf() functions familiar from many other programming languages. D3 provides a remedy: a sophisticated formatting facility modeled after similar functionality in Python 3. This section describes how to convert numbers into human-readable strings; routines for formatting timestamps will be discussed in Chapter 10.

In full generality, the workflow to format a value involves three steps:

  1. Obtain a locale object (or use the current “default locale”).

  2. Use the locale object to obtain a formatter instance for the intended output format.

  3. Apply the formatter to a numeric value to obtain the value’s formatted, human-readable string representation.

Of course you can bundle all three steps into a single statement without assigning the intermediate objects to individual variables. For example, using the default locale, you might simply say:

var str = d3.format( ".2f" )( Math.PI );

Locales

There are two functions to obtain a locale object (see Table 6-4). Both require a locale definition as input. A locale definition specifies details such as the currency symbol, prevailing number formats, names of months and weekdays, and so on (see the D3 Reference Documentation for details). Locale definitions in a format suitable for D3 are available from https://unpkg.com, a repository for content packaged using the JavaScript package manager npm. The following snippet shows how to fetch and use a new locale definition. It prints the string 3,1316 to the console, following the German convention of using a comma (not a point) as a decimal indicator:

d3.json( "https://unpkg.com/d3-format/locale/de-DE.json" ).then(
    function( res ) {
        var loc = d3.formatLocale( res );
        console.log( loc.format( ".4f" )( Math.PI ) );
    },
    function( err ) {
        throw err;
    }
);
Table 6-4. Factory methods for creating locale objects
Function Description

d3.formatLocale( def )

Takes a locale definition and returns a locale object.

d3.formatDefaultLocale( def )

Takes a locale definition and returns a locale object, but also sets the default locale to the supplied definition.

Formatters

A locale object serves as a factory for formatters. Once you have chosen a locale (either a specific one or the default locale), you use it to obtain a formatter instance by providing the intended output format as a string. The formatter is a function object, which takes a single number and returns a formatted string. (It is not possible to create a string containing multiple formatted values simultaneously, in contrast to printf().) All the factory functions that produce a formatter instance are listed in Table 6-5.

Table 6-5. Factory methods to create number formatter instances (loc is a locale instance)
Function Description

d3.format( fmt )

Returns a formatter instance for the current default locale using the format specification in the string fmt.

loc.format( fmt )

Returns a formatter instance for the receiver locale using the format specification in the string fmt.

d3.formatPrefix( fmt, scale )

Returns a formatter instance for the current default locale. Quantities will be expressed as multiples of the supplied “scale” argument, which should be an engineering power 10±3, 10±6, 10±9, … The scale is represented through SI prefixes in the output string.

loc.formatPrefix( fmt, scale )

Same as d3.formatPrefix(), but for the receiver locale.

In addition to the usual formatter, there is also a special formatter that expresses all quantities in a fixed engineering unit. For instance, it will express all quantities in “kilos,” “millis,” or whichever scale you choose. (See the D3 Reference Documentation or the Wikipedia Metric Prefixes page for all available prefixes.) This behavior is not indicated through the supplied output format specifier; instead, you must use a special factory function to obtain a formatter with this behavior. For example:

d3.formatPrefix( ".4f", 10e-3 )(Math.PI) === "3141.5927m"
d3.formatPrefix( ".4f", 10e3 )(Math.PI)  === "0.0031k"

A formatter object can be reused to format several values. This snippet turns an array of numbers into an array of formatted strings:

var f = d3.format( ".3f" );
[ Math.E, Math.PI ].map( f );

Format or Conversion Specifiers

The format specifier string can consist of up to nine different fields:

[[fill]align][sign][symbol][zero][width][,][.precision][type]

See Table 6-6 for the permissible values and their effects. Be aware that you don’t use % in the format specification (in contrast to what’s customary for printf()).

Table 6-6. Conversion specifiers for numbers
Field Specifier Description

fill

any character

Used for padding when aligning values

align

>

Right-align value within available space (default if missing)

<

Left-align value within available space

^

Center value within available space

=

Right-align value, left-align sign and symbol

sign

-

Minus sign for negative values, nothing otherwise (default if missing)

+

Minus sign for negative values, + otherwise

(

Parentheses for negative values, nothing otherwise

blank

Minus sign for negative values, a blank space otherwise

symbol

$

Insert currency symbol per the locale definition

#

Prefix binary, octal, or hexadecimal numbers by 0b, 0o, or 0x, respectively

zero

0

When a 0 is present, it sets the > and the = flags, overriding other settings.

width

number

Defines the minimum field width. If the value does not exhaust the width, the value will be padded. If not present, the width will be determined by the content.

,

,

When a comma is present, a grouping separator will be used.

precision

number

Number of digits to the right of the decimal (for f and %); number of significant digits (for e, g, r, s, p, and missing type). Defaults to 6, but equals 12 when the type indicator is missing. Ignored for integer formats (b, o, d, x, and X).

type

e

Exponential notation

f

Floating point notation

g

Decimal notation if the resulting string has fewer than precision significant digits, otherwise exponential notation

r

Decimal notation, rounded to significant digits

s

Decimal notation with an SI prefix, rounded to significant digits

%

Multiply by 100, then decimal notation with a percent sign

p

Multiply by 100, round to significant digits, then decimal notation with a percent sign

b

Binary notation, rounded to integer

o

Octal notation, rounded to integer

d

Decimal notation, rounded to integer

x

Hexadecimal notation, using lowercase letters, rounded to integer

X

Hexadecimal notation, using uppercase letters, rounded to integer

c

Convert integer to the corresponding Unicode character

n

Shorthand for ,g (with grouping separator)

missing

Like g, but omits trailing zeros

D3 includes some functions that can help construct format specifiers. First among these is the function d3.formatSpecifier(fmt) that parses the specifier fmt into its constituent fields. You can now inspect individual values, even change some of the fields (possibly programmatically), and then glue the values together again to obtain a new specifier based upon the old one. Other functions (d3.precisionFixed(), d3.precisionPrefix(), and d3.precisionRound()) help you find the proper value for the precision field in a specifier, given the finest resolution (that is, the smallest difference between consecutive values) that you still would like to be visible. These functions are used internally (for instance, to determine the appropriate format for axis tick marks), but they are available for general uses as well.

1 For example, the MDN Promises Guide.

2 See the MDN Fetch Reference for a list of all legal parameters.

3 For example: MDN Request Cache Reference and Cache Control for Fetch.

4 The best short introduction I am aware of is Spring’s “Understanding CORS”.

5 See the MDN Request.mode Reference.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset