Chapter 6. Data Manipulation and Sources

Until now, data rendered in this book's examples have been defined within the code itself. Given this is a fairly limited method of storing data it naturally follows that a connection to external data sources must be developed. This chapter is dedicated to the topic of data accessibility for use in the Visualization API, including the following topics:

  • Preparing/cleaning data
  • Manipulating data with the Visualization API
  • Configuring the following to be consumed as a data source:
    • Spreadsheets
    • Fusion Tables
  • Manipulating data with their native APIs
  • Apps Script equivalent functionality
  • Creating your own data source

Data management is a topic that ranges well beyond the pages of this book, and the survey of methods presented in this chapter give only a handful of technical methods for working with data. This chapter is also not an exhaustive description of all Google data capabilities. For example, during the writing of this book, the Google Big Query (no SQL) and Prediction APIs are available as experimental options from the Apps Script development platform. It is very likely that as Google continues to further integrate API accessibility, additional data-focused APIs will emerge and become integrated as well. Also, beyond the tools provided by Google, traditional methods of database management are equally valid methods of data maintenance.

Preparing data

In general, it can be reasonably assumed that most collections of data are not as pristine as one would like them to be. Inconsistencies, poorly designed structures, and even mysterious extra whitespaces can be a nightmare for an average developer attempting to mine even the smallest amount of useful information. While not required, a best practice for data visualization is to first verify that the data to be visualized is formatted as expected.

Tip

Best practice

Before visualizing data, be sure it is arranged as expected and is free from inconsistencies.

The Google answer to the prevailing issue of dirty data is the free software tool, Google Refine. At the time of publication, Google Refine was in the process of transitioning from a Google Code hosted project to a GitHub hosted project called Open Refine. In either case, the capabilities and intentions of the tool remain the same. The application is a downloadable installation for Mac, Windows, and Linux operating systems. The stand-alone application is intentionally not a web service by design, as users with sensitive or private data can also use the tool without exposing their data on the Web. The intention of Google Refine is to provide the big picture of data, clean up messy raw data, and to discover and fix inconsistencies. It is particularly suited for public data, but can be used with any desired dataset.

Note

The Google Refine documentation is available at https://code.google.com/p/google-refine/.

The Open Refine documentation is available at http://openrefine.org/.

Google Refine – importing data

Google Refine allows data of various formats and origins to be imported. The import wizard lists the formats Google Refine will accept. The application also allows for import from various locations, including Google Drive. The 2010 Chicago Census data that has been the subject of examples in the previous chapters can be imported into Refine from Google Drive by selecting Google Data | Chicago_Census_2010_Ages when signed into the Google Drive account used for previous examples.

Google Refine – importing data

Google Refine – Facets

In Google Refine, a Facet is a feature or element of data. The primary capabilities of a Facet are two-fold. First, a Facet on a column of data allows the user to view the big picture of that column's data. Second, Facets are useful when a bulk change is to be performed on a subset of dataset values. To apply a Facet to a column, select the down arrow button to view general options for the column. Select Facet from the drop-down menu options, and then select whichever type of Facet is desired.

Google Refine – Facets

Google Refine contains a variety of Facets, but standard Facets can be described as similar to numeric or text-style filters. For users with unique requirements, custom Facets can be defined by using Google Refine Expression Language (GREL), Jython, and Clojure query languages. Selecting the Customized facets menu option opens up the dialog in which a custom query can be entered. References for the aforementioned query languages have been included at the end of this section, but their use is beyond the scope of this chapter.

Once a Facet has been selected or defined, it will appear as a window in the panel to the left of the primary application window. Facets are interactive in nature, and can be manipulated and deleted from the side panel location. The behavior of Facets is cumulative, which implies that the manipulation of one Facet applied to a section of data will also be reflected in other current Facets. This is the case in the following example where the U.S. 2010 Chicago Census data has been imported from Google Spreadsheets. Text and numeric Facets have been applied to the Age and Male columns, respectively. When the slider bars are adjusted to only select the two groups to the far right of the graph, the change is reflected in the Age Facet and two age categories are currently selected.

Google Refine – Facets

Google Refine – clean and supplement

Standard bulk operations to clean up messy data are fairly intuitive with Google Refine. For single cell edits, select the Edit button while hovering over a cell. The Edit button opens a dialog that allows editing of that cell. Options for bulk replace, changing the type of data in the cell, and editing the cell value itself are possible through the Edit option. Additional cell editing options are also available through the column's drop-down arrow menu under the Edit cells option.

Google Refine – clean and supplement

To make broader changes to a column's data, again select the down arrow in the column's label cell, this time choosing the Edit column option. In addition to the more standard column manipulations reminiscent of a Spreadsheet function, Edit column options can be used to collect additional information based on an existing column's values. A common use of this feature is the collection of latitude and longitude, also known as geocoding, information from an existing column of addresses. A third party service, such as Open Street Map's Nominatum, can be used by Google Refine to retrieve and store corresponding geocode information for each address in the data column. The Google Refine project home page hosts a how-to video on this and other Refine uses.

Finally, for anyone making sweeping changes to a dataset, the fear of making a change without the ability to retract is very real. It is also desirable to keep a record of operations performed on the data, just in case data transformations must be recreated. Google Refine provides an interactive history with redo/undo capability to address these concerns. To view the operation history, select the Undo/Redo tab, which can be found next to the Facet/Filter tab. The history actions are interactive, and simply selecting an operation in the history list will revert the dataset back to the specified state.

Google Refine – clean and supplement

Google Refine – export options

Once a dataset has been modified as desired, Google Refine provides a variety of export options to deliver the newly cleaned data. Select the Export button to choose an export format.

Google Refine – export options

Alternatively, Google Refine is closely aligned with the Freebase open data repository, which is also an option for data export. To upload to Freebase, select the drop-down menu from the Freebase button directly below the Export button.

Note

The Google Refine documentation is available at http://code.google.com/p/google-refine/.

The Open Refine documentation is available at http://openrefine.org/.

The Open Street Map Nominatim Service documentation is available at http://wiki.openstreetmap.org/wiki/Nominatim.

The Jython Language documentation is available at http://www.jython.org/.

The Clojure Language documentation is available at http://clojure.org/.

The Freebase Open Data Repository documentation is available at http://www.freebase.org.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset