Chapter 3. Exploring the Library and the Ecosystem

TensorFlow itself, while impressive, is “just” an open source library for numerical computation using data flow graphs. As described in Chapter 2, there are plenty of open source competitors that you could use to build, train, and run inference with complex neural networks; more will arise. It is the ecosystem surrounding a library, built not only by the original author(s) but also by the community, that forms a long-term solution in an ever-evolving space. It is TensorFlow’s rich and growing ecosystem that compels many to use it. To go through the detailed use of each TensorFlow component would be beyond the scope of this report, but we will strive to introduce the relevant pieces and provide some perspective on the larger overall puzzle.

Python serves as an excellent example of the power of the ecosystem. One of the language’s selling points was its “batteries-included” philosophy; it came with a standard library that made many tasks (such as making HTTP requests) simple. Even with this approach, Python owes some of its success to its ecosystem. The Numpy and Scipy libraries created a strong foundation for numerical and scientific computing, extending the language’s core capabilities and the community that uses it. Libraries such as scikit-learn, which serves almost as a reference implementation for algorithms within the field of machine learning, and Pandas, the de facto standard for Python-based data analysis, have built upon Numpy and Scipy and have helped Python contest for the throne of data science programming languages. Companies like Enthought and then Continuum Analytics created distributions of Python that included critical libraries whose numerous external dependencies had made installation difficult. This simplified the deployment of Python, broadening the community of users. The IPython Notebook has evolved into Project Jupyter (Julia Python R) to support new languages beyond Python. Jupyter is emerging as the standard IDE for data science, deep learning, and artificial intelligence. Even the deep learning libraries based on Python not only extend Python’s ecosystem but also are only possible because of that ecosystem.

We divide the TensorFlow ecosystem into several functional categories. The first group increases the direct utility of the library by making it easier for you to design, build, and train neural networks with or on top of TensorFlow. Several examples of this are prebuilt and even pretrained deep neural networks, graphical interfaces for tracking training progress (TensorBoard), and a higher-level interface to TensorFlow (Keras). The second category contain tools that make inference possible and easier to manage. The next category are the components used to connect to and interact with other popular open source projects such as Hadoop, Spark, Docker, and Kubernetes. The last category are technologies that decrease the time and cost to train deep neural networks because this is often the rate-limiting step.

This division loosely follows the three stages of the TensorFlow pipeline: (1) data preparation, (2) training, and (3) inference and model serving. We will not focus significant prose on preparing data for use with TensorFlow; it is assumed that enterprises transitioning from other types of machine learning will already have mechanisms in place to clean, wrangle, and otherwise prepare data for analysis and training. Figure 3-1 shows how the ecosystem lines up with the pipeline.

Figure 3-1. The alignment between various parts of the TensorFlow ecosystem and the overall workflow

Improving Network Design and Training

The following tools and open source projects help the software engineer and data scientist to design, build, and train deep learning models, seeking to create immediate value for the TensorFlow user. If you’re new, we recommend that you take a look at the relevant prebuilt neural networks as a starting point and then take a close look at Keras, which can simplify the creation of more complex networks and offers some portability for models. If your application involves sequence data (text, audio, time–series, etc.), do not skip Tensor2Tensor. Regardless of your experience level, expect to use TensorBoard.

Estimators

TensorFlow offers a higher-level API for machine learning (tf.estimator). It contains a number of built in models—linear classifier, linear regressor, neural network classifier, neural network regressor, and combined models—and allows more rapid configuration, training, and inference or evaluation to occur.

Prebuilt Neural Networks

Deep neural network design remains somewhat of an academic pursuit and an artform. To speed the adoption and use of DL, TensorFlow comes with a number of example neural networks available for immediate use. Before starting any project, check this directory to see if a potential jumpstart is available. Of special note is the Inception network, a convolutional neural network that achieved state-of-the-art performance in both classification and detection in the 2014 ImageNet Large-Scale Visual Recognition Challenge.1

Keras

Keras is a high-level API written in Python and designed for humans to build and experiment with complex neural networks in the shortest amount of time possible. You can use Keras as a model definition abstraction layer for TensorFlow, and it’s also compatible with other TF-related tools. Interestingly, Keras offers a potential portability pathway to move networks from one deep learning library to another, coming closest to achieving a standard model abstraction. It is currently capable of running on top of not only TensorFlow, but also Theano, Microsoft Cognitive Toolkit, and, recently, MXNet. Further, Keras is the Python API for Deeplearning4J.

Machine Learning Toolkit for TensorFlow

This toolkit provides out-of-the-box, high-level machine learning algorithms (see the list that follows) inspired by the popular scikit-learn library for immediate use versus rewriting the algorithms using TF’s lower-level API.

  • Neural networks (DNN, RNN, LSTM, etc.)

  • Linear and logistic regression

  • K-means clustering

  • Gaussian mixture models

  • WALS matrix factorization

  • Support vector machine (with L1 and L2 regularization)

  • Stochastic dual coordinate ascent for context optimization

  • Random forests

  • Decision trees

Importantly, all of these algorithms have distributed implementations and can execute across machines in parallel, offering significant performance increases over nonparallelized implementations.

Tensor2Tensor (T2T)

T2T is an open source system built on top of TensorFlow to support the development and training of state-of-the-art deep learning networks with a particular focus on sequence-to-sequence models (the kinds used to translate text or provide a caption for images). This library, released in 2017, is being actively used, developed, maintained, and supported by the Google Brain team. It also includes the following:

The goal of the software is to provide a level of abstraction higher than that provided by the base TensorFlow API and encapsulate many best practices and hard-learned “tricks” of the trade into software that enforces a standardized interface between all of its pieces.

TensorBoard

Even though machine learning in general is difficult to visualize, neural networks have long been criticized for being a black box, affording almost no transparency into their inner workings. Deep network graphs can be difficult to visualize. Dataflow within the many layers of a graph is difficult to observe in situ or even a posteriori. Understanding and debugging neural networks can be notoriously difficult from a practitioner’s perspective.

TensorBoard is a collection of visualization tools that provides insight into a TensorFlow graph and allows the developer or analyst to understand, debug, and optimize the network. The UI for the tools is browser based. It provides three core capabilities:

Visualization of the graph structure

The first step to understanding a neural network, which could be composed of dozens of layers, with hundreds of thousands of nodes or more, is to inspect and verify the structure of the network visually.

Visualization of summaries

TensorBoard allows you to attach summarizations to capture various types of tensors flowing through the graph during training and execution. These tensors could represent input data or network weights, with histograms demonstrating how network weights or other tensors in the network change over time.

Embedding visualizer

TensorBoard also allows you to visualize the machine learning results in a three-dimensional interface.

Typical functions are available such as graphing summary statistics during learning. You can also gain insight into the outputs of specific layers to run your own analysis. This makes it possible to review the distribution of outputs from one layer before those values serve as input to the next layer. TensorBoard reads serialized TensorFlow event data. Although some features and visualizations come for free without setup, others require code changes to capture the data to be visualized, and you can choose the nodes or objects about which you collect summary information.

Google has big goals for continued TensorBoard development. First, the TensorFlow debugger will be integrated with TensorBoard so that you can visualize debugging information through this tool. Next, TensorBoard will soon support plug-ins that allow for complex and custom visualizations designed for interrogating specific neural networks types with unique visualization needs in various problem domains. Finally, Google plans to release an “organizational scale” TensorBoard designed not just for the individual but also for the team so that results can be rapidly disseminated and a shared history of development can be kept.

TensorFlow Debugger

TensorFlow comes out of the box with a specialized debugger (tfdbg) that allows for introspection of any data as it flows through TensorFlow graphs while doing both training and inference. There was an interesting third-party open source debugger for TensorFlow (tdb) with robust visualization capabilities. It was described by the author as, “TDB is to TensorBoard as GDB is to printf. Both are useful in different contexts.” However, the author, Eric Jang, was apparently hired by Google Brain and the external effort has been abandoned.

Deploying Networks for Inference

Deep learning training often gets most of the press due to the large computational demands that it requires. However, a state-of-the-art deep neural network is without value if no one uses it. Providing inference capabilities in a robust, scalable, and efficient way is critical for the success of the deep learning library and ecosystem.

TensorFlow Serving

After training, the enterprise faces the decision of how to operationalize deep learning networks and machine learning models. There will be some use cases, such as in research, experimentation, or asynchronous prediction/classification activities, for which operationalization is not required. However, in many instances, the enterprise will want to provide real-time inference for user-facing applications (like object detection in a mobile application such as the “Not Hotdog” application from HBO’s Silicon Valley), human decision-making support, or automated command and control systems; this requires moving a previously trained network into production for inference.

Operationalizing machine learning models opens a Pandora’s box of problems in terms of designing a production system. How can we provide the highest level of performance? What if we need to expose multiple models for different operations? How do we manage a deployment process or deal with the configuration management of multiple model versions?

TensorFlow Serving provides a production-oriented and high-performance system to address this issue of model deployment; it hosts TensorFlow models and allows remote access to them to meet client requests. Importantly, the models served are versionable, making it easy to update networks with new weights or iterations while maintaining separate research and production branches. You cannot make HTTP requests via the browser to communicate with TensorFlow Serving. Instead, the server, built on C++ for performance, implements a gRPC interface. gRPC is Google’s Remote Procedure Call framework designed to performantly connect services in and across datacenters in a scalable fashion. Thus, a client will need to be built to communicate with the server. Deep learning and machine learning models must be saved in Google’s protobuf format. TensorFlow Serving can (auto) scale within CloudML or by using Docker/Kubernetes.

In-Process Serving

In some cases, organizations might not want to deploy TensorFlow Serving or cannot use the TensorFlow Serving RPC server to serve models. In these situations, you can still use saved models directly by including core TensorFlow libraries in the application. In-process serving offers a very lightweight mechanism to provide inference capabilities but none of the benefits provided by TensorFlow Serving, like automated request batching or model versioning.

As an example, let’s consider a basic website built with the Python library Flask. This website will allow a user to upload an image and identify objects within the image using a deep convolutional neural network. From our perspective, the interesting part happens after the user has uploaded the photo and after the trained convolutional neural network has been loaded. Upon receipt of a photo, the Flask server would show the network the input photo, perform the inference, and return the results. All of the inference capability would be provided by TensorFlow libraries that could easily be called by the Flask-based web server. A similar library approach is used for inference on mobile devices.

Integrating with Other Systems

A critical aspect for any new technology being considered for adoption in the enterprise is how it fits into the existing corporate infrastructure. Although today’s big data landscape is incredibly crowded and complex, there are some obvious technologies that play a role in many big data stacks across industries.

Data Ingestion Options

Key to deep learning are the often massive amounts of data that must be cleaned, conditioned, and then used to train neural networks. For this to happen, before anything else the data must be ingested. Fortunately, there are many options. First, TensorFlow supports its own native TensorFlow format (tf.Example and tf.SequenceExample) built on protocol buffers in TFRecords. Note that Apache Beam has native support for TFRecords. Second, and slightly slower, TensorFlow has built in functionality to read JSON, comma-separated value (CSV), and Avro data files. Finally, the end user can use Python to read data, including data from Pandas data tables. Because this last option is slowest, it is best for testing and experimentation. Finally, TensorFlow supports several different distributed storage options including Apache Hadoop HDFS, Google Cloud Storage, and Amazon Elastic File System.

TensorFlowOnSpark

Yahoo was kind enough to open source code that allows distributed TensorFlow training and inference to run on clusters built for Apache Spark. From the enterprise perspective, this is potentially very powerful as many shops looking to use TensorFlow might already be using Spark for data analysis and machine learning and have a Spark cluster operational. Thus, the organization can reuse its existing cluster assets instead of setting up separate infrastructure solely for deep learning, making the transition significantly easier. Further, this can alleviate the need to move data from one cluster to another—an often painful and time-intensive process.

From a tactical perspective, TensorFlowOnSpark is compatible with TensorBoard, going so far as configuring a Spark executor to run Tensorboard during training on cluster setup. The API is minimal, making it quick to learn and use and requiring very few changes to existing TensorFlow code to run. TensorFlowOnSpark provides the means to do three things:

  • Start/configure a TensorFlow cluster within spark

  • Feed data to a TensforFlow graph by converting Spark’s Resilient Distributed Datasets (RDDs) to feed_dict

  • Shutdown the TensorFlow cluster when finished

To make the most use of Spark, you want to run the type of TensorFlow programs that will fully saturate the resources; otherwise, performance will not scale linearly as with any distributed application. In terms of downsides, TensorFlowOnSpark is not fully compatible with all community projects (like Keras). Further, Spark running in the Java Virtual Machine (JVM) can provide some relatively inscrutable error messages upon failure. Regardless, this is possibly the easiest way to run distributed TensorFlow for training if your enterprise is already using a Spark cluster.

“Ecosystem” Repo

The ecosystem repo is an Apache 2.0 licensed open source repository on GitHub from Google that contains examples integrating TensorFlow with numerous open source software, including those listed here:

Docker

A set of example Dockerfiles to build containers with various TensorFlow configurations.

Kubernetes

A YAML template file for running distributed TensorFlow on Kubernetes.

Marathon (on top of Mesos)

A configuration file for TensorFlow running in Marathon, a container orchestration for Mesos, which is a cluster manager.

Hadoop

An implementation of the InputFormat/OutputFormat for Apache Hadoop MapReduce using the TRRecords format.

Spark-tensorflow-connector

A library for reading and writing TensorFlow records (TFRecords) in and out of Spark 2.0+ SQL DataFrames.

Consider the “ecosystem” repo a starting point to exploring how you can integrate TensorFlow with other software.

Accelerating Training and Inference

Training deep neural networks takes a significant amount of computational horsepower, often exceeding what a cluster of general-purpose microprocessors can deliver. However, as deep learning’s value became more obvious, the search for higher performance hardware became critical. GPUs were quickly repurposed for this task and, later, custom hardware designed specifically for this use case was and is in development. The important thing to note is that without sufficient training data and sufficient computational horsepower, deep learning would be irrelevant and would not have experienced its impressive success to date.

GPUs and CUDA

Using the graphics processing units (GPUs) to perform floating-point calculations in a massively parallel fashion has intrigued performance-minded programmers for nearly two decades. In fact, the term general-purpose computing on GPUs (GPGPU) was coined in 2002. NVIDIA has been a long-standing promoter of this use case and developed its proprietary Compute Unified Device Architecture (CUDA) as a parallel computing platform and programming model for the company’s GPUs.

Training deep learning networks has emerged as the killer application for this field and NVIDIA augmented its CUDA offering with the NVIDIA Deep Learning Software Development Kit, which contains a GPU-accelerated library of key primitives needed for neural networks called cuDNN. Using the fastest GPU available from NVIDIA can offer a 10- to 100-times speedup for training deep networks versus the fastest available CPU from Intel.

Tensor Processing Units

GPUs used to rule the benchmarks for accelerating deep neural networks until the world learned of Google’s Tensor Processing Units (TPUs) at Google IO in May of 2016. The first-generation TPU was announced in May 2016 at the Google I/O conference and only accelerated inference workloads (not training) using quantized integer arithmetic and not floating point. An excellent technical overview of the first first-generation TPU is online here, and a very thorough technical article on its performance was presented this past year and is available online.2 Importantly, this first-generation TPU has been in operation in Google’s datacenters for more than a year and helped power Google’s AlphaGo win over Go World Champion Lee Sedol.

The second generation of TPU was announced in 2017 and can perform both inference and training and do floating-point arithmetic. Each individual processor offers 45 teraflops of performance and are arranged into a four-chip, 180-teraflop device. Sixty-four such devices are assembled into a pod that offers 11.5 petaflops of deep learning performance.3 The key to both chips, as is important for any server-oriented processor, is that they not only provide impressive floating-point performance, but also consume less power than traditional processors.

Why does this matter for the enterprise? Because the TPU reduces the cost and time required to train models. Even though Google has no plans to sell TPUs, this capability is being made available via Google’s Cloud offerings. Further, there are some intriguing options for Google to optimize across both software and hardware given that the company controls the entire stack. Google is not the only company in this space; Nervana, a small company making custom silicon for accelerating deep learning, was purchased by Intel in August of 2016.

Google Cloud TPU and CloudML

Cloud TPU is a Google Cloud service offering currently in alpha that gives users the ability to perform training and inference of machine learning models using the second-generation TPUs. You can connect to the Cloud TPUs from both standard and custom virtual machine types, and the offering is also fully integrated with other Google Cloud Platform offerings including Google Compute Engine and BigQuery.4 This is the most direct way for the enterprise to take advantage of Google’s TPUs. Google also exposes TPUs indirectly through some of the functionality of the Cloud Machine Learning Engine (Cloud ML).

Summary

The question for any enterprise adopting deep learning is how will it integrate into the organization’s existing workflows and data pipelines. The TensorFlow data pipeline is composed of three stages: (1) data preparation, (2) model training, and (3) model serving and inference. All three see substantial support from both the TensorFlow library directly and the emerging ecosystem. This data pipeline is very similar to the traditional machine learning pipeline found in the enterprise with one notable difference; model training can be substantially more time and resource intensive for deep learning models. The ecosystem attempts to remedy this situation with support for multiple GPUs and even Google’s own TPUs.

1 Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich, “Going Deeper with Convolutions”, Computer Vision and Pattern Recognition (2015).

2 N.P. Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, Proceedings of the 44th Annual International Symposium on Computer Architecture (June 2017): 1-12.

3 Patrick Kennedy, “Google Cloud TPU Details Revealed,” STH, May 17, 2017, https://www.servethehome.com/google-cloud-tpu-details-revealed/.

4 Cloud TPUs - ML accelerators for TensorFlow  |  Google Cloud Platform.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset