Chapter 2. Selecting a Deep Learning Framework

When the decision is made to adopt deep learning, the first question that arises is which deep learning library should you choose (and why)? Deep learning has become a crucial differentiator for many large technology firms and each has either developed or is championing a particular option. Google has TensorFlow. Microsoft has the Microsoft Cognitive Toolkit (aka CNTK). Amazon is supporting the academia-built MXNet, causing some to question the longevity of the internally developed DSSTNE (Deep Scalable Sparse Tensor Network Engineer). Baidu has the PArallel Distributed Deep LEarning (PADDLE) library. Facebook has Torch and PyTorch. Intel has BigDL. The list goes on and more options will inevitably appear.

We can evaluate the various deep learning libraries on a large number of characteristics: performance, supported neural network types, ease of use, supported programming languages, the author, supporting industry players, and so on. To be a contender at this point, each library should offer support for the use of graphics processing units (GPUs)—preferably multiple GPUs—and distributed compute clusters. Table 2-1 summarizes a dozen of the top, open source deep learning libraries available.

Table 2-1. General information and GitHub statistics for the 12 selected deep learning frameworks (the “best” value in each applicable column is highlighted in bold)

General information

GitHub statistics

Org

Year

License

Current version

Time since last commit

Watches

Commits

Contrib­utors

Caffe

(GitHub)

UC Berkeley

2014

BSD 2-Clause

1.0

21 days

2037

4045

248

Caffe2

(GitHub)

Facebook

2017

BSD 2-Clause

0.8.0

1 hour

437

2406

113

BigDL

(GitHub)

Intel

2017

Apache 2.0

0.2.0

13 hours

179

1752

37

Deeplearning4J

(GitHub)

Skymind

2014

Apache 2.0

0.9.2

11 hours

712

8621

124

DyNet

(GitHub)

Carnegie Mellon

2015

Apache 2.0

2.0

8 hours

160

2769

79

DSSTNE

(GitHub)

Amazon

2016

Apache 2.0

255 days

345

221

22

Microsoft Cognitive Toolkit (CNTK)

(GitHub)

Microsoft Research

2015

MIT

2.1

3 hours

1235

14791

145

MXNet

(GitHub)

Amazon

2015

Apache 2.0

0.1

1 hour

987

5820

405

PADDLE

(GitHub)

Baidu

2016

Apache 2.0

0.10.0

2 hours

492

6324

72

TensorFlow

(GitHub)

Google

2015

Apache 2.0

1.3

8 hours

6087

21600

1028

Theano

(GitHub)

Université de Montréal

2008

BSD License

0.9

1 day

552

27421

321

Torch7

(GitHub)

Several

2002

BSD License

7.0

2 days

680

1331

134

Supported programming languages

Nearly every framework listed was implemented in C++ (and potentially use Nvidia’s CUDA for GPU acceleration) with the exception of Torch, which has a backend written in Lua, and Deeplearning4J, which has a backend written for the Java Virtual Machine (JVM). However, the important issue when using these frameworks is which programming languages are supported for training—the compute-intensive task of allowing the neural network to learn from data and update internal weights—and which languages are supported to inference—showing the previously trained network new data and reading out predictions. As inference is a much more common task for production, one could argue that the more languages a library supports for inference, the easier it will be to plug in to existing enterprise infrastructures. Training is somewhat more specialized, so the language support might be more limited. Ideally, a framework would support the same set of languages for both tasks.

Different types of networks

There are many different types of neural networks, and researchers in academia and industry are developing new network types with corresponding new acronyms almost daily. To name just a few, there are feed forward networks, fully connected networks, convolutional neural networks (CNNs), restricted Boltzman machines (RBMs), deep belief networks (DBNs), denoising autoencoders, stacked denoising autoencoders, generative adversarial network (GANs), recurrent neural networks (RNNs), recursive neural networks, and many more. If you would like graphical representations of the above or an even longer list of different neural network types/architectures, the Neural Network Zoo is a good place to start.

Two network types that have received significant press are convolutional neural networks that can handle images as inputs, and recurrent neural networks and variations, such as LSTM, that can handle sequences—think text in sentence, time–series data, audio streams, and so on—as input. The deep learning library that you choose should support the broadest range of networks and, at the very least, those most relevant to business needs.

Deployment and operationalization options

Although both machine learning and deep learning often require a significant amount of data for training, deep learning truly heralded the transition from big data to big compute. For the enterprise, this is likely the largest issue and potential obstacle transitioning from more traditional machine learning techniques to deep learning. Training large-scale neural networks can take weeks or even months; thus, even a 50% performance gain can offer enormous benefits. To make the process feasible, training networks requires significant raw computing power that often comes in the form of one or more GPUs or even more specialized processors. Ideally, a framework would support both single and multi- CPU and GPU environments and heterogeneous combinations.

Accessibility of help

The degree to which help is available is a very important component to the usefulness and success of a library. The volume of documentation is a strong indicator of the success (and potential longevity) of a platform and the adoption and use of the library easier. As the ecosystem grows, so too should the documentation in numerous forms including online tutorials, electronic and in-print books, videos, online and offline courses, and even conferences. Of particular note to the enterprise is the issue of commercial support. Although all of the aforementioned libraries are open source, only one offers direct commercial support: Deeplearning4J. It is highly likely that third parties will be more than eager to offer consulting services to support the use of each library.

Enterprise-Ready Deep Learning

Down selecting from the dozen deep learning frameworks, we examine four of the libraries in depth due to their potential enterprise readiness: TensorFlow, MXNet, Microsoft Cognitive Toolkit, and Deeplearning4J. To give an approximate estimate of popularity, Figure 2-1 presents the relative worldwide interest by search term as measured by Google search volume.

TensorFlow

Google has a rich and storied history in handling data at scale and applying machine learning and deep learning to create useful services for both consumers and enterprises. When Google open sources software, the industry takes notice, especially when it is version two. In 2011, Google internally used a system called DistBelief for deep learning that was capable of using “large-scale clusters of machines to distribute training and inference in deep networks.”1 The lessons learned from years of operating this platform ultimately guided the development of TensorFlow, announced in November of 2015.2

TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery.

Some who believe that this is a winner-take-all space would say that TensorFlow has already won the war for developer mindshare. Although that pronouncement is likely premature, TensorFlow does currently have impressive momentum. By nearly all metrics, TensorFlow is the most active open source project in the deep learning space. It also has the most books written about it, has an official conference, has generated the most worldwide interest as measured by Google search volume, and has the most associated meetups. This type of lead will be difficult to overcome for its competitors.

MXNet

MXNet is the youngest deep learning framework that we will examine more closely. It entered Apache incubation in January 2017 and the latest version as of October 2017 was the 0.11.0 release. The question is, given its youth and the credentials of competitors, should enterprises even be aware of this alternative deep learning framework? The very loud answer to that question came from Amazon, which announced in November 2016 that “Apache MXNet is Amazon Web Services’ deep learning framework of choice”. It is probably no coincidence that one of the founding institutions behind MXNet was Carnegie Mellon University and Dr. Alexander Smola, a professor in the CMU machine learning department, joined Amazon in July 2017.

To further make the case that MXNet is a contender, the latest release candidate of the framework allows developers to convert MXNet deep learning models to Apple’s Core machine learning format, meaning that billions of iOS devices can now provide inference capability to applications using MXNet. Also note that Apple is the one large tech company not associated with any of the aforementioned deep learning frameworks.

The next question is, what is MXNet and how does it improve upon existing libraries?3

MXNet is a multilanguage machine learning library to ease the development of machine learning algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters.

MXNet arose out of a collaboration between a number of top universities including CMU, MIT, Stanford, NYU, the University of Washington, and the University of Alberta. Given its more recent development, the authors had an opportunity to learn from the deep learning frameworks that have come before and potentially improve upon them. The framework strives to provide both flexibility and performance. Developers can mix symbolic and imperative programming models, and both can be parallelized by the dynamic dependency scheduler. Developers can also take advantage of the predefined neural network layers to construct complex networks with little code. Importantly, MXNet goes far beyond supporting Python; it also has full APIs for Scala, R, Julia, C++, and even Perl. Finally, the MXNet codebase is small and was designed for efficient scaling over both GPUs and CPUs.

Microsoft Cognitive Toolkit (CNTK)

Despite the rise of the web, Microsoft is still one of the dominant vendors in the enterprise space. Thus, it should come as no surprise that the Microsoft Research deep learning framework is one to examine. Formerly known as the Computational Neural Toolkit (CNTK), the toolkit apparently emerged from the world class speech transcription team at Microsoft Research and was then generalized for additional problem sets. The first general paper emerged in 2014 and the software appeared on Github in January of 2016.4 It was used in 2016 to achieve human-level performance in conversational speech recognition. The toolkit promises efficient scalability and impressive performance in comparison to competitors.

Deeplearning4J

Deeplearning4J is somewhat of the odd framework in this list. Even though Python has become the nearly de facto language for deep learning, Deeplearning4J was developed in Java and designed to use the JVM and be compatible with JVM-based languages such as Scala, Clojure, Groovy, Kotlin, and JRuby. Note that the underlying calculations are coded in C/C++ and CUDA. This also means that Deeplearning4J works with both Hadoop and Spark out of the box. Second, many of the earlier deep learning frameworks arose out of academia and the second wave rose out of larger technology companies. Deeplearning4J is different because it was created by a smaller technology startup (Skymind) based in San Francisco and started in 2014. Although Deeplearning4J is open source, there is a company that is willing to provide paid support for customers using the framework.

Industry Perspectives

Jet.com provides an interesting example of deep learning library selection. Jet.com is a Microsoft shop, from top to bottom, and is one of the few remaining shops focused on the F# programming language in the United States (they also use C# and the .NET framework). For cloud services, the company uses Microsoft Azure. Despite the strong focus on Microsoft, Jet.com uses TensorFlow for deep learning. The company had originally started with Theano but transitioned to TensorFlow quickly after it was released. TensorFlow is run on virtual instances with GPUs running within the Microsoft Azure cloud.

As a technology startup, PingThings has substantial leeway to select different technologies. Google’s prominence in the field and the volume of documentation and tutorials were both strong motivators for choosing TensorFlow. However, PingThings is working on projects funded by the National Science Foundation (NSF) and the Advanced Research Projects Agency-Energy (ARPA-E) with collaborators from multiple research institutes while also deploying hardware inside of utilities. Thus, the fact that Google designed TensorFlow to balance the needs of both research and robust operation at scale was particularly important. Tensor2Tensor (a part of the ecosystem that we discuss later) was particularly appealing because of its focus on sequence data. MXNet is an interesting new option whose future development will be watched, especially given its strong performance and support from Amazon.

Summary

TensorFlow does everything well enough: competitive performance, strong support for different neural network types, numerous hardware deployment options, multi-GPU support, numerous programming language options, and more. However, the library’s allure transcends this feature set.

Google played a large part in helping to start the big data revolution with the combination of a distributed file system and the MapReduce computing framework, and continues to lead the industry today.5, 6 Further, there are numerous successful examples of technologies iterating within Google and then seeing a widespread release. Kubernetes, the popular container orchestration system, is one such example, being the result of years of experience and lessons learned from earlier internal systems like Borg and Omega.7 Google is well regarded for both advancing the state of the art and software engineering at web scale, a balance between academia and industry that seems particularly appropriate for the gold rush that is deep learning.

TensorFlow inherits this goodwill with an aim to be sufficiently flexible for intense research while also robust enough to allow production deployment of its models. Newer frameworks might arise that could build upon the lessons learned from TensorFlow, improving various aspects of the library, making available multiple programming methodologies, or offering a more improved performance. Many of the libraries described above attempt to do one or more of these things. However, TensorFlow is constantly applying these lessons, striving for better performance and exploring new approaches, as well. As long as Google’s weight and effort remain behind TensorFlow, it will continue to be a strong, safe, and practically the default choice for deep learning libraries, especially given the ecosystem that we describe in Chapter 3.

1 J. Dean et al., “Large Scale Distributed Deep Networks,” Advances in Neural Information Processing Systems 25 (2012).

2 M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems”, preliminary white paper (2015).

3 T. Chen et al., “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems”, NIPS Machine Learning Systems Workshop (2016).

4 A. Agarwal et al. (2014). “An Introduction to Computational Networks and the Computational Network Toolkit”, Microsoft Technical Report MSR-TR-2014-112 (2014).

5 S. Ghemawat, H. Gobioff, and S. Leung, “The Google File System,” Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (2003).

6 J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation 6 (2004): 10.

7 B. Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes, “Borg, Omega, and Kubernetes: Lessons learned from three container-management systems over a decade,” System Evolution 14, no. 1 (2016).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset