Chapter 7. Case scenarios: Using IBM PowerAI

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Case scenarios: Using IBM PowerAI

To identify opportunities for using deep learning (DL) and IBM PowerAI, this chapter describes three different use cases that are based on our field experiences. For each scenario, we describe the requirement, solution, and benefits.

Note: The scenarios in this chapter make many assumptions regarding the number of servers, networking, operating systems, open source software, and other items to help you understand what you can do with IBM PowerAI DL. There are many ways to help solve your challenges, and the scenarios illustrate one of the many ways.

The scenarios that are described in this chapter shows how to use IBM PowerAI, but do not represent customers’ configurations for their productions systems. Even though there might be some likeness or similarities to the solutions that are represented in this chapter, it is purely coincidental.

This chapter contains the following topics:

•Use case one: Bare metal environment

•Use case two: Multitenant environment

•Use case three: High-performance computing environment

•Conclusion

7.1 Use case one: Bare metal environment

In many modern industries, companies require and depend on real-time responses from their applications and systems to maintain a successful business. Such customers are receptive to new technology (such as DL) and have expectations about any new technology that quickly contributes to satisfying their business goals and requirements.

7.1.1 Customer requirements

When users and data scientists initially experiment with implementation, they do not consider the infrastructure that hosts their DL frameworks because their focus on the environment is the accuracy results that their models produce. However, as their familiarity grows and they implement more complex models with larger data sets, it is inevitable that the training phase takes longer. As a result, their focus shifts to the infrastructure under the frameworks. They start to evaluate differences in performance depending on the servers’ configurations. As a criteria for vendor selection, customers tend to evaluate the performance of one specific framework, and testing a specific data set such as an imagenet.

Here are examples of customer requests:

•Capability to host an on-premises DL infrastructure for real-time analysis of large quantities of internal data

•Capability to run frameworks, such as TensorFlow and Torch, on the DL platform

•Proven expectation of reduction for the training time compared to other providers (both for on-premises and cloud providers)

•Reduced total cost of ownership (TCO) compared to other solutions

7.1.2 IBM solution

The customer requirements that are described in 7.1.1, “Customer requirements” on page 232 can be satisfied by implementing the following environment:

•Hardware: IBM Power System S822LC for High Performance Computing server

•Operating system: Ubuntu 16.04

•Software: IBM PowerAI Release 4, including TensorFlow and Torch

– CUDA: Version 8

– CUDA Deep Neural Network (cuDNN): Version 6

– NVDIA driver: Version 384.66

•Network: 10 Gb Ethernet (see Figure 7-1)

Figure 7-1 Network configuration for the bare metal environment

7.1.3 Benefits

As a result of the proposed solution that is described in 7.1.2, “IBM solution” on page 232, there are a few benefits:

•Better training performance for TensorFlow and Torch on Power S822LC for High Performance Computing with NVLink (proven by proof of concept)

•Evaluation item: Training time with single graphical processing unit (GPU), which results in faster performance than other offerings

•Lower running cost than a public cloud for a 3-year period

•Enterprise-ready package and quick installation of DL frameworks and related libraries

Figure 7-2 illustrates the image of the installation for IBM PowerAI Release 4 and IBM PowerAI Vision on a bare metal server.

Figure 7-2 IBM PowerAI V1.4 integrated with artificial intelligence Vision on a bare metal machine

Figure 7-3 shows the image of the installation for IBM PowerAI Enterprise with all the benefits from IBM PowerAI Release 5 on bare metal server due to the new option of Red Hat Enterprise Linux with Release 5.

Figure 7-3 IBM PowerAI V1.5 integrated with IBM Spectrum Conductor with Spark and Deep Learning Impact

7.2 Use case two: Multitenant environment

Cloud computing is the standard platform for many companies from an IT consumer perspective as a viable alternative to hosting internal environments. Alternatively, from the viewpoint of IT providers, platforms must be on-premises servers and maintain the highest level of system and data security. Regarding data protection, healthcare institutions and research institutions are frequently mandated to host data within their institutions or within specific countries because of the required treatment of personal, financial, or regulatory information.

It is a common requirement to securely use servers for multiple and distinct customers. This section introduces the multitenancy environment.

7.2.1 Customer requirements

Here is one of the examples of running NVIDIA-Docker on Power S822LC for High Performance Computing and IBM PowerAI on a Docker container. The purpose for using this NVIDIA-Docker solution generally is for cloud services, and internal use across the departments within the company.

Here are examples of customer requests:

•Platform supports the current NVIDIA GPU.

•Use the servers as a shared platform across the departments.

•Use the platform for DL and other open source applications.

•Address concerns about the potential bottleneck of GPU computing that is caused by limited memory and storage I/O.

•High compatibility with existing storage file systems (for example, IBM Spectrum Scale).

7.2.2 IBM solution

In this situation, the flexibility of the platform is highly requested. Because of the GPU-enabled Docker technology, that is, NVIDIA-Docker, IBM can provide the following components in the solution:

•Hardware: FIve Power S822LC for High Performance Computing servers

•Host operating system: Red Hat Enterprise Linux 7.3

•Software:

– IBM PowerAI Release 4

– CUDA: Version 8

– cuDNN: Version 6

– NVDIA driver: Version 384.66

NVIDIA-Docker with a guest operating system image: Ubuntu and CentOS

•Network: InfiniBand Enhanced Data Rate (EDR) (see Figure 7-4)

Figure 7-4 Network configuration image for the NVIDIA-Docker case

7.2.3 Benefits

From the solution that is proposed in 7.2.2, “IBM solution” on page 236, the following benefits can be achieved:

•Establishment of a multitenant environment for the departments in the company

•Easier, faster implementation by the IBM PowerAI toolkit with optimized DL libraries

•Dedicated use of installed GPUs for each Docker container

•CPU-GPU NVLink support

•Reduced bottleneck in storage I/O with the InfiniBand connection

Figure 7-5 shows the architectural diagram of NVIDIA-Docker that is installed on Red Hat Enterprise Linux to obtain the multitenancy environment for business use.

Figure 7-5 NVIDIA-Docker solution for IBM PowerAI

7.3 Use case three: High-performance computing environment

IBM Power Systems servers have been used within high-performance computing (HPC) environments since the days of the original IBM RS/6000® SP clusters. The scalability and flexibility of the hardware platform makes it an attractive choice for the HPC community. This relationship continues with the advent and popularity of DL. The usage of GPU-enabled
IBM Power Systems for HPC environments is increasing. These servers provide an alternative model compared to traditional HPC environments, where it was expected to require the largest server models with the most quantity of resources, now the GPU-enabled models provide the same processing power in a much smaller footprint. Where previously rooms were required, now we can talk in quantities of racks.

Presently, there are many data scientists conducting DL frameworks in such clusters. The largest announced environment at the time of writing is the one that will be adopted for the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) project for the US Department of Energy. This environment will be constructed by using the recently announced POWER9 based Power AC922 machines. Especially for the DL frameworks in HPC, users manage the environment with job scheduling software, such as IBM Spectrum LSF®, and recently with Spark technology to enable distributed parallel processing.

7.3.1 Customer requirements

Here are examples of customer requests:

•Adopt the latest technology available.

•Support the following components:

– NVIDIA GPU

– InfiniBand network

– Open source DL frameworks, such as Caffe, Torch, or TensorFlow

•Use other vendors technologies.

•Accommodate the HPC cluster with Spark.

•Mitigation of bottlenecks relevant to networking between devices.

7.3.2 IBM solution

To satisfy the customer requirements that are shown in 7.3.1, “Customer requirements” on page 238, IBM provides experiences and solutions that can be integrated with other software. As a basic proposal (Figure 7-6), IBM can provide a solution with the following components:

•Hardware: Thirty Power S822LC for High Performance Computing servers

•Host operating system: Ubuntu 16.04

•Software: IBM PowerAI Release 4 with distributed deep learning (DDL) and large model support (LMS) functions:

– CUDA: Version 8

– cuDNN: Version 6

– NVDIA driver: Version 384.66

– IBM Spectrum Conductor with Spark

– LSF

•Network: InfiniBand switch and cable for EDR

7.3.3 Benefits

By providing an IBM PowerAI end-to-end solution consisting of all of the components that are shown in 7.3.2, “IBM solution” on page 238, customers can obtain these benefits:

•Reduce the training time by double digits with the latest technology that is available.

•LMS and DDL provide better scalability of GPUs.

•Easily alter both the data and the artificial intelligence (AI) model by using an on-premises Power System solution.

•Easier and faster implementation with the IBM PowerAI toolkit with optimized DL libraries.

•Easy management and control of resources with job scheduler.

Figure 7-6 IBM PowerAI V1.5 integrated with Deep Learning Impact and IBM Spectrum Conductor with Spark in a high-performance computing environment

7.4 Conclusion

In the previous sections, we described the key components of the IBM PowerAI Deep Learning framework, summarized the benefits of the GPU-accelerated Power Systems hardware, and provided example installations and implementations of IBM PowerAI versions 4 and 5.

IBM PowerAI makes the field of DL available and consumable to every audience, including the causal user, small businesses, and traditional HPC organizations. The significance of the acceleration that is provided by GPUs results is realized in savings in both processing time and physical footprint. The smaller use cases that use capacity from a cloud provider (see IBM and NVIDIA Team Up on World’s Fastest Deep Learning Enterprise Solution) provide a low-cost route for evaluation and implementation, and in the larger cases, the number of physical servers are reduced compared to traditional computing requirements from previous decades.

If you are unfamiliar with the field of DL these example use cases provide context and perspective. Understanding what data you have is key to using DL. What connections, trends, or observations might be hidden from your existing view? Using DL for data processing provides many previously unavailable opportunities and areas of interest for academic, scientific, industrial, retail, and commercial businesses. DL has been growing in popularity for some time, but it is the current ease of adoption that is making it commonplace in many areas of our everyday life.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7. Case scenarios: Using IBM PowerAI

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 7. Case scenarios: Using IBM PowerAI