System overview
This chapter provides an overview of the IBM Blue Gene/Q system and its software environment. It includes the following sections:
1.1 Blue Gene/Q environment overview
The Blue Gene/Q system, shown in Figure 1-1, is the third-generation computer architecture in the Blue Gene family of supercomputers.
Figure 1-1 Blue Gene/Q system architecture
The Blue Gene/Q system comprises multiple components including one or more compute racks and optionally I/O racks. The system contains densely packaged compute nodes, I/O drawers, and service cards. Additional hardware is associated with the storage subsystem, the primary service node (SN), the front end nodes (FENs), and the communications subsystem. The I/O drawers containing I/O nodes connect to the functional local area network (LAN) to communicate with file servers, FENs, and the SN. The service cards connect to the control subnets and are used by the SN to control the Blue Gene/Q hardware.
A service node provides a single point of control and administration for the Blue Gene/Q system. It is possible to operate a Blue Gene/Q system with a single service node. However, the system environment can also be configured to include distributed subnet service nodes (SSN) for high scalability. System administration is outside the scope of this book and is covered in the IBM System Blue Gene Solution: Blue Gene/Q System Administration, SG24-7869 Redbooks publication.
A front end node, also known as a login node, comprises the system resources that application developers log in to for access to the Blue Gene/Q system. Application developers edit and compile applications, create job control files, launch jobs on the Blue Gene/Q system, post-process output, and perform other interactive activities.
1.2 Blue Gene/Q hardware overview
Figure 1-2 shows the primary hardware components of the Blue Gene/Q system.
Figure 1-2 Blue Gene/Q hardware overview
Compute cards contain 16 IBM Blue Gene/Q PowerPC® A2 core processors and 16 GB of memory. Thirty-two such cards plug into a node board and 16 node boards are contained in a midplane. A Blue Gene/Q compute rack has either one (half rack configuration) or two fully populated midplanes. The system can be scaled to 512 compute racks.
Compute racks components are cooled either by water or air. Water is used for the processing nodes. Air is used for the power supplies and the I/O drawers mounted in the Blue Gene/Q rack.
I/O drawers are either in separate racks or in I/O enclosures on top of the compute racks, sometimes described as top hats. Eight I/O nodes are housed in each I/O drawer. In the compute rack, up to four I/O drawers, two per midplane, can be configured using the I/O enclosure (top hat). The placement of I/O drawers in the I/O rack configuration is advisable in a large system installation where the number of I/O nodes cannot be accommodated in the compute racks.
For an introduction to the Blue Gene/Q hardware components, see the Blue Gene/Q Hardware Overview and Installation Planning Guide, SG24-7822 Redbooks publication.
1.3 Blue Gene/Q software overview
The Blue Gene/Q software includes the following features:
Scalable Blue Gene/Q system administration and management services running on service nodes, subnet service nodes, and front end nodes
Compute Node Kernel (CNK) running on the compute nodes
Full Linux kernel running on I/O nodes
Message Passing Interface (MPI) between compute nodes through MPI library support
Open multi-processing (OpenMP) application programming interface (API)
Support for the standard IBM XL family of compilers with XLC/C++, XLF, and the GNU Compiler Collection
Software support that includes IBM Tivoli® Workload Scheduler LoadLeveler®, IBM General Parallel File System (GPFS™), and Engineering and Scientific Subroutine Library (ESSL)
Support for running Python applications
Support for debuggers including GNU Project Debugger (GDB)
1.3.1 System administration and management
The responsibilities of a Blue Gene/Q system administrator can be wide-ranging, but the administrator typically maintains and monitors the health of the Blue Gene/Q system. Most of the system administrator tasks are performed from the service node. The Navigator web application that runs on the service node plays an important role in helping administrators perform their job. The IBM System Blue Gene Solution: Blue Gene/Q System Administration, SG24-7869 Redbooks publication provides a comprehensive description of administering a Blue Gene/Q system, including how to use the key features of Navigator, manage compute and I/O blocks, run diagnostics, perform service actions, use the console, handle alerts, manage various servers, submit and manage jobs, and configure I/O nodes.
1.3.2 Compute Node Kernel and services
The Compute Node Kernel (CNK) software is an operating system that is similar to Linux and provides an environment for running user processes on compute nodes. The CNK includes the following services:
Process creation and management
Memory management
Process debugging
Reliability, availability, and serviceability (RAS) management
File I/O
Network
The Blue Gene/Q software stack includes a standard set of runtime libraries for C, C++, and Fortran. To the extent that is possible, the supported functions maintain open standard Portable Operating System Interface (POSIX)-compliant interfaces. The CNK has a robust threading implementation on the Blue Gene/Q system that supports pthread, XL OpenMP, and GNU OpenMP implementations. The Native POSIX Thread Library (NPTL) pthreads implementation in the GNU C Library (GLIBC) runs without modification.
Although statically linked executable programs provide optimal performance, the CNK also has support for dynamically linked executable programs. This support enables dynamically linked scripting languages, such as Python, to be used in CNK environments.
For more information about the Compute Node Kernel, see 2.1, “Compute Node Kernel” on page 10.
1.3.3 I/O node kernel and services
The I/O node kernel is a patched Red Hat Enterprise Linux 6 kernel running on I/O nodes. The patches provide support for the Blue Gene/Q platform and contain modifications to improve performance.
The I/O node software provides I/O services to compute nodes. For example, applications that are running on compute nodes can access file servers and communicate with processes in other machines. The I/O nodes also play an important role in starting and stopping jobs and in coordinating activities with debug and monitoring tools.
Blue Gene/Q is a diskless system, so file servers must be present. A high-performance parallel file system is expected. The Blue Gene/Q system is flexible and accepts various file systems that are supported by Linux. Typical parallel file systems are the IBM General Parallel File System (GPFS) and Lustre.
The I/O node includes a complete internet protocol (IP) stack with Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) services. A subset of these services is available to user processes running on the compute nodes that are associated with an I/O node. Application processes communicate with processes that are running on other systems with client-side sockets. Support for server-side sockets is also provided. The I/O node implements the sockets so that a group of compute nodes behave as though the compute tasks are running on the I/O node. In particular, this means that the socket port number is a single address space within the group. The compute nodes share the IP address of the I/O node.
The I/O node kernel is designed to be booted as infrequently as possible. The bootstrap process includes loading a ramdisk image and booting the Linux kernel. The ramdisk image is extracted to provide the initial file system. This system contains minimal commands to mount the file system on the service node using the Network File System (NFS). The boot continues by running startup scripts from the NFS. It also runs customer-supplied startup scripts to perform site-specific actions, such as logging configuration and mounting high-performance file systems.
Toolchain shared libraries and all of the basic Linux text and shell utilities are local to the ramdisk. Packages, such as GPFS, and customer-provided scripts are NFS mounted for administrative convenience.
A complete description of the I/O node software is provided in the IBM System Blue Gene Solution: Blue Gene/Q System Administration, SG24-7869 Redbooks publication.
1.3.4 Message Passing Interface
The implementation of the Message Passing Interface (MPI) on the Blue Gene/Q system is the MPICH2 standard that was developed by Argonne National Labs. For more information about MPICH2, see the Message Passing Interface (MPI) standard website at:
The dynamic process management function (creating new MPI processes) of the MPI-2 standard is not supported by the Blue Gene/Q system. However, the various thread modes are supported.
1.3.5 Compilers
The Blue Gene/Q toolchain compilers and the IBM XL compilers for Blue Gene/Q compute nodes are available for use on the Blue Gene/Q system. Because compilation occurs on the front end node and not the Blue Gene/Q system, the compilers for the Blue Gene/Q system are cross-compilers. See 7.2, “Compilers for the Blue Gene/Q system” on page 80 for more information about compilers.
GNU compilers
The compilers in the Blue Gene/Q toolchain are based on the GNU compilers. When installing the Blue Gene/Q software, RPM package managers (RPMs) are provided so that the user can build and install the Blue Gene/Q toolchain into the gnu-linux directory of the software stack. The Blue Gene/Q toolchain compilers are used to build much of the Blue Gene/Q system software and provide the base libraries for user applications. They can be used to build applications to run on the Blue Gene/Q compute nodes. See 7.2.2, “GNU Compiler Collection” on page 81 for more information about the GNU compilers.
IBM XL compilers
The IBM XL compilers for Blue Gene/Q can be used to build applications that run on the Blue Gene/Q system. The IBM XL compilers can provide higher levels of optimization than the Blue Gene/Q toolchain compilers. The XL compilers for Blue Gene/Q support single instruction, multiple data (SIMD) vectorization (simdization). Simdization enables automatic code generation to use the quad floating‑point unit (FPU) of the Blue Gene/Q system. This unit can handle four simultaneous floating‑point instructions. The Blue Gene/Q XL compilers also provide support for source code syntax to use transactional memory and speculative threads. See 7.2.1, “IBM XL compilers” on page 80 for more information about the IBM XL compilers.
MPI wrapper scripts for Blue Gene/Q compilers
The MPI wrapper scripts are compiler wrapper scripts that are provided in the Blue Gene/Q driver. These scripts can be used to compile and link programs that use MPI. Various MPI scripts are available, depending on which compiler is used to compile the code and the version of the libraries to be linked. The wrapper scripts start the appropriate compiler and add all necessary directories, libraries, and options that are required to compile programs for MPI. For each compiler language and standard that is provided for the Blue Gene/Q system, there is a corresponding MPI wrapper script. There are also thread-safe versions for each of the IBM XL compilers. The MPI wrapper scripts are described in 6.5, “Compiling MPI programs on the Blue Gene/Q system” on page 74.
1.3.6 Application development and debugging
Application developers access front end nodes to compile and debug applications, submit Blue Gene/Q jobs, and perform other interactive activities.
Debuggers
The Blue Gene/Q system includes support for running GNU Project Debugger (GDB) with applications that run on compute nodes. Other third-party debuggers are also available. See 8.2, “Debugging applications” on page 104.
Running applications
Blue Gene/Q applications can be run in several ways. The most common method is to use a job scheduler that supports the Blue Gene/Q system, such as the LoadLeveler scheduler. Another less common option is to use the runjob command directly. All Blue Gene/Q job schedulers use the runjob interface for job submission, but schedulers can wrap it with another command or job submission interface. The runjob command is described in the IBM System Blue Gene Solution: Blue Gene/Q System Administration, SG24-7869 Redbooks publication.
For more information about the LoadLeveler scheduler, see 8.1.1, “IBM LoadLeveler” on page 104.
Application memory considerations
On the Blue Gene/Q system, the entire physical memory of a compute node is 16 GB, so careful consideration of memory is required when writing applications. Some of that space is allocated for the CNK. Shared memory space is also allocated to the user process at the time the process is created.
The CNK tracks collisions of the stack and heap as the heap is expanded with brk() and mmap() system calls. The CNK and its private data are protected from reads and writes by the user process or threads. The code space of the process is protected from writing by the process or threads. Code and read-only data are shared between the processes that share each node.
The amount of memory required by the application is an important topic for Blue Gene/Q. The memory used by an application falls into one of the following classifications:
bss Uninitialized static and common variables
data Initialized static and common variables
heap Controlled allocatable arrays
stack Controlled automatic arrays and variables
text Application text (instructions) and read-only data
The Blue Gene/Q system implements a 64-bit memory model. You can use the Linux size command to display the memory size of the program. However, the size command does not provide any information about the runtime memory usage of the stack or heap.
The memory that is available to the application depends on the number of processes per node. The 16 GB of available memory is partitioned as evenly as possible among the processes on each node. Because memory is a limited resource, it is generally advisable to conserve memory in the application. In some cases, the memory requirement can be reduced by distributing data that was replicated in the original code. However, additional communication might be required. On Blue Gene/Q systems, the total number of processes can be large. Consider the memory that is required to store arrays that have the number of processes as one or more of the array dimensions.
Other considerations
It is important to understand that the operating system present on the compute node, the CNK, is not a full version of the Linux operating system. Therefore, use care in the areas explained in the following sections when writing applications for the Blue Gene/Q system. For a full list of supported system calls, see 5.3, “System calls” on page 63.
Input and output
Pay special attention to I/O in your application. The CNK does not perform I/O. I/O is managed by the I/O node.
File I/O
A limited set of file I/O is supported. Do not attempt to use asynchronous file I/O because it causes runtime errors.
Standard input
Standard input (stdin) is supported on the Blue Gene/Q system.
Socket calls
Socket calls are supported on the Blue Gene/Q system. For more information, see Chapter 5, “Compute Node Kernel interfaces” on page 55.
Linking
Dynamic linking is supported on the Blue Gene/Q system. You can statically link all code into your application or use dynamic linking.
Shell scripts
The CNK does not provide a mechanism for a command interpreter or shell when applications start on the Blue Gene/Q system. Only the executable program can be started. Therefore, if the application includes shell scripts that control workflow, the workflow must be adapted. For example, an application workflow shell script cannot be started with the runjob command. Instead, run the application workflow scripts on the front end node and start the runjob command only at the innermost shell script level where the main application binary is called.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset