Chapter 10

Linux on the Coprocessor

It’s just Linux. The very first users of the Intel® Xeon Phi™ coprocessor would always ask us general or even very specific “how to” questions about understanding how to manage, or port, or run code on the coprocessor assuming it differed from what they’d consider “normal.” We were happy to surprise them. Most often, our answer would simply be “It’s just Linux!” Much of why we could answer like that for application developers was explained in Chapters 8 and 9 on the hardware and software architecture of the coprocessor. We continue that in this chapter while looking at the installation and management aspects for a system.

In this chapter, we focus more on knowledge useful to the administration of a system that contains Intel Xeon Phi coprocessors. The information in this chapter is generally uninteresting if your focus is purely developing applications for the coprocessors. We will delve a little deeper into the Linux implementation on the coprocessor, how to configure it, use of the micctrl utility, and Intel® Cluster Checker. In particular, while it is indeed running a true Linux operating system, the nature of the design as an attached coprocessor means there are some key elements to consider and understand if you need to adapt the coprocessor Linux operating environment to specific requirements of your network, applications, and storage system. We will describe the key considerations and tools for working with and adapting the coprocessor Linux environment to your needs. We will include information on how coprocessors may be included as part of a cluster from a software perspective, and give an overview of some of resources from Intel to assist in this.

There is additional information available, including coprocessor installation documentation, “quick start” documentation, and the complete coprocessor Linux package. Further information resources are also listed at the end of this chapter.

Coprocessor Linux baseline

As explained in Chapter 9, the coprocessor does not use or boot “off the shelf” Linux distributions like, Fedora, Ubuntu, Red Hat, or SUSE. The starting point for the coprocessor Linux is the same core baseline those distributions utilize from http://kernel.org. The Linux kernel for the coprocessor uses version 2.6.34 or greater. As a growing, improving open source project, we expect the coprocessor will evolve to be adapted to newer Linux versions over time like any distribution. A goal of the adaptation to the coprocessor was to make the minimal changes required to both maintain operational compatibility with the familiar, standard Linux environment and to more easily allow future upgrades. The coprocessor Linux implementation provides typical capabilities such as process/task creation, scheduling, and memory management. It also provides configuration, power management, and server system management.

Given the coprocessor has a comparatively small memory footprint versus a typical Intel® Xeon® platform and no directly attached storage; a minimal, embedded Linux environment was chosen as the baseline to be ported to the coprocessor. For primary coverage of prevalent applications, the Linux Standard Base (LSB) Core libraries are also included along with a Busybox minimal shell environment. Table 10.1 lists the LSB components.

Table 10.1

Linux Standard Base (LSB) Core Libraries on the Coprocessor

Component Description
glibc the GNU C standard library
libc the C standard library
libm the math library
libdl programmatic interface to the dynamic linking loader
librt POSIX real-time library (POSIX shared memory, clock and time functions, timers)
libcrypt password and data encryption library
libutil library of utility functions
libstdc++ the GNU C++ standard library
libgcc_s a low-level runtime library
libz a lossless data compression library
libcurses a terminal-independent method of updating character screens
libpam the Pluggable Authentication Module (PAM) interfaces allow applications to request authentication via a system administrator-defined mechanism

Of course, the coprocessor Linux kernel can be extended with loadable kernel modules (LKMs); LKMs may be added or removed with modprobe as with any standard distribution. An Intel Xeon Phi targeted gcc compiler for such kernel module ports is provided as part of the coprocessor Linux installation packages within the Intel® Manycore Platform Software Stack (Intel® MPSS). Finally, standard “user” level libraries from http://linux.org, open source or third parties can be ported using Intel® Parallel Studio XE or other coprocessor compatible compilers. For example, simple recompile ports of MPI libraries such as MPICH1 and MPICH2 have been done as needed by several customers. We expect optimized libraries of these and many other useful libraries to be completed over time by their primary open source contributors. Remember, “It’s just Linux!”

Introduction to coprocessor Linux bootstrap and configuration

The Intel Xeon Phi coprocessors run autonomous Linux operating systems and are controlled through the host processor platform via the PCI Express bus they are plugged into. The host platform hardware and/or software provide the data communications paths to the coprocessor. To support this requirement the host must:

• Provide the Linux boot image to the coprocessor

• Instruct the coprocessor to boot, shut down, reset, and so on

• Provide the root file system for running a Linux operating system

• Provide a virtual console

• Provide a networking connection

• Provide power management control information for the coprocessor

• Provide high speed data transfer to and from the coprocessor

Software for the coprocessor is installed and configured on the host platform. This somewhat complex process is usually hidden from the system administrator by Linux OS installation scripts.

The “MPSS Boot” portion of Intel MPSS provides the software to perform the entire configuration and booting of the Linux operating system on the coprocessor. It provides the PCI Express access driver, a daemon for initialization and monitoring of the coprocessors and a utility to simplify the configuration process.

The host coprocessor driver (mic.ko) provides the PCI Express bus access. It contains code to inject the Linux kernel and its command line into the coprocessor’s memory and signal it to begin executing (booting). It includes a virtual console driver and a virtual network driver. The host driver also coordinates the power management on the coprocessor. Finally, it implements the high speed PCI Express transfer protocol via the Symmetric Communications Interface (SCIF) that was described in Chapter 9. SCIF is utilized to provide the control and data transfer functionality required to launch and communicate with the coprocessor Linux OS.

The mpssd daemon controls the initialization and booting of the coprocessor based on a set of configuration files. The mpssd daemon is started and stopped as a Linux service (such as with the command service mpss start) and instructs the coprocessors on a platform to boot or shutdown. The daemon supplies the final file system image to the coprocessors when requested. It monitors the coprocessor state and logs it as it changes. It will also log error information from the coprocessor.

The micctrl configuration utility is the workhorse of the configuration process and has numerous options to control the configuration and the resulting Linux OS environment on the coprocessor. It also gives the system administrator the ability to individually boot and shut down coprocessors in the system. We will explain the micctrl utility and key uses throughout this chapter.

Note

We expect the micctrl utility will continue to grow significantly in capabilities beyond the original publishing date of this book. Our goal is to provide you the key knowledge to understand the coprocessor Linux configuration process. Please refer to latest Intel MPSS documentation for the most up-to-date information on the micctrl utility.

Default coprocessor Linux configuration

After the installation of Intel MPSS (consult the readme.txt file at http://intel.com/software/mic under the Intel MPSS section for installation instructions), the system administrator must complete the coprocessor configuration before starting the Intel MPSS service.

Step 1: Ensure root access

User access to each coprocessor Linux OS node is provided through the secure shell utilities, such as SSH. By default, the mpssd daemon enables users with existing SSH access on the host platform, although specific exclusions or inclusions of users can be controlled with configuration changes. In particular, it is important that the root user, who is typically a trusted system administrator, has access. The general method to ensure a user will have access is to generate the proper secure key file(s) configured in the standard user home directory structure. To ensure the root user has keys created, the root user must be logged in. Look in the /root/.ssh directory for either the id_rsa.pub or id_dsa.pub key files. If no SSH keys exist, use the ssh-keygen command to generate a set:

root_prompt# ssh-keygen

Step 2: Generate the default configuration

Each coprocessor has a unique configuration file in the /etc/sysconfig/mic directory. Initialize the default configuration for the all coprocessors installed on the host platform:

user_prompt$ sudo micctrl ––initdefaults

The micctrl ––initdefaults command creates and populates the default configuration files corresponding to each coprocessor installed in the system. These configuration files are default.conf and micN.conf located at /etc/sysconfig/mic/.

Note

In the configuration file micN.conf, N is an integer number (0, 1, 2, 3, and so on) that identifies each coprocessor installed in the system.

Step 3: Change configuration

Examine the files in the /etc/sysconfig/mic directory. If the default configuration meets the requirements of the system, continue to Step 4. Otherwise, edit the configuration files in /etc/sysconfig/mic (refer to section “Changing Coprocessor Configuration”).

Step 4: Start the Intel® MPSS service

By default the Intel MPSS service boots all coprocessors on the platform when it is started. Other options are available to boot specific coprocessors or delay booting any coprocessors until later. To start the Intel MPSS service, execute the Linux service command:

user_prompt$ sudo service mpss start

The call to service will complete when it determines the coprocessors have either successfully booted or failed to boot and the status of the coprocessors will be displayed.

Changing coprocessor configuration

This section focuses on configuring coprocessors, including configuration files, kernel command line parameters, and authentication.

Configurable components

On a typical Linux system, installation and configuration are performed by a graphical utility that prompts the system administrator for input. Since the Intel Xeon Phi coprocessors do not have a file system of their own, this process is replaced by:

• Installing Intel MPSS (containing the required software) on the host platform

• Configuration by a combination of the following:

– Editing the configuration file(s)

– Using the micctrl utility

The configuration parameters have three categories:

1. Parameters for the host to load the Linux kernel on the target coprocessor and initiate the boot process.

2. Parameters to define the root file system to be used for the coprocessor.

3. Parameters to configure the host processor platform’s virtual Ethernet connection described in Chapter 9.

The current configuration parameters can be displayed with the micctrl ––config command. For example, the default configuration on most systems looks like the following:

mic0:

=============================================================

Linux Kernel: /lib/firmware/mic/uos.img

BootOnStart:   Enabled

Shutdowntimeout: 300 seconds

ExtraCommandLine: highres=off pm_qos_cpu_dma_lat=75

UserAuthentication: Local

Root Device:  Dynamic Ram /opt/intel/mic/filesystem/mic0.image

BaseDir:  /opt/intel/mic/filesystem/base.filelist

CommonDir: /opt/intel/mic/filesystem/common.filelist

MicDir:  /opt/intel/mic/filesystem/mic0.filelist

Overlay:  /opt/intel/mic/filesystem/allinea-dev.filelist

Overlay:  /opt/intel/mic/coi/config/coi.filelist

Network: Static Pair

Hostname: sys1-mic0tinynet.com

Host IP: 172.31.1.254

MIC IP:  172.31.1.1

Host MAC: 4a:70:e7:0c:2c:57

MIC MAC:  22:88:36:2b:0f:97

Net Bits: 24

NetMask:  255.255.255.0

Console: hvc0

VerboseLogging: Enabled

Configuration files

This section briefly discusses configuration file format and using the Include parameter, which enables one configuration file to include additional configuration files.

File location and format

Configuration is controlled by per coprocessor configuration files located in the /etc/sysconfig/mic directory. Each coprocessor uses a file identified by its ID (that is, micN.conf, where N is an integer number (0, 1, 2, 3, and so forth) that identifies each coprocessor installed in the system).

Each of the configuration files contains a list of configuration parameters and their arguments. Each parameter must be on a single line. Comments begin with the “#” character and terminate at the end of the same line.

Including other configuration files

Parameter syntax:

Include <config_file_name>

Each of these configuration files has the ability to include other configuration files. The Include parameter lists the configuration file(s) to be included. The configuration file(s) to be included must be located in the base directory of etc/sysconfig/mic. The configuration parser processes each parameter sequentially. When the Include parameter is encountered, the included configuration file(s) are immediately processed. If the same parameter is set multiple times, the last instance of the parameter setting will be applied.

By default, the /etc/sysconfig/mic/default.conf file is included at the start of each coprocessor specific file (e.g. mic0.conf, mic1.conf, etc.). This allows the coprocessor specific files to override any parameter set in default.conf.

The last entry in the default.conf file is typically the line:

Include conf.d/*.conf

This is a special rule, specifying that all the files in the /etc/sysconfig/mic/conf.d directory will be included.

Configuring boot parameters

The host system boots the coprocessor by injecting the Linux kernel image and kernel command line into its memory and then instructing the coprocessor to start. To perform this operation, the host system must read the coprocessor specific configuration file and load all the parameters into the kernel command line.

What to boot

Parameter syntax:

OSimage <linux_kernel_image>

The default value for the coprocessor Linux OS image is /lib/firmware/mic/uos.img. Optionally, the system owner can substitute a new kernel image. To do so, it is necessary to set the parameter in the correct coprocessor specific configuration file and change the image name. The change takes effect upon executing either service mpss start or micctrl -b.

When to boot

Parameter syntax:

BootOnStart <Enabled | Disabled>

This parameter controls whether the coprocessor is booted when the Intel MPSS service starts. The BootOnStart parameter should be defined in the coprocessor specific configuration file. If set to Enabled, the mpssd daemon will attempt to boot the coprocessor when service mpss start is called.

VerboseLogging kernel command line parameter

Parameter syntax:

VerboseLogging <Enabled | Disabled>

The VerboseLogging parameter specifies whether the quiet kernel command line parameter is passed to the coprocessor on boot. The quiet kernel parameter suppresses most kernel messages during kernel boot. VerboseLogging is enabled by default. Disabling VerboseLogging will reduce boot times.

Changes to VerboseLogging take effect upon executing service mpss start.

Console kernel command line parameter

Parameter syntax:

Console “<console device>”

Intel MPSS includes a PCI Express bus virtual console driver. Its device node (hvc0) is the default value assigned to the Console parameter. Other possible values are intended for internal use.

Changes to Console take effect upon executing service mpss start.

PowerManagement kernel command line parameter

Parameter syntax:

PowerManagement “<string>”

The PowerManagement parameter is a string of four attributes passed directly to the kernel command line for the coprocessor’s power management driver. The mpssd daemon and micctrl utility do not validate any of the parameters in this string or its format. For more information consult the Intel MPSS power management documentation for correct values at http://intel.com/software/mic. Changes to PowerManagement take effect upon executing service mpss start.

RootDevice kernel command line parameter

Parameter syntax:

RootDevice RamFS <location> RootDevice StaticRamFS <location> RootDevice NFS <location>

RootDevice SplitNFS <location> </usr location>

RootDevice InitRD

The RootDevice parameter defines the type of root device to mount. The type argument is a string specifying the device type. The location argument is the location information of the file system for the coprocessor. Some supported types as of this writing are RamFS, StaticRamFS, NFS, SPLITNFS, and InitRD.

The InitRD type boots to the initial RAM disk image included in the downloaded kernel. This option exists for debug purposes only.

The RamFS and StaticRamFS types have a second argument specifying the file name of the compressed cpio image to be used for the coprocessor file system. The RamFS type builds its image when the coprocessor requests download of the image file.

The StaticRamFS boot will fail if the image file is not already present.

The NFS root device type has a second argument of the directory path on the host to mount as the root directory. This directory must be correctly exported by the host as a NFS share.

The SplitNFS root device type is the same as the NFS type with a third argument specifying the location of a NFS export to mount under /usr on the embedded Linux OS. Typically the /usr NFS mount will be shared between coprocessors.

Changes to RootDev take effect upon executing service mpss start. More information on the cpio file system image will be described in the section titled “The Coprocessor File System Creation Process.”

Coprocessor root file system

Every Linux operating system needs a root file system with a minimal set of files. Other nonessential files may be on the root or they may be on secondary mounts. Most modern Linux OS releases assume the root file system will be large enough to install the complete required release files. The Intel Xeon Phi coprocessor embedded file system follows the same rule. Files on the root file system fall into three categories:

1. The binaries installed with the system.

2. The files in the /etc directory, which define an individual system.

3. The set of files for the users of the system.

The Intel MPSS configuration provides syntax for setting up the root file system.

File location parameters

Parameter syntax:

BaseDir <location> <descriptor file> CommonDir <location> <descriptor file> MicDir <location> <descriptor file> OverLay <location> <descriptor file>

Each parameter has two required arguments. The first is the top-level directory name where the files are located. The second is a file describing where each file gets placed on the coprocessor’s file system and the permissions for that file. The format of descriptor file will be explained in the section, “Adding Files to the Coprocessor Root File System.”

The BaseDir parameter is the location of the coprocessor binaries installed by the Intel MPSS installation process. The files in this directory should never be changed since the next install will overwrite any changes.

The CommonDir parameter defines a set of files the system administrator wishes to have on all the coprocessor file systems installed in the host platform. The Intel MPSS installation process does not install files in this directory and any added files will be maintained across updates to the Intel MPSS installation.

The MicDir parameter defines the per coprocessor information for each unique coprocessor in the host platform. The Intel MPSS installation process installs no files in this directory and most of its content is created by the configuration process. Specifically, user access and network configuration each has its own set of configuration parameters.

The Overlay parameter is the only one of the set that is likely to be used many times in a configuration. Each entry specifies a new set of files to add to the file system. This parameter is used to add additional software to be automatically included. For an example of its syntax see /etc/sysconfig/mic/conf.d/coi.conf in a host platform with Intel MPSS installed.

The RamFS root device type will use these configuration parameters to automatically build and download the file system image to the booting coprocessor. For the StaticRamFS type, the micctrl ––updateramfs command can be used to update the image. The section “Coprocessor File System Creation Process” explains this further.

User access

Parameter syntax:

UserAuthentication None

UserAuthentication Local <low uid> < high uid>

User authentication for the coprocessor Linux operating system is controlled through the standard Unix /etc/passwd, /etc/shadow and /etc/group files. Although the mechanisms for populating these files also copy the password defined for the use on the host system, it is recommended to control access through the secure shell.

Note

The passwords do not work with a SUSE Linux host because it uses a different encryption algorithm than the coprocessor.

The UserAuthentication configuration parameter specifies two default sets of users. If None is specified, the /etc/passwd file on the coprocessor will default to one containing the root, sshd and micuser accounts. If Local is specified, all the users on the host between the low user ID (uid) and high user ID will also generate entries on the coprocessor files.

Changing UserAuthentication can be done in two ways. The first method is to edit the entry in the coprocessor specific configuration file and then run micctrl ––resetconfig. The second, and easier, method is to use the micctrl ––userconfig command.

Note

Every user to be populated to the coprocessor file system should set their secure shell configuration files with the ssh-keygen utility, as in the example for root authentication earlier in the chapter, before running either the micctrl ––resetconfig or micctrl ––userconfig commands.

The micctrl utility will attempt to find all users in the /etc/passwd file and /home directories and populate them to the coprocessor file system. After the initial user authentication configuration is completed, additional users may be added to coprocessor file system with the micctrl ––useradd command. Further information will provided in the section “The micctrl Utility.”

For coprocessors where it is required to strictly control user access, it is recommended to set UserAuthentication to None and add each user specifically as required.

Network access

Parameter syntax:

Hostname <name> HostMacAddress <address> MicMacAddress <address> BridgeName <name>

Subnet <subnet> HostIPaddress <address> MicIPaddress <address> NetBits <bits>

MTUsize <bits>

On the host operating system, files are added to the network configuration based on the host OS type (RedHat or SUSE). On the coprocessor file systems, the files added are:

/etc/sysconfig/network/ifcfg-mic0

/etc/sysconfig/hostname

/etc/ssh/ssh_host_key

/etc/ssh/ssh_host_key.pub

/etc/ssh/ssh_host_rsa_key

/etc/ssh/ssh_host_rsa_key.pub

/etc/ssh/ssh_host_dsa_key

/etc/ssh/ssh_host_dsa_key.pub

/etc/resolv.conf

/etc/nsswitch.conf

/etc/hosts

All network configuration parameters take effect upon executing service mpss start.

Host Name Assignment. The Hostname parameter defines the value assigned to the host name on a coprocessor. The initial value from the micctrl ––initdefaults command is set to the host name with a dash and the coprocessor name appended to it. The host name string may be edited in the coprocessor specific configuration.

MAC Address Assignment. Configuring the virtual network interface is a nontrivial process and differs based on the required topology. However, as a prerequisite, both ends of the virtual network need to have MAC addresses assigned.

MAC addresses for both ends of the virtual network are randomly created by the micctrl ––initdefaults command. It is possible to generate a duplicate address when there are a large number of coprocessors in a cluster. The values in the coprocessor-specific configuration file may be edited to specific values by the system administrator.

Note

When editing to specific values, it is important to set bit two of the high byte to indicate the value is locally generated.

Many cluster managers use the MAC address to identify systems on the network. For this reason the MAC address assignment is a value retained as highly persistent. As such it is noted and saved by both the micctrl ––resetconfig and micctrl ––resetdefaults commands.

Static Pair Topology (Default). In the static pair topology, every coprocessor gets a separate subnet under the host. In the default configuration it is controlled by the Subnet parameter. The BridgeName parameter must be commented out. In the default configuration Subnet is set to 172.31.

Using Subnet to control the static pair configuration automatically assigns the subnet argument as the first two quads of the IP address. It is an error for the subnet to have more than the first two quads specified. The micctrl ––initdefault or micctrl ––resetconfig commands add the board number as the third quad for each coprocessor. The coprocessor side of the virtual connection is assigned 1 for the last quad and the host end is assigned 254. For example, the host end of the mic0 coprocessor will have the IP address 172.31.0.254 and the coprocessor end will be assigned the IP address 172.31.0.1.

Subnet may be overridden on the coprocessor-specific configuration file by specifying the HostIPaddress and MicIPaddress parameters. They must both be provided and the IP addresses are set to the corresponding address values. It is an error to not specify complete, valid IP addresses using standard dot-decimal notation.

It is up to the system administrator to correctly route the virtual Ethernet nodes to the external network or each other.

Internal Bridge Topology. Linux operating systems provide a mechanism for bridging network devices to a common network. The terminology “internal bridge” in the context of Intel Xeon Phi coprocessor configuration refers to the process of bridging more than one coprocessor virtual network interface, on the same host, together.

Internal bridge configuration is specified by uncommenting the BridgeName parameter and setting the value to a bridge name starting with the string mic. The Subnet parameter may be left at the first two quads of the requested subnet and the third quad will become zero. Or the subnet can be further specified by providing the first three quads of the requested subnet. In either case the host is assigned the fourth quad value of 254 and each coprocessor is assigned the coprocessor number plus one.

For example, in a configuration of two Intel® Xeon Phi™ coprocessors, and where Subnet is 172.31.1, then the host will be assigned the IP address 172.31.1.254. The coprocessors mic0 and mic1 will be assigned IP addresses 172.31.1.1 and 172.31.1.2 respectively.

The network information will be updated when micctrl ––resetconfig is executed. The resetconfig operation will create the correct network configuration files for the bridge on the host, the host side of the virtual network attachments to the bridge and the network configuration for the coprocessor.

External Bridge Topology. The Linux bridging mechanism can also bridge the coprocessor virtual connections to a physical Ethernet device. In this topology, the virtual network interfaces become configurable on the wider subnet.

To specify an external bridge topology, uncomment the BridgeName parameter and point it to an existing bridge connected to a physical network device. The bridge must already exist for the coprocessor configuration to become effective. As an example, the configuration files on a RedHat system may be:

/etc/sysconfig/network-scripts/ifcfg-br0: DEVICE=br0

TYPE=Bridge

ONBOOT=yes NM_CONTROLLED=no BOOTPROTO=dhcp STP=on

/etc/sysconfig/network-scripts/ifcfg-eth0: DEVICE=eth0

NM_CONTROLLED=yes

ONBOOT=yes

BRIDGE=br0

NAME=”System eth0”

Note

This is not a complete file listing. Please edit the appropriate files provided in the installation as needed for the particular system network topology required.

The host network configuration file for the virtual network connection will be attached to the bridge (much as the eth0 device is) and does not need an IP address.

The Subnet parameter needs to be set to the full IP address of the first coprocessor. It must also specify an address on a subnet the host platform exists on. Each additional coprocessor will get an IP address with the coprocessor number added to the fourth byte in the IP quad. For example: with two coprocessors in the system, if Subnet is set to 10.10.10.15, the first coprocessor will be assigned 10.10.10.15 and the second will be assigned 10.10.10.16.

The Subnet parameter may also be set to the string dhcp. The coprocessor network interface will attempt to retrieve its IP address using the DHCP protocol. This is the recommended value for most clusters.

Assigning the Netmask. Each network interface is assigned a netmask. The netmask for coprocessors is controlled by the NetBits parameter and its argument is the number of bits to assign to the mask. The default value is 24, translates into the mask FFFFFF00, and will allow a subnet with 253 devices. If bigger subnets are required, the system administrator may change this value. For instance, setting it to 22 will generate the netmask FFFFFC00 with a maximum number of devices of 1022.

Assigning the MTU size. The coprocessor virtual network defaults to packet sizes of 64 KB instead of the standard 1500 bytes. It is much more efficient, but will not route correctly over external bridge in most networks. The MTUsize configuration parameter will change the value of the MTU size. Its argument is the number of bytes to set it to.

Not all network hardware supports the maximum IPV4 MTU size of 64 KB. Typically clusters have the MTU size set to 9000 and MTUsize should be set to match.

Host Platform SSH Keys. The secure shell utilities recognize a Linux operating system on the network by their host key files. These files are found in the /etc/ssh directory. The micctrl ––initdefaults command uses the ssh-keygen utility to generate key values if they are not found for the coprocessor. These values, the MAC addresses, are considered to be highly persistent, and the micctrl command will retain their values if they exist.

In some clusters, detecting and protecting against “man in the middle” and other such attacks might not be required. In this case, the system administrator may use the micctrl ––hostkeys command to set the host SSH keys to be the same, cluster wide.

Name Resolution Configuration. Name resolution on the coprocessor is set by creating the /etc/nsswitch file and copying the /etc/resolve.conf file from the host to the coprocessor file system.

The micctrl utility

The micctrl utility is a multipurpose toolbox for the system administrator. It provides the following categories of functionality.

• Coprocessor state control—boot, shutdown and reset control while the mpssd daemon is running.

• Configuration files initialization and propagation of values.

• Helper functions for modifying configuration parameters.

• Helper functions for modifying the root file system directory or associated download image.

The micctrl utility requires a first argument specifying the action to perform, followed by option-specific arguments. The arguments may be followed by a list of coprocessor names, which is shown in the syntax statements as [MIC coprocessor list]. If no coprocessors are specified, the host driver (mic.ko) must be loaded and the existing coprocessor list is probed. Otherwise, the coprocessor will be a list of the coprocessor names. For example, the list may be “mic1 mic3” if these are the coprocessors to control.

Coprocessor state control

Starting the mpssd daemon typically initiates booting of all the system’s coprocessors, and stopping the daemon shuts them down. However, this global behavior is not desired if only one coprocessor needs to be restarted. The micctrl utility provides mechanisms for individual coprocessor control.

State is controlled for each coprocessor by the sysfs entry /sys/class/mic/<micname>/state. The micname value is literally the name of the coprocessor and will be in the format mic0 or mic1, and so on. Reading from the state will show the current run state of the selected coprocessor. Writing to it is limited to the root user and may cause the coprocessor to change states.

Booting coprocessors

Command syntax:

micctrl -b [-w] [mic coprocessor list]

micctrl ––boot [-w] [mic coprocessor list]

The coprocessor must be in the Ready state. The command writes the string boot:linux:<image> (where image is the OSimage configuration parameter) to the /sys/class/mic/<micname>/state sysfs file. The host driver will inject the indicated Linux image into the coprocessors memory and start it booting.

The optional -w parameter may be specified to instruct the micctrl command to wait until the specified coprocessors have either entered the Online or Failed states. The wait option will timeout after 300 seconds.

Shutting down coprocessors

Command syntax:

micctrl -S [-w] [mic coprocessor list]

micctrl ––shutdown [-w] [mic coprocessor list]

The coprocessor must be in the Online state. This command writes the string shutdown to the /sys/class/mic/<micname>/state sysfs file. The driver instructs the coprocessor to perform an orderly shutdown and wait for completion. It will then reset the coprocessor to place it again in the Boot Ready state.

The optional -w parameter may be specified to instruct the micctrl command to wait until the specified coprocessors have entered the Ready state. The wait option will timeout after 300 seconds.

Rebooting the coprocessors

Command syntax:

micctrl -R [-w] [mic coprocessor list]

micctrl ––reboot [-w] [mic coprocessor list]

The coprocessor must be in the Online state. This command sequentially performs the shutdown and boot functions from the sections “Shutting Down Coprocessors” and “Booting Coprocessors.”

The optional -w parameter may be specified to instruct the micctrl command to wait until the specified coprocessors have entered the Ready state. The wait option will timeout after 300 seconds.

Resetting coprocessors

Command syntax:

micctrl -r [-w] [mic coprocessor list]

micctrl ––reset [-w] [mic coprocessor list]

The coprocessor can be in any state. This command writes the string reset to the /sys/class/mic/<micname>/state sysfs file. The driver will perform a soft reset on the coprocessor.

Note

It is recommended to do a shutdown where possible instead of a reset.

The optional -w parameter may be specified to instruct the micctrl command to wait until the specified coprocessors have entered the Ready state. The wait option will timeout after 300 seconds.

Waiting for a coprocessor state change

Command syntax:

micctrl -w [mic coprocessor list]

micctrl ––wait [mic coprocessor list]

The wait option waits for the status of the coprocessor to be either Online or Ready. It also allows for a brief pause to the Ready state during mpssd startup. It is intended for users to verify the mpssd startup procedure is complete. It has a built-in timeout value of 300 seconds.

Coprocessor status

Command syntax:

micctrl -s [mic coprocessor list]

micctrl ––status [mic coprocessor list]

The status option displays the status of the coprocessors in the system. If the status is “online” or “booting” it also displays the name of the associated boot image.

Coprocessor configuration initialization and propagation

This section discusses the micctrl command options for initializing configuration files, and propagating, resetting, and cleaning configuration parameters.

Initializing the configuration files

Command syntax:

micctrl ––initdefaults [mic coprocessor list]

The Intel MPSS installation does not provide the configuration files. They are created by the micctrl ––initdefaults command. micctrl ––initdefaults can be run anytime but will not change files if they have valid information.

The ––initdefaults option first checks to see if the /etc/sysconfig/mic/default.conf file is present. If not, it creates the default version of it. Then, for each supplied coprocessor, it checks for the existence of the coprocessor-specific configuration file /etc/sysconfig/mic/<micname>.conf. If it is not present, it creates a default version with an Include parameter including the default.conf file.

The ––initdefaults option then proceeds to parse the per coprocessor configuration files. For each parameter that is not set, it will add a default value to the per coprocessor configuration file. Each parameter that gets created also has its operation performed at configuration time. For example, the UserAuthentication parameter being set to its default of Local will cause the coprocessors /etc/passwd, /etc/shadow and /etc/group files to be created along with any corresponding user directories and SSH key files.

Propagating changed configuration parameters

Command syntax:

micctrl ––resetconfig [mic coprocessor list]

Changes to the configuration files are propagated with the micctrl ––resetconfig command. The ––resetconfig option first removes the files in MicDir created by the configuration process with the exception of the highly persistent SSH host key files. It then regenerates those files according to the parameters in the /etc/sysconfig/mic/<micname>.conf and /etc/sysconfig/mic/default.conf files. This process will not add default parameters, but only causes the changed parameters to be propagated.

Resetting configuration parameters

Command syntax:

micctrl ––resetdefaults [mic coprocessor list]

In the event of a failed or problematic configuration process, the best remedy may be to start again. The micctrl ––resetdefaults command deletes the configuration files and executes the same process as the ––initdefaults option.

Since ––initdefaults only affects the files known to the configuration, it does not delete any files the system administrator has added to a coprocessor’s file system.

Cleaning configuration parameters

Command syntax:

micctrl ––cleanconfig [mic coprocessor list]

Since Intel MPSS configuration commands will update the configuration when parameters change, it may not be possible to return to a previous version of the Intel MPSS software. Indeed, removing the whole coprocessor configuration may be required.

The ––cleanconfig option not only removes a coprocessor’s configuration files, but also removes all files in the MicDir parameter directory along with the other values specified by RootDevice.

Helper functions for configuration parameters

This section discusses command options for adding and removing users and groups.

Change the UserAuthentication configuration parameter

Command syntax:

micctrl ––configuser=none [-ids] [mic coprocessor list]

micctrl ––configuser=local [––low=<low uid>] [––high=<high uid] [-ids] [mic coprocessor list]

The ––configuser option provides an easy method for changing the UserAuthentication configuration parameter. It performs the same process as ––resetconfig for this single parameter.

When specifying the local mode, low and high user ID values may optionally be supplied. The default values are 500 and 65000, for low and high user ID, respectively. Although any 32-bit user ID may be entered, it is not recommended to use less than 500 for the low value. This is the range where most Linux releases start user ID allocation and migrating the user IDs for system level accounts is not recommended.

The optional -i, -d, and -s parameters are mutually exclusive; trying to use more than one will result in an error.

The -d option indicates to remove all the current users in the coprocessor’s file system before resetting the user authentication mode. This is the default for the None value.

The -s option indicates to save, or not delete, all the current users before changing the user authentication mode. This is the default for the Local value.

The -i option prompts the user to delete or save each current user before resetting the user authentication mode.

It is legal to specify changing the UserAuthentication parameter to the old value. Tying the same value to the -i option gives the system administrator the chance to clean up the current user list.

Note

Every user to be populated on the coprocessor file system should set their secure shell configuration files before running this command.

Adding users to the coprocessor file system

Command syntax:

micctrl ––useradd=<name> –uid=<uid> –gid=<gid>

[–home=<dir>] [–comment=<string>] [–app=<exec>] [–sshkeys=<keydir>] [mic coprocessor list]

The ––useradd option adds the specified user name to the /etc/passwd and /etc/shadow files on the coprocessor file system. The system administrator must specify the correct user and group IDs for the user.

Default values are supplied for the –home, –comment, –app, and –sshkeys arguments, and can be overridden. If a home directory for the user is not specified, one will be created in /home/<name>. If a comment string is not specified, the user name will be placed in the comment field. The default application to execute is /bin/sh. If a directory for the user’s secure shell key files is not provided, the ––useradd option will attempt to find them in /home/<name>/.ssh. In addition, a default .profile file will be added for the user.

Removing users from the coprocessor file system

Command syntax:

micctrl ––userdel=<name> [mic coprocessor list]

The ––userdel option removes the specified user from the coprocessors /etc/passwd and /etc/shadow files. It also removes the directory stored in the home field of the /etc/passwd file.

Adding groups to the coprocessor file system

Command syntax:

micctrl ––groupadd=<name> –gid=<gid> [mic coprocessor list]

The ––groupadd option adds the specified group name and ID to the coprocessor’s /etc/group file.

Removing groups from the coprocessor file system

Command syntax:

micctrl ––groupdel=<name> [mic coprocessor list]

The ––groupdel option removes the specified group name entry from the coprocessors /etc/group file.

Setting the root device

Command syntax:

micctrl ––rootdev=RamFS –target=<location> [mic coprocessor list]

micctrl ––rootdev=StaticRamFS –target=<location> [mic coprocessor list]

micctrl ––rootdev=NFS –target=<NFS Share> [-c] [-d] [mic coprocessor list]

micctrl ––rootdev=NFS –target=<NFS Share>

–usr=</usr NFS share> [-c] [-d] [mic coprocessor list]

micctrl ––rootdev=InitRD [mic coprocessor list]

The ––rootdev option changes the configured RootDevice parameter. The target argument is the name of the compressed cpio image to be used for the coprocessor file system. Setting rootdev to NFS optionally uses the target parameter to specify a valid NFS share on the host containing the root file system. If target is not specified the default name of /opt/intel/mic/filesystem/<micN>.export will be used. If the -c parameter is also specified, the micctrl utility will create the root directory using the same information required for building the compressed cpio download image. The created directory must be in the NFS exports list on the host.

Setting rootdev to SplitNFS is the same as the NFS option except the /usr files are mounted as a separate NFS share. On a multicard system the coprocessors will typically share the same /usr NFS export. If the usr parameter is not specified then the default value of /opt/intel/mic/filesystem/usr.export will be used.

Adding a NFS mount

Command syntax:

micctrl ––addnfs=<NFS export> –dir=<mount dir>

[–server=<server>] [mic coprocessor list]

The ––addnfs option adds a NFS mount entry to the coprocessor’s /etc/fstab file. It specifies the NFS export and the mount directory. If the optional server argument is not specified, it places the IP address of the host in the server field.

Removing a NFS mount

Command syntax:

micctrl ––remnfs=<mount dir> [mic coprocessor list]

The ––remnfs option searches the /etc/fstab for the coprocessor for the specified mount point and removes it from the file.

Specifying the host secure shell keys

Command syntax:

micctrl ––hostkeys=<keydir> [mic coprocessor list]

The ––hostkeys option removes the host keys randomly generated by the ––initdefaults command and replaces it with the files from the specified directory. These files are considered to be highly persistent and should stay resident unless the ––resetdefaults or ––cleanconfig option is performed.

Other file system helper functions

Updating the compressed CPIO image

Command syntax:

micctrl ––updateramfs [mic coprocessor list]

The StaticRamFS root file system image is only changed when the system administrator requests it. In many cluster systems this image will be built externally and put in place. The ––updateramfs option updates the image from the same parameters used by the RamFS specification. The new image will be used the next time the coprocessor boots.

Adding software

No installation is static. Additional software eventually needs to be added. The system administrator must therefore add files to the downloaded root file system to meet user needs.

The File System Creation Process

The root file system for the coprocessor is built from the configuration parameters BaseDir, CommonDir, MicDir and Overlay. It is accomplished by using the filelist argument to each of the parameters as a list of the files to be placed into the file system image. Each filelist is processed in the order of configuration parameters stated. Here are the filelist directives:

dir <name> <perms> <uid> <gid>

file <name> <source> <perms> <uid> <gid>

slink <name> <to> <perms> <uid> <gid>

nod <name> <perms> <uid> <gid> <type> <major> <minor>

pipe <name> <perms> <uid> <gid>

sock <name> <perms> <uid> <gid>

Each filelist entry defines one of six types of files available on a Linux file system.

The dir filelist directive

The dir directive specifies a directory that must be present on the file system. It requires the information to set the permissions, user ID, and group ID of the directory. A typical entry is:

dir /tmp 0777 0 0

The example defines the directory /tmp to be owned by user root and group root with global permissions for everybody.

The file filelist directive

The file directive specifies to create the file with name in the file system image for the file source on the host. It must also be created with the specified permission, user ID and group ID. The source file is copied from the specified string prepended with the associated directory from the original configuration parameter. For example the configuration parameter MicDir may be:

MicDir /opt/intel/mic/filesystem/mic0

/opt/intel/mic/filesystem/mic0.filelist

The entry to copy the /etc/passwd file to the coprocessor’s file system image is:

file /etc/passwd etc/passwd 644 0 0

This specifies using the file /opt/intel/mic/filesystem/mic0/etc/passwd for the coprocessor’s /etc/passwd file. It will be owned by user and group root with global read permissions and root modification permission.

The slink filelist directive

The slink directive specifies creation of a symbolic link on the file system. It will have name and will link to source. The link will be created with the specified user and group IDs and permissions. A typical use of symbolic links is found in the Linux OS startup scripts. In the filelist associated with the MicDir parameter you will find:

slink /etc/rc3.d/S80sshd ../init.d/sshd 0755 0 0

This directs the creation of a symbolic link on the coprocessor’s file system accessing the /etc/init.d/sshd file when /etc/rc.d/S80sshd is accessed.

The nod filelist directive

The nod directive creates a device node on the coprocessor’s file system. It will be located at name and will be of type. Type must be either the character b for block device or c for character device. The arguments major and minor must be integer values defining the correct values of the node. The node will be created with the specified user and group IDs and permissions.

Most devices on Linux today are created dynamically by the device driver. However, some legacy devices still require a hard-coded entry. For example the filelist for BaseDir includes the following entry, which specifies the creation of a device node for the console:

nod /dev/console 0600 0 0 c 5 1

The pipe filelist directive

The pipe directive creates a device file of type pipe under the specified name. It is created with the specified user and group IDs and associated permissions.

The sock filelist directive

The sock directive creates a device file of type socket under the specified name. It is created with the specified user and group IDs and associated permissions.

Creating the download image file

The download image file for the RamFS root device type is created by processing the configuration directives BaseDir, CommonDir, MicDir and any Overlays in that order. As the configuration directives are processed, a tree of file names and their information is created.

When the tree is completely processed, mpssd or micctrl ––updateramfs will create a cpio entry for the file and append it to the file name specified by the RootDevice directive. When processing is complete it then compresses the file.

Adding files to the root file system

Adding a file to the root file system can be done in two ways. The system administrator can add an entry to some existing filelist, indicating the location of the file. Alternatively, the system administrator can add new Overlay configuration parameter with location and descriptor file arguments that describe the files to be added.

Adding files by copying

When adding a file to an existing filelist, the first decision is whether the file should be accessible by all the cards or only a particular one. If it is required for all cards to have access, then copy the file to a location under the directory specified by the location argument to the CommonDir configuration parameter, and amend its filelist. Otherwise, copy the file to a location under the directory specified by the location argument to the MicDir directory, and amend its filelist.

If a directory had to be created for the added file, do not forget to insert the appropriate dir entry prior to the new file entry.

Adding an overlay

The process for adding an Overlay set is similar to the description the previous section. The file must be placed in the correct location specified under Overlay and added to the filelist file specified. The power of the Overlay is it may be called from a new configuration file in /etc/sysconfig/mic/conf.d.

Example: Adding a new global file set

The Intel® Coprocessor Offload Infrastructure (Intel ®COI) as part of Intel MPSS (see Chapter 9) is configured as add-in software and is a good example of how to add a new set of software to the coprocessor file systems. The Intel COI configuration file is installed at /etc/sysconfig/mic/conf.d/coi.conf. It contains the following:

# COI download files

Overlay /opt/intel/mic/coi

/opt/intel/mic/coi/config/coi.filelist

From this file, it is clear the files for the coprocessor file system were installed into the /opt/intel/mic/coi directory. It has also installed a coi.filelist file describing the files to include in the coprocessor file systems and looks like the following:

dir /bin 755 0 0

file /bin/coi_daemon device-linux-release/bin/coi_daemon 755 0 0 dir /etc 755 0 0

dir /etc/init.d 755 0 0

file /etc/init.d/coi config/coi 775 0 0 dir /etc/rc3.d 755 0 0

slink /etc/rc3.d/S95coi ../init.d/coi 777 0 0

dir /lib64 755 0 0

file /lib64/libcoi_device.so device-linux-release/lib/libcoi_device.so 755 0 0 slink /lib64/libcoi_device.so.0 libcoi_device.so 777 0 0

From this we can see:

• COI requires that the /bin directory exists

• The COI daemon is at /bin/coi_daemon

– It is found on the host file system at /opt/intel/mic/coi/device-linux- release/bin/coi_daemon

– It requires read/write/execute permissions for root

– It requires read/execute permissions for other

– It runs under the root (0) user and group IDs

• COI requires the /etc and /etc/init.d directories

• The COI startup script should be put at /etc/init.d/coi

– It can be found on the host at /opt/intel/mic/coi/config/coi

– Set permissions, user ID and group ID as described.

– For the coprocessor to start COI, it needs a reference in /etc/rc3.d, so make sure it exists.

• The startup will find the script at /etc/rc3.d/S95coi, so create a symbolic link to the actual startup script.

• COI requires /lib64 directory to exist.

• Install libcoi_device.so library.

• Create the required symbolic link to it.

Since Intel COI installs a new daemon on the file system, it needs a startup script for it. The COI startup script appears in the following code example.

#!/bin/sh

coiexec=/bin/coi_daemon

case "$1" in

start)

if [ ! -f $coiexec ];then exit 1;

fi

     $coiexec &

     ;;

esac

Coprocessor Linux boot process

As previously mentioned, the host driver and mpssd daemon manage the process of booting the Linux kernel on the coprocessor. Figure 10.1 shows the sequence of steps that are performed during the boot process.

image

Figure 10.1 Boot Process for the Intel® Manycore Platform Software Stack (Intel® MPSS).

Booting the coprocessor

The key steps in the process flow in Figure 10.1 performed during the Intel MPSS boot process on the coprocessor are described in this section.

Set the kernel command line

On most Intel Architecture based systems, loading and executing the Linux kernel image is controlled by the grub boot loader. In the grub configuration file, each possible kernel definition contains a number of parameters to be passed to Linux through its kernel command line. In the Intel MPSS boot system, this is done by the mpssd daemon parsing its configuration files. The kernel command line is created based on values in the configuration files and placed in /sys/class/mic/mic<id>/cmdline for the driver to retrieve it.

Instruct the driver to boot the coprocessor

The mpssd daemon requests the coprocessor to start executing the Linux image by writing a boot string to the /sys/class/mic/mic<id>/state file. This file is a link into the coprocessor driver through a Linux sysfs (see Chapter 9) entry. The format of the request must be:

boot:linux:<Linux image file name>

The options reset and shutdown may also be written to state entry and will be discussed further later. The second part of the boot argument indicates to boot a Linux image. It may also be set to elf to indicate booting a standard ELF format file. This option is beyond our scope, please see standard Linux documentation for more information.

When the driver receives the boot request, it first checks to see the coprocessor is in the Ready state. If the coprocessor is not ready to boot it will return an error through the write call to the sysfs entry and not attempt to boot the coprocessor. Otherwise it sets the state of the coprocessor to Booting.

The driver then saves the image file name for later retrieval through the /sys/class/mic/mic<id>/image sysfs entry. It also sets the mode to indicate it is booting a Linux image.

The driver will copy the kernel command line setting request by the mpssd daemon, along with a number of addresses in host memory required by various drivers in the Linux image. It then copies the requested Linux image file into the coprocessor’s memory.

The last step is to write to the coprocessor register instructing it to start executing the injected image.

Execute the Linux kernel image

Executing the Linux kernel code functions as it does on any Intel Architecture based machine. It initializes hardware, starts kernel services, and sets all the CPU cores to the Online state. When the kernel is ready, it initializes its attached initial RAM disk image and starts executing the init script in the image.

The initial RAM disk contains the loadable modules required for the real root file system. Many of the arguments passed in the kernel command line are addresses required for the modules to access host memory. The init script parses the kernel command line for needed information and loads the driver modules.

The last step is for the init script to check the root parameter in the kernel command line for the type of device containing the root file system, and take the appropriate actions.

Root is the initial RAM disk

Setting the root to be the initial RAM disk is for debug purposes only. The initial RAM disk contains only a minimal set of tools and utilities.

Root is a RAM disk image

If the root is set to be a RAM file system, the init script must first download the file system information from the host. It makes a request to the mpssd daemon through a SCIF (see Chapter 9) connection and receives a compressed cpio format file.

After the file has been downloaded, the init script creates a tmpfs (Linux RAM disk file system type) in coprocessor memory and extracts the file system information into it. This image must contain everything needed to start a fully functional Linux system.

The RAM disk image is activated as the root device by calling the Linux switch_root utility. This special utility instructs the Linux kernel to remount the root device on the tmpfs mount directory, release all file system memory references to the old initial RAM disk, and start executing the new /sbin/init function.

The /sbin/init function performs the normal Linux user level initialization. All the information required must have already been in the compressed cpio file.

Root is a NFS export

If a NFS mount is indicated to supply the root device, the init script will initialize the mic0 virtual network interface to the IP address supplied on the kernel command line and mount the NFS export from the host.

As in the RAM disk image, the NFS mount is activated as the root device by calling the Linux switch_root utility. This special utility instructs the Linux kernel to remount the root device on the NFS mount directory, release all file system memory references to the old initial RAM disk, and start executing the new /sbin/init function.

The /sbin/init function performs the normal Linux user level initialization. All the information required must have already been in the NFS export.

Notify the host that the coprocessor system is ready

The last step of any of the three initializations is to notify the host that the coprocessor is ready for access. It does this by writing to its /sys/clas/micnotify/notify/host_notified entry. This causes an interrupt into the host driver, which updates the coprocessor’s state to Booted.

Coprocessors in a Linux cluster

So far in this chapter, and much of this book, we have focused on explaining how a coprocessor is programmed and operates in a single host processor platform, In other words, a single node in a cluster. This focus has been intentional as we believe it is essential for success on a cluster of coprocessor-enabled platforms to first understand how to parallelize and optimize code at the node level in order to maximize use across a cluster.

Using Intel Xeon Phi coprocessors in a cluster environment is, of course, one of the primary usages. The foundation to enable it is provided by the coprocessor’s Linux operating system environment and it’s configurability that we have described thus far in this chapter. However, configuring, optimizing, and running code in clusters in general and further in clusters with Intel Xeon Phi coprocessors is a significant topic on its own and to cover it properly would warrant multiple chapters or even its own book. From an application development point of view, MPI is the prevalent cluster-wide scaling tool, and the software architecture for MPI and its uses on the coprocessor in Chapter 9 Chapter 11 Chapter 12 should be a solid basis to start cluster-wide application development.

Here, we will touch upon some key information and resources to consider when using Intel Xeon Phi coprocessors in a standard cluster configuration Intel calls Intel® Cluster Ready.

Intel® Cluster Ready

This section is particular to systems that are certified Intel Cluster Ready and therefore have access to the tools to verify and maintain that compliance. Compliance of machines and software to a common specification greatly enhances portability of software between clusters.

Intel created the Intel Cluster Ready program in collaboration with High Performance Computing (HPC) industry hardware and software vendors to make it easier to set up, to operate, and to maintain HPC clusters including those with Intel Xeon Phi coprocessors. At the heart of the program is a standard and tools to check for adherence to the standard. The compliance checks are a great checklist in setting up a system that can be used to find many issues that would otherwise take time to discover or debug.

The Intel® Cluster Checker, a key element of the Intel Cluster Ready validation process, is a software tool that helps to verify that cluster components are working well when the cluster is being deployed and help check compliance status throughout the life of the machine.

How Intel® Cluster Ready works

Intel Cluster Ready works in the following manner:

1. Initial machine setup. Platform providers and system integrators use the Intel Cluster Ready architecture and specification to develop interoperable clusters that are easy to deploy and manage. They ensure that all cluster components work together, so an end user can buy an Intel Cluster Ready cluster with the confidence that it will work right from the start. Intel Cluster Ready clusters must pass a certification process in order to verify they are interoperable with registered Intel Cluster Ready software.

2. Software compliance for portability. Software developers test their applications running representative workloads on certified Intel Cluster Ready systems to ensure that they run as expected. Once proper execution is verified, the application is registered as capable of running on any system compliant with the Intel Cluster Ready specification. An end user can choose the right cluster to run the software that has been registered with Intel Cluster Ready and experience great application performance, right from the start.

3. Ongoing machine/environment compliance. Intel Cluster Checker, which is provided with all Intel Cluster Ready systems, eases the headaches of traditional cluster maintenance and gives the end user the power of HPC clusters with less need for specialized experience. Intel Cluster Checker enhances system reliability, reduces total cost of ownership over the cluster’s lifetime, and ensures continuing performance.

How Intel® Cluster Checker works

Before a certified Intel Cluster Ready system arrives, the platform provider has used Intel Cluster Checker to verify that the cluster fully complies with the Intel Cluster Ready specification. Application software developers have defined representative workloads and used Intel Cluster Checker to confirm that their applications run successfully on an Intel Cluster Ready system.

Intel Cluster Checker can be run as part of regular cluster maintenance, helping to keep components working together over the cluster’s lifetime or spotting potential issues before they can affect productivity. If problems arise, Intel Cluster Checker can be used to identify the problems quickly and obtain detailed diagnostic information. The time required to find a solution can be reduced from days to hours or even minutes.

Intel Cluster Checker is a tool that acts a proxy for application binaries. With more than 100 checks aggregated in parallel test modules, Intel Cluster Checker can perform a wide array of cluster evaluations. This rich feature set is also extensible. Additional checks can be integrated and existing checks can be customized in order to fully exploit the potential of Intel Cluster Checker. The cluster is examined at both the node and cluster level, making sure that all components work together and deliver optimal performance. Intel Cluster Checker assesses firmware, kernel, storage, and network settings, and conducts high-level tests of node and network performance using the Intel® MPI Benchmarks, STREAM, the HPC Challenge benchmark (HPCC), and other benchmarks.

Intel® Cluster Checker support for coprocessors

One of the key benefits of Intel Cluster Checker is that it acts as a real-life user executing commands. The end user does not have to be an expert in using HPC clusters in order to be able to fully exploit the capabilities of Intel Cluster Checker. Intel Cluster Checker supports Intel Xeon Phi coprocessors and incorporates a variety of checks in order to ensure the proper functioning of the coprocessor.

For example, to ensure that the coprocessor is up and running, there are three recommended steps: two involving the execution of Intel MPSS supporting tools micinfo and miccheck described in Chapter 9 plus the execution of a benchmark on the host system that uses offload to speed up computation as described in Chapter 7.

The built-in test module micinfo checks whether the coprocessor information is correct and uniform across nodes. Any error, undefined value, or abnormal difference among coprocessors is reported. For example, the test module ensures that voltage, memory size, and temperature are nonzero and only differences smaller than 100,000 µV, 128 MB or 20°C are allowed. This default behavior can be altered with custom configuration if desired.

The test module miccheck checks the sanity of the coprocessors by running the miccheck diagnostic tool on every node in the cluster in parallel. Only the failing checks are reported. Since the Intel Math Kernel Library has very good support for direct offload to an Intel Xeon Phi coprocessor for certain routines, like the GEMM routines, the test module dgemm of Intel Cluster Checker can directly be used to also test the offload functionality and performance.

To run a benchmark that offloads work to a coprocessor, two related environment variables need to be specified (both to force offload and to enable reporting). Since the offloading of GEMM routines works best with symmetric values, the parameters to the test dgemm in the configuration file (default config.xml) should be adjusted as well.

<cluster>

   <include_module>micinfo</include_module>

   <include_module>miccheck</include_module>

   <test>

    <dgemm>

     <k>6000</k>

     <m>6000</m>

     <n>6000</n>

    </dgemm>

   </test>

</cluster>

Then run the following command to explicitly test the Intel Xeon Phi coprocessor:

$ source ’/opt/intel/clck/<version>/bin/clckvars.sh

$ OFFLOAD_REPORT=2 MKL_MIC_ENABLE=1 clck –I micinfo –I miccheck –I dgemm

Here is an example of some lines of corresponding diagnostic and verbose output when running Intel Cluster Checker with the option –certification, which executes a larger set of test modules, including micinfo and miccheck:

Intel(R) Cluster Checker, Version 2.0

Commandline: ’/opt/intel/clck/2.0.013/bin/clck -certification’

User: ’cmsupport’

Date: ’Thu Nov 15 20:21:45 2012’

Configuration File: ’/etc/intel/clck/config.xml’

...

Nodefile: ’/etc/intel/clck/nodelist’

Checking 5 nodes:

    jerry, computej-[1-4]

Modules to be executed:

...

    ’micinfo’

...

    ’miccheck’

    ’mpi_internode’

    ’hpl’

Test            Result

---------------------------------------------------------------

...

Check Intel Xeon Phi wellness and uniformity, (micinfo)........

.............................Succeeded

   subtest ’micinfo uniformity’ passed

    node: computej-[1-4]: 82 fields matched

   subtest ’micinfo valid output’ passed

    node: computej-[1-4]: passed

Mounted filesystems check, ...

Intel Xeon Phi diagnostic test, (miccheck).............................................................................Succeeded

   subtest ’MIC 0 Test 1 Find the MIC’ passed

    node: computej-[1-4]: OK

   ...

    subtest ’Test 2 Ensure host driver is loaded’ passed

     node: computej-[1-4]: OK

    subtest ’Test 3 Ensure driver matches manifest’ passed

     node: computej-[1-4]: OK

    subtest ’Test 4 Detect all MICs’ passed

     node: computej-[1-4]: OK

...

20 Modules passed:

   clock dgemm disk_bandwidth ethernet hardware hpl libraries miccheck micinfo mount mpi_internode mpi_local packages ping process remote_login shells storage stream tools

0 Modules failed

Check has SUCCEEDED

Log saved in: /var/log/intel/clck/clck-20121115-202144.log

Total elapsed time: 0:03:12

Clusters are a challenge to build or manage but an Intel Cluster Ready certified platform with Intel Cluster Checker validation tools helps make the process more predictable and reliable.

Summary

A key goal in making the Intel Xeon Phi coprocessor a readily usable platform for high performance parallel programs is to allow developers and operational personnel the ability to focus on the delivering applications and solutions, not on learning new and different ways to interact with a new product. Adapting a standard Linux operating system to the coprocessor is at the heart of making this goal a reality.

In this chapter we provided insight into how this goal was achieved and also the tools and mechanisms allowing the coprocessor Linux operating system to be configured and booted to successfully operate as a standalone Linux computer in almost any network configuration, including the desired file system topology and the software required for target applications.

We also discussed some of the tools and considerations in using the coprocessor in cluster environments, including the Intel Cluster Ready certification program and the Intel Cluster Checker tool that allows confirming a cluster is operating properly.

In this chapter along with Chapters 8 and 9, we explained various architectural and operational aspects of the Intel Xeon Phi coprocessor to give you important foundational elements that enable high performance applications to run on the coprocessor. In the upcoming three chapters, we will now switch focus to some of the libraries and development tools that can enable you to more easily develop high performance parallel applications.

For more information

Here are some additional reading materials we recommend related to this chapter.

• Intel Manycore Platform Software (Intel MPSS) documentation and downloads, http://intel.com/software/mic

• Linux information, http://linux.org

• Linux kernel source distributions, http://kernel.org

• Information on Intel Clusters, the Intel Cluster Ready Program, and related tools: http://www.intel.com/go/cluster

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset