Introduction to InfiniBand on System z
In this chapter, we introduce the InfiniBand architecture and technology and discuss the advantages that InfiniBand brings to a Parallel Sysplex environment compared to earlier coupling technologies. InfiniBand is a powerful and flexible interconnect technology designed to provide connectivity for large server infrastructures, and it plays a vital role in the performance, availability, and cost-effectiveness of your Parallel Sysplex.
In this chapter, we discuss the following topics:
InfiniBand architecture
IBM System z InfiniBand implementation
Advantages of InfiniBand
The importance of an efficient coupling infrastructure
Terminology
 
Note: This document reflects the enhancements that were announced for IBM zEnterprise 196 on July 12, 2011 and delivered with Driver Level 93.
Any z196 that is using Driver 86 must be migrated to Driver 93 before an upgrade to add HCA3 adapters can be applied. Therefore, this document reflects the rules and capabilities for a z196 at Driver 931 or later CPCs.

1 For more information about Driver levels, see Appendix B, “Processor driver levels” on page 245.
1.1 Objective of this book
This book is a significant update to a previous introductory edition. Since that edition was published several years ago, IBM has made many announcements related to InfiniBand on System z. We also have more experience with implementing InfiniBand in large production environments. We provide that information here so that all clients can benefit from those that have gone before them. Finally, the focus of this book has changed somewhat, with less emphasis on InfiniBand architecture and more focus on how InfiniBand is used in a System z environment.
1.2 InfiniBand architecture
The use, management, and topology of InfiniBand links is significantly different from traditional coupling links, so a brief explanation of InfiniBand architecture is useful before continuing on to the rest of the book.
InfiniBand background and capabilities
In 1999, two competing I/O standards called Future I/O (developed by Compaq, IBM, and Hewlett-Packard) and Next Generation I/O (developed by Intel, Microsoft, and Sun) merged into a unified I/O standard called InfiniBand. The InfiniBand Trade Association (IBTA) is the organization that maintains the InfiniBand specification. The IBTA is led by a steering committee staffed by members of these corporations. The IBTA is responsible for compliance testing of commercial products, a list of which can be found at:
InfiniBand is an industry-standard specification that defines an input and output architecture that can be used to interconnect servers, communications infrastructure equipment, storage, and embedded systems. InfiniBand is a true fabric architecture that leverages switched, point-to-point channels with data transfers up to 120 Gbps, both in chassis backplane applications and through external copper and optical fiber connections.
InfiniBand addresses the challenges that IT infrastructures face. Specifically, InfiniBand can help you in the following ways:
Superior performance
InfiniBand provides superior latency performance and products, supporting up to 120 Gbps connections.
Reduced complexity
InfiniBand allows for the consolidation of multiple I/Os on a single cable or backplane interconnect, which is critical for blade servers, data center computers and storage clusters, and embedded systems.
Highest interconnect efficiency
InfiniBand was developed to provide efficient scalability of multiple systems. InfiniBand provides communication processing functions in hardware, thereby relieving the processor of this task, and it enables full resource utilization of each node added to the cluster.
In addition, InfiniBand incorporates Remote Direct Memory Access (RDMA), which is an optimized data transfer protocol that further enables the server processor to focus on application processing. RDMA contributes to optimal application processing performance in server and storage clustered environments.
Reliable and stable connections
InfiniBand provides reliable end-to-end data connections. This capability is implemented in hardware. In addition, InfiniBand facilitates the deployment of virtualization solutions that allow multiple applications to run on the same interconnect with dedicated application partitions.
1.2.1 Physical layer
The physical layer specifies the way that the bits are put on the wire in the form of symbols, delimiters, and idles. The InfiniBand architecture defines electrical, optical, and mechanical specifications for this technology. The specifications include cables, receptacles, and connectors and how they work together, including how they need to behave in certain situations, such as when a part is hot-swapped.
Physical lane
InfiniBand is a point-to-point interconnect architecture developed for today’s requirements for higher bandwidth and the ability to scale with increasing bandwidth demand. Each link is based on a two-fiber 2.5 Gbps bidirectional connection for an optical (fiber cable) implementation or a four-wire 2.5 Gbps bidirectional connection for an electrical (copper cable) implementation. This 2.5 Gbps connection is called a physical lane.
Each lane supports multiple transport services for reliability and multiple prioritized virtual communication channels. Physical lanes are grouped in support of one physical lane (1X), four physical lanes (4X), eight physical lanes (8X), or 12 physical lanes (12X).
InfiniBand currently defines bandwidths at the physical layer. It negotiates the use of:
Single Data Rate (SDR), delivering 2.5 Gbps per physical lane
Double Data Rate (DDR), delivering 5.0 Gbps per physical lane
Quadruple Data Rate (QDR), delivering 10.0 Gbps per physical lane
Bandwidth negotiation determines the bandwidth of the interface on both sides of the link to determine the maximum data rate (frequency) achievable based on the capabilities of either end and interconnect signal integrity.
In addition to the bandwidth, the number of lanes (1X, 4X, 8X, or 12X) is negotiated, which is a process in which the maximum achievable bandwidth is determined based on the capabilities of either end.
Combining the bandwidths with the number of lanes gives the link or signaling rates that are shown in Table 1-1.
Table 1-1 Interface width and link ratings
Width
Single Data Rate
Double Data Rate1
Quadruple Data Rate
1X
2.5 Gbps
5.0 Gbps
10 Gbps (1 GBps)
4X
10.0 Gbps (1 GBps)
20.0 Gbps (2 GBps)
40 Gbps (4 GBps)
8X
20.0 Gbps (2 GBps)
40.0 Gbps (4 GBps)
80 Gbps (8 GBps)
12X
30.0 Gbps (3 GBps)
60.0 Gbps (6 GBps)
120 Gbps (12 GBps)

1 All InfiniBand coupling links on IBM z10™ and later CPCs use Double Data Rate.
 
Important: The quoted link rates are only theoretical. The message architecture, link protocols, CF utilization, and CF MP effects make the effective data rate lower than these values.
Links use 8 B/10 B encoding (every 10 bits sent carry 8 bits of data), so that the useful data transmission rate is four-fifths the signaling or link rate (signaling and link rate equal the raw bit rate). Therefore, the 1X single, double, and quad rates carry 2 Gbps, 4 Gbps, or 8 Gbps of useful data, respectively.
In this book we use the following terminology:
Data rate This is the data transfer rate expressed in bytes where one byte equals eight bits.
Signaling rate This is the raw bit rate expressed in bits.
Link rate This is equal to the signaling rate expressed in bits.
We use the following terminology for link ratings. Notice that the terminology is a mix of standard InfiniBand phrases and implementation wording:
12X IB-SDR
This uses 12 lanes for a total link rate of 30 Gbps. It is a point-to-point connection with a maximum length of 150 meters.
12X IB-DDR
This uses 12 lanes for a total link rate of 60 Gbps. It is a point-to-point connection with a maximum length of 150 meters.
1X IB-SDR LR (Long Reach)
This uses one lane for a total link rate of 2.5 Gbps. It supports an unrepeated distance of up to 10 km1, or up to 175 km2 with a qualified DWDM solution.
1X IB-DDR LR (Long Reach)
This uses one lane for a total link rate of 5 Gbps. It supports an unrepeated distance of up to 10 km 1, or up to 175 km2 with a qualified DWDM solution.
The link and physical layers are the interface between the packet byte stream of higher layers and the serial bit stream of the physical media. Physically, you can implement the media as 1, 4, 8, or 12 physical lanes. The packet byte stream is striped across the available physical lanes and encoded using the industry standard 8 B/10 B encoding method that is also used by Ethernet, FICON or Fibre Channel CONnection, and Fibre Channel.
 
Note: There is no relationship between the number of CHPIDs associated with an InfiniBand port and the number of lanes that will be used. You can potentially assign 16 CHPIDs to a port with only one lane, or assign only one CHPID to a port with 12 lanes, and the signals will be spread over all 12 lanes.
Virtual lanes
InfiniBand allows for multiple independent data streams over the same physical link, which are called virtual lanes (VLs). VLs are separate logical flows with their own buffering. They allow more efficient and speedier communications between devices because no buffer or task can slow down the communication on the physical connection. InfiniBand supports up to 16 virtual lanes (numbered 0 to 15).
 
Note: There is no relationship between virtual lanes and CHPIDs. The fact that you can have up to 16 of each is coincidental.
1.3 IBM System z InfiniBand implementation
As you can see, InfiniBand is an industry architecture. It is supported by many vendors, and each vendor might have its own unique way of implementing or exploiting it. This section describes how InfiniBand is implemented on IBM System z CPCs.
1.3.1 Host channel adapters
Host channel adapters (HCAs) are physical devices in processors and I/O equipment that create and receive packets of information. The host channel adapter is a programmable Direct Memory Access (DMA) engine that is able to initiate local memory operations. The DMA engine offloads costly memory operations from the processor, because it can access system memory directly for reading and writing independently from the central processor. This enables the transfer of data with significantly less CPU overhead. The CPU initiates the transfer and switches to other operations while the transfer is in progress. Eventually, the CPU receives an interrupt after the transfer operation has been completed.
A channel adapter has one or more ports. Each port has its own set of transmit and receive buffers that enable the port to support multiple simultaneous send and receive operations. For example, the host channel adapter ports provide multiple communication interfaces by providing separate send and receive queues for each CHPID. Figure 1-1 shows a schematic view of the host channel adapter.
A host channel adapter provides an interface to a host device and supports “verbs” defined to InfiniBand. Verbs describe the service interface between a host channel adapter and the software that supports it. Verbs allow the device driver and the hardware to work together.
Figure 1-1 Host channel adapter
1.3.2 Processor-specific implementations
 
 
Note: At the time of writing, System z9 CPCs are withdrawn from marketing, meaning that if you have a z9 and it does not already have InfiniBand adapters on it, it is no longer possible to purchase those adapters from IBM. For this reason, this book focuses on IBM System z10® and later CPCs.
System z10 and subsequent CPCs exploit InfiniBand for internal connectivity and CPC-to-CPC communication. System z9 CPCs only use InfiniBand for CPC-to-CPC communication. Fabric components, such as routers and switches, are not supported in these environments, although qualified DWDMs can be used to extend the distance of 1X InfiniBand links.
System z CPCs take advantage of InfiniBand technology in the following ways:
On System z10 and later CPCs, for CPC-to-I/O cage connectivity, InfiniBand, which includes the InfiniBand Double Data Rate (IB-DDR) infrastructure, replaces the Self-Timed Interconnect (STI) features found in prior System z CPCs.
Parallel Sysplex InfiniBand (PSIFB) 12X links (both IFB and IFB3 mode) support point-to-point connections up to 150 meters (492 feet).
Parallel Sysplex InfiniBand (PSIFB) Long Reach links (also referred to as 1X links) support point-to-point connections up to 10 km unrepeated (up to 20 km with RPQ 8P2340), or up to 175 km when repeated through a Dense Wave Division Multiplexer (DWDM) and normally replace InterSystem Channel (ISC-3). The PSIFB Long Reach feature is not available on System z9.
Server Time Protocol (STP) signals are supported on all types of PSIFB links.
 
Note: InfiniBand is used for both coupling links and internal processor-to-I/O cage (and process-to-drawer) connections in System z10 and later CPCs.
However, you do not explicitly order the InfiniBand fanouts that are used for processor-to-I/O cage connections; the number of those fanouts that you need will depend on the I/O configuration of the CPC. Because the focus of this book is on the use of InfiniBand links for coupling and STP, we do not go into detail about the use of InfiniBand for internal connections.
Host channel adapter types on System z CPCs
System z CPCs provide a number of host channel adapter types for InfiniBand support:
HCA1-O A host channel adapter that is identified as HCA1-O (Feature Code (FC) 0167) provides an optical InfiniBand connection on System z93. HCA1-O is used in combination with the 12X IB-SDR link rating to provide a link rate of up to 3 GBps.
HCA2-C A host channel adapter that is identified as HCA2-C (FC 0162) provides a copper InfiniBand connection from a book to I/O cages and drawers on a System z10, zEnterprise 196, or zEnterprise 114.
HCA2-O A host channel adapter that is identified as HCA2-O (FC 0163) provides an optical InfiniBand connection.
HCA2-O supports connection to:
HCA1-O For connection to System z9.
HCA2-O For connection to System z10 or later CPCs.
HCA3-O For connection to zEnterprise 196 and later CPCs.
HCA2-O is used in combination with the 12X IB-DDR link rating to provide a link rate of up to 6 GBps between z10 and later CPCs and up to 3 GBps when connected to a z9 CPC.
HCA2-O LR4 A host channel adapter that is identified as HCA2-O LR (FC 0168) provides an optical InfiniBand connection for long reach coupling links for System z10 and later CPCs. HCA2-O LR is used in combination with the 1X IB-DDR link rating to provide a link rate of up to 5 Gbps. It automatically scales down to 2.5 Gbps (1X IB-SDR) depending on the capability of the attached equipment. The PSIFB Long Reach feature is available only on System z10 and later CPCs.
HCA3-O A host channel adapter that is identified as HCA3-O (FC 0171) provides an optical InfiniBand connection to System z10 and later CPCs. A HCA3-O port can be connected to a port on another HCA3-O adapter, or a HCA2-O adapter. HCA3 adapters are only available on zEnterprise 196 and later CPCs.
HCA3-O adapters are used in combination with the 12X IB-DDR link rating to provide a link rate of up to 6 GBps. A port on a HCA3-O adapter can run in one of two modes:
IFB mode This is the same mode that is used with HCA2-O adapters and offers equivalent performance.
IFB3 mode This mode is only available if the HCA3-O port is connected to another HCA3-O port and four or fewer CHPIDs are defined to share that port. This mode offers improved performance compared to IFB mode5.
HCA3-O LR A host channel adapter that is identified as HCA3-O LR (FC 0170) provides an optical InfiniBand connection for long reach coupling links. The adapter is available for z196, z114, and later CPCs, and is used to connect to:
HCA2-O LR For connection to z10, and z196.
HCA3-O LR For connection to z196, z114, and later CPCs.
HCA3-O LR is used in combination with the 1X IB-DDR link rating to provide a link rate of up to 5 Gbps. It automatically scales down to 2.5 Gbps (1X IB-SDR) depending on the capability of the attached equipment. This adapter also provides additional ports (four ports versus two ports on HCA2-O LR adapters).
1.4 InfiniBand benefits
System z is used by enterprises in different industries in different ways. It is probably fair to say that no two mainframe environments are identical. System z configurations span from sysplexes with over 100,000 MIPS to configurations with only one or two CPCs. Various configurations intensively exploit sysplex capabilities for data sharing and high availability. Others exploit simply the resource sharing functions. There are enterprises that run a single sysplex containing both production and non-production systems. Others have multiple sysplexes, each with a different purpose.
InfiniBand addresses the requirements of all these configurations. Depending on your configuration and workload, one or more InfiniBand attributes might particularly interest you.
The benefits that InfiniBand offers compared to previous generation of System z coupling technologies are listed here:
The ability to have ICB4-levels of performance for nearly all in-data-center CPC coupling connections.
ICB4 links are limited to 10 meters, meaning that the maximum distance between connected CPCs is limited to about 7 meters. As a result, many installations wanting to use ICB4 links were unable to because of physical limitations on how close the CPCs would be located to each other.
InfiniBand 12X links can provide performance similar to or better than ICB4 links, and yet support distances of up to 150 meters. It is expected that InfiniBand 12X links will be applicable to nearly every configuration where the CPCs being connected are in the same data center. This is designed to result in significant performance (and overhead) improvements for any installation that was forced to use ISC links in the past.
The ability to provide coupling connectivity over large distances with performance that is equivalent to or better than ISC3 links, but with significantly fewer links.
HCA2-O LR and HCA3-O LR 1X links on z196 and later support either 7 or 32 subchannels and link buffers6 per CHPID, depending on the Driver level of the CPCs at both ends of the link. For long-distance sysplexes, the use of 32 subchannels means that fewer links are required to provide the same number of subchannels and link buffers than is the case with ISC3 links. And if 64 subchannels (two CHPIDs with 32 subchannels each) are not sufficient, additional CHPIDs can be defined to use the same link (in the past, the only way to add CHPIDs was to add more physical links).
For a long-distance sysplex, the ability to deliver the same performance with fewer links might translate to fewer DWDM ports or fewer dark fibers for unrepeated links. Also, fewer host channel adapters might be required to deliver the same number of subchannels. Both of these characteristics can translate into cost savings.
The ability to more cost effectively handle peak CF load situations.
Because InfiniBand provides the ability to assign multiple CHPIDs to a single port, you can potentially address high subchannel utilization or high path busy conditions by adding more CHPIDs (and therefore more subchannels) to a port. This is a definition-only change; no additional hardware is required, and there is no financial cost associated with assigning another CHPID to an InfiniBand port.
The IBM experience has been that many clients with large numbers of ICB4 links do not actually require that much bandwidth. The reason for having so many links is to provide more subchannels to avoid delays caused by all subchannels or link buffers being busy during workload spikes. You might find that the ability to assign multiple CHPIDs to an InfiniBand port means that you actually need fewer InfiniBand ports than you have ICB4 links today.
Every Parallel Sysplex requires connectivity from the z/OS systems in the sysplex to the CFs being used by the sysplex. Link types prior to InfiniBand cannot be shared across sysplexes, meaning that every sysplex required its own set of links.
Although InfiniBand does not provide the ability to share CHPIDs across multiple sysplexes, it does provide the ability to share links across sysplexes. Because InfiniBand supports multiple CHPIDs per link, multiple sysplexes can each have their own CHPIDs on a shared InfiniBand link. For clients with large numbers of sysplexes, this can mean significant savings in the number of physical coupling links that must be provided to deliver the required connectivity.
zEnterprise 196 and later support larger numbers of CF link CHPIDs (increased to 128 CHPIDs from the previous limit of 64 CHPID). The InfiniBand ability to assign multiple CHPIDs to a single link helps you fully exploit this capability7.
1.5 The importance of an efficient coupling infrastructure
Efficient systems must provide a balance between CPU performance, memory bandwidth and capacity, and I/O capabilities. However, semiconductor technology evolves much faster than I/O interconnect speeds, which are governed by mechanical, electrical, and speed-of-light limitations, thus increasing the imbalance and limiting overall system performance. This imbalance suggests that I/O interconnects must change to maintain balanced system performance.
Each successive generation of System z CPC is capable of performing more work than its predecessors. To keep up with the increasing performance, it is necessary to have an interconnect architecture that is able to satisfy the I/O interconnect requirements that go along with it. InfiniBand offers a powerful interconnect architecture that by its nature is better able to provide the necessary I/O interconnect to keep the current systems in balance.
Table 1-2 highlights the importance that link technology plays in the overall performance and efficiency of a Parallel Sysplex. The cells across the top indicate the CPC where z/OS is running. The cells down the left side indicate the type of CPC where the CF is running and the type of link that is used to connect z/OS to the CF.
Table 1-2 Coupling z/OS CPU cost
CF/Host
z10 BC
z10 EC
z114
z196
zBC12
zEC12
z10 BC ISC3
16%
18%
17%
21%
19%
24%
z10 BC 1X IFB
13%
14%
14%
17%
18%
19%
z10 BC 12X IFB
12%
13%
13%
16%
16%
17%
z10 BC ICB4
10%
11%
NA
NA
NA
NA
z10 EC ISC3
16%
17%
17%
21%
19%
24%
z10 EC 1X IFB
13%
14%
14%
17%
17%
19%
z10 EC 12X IFB
11%
12%
12%
14%
14%
16%
z10 EC ICB4
9%
10%
NA
NA
NA
NA
z114 ISC3
16%
18%
17%
21%
19%
24%
z114 1X IFB
13%
14%
14%
17%
17%
19%
z114 12X IFB
12%
13%
12%
15%
15%
17%
z114 12X IFB3
NA
NA
10%
12%
12%
13%
z196 ISC3
16%
17%
17%
21%
19%
24%
z196 1X IFB
13%
14%
13%
16%
16%
18%
z196 12X IFB
11%
12%
11%
14%
14%
15%
z196 12X IFB3
NA
NA
9%
11%
10%
12%
zBC12 1X IFB
14%
15%
14%
18%
17%
20%
zBC12 12X IFB
13%
13%
12%
15%
14%
17%
zBC12 12X IFB3
NA
NA
10%
11%
11%
12%
zEC12 1X IFB
13%
13%
13%
16%
16%
18%
zEC12 12X IFB
11%
11%
11%
13%
13%
15%
zEC12 12X IFB3
NA
NA
9%
10%
10%
11%
These values are based on 9 CF requests per second per MIPS.
The XES Synch/Async heuristic algorithm effectively caps overhead at about 18%.
To determine the z/OS CPU cost associated with running z/OS on a given CPC and using a CF on a given CPC, find the column that indicates the CPC your z/OS is on, and the row that contains your CF and the type of link that is used. For example, if you are running z/OS on a z10 EC, connected to a z10 EC CF using ISC3 links and performing 9 CF requests per second per MIPS, the overhead is 17%.
The overhead reflects the percent of available CPC cycles that are used to communicate with the CF. A given CF with a given link type will deliver a certain average response time. For a given response time, a faster z/OS CPC is able to complete more instructions in that amount of time than a slower one. Therefore, as you move z/OS to a faster CPC, but do not change the CF configuration, the z/OS CPU cost (in terms of “lost” CPU cycles) increases. Using the table, you can see that upgrading the z/OS CPC from a z10 EC to a faster CPC (a z196) increases the cost to 21%8.
To keep the z/OS CPU cost at a consistent level, you also need to reduce the CF response time by a percent that is similar to the percent increase in the z/OS CPU speed. The most effective way to address this is by improving the coupling technology. In this example, if you upgrade the CF to a z196 with the same link type (ISC3), the cost remains about the same (21%). Replacing the ISC3 links with 12X IFB links further reduces the response time, resulting in a much larger reduction in the cost, to 14%. And replacing the ISC3 links with 12X IFB3 links reduces the cost further, to 11%.
These z/OS CPU cost numbers are based on a typical data sharing user profile of 9 CF requests per MIPS per second. The cost scales with the number of requests. For example, if your configuration drives 4.5 CF requests per MIPS per second, the cost is 50% of the numbers in Table 1-2 on page 9.
To further illustrate the relationship between coupling link types and response times, Table 1-3 contains information about expected response times for different link types and different types of requests on z10, z196, and EC12 CPCs.
Table 1-3 Expected CF synchronous response time ranges
 
ISC3
PSIFB 1X
 
ICB-4
PSIFB 12X IFB
PSIFB 12X IFB3
 
ICP
zEC12
 
 
 
 
 
 
 
 
Lock request
20-30
12-16
 
N/A
10-14
5-8
 
2-6
Cache/List request
25-40
14-24
 
N/A
13-17
7-9
 
4-8
z196
 
 
 
 
 
 
 
 
Lock request
20-30
14-17
 
N/A
10-14
5-8
 
2-8
Cache/List request
25-40
16-25
 
N/A
14-18
7-9
 
4-9
z10
 
 
 
 
 
 
 
 
Lock request
20-30
14-18
 
8-12
11-15
N/A
 
3-8
Cache/List request
25-40
18-25
 
10-16
15-20
N/A
 
6-10
These represent average numbers. Many factors (distance, CF CPU utilization, link utilization, and so on) can impact the actual performance. However, you can see a similar pattern in this table to Table 1-3; that is, faster link types deliver reduced response times, and those reduced response times can decrease the z/OS CPU cost of using a CF with that link type.
1.5.1 Coupling link performance factors
Note that there is a difference between speed (which is typically observed through the CF service times) and bandwidth.
Consider the example of a 2-lane road and a 6-lane highway. A single car can travel at the same speed on both roads. However, if 1000 cars are trying to traverse both roads, they will traverse the highway in much less time than on the narrow road. In this example, the bandwidth is represented by the number of lanes on the road. The speed is represented by the ability of the car to travel at a given speed (because the speed of light is the same for all coupling link types that exploit fiber optic cables9).
To take the analogy a step further, the time to traverse the road depends partially on the number of lanes that are available on the entry to the highway. After the traffic gets on to the highway, it will tend to travel at the speed limit. However, if many cars are trying to get on the highway and there is only a single entry lane, there will be a delay for each car to get on the highway. Similarly, the time to get a large CF request (a 64 KB DB2 request, for example) into a low-bandwidth link will be significantly longer than that required to place the same request into a higher bandwidth link.
Therefore, the “performance” of a coupling link is a combination of:
The type of requests being sent on the link (large or small, or some specific mix of short-running or long-running).
The bandwidth of the link (this becomes more important for requests with large amounts of data, or if there is a significantly large volume of requests).
The technology in the card or adapter that the link is connected to.
The distance between the z/OS system and the CF.
The number of buffers associated with the link.
Another aspect of the performance of CF requests that must be considered is, how many CF requests do your systems issue and how does that affect the cost of using the CF? Changing from one link type to another is expected to result in a change in response times. How that change impacts your systems depends to a large extent on the number and type of requests that are being issued.
If your CF is processing 1000 requests a second and the synchronous response time decreases by 10 microseconds, that represents a savings of .01 seconds of z/OS CPU time per second across all the members of the sysplex, which is a change that is unlikely to even be noticeable.
However, if your CF processes 200,000 synchronous requests a second and the synchronous response time improves by just half that amount (5 microseconds), that represents a saving of one second of z/OS CPU time per second; that is, a savings of one z/OS engine’s worth of capacity.
Using this example, you can see that the impact of CF response times on z/OS CPU utilization is heavily influenced by the number and type of requests being sent to the CF; the larger the number of requests, the more important the response time is.
This section illustrates the importance of using the best performing coupling links possible. However, the coupling link connectivity in many enterprises has not kept up with changes to the z/OS CPCs, resulting in performance that is less than optimal.
As you migrate from your current link technology to InfiniBand links, you are presented with an ideal opportunity to create a coupling infrastructure that delivers the optimum performance, flexibility, availability, and financial value. The primary objective of this book is to help you make the best of this opportunity.
1.5.2 PSIFB 12X and 1X InfiniBand links
As stated previously, there are two bandwidths available for System z InfiniBand links: 12X and 1X.
12X links support a maximum distance of 150 meters. It is expected that anyone with a need to connect two CPCs within a single data center is able to exploit 12X InfiniBand links.
1X links support larger distances, and therefore are aimed at enterprises with a need to provide coupling connectivity between data centers.
The InfiniBand enhancements announced in July 2011 further differentiate the two types of links. The new HCA3-O 12X adapters were enhanced to address the high bandwidth/low response time needs that typically go with a sysplex that is contained in a single data center. Specifically, they support a new, more efficient, protocol that enables reduced response times, and the ability to process a larger number of requests per second.
The InfiniBand 1X adapters were enhanced in a different way. Sysplexes that span large distances often experience high response times, resulting in high subchannel and link buffer utilization. However, because each subchannel and link buffer can only handle one CF request at a time, the utilization of the fiber between the two sites tends to be quite low.
To alleviate the impact of high subchannel and link buffer utilization, Driver 93 delivered the ability to specify 32 subchannels and link buffers per CHPID for 1X links on z196 and later CPCs. This provides the ability to process more requests in parallel without requiring additional physical links. Additionally, because of the greatly increased capability to handle more concurrent requests on each CHPID, the HCA3-O LR adapters have four ports rather than two. This allows you to connect to more CPCs with each adapter, while still supporting more concurrent requests to each CPC than was possible with the previous two-port adapter.
 
Note: IBM recommends specifying seven subchannels per CHPID for coupling links between CPCs in the same site. For links that will span sites, it is recommended to specify 32 subchannels per CHPID.
1.6 Terminology
Before the availability of InfiniBand coupling links, there was a one-to-one correspondence between CF link CHPIDs and the actual link. As a result, terms such as link, connection, port, and CHPID tended to be used interchangeably. However, because InfiniBand supports the ability to assign multiple CHPIDs to a single physical link, it becomes much more important to use the correct terminology. To avoid confusion, the following list describes how common terms are used in this book:
CF link Before InfiniBand links, there was a one-to-one correspondence between CF links and CF link CHPIDs. As a result, the terms were often used interchangeably. However, given that InfiniBand technology supports multiple CHPIDs sharing a given physical connection, it is important to differentiate between CF link CHPIDs and CF links. In this book, to avoid confusion, we do not use the term “CF link” on its own.
CF link CHPID A CF link CHPID is used to communicate between z/OS and CF, or between two CFs. A CF link CHPID can be associated with one, and only one, coupling link. However, an InfiniBand coupling link can have more than one CF link CHPID associated with it.
Coupling link When used on its own, “coupling link” is used generically to describe any type of link that connects z/OS-to-CF, or CF-to-CF, or is used purely for passing STP timing signals. It applies to all link types: PSIFB, ICB4, ISC3, and ICP.
Timing-only link This is a link that is used to carry only STP signals between CPCs.
CPCs that are in the same Coordinated Timing Network (CTN) must be connected by some type of coupling link. If either of the CPCs connected by a coupling link contain a CF LPAR, the CHPIDs associated with all links between those CPCs must be defined in hardware configuration definition (HCD) as Coupling Link CHPIDs. If neither CPC contains a CF LPAR, the CHPIDs must be defined as timing-only link CHPIDs. You cannot have both coupling links and timing-only links between a given pair of CPCs.
Port A port is a receptacle on an HCA adapter into which an InfiniBand cable is connected. There is a one-to-one correspondence between ports and InfiniBand links. Depending on the adapter type, an InfiniBand adapter will have either two or four ports.
PSIFB coupling links This refers generically to both 1X and 12X PSIFB links.
12X InfiniBand links This refers generically to both IFB and IFB3-mode 12X links.
1X InfiniBand links This refers generically to HCA2-O LR and HCA3-O LR links.
12X IFB links This refers to InfiniBand links connected to HCA1-O adapters, HCA2-O adapters, or HCA3-O adapters when running in IFB mode.
12X IFB3 links This refers to InfiniBand links where both ends are connected to HCA3-O adapters, and that are operating in IFB3 mode.
Gbps or GBps The convention is that the bandwidth of ISC3 and 1X PSIFB links is described in terms of Gigabits per second, and the bandwidth of ICB4 and 12X PSIFB links is described in terms of Gigabytes per second.
Subchannel In the context of coupling links, a subchannel is a z/OS control block that represents a link buffer. Each z/OS LPAR that shares a CF link CHPID will have one subchannel for each link buffer associated with that CHPID.
Link buffer Every CF link CHPID has a number of link buffers associated with it. The number of link buffers will be either 7 or 32, depending on the adapter type, the Driver level of the CPC, and the type of adapter and driver level of the other end of the associated coupling link. Link buffers reside in the link hardware.
For more information about the relationship between subchannels and link buffers, see Appendix C, “Link buffers and subchannels” on page 247.
System z server This refers to any System z CPC that contains either a z/OS LPAR or CF LPAR or both and, in the context of this book, supports InfiniBand links.
zEC12 This is the short form of zEnterprise System zEC12.
zBC12 This is the short form of zEnterprise System zBC12.
z196 This is the short form of zEnterprise System 196.
z114 This is the short form of zEnterprise System 114.
zEnterprise server This refers to both zEnterprise System 196 and zEnterprise System 114.
z10 or System z10 This refers to both System z10 EC and System z10 BC.
z9 or System z9 This refers to both System z9 EC and System z9 BC.
CPC Many different terms have been used to describe the device that is capable of running operating systems such z/OS or IBM z/VM®, namely, server, CPU, CEC, CPC, machine, and others. For consistency, we use the term “CPC” throughout this book. One exception is in relation to STP, where we continue to use the term “server” to be consistent with the terminology on the Hardware Management Console (HMC), the Support Element (SE), and the STP documentation.
1.7 Structure of this book
The objective of this book is to help you successfully implement InfiniBand links in a System z environment. To this end, the following chapters are provided:
Chapter 3, “Preinstallation planning” on page 37 describes the information that you need as you plan for the optimum InfiniBand infrastructure for your configuration
Chapter 4, “Migration planning” on page 63 provides samples of what we believe are the most common migration scenarios for clients moving to InfiniBand links.
Chapter 5, “Performance considerations” on page 121 provides information about the results of a number of measurements we conducted, to compare the relative performance of the various coupling link technologies.
Chapter 6, “Configuration management” on page 155 provides information to help you successfully define the configuration you want using HCD.
Chapter 7, “Operations” on page 189 provides information to help you successfully manage an InfiniBand configuration.
The following Redbooks provide information that supplements the information provided in this book:
IBM System z Connectivity Handbook, SG24-5444
Server Time Protocol Planning Guide, SG24-7280
Server Time Protocol Implementation Guide, SG24-7281
Server Time Protocol Recovery Guide, SG24-7380
IBM System z10 Enterprise Class Technical Guide, SG24-7516
IBM System z10 Business Class Technical Overview, SG24-7632
IBM zEnterprise 196 Technical Guide, SG24-7833
IBM zEnterprise 114 Technical Guide, SG24-7954
IBM zEnterprise EC12 Technical Guide, SG24-8049
IBM zEnterprise BC12 Technical Guide, SG24-8138

1 RPQ 8P2340 may be used to increase the unrepeated distance to up to 20 km.
2 The supported repeated distance can vary by DWDM vendor and specific device and features.
3 InfiniBand cannot be used to connect one System z9 CPC to another z9 CPC. Only connection to a later CPC type is supported.
4 HCA2-O LR adapters are still available for z10. However, they have been withdrawn from marketing for z196. On z196, HCA3-O LR functionally replaces HCA2-O LR.
5 The maximum bandwidth for a HCA3-O link is 6 GBps, regardless of the mode. IFB3 mode delivers better response times through the use of a more efficient protocol.
6 The relationship between subchannels and link buffers is described in Appendix C, “Link buffers and subchannels” on page 247.
7 The number of physical coupling links that you can install depends on your CPC model and the number of books that are installed.
8 In practice, the XES heuristic algorithm effectively caps overhead at about 18% by converting longer-running synchronous CF requests to be asynchronous.
9 The speed of light in a fiber is about 2/3 of the speed of light in a vacuum. The speed of a signal in a copper coupling link (ICB4, for example) is about 75% of the speed of light in a vacuum.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset