2
Cooling Servers

2.1. Evolution of cooling for mainframe, midrange and distributed computers from the 1960s to 1990s

The first general purpose digital computer was built by IBM in 1944. This electromechanical device codeveloped with Harvard was the pioneer in computing. Following in 1948, the electronic calculator SSEC, which contained about 21,400 relays and 12,500 vacuum tubes, was at the origin of power consumption criticality considerations and able to deliver compute performance of thousands of calculations a second. The large mainframe computer from 1944, the ENIAC, consisted of 30 separate units, plus power supply and forced-air cooling, and weighed over 30 tons. Its 19,000 vacuum tubes, 1,500 relays and hundreds of thousands of resistors, capacitors and inductors consumed almost 200 kW of electrical power.

By the mid-1950s, transistors had begun to replace vacuum tubes in computers to improve reliability, performance and power consumption and in 1958, IBM announced the 7070 Data Processing System using transistors that incorporated solid-state technology. A big leap was eventually made in the mid-1960s with the introduction of the IBM S/360 that lead to dominance of mainframe computing for a few decades. The processing unit of the IBM S/360 model 85 (IBM Systems Reference Library 1974) was composed of 13 frames with a total footprint of 15 m2 a power consumption of 9.3 kW and with a clock frequency of 12.5 MHz. In addition to the power consumption, the mainframes needed to operate in a controlled temperature environment where ambient temperatures, relative humidity (RH) and environmental particulate levels had to be tightly controlled. The large airflow and cold air temperature started placing a burden on construction of data centers where the mainframes were housed.

In order to deliver power and high levels of capacity, redundancy and reliability of the mainframe, inlet temperatures of 20–23°C and RH of 40–60% had to be maintained within tight tolerances.

Redundancy of the cooling chilled water loop was also essential. The consequence of the expectation of reliable, resilient operation resulted in infrastructure and operational costs for power and cooling becoming the dominant expenses after the acquisition costs of the computer equipment.

A new paradigm began to emerge in the late 1960s that allowed computing on a smaller scale, namely the minicomputers. The pioneer here was the Digital Equipment Corporation (DEC), whose computer was contained in a single 19′′ rack cabinet compared to a mainframe that would fit only in a very large room. The minicomputer provided a large relief to the infrastructure costs. The power consumption of a fully configured PDP-11/70 (PDP-11 1975) was 3.3 kW and could operate in a room with temperature 40°C and 90% RH with a clock frequency of 18 MHz. Computing environments could now be less regulated from a power and cooling perspective.

2.2. Emergence of cooling for scale out computers from 1990s to 2010s

In the 1990s, computer evolved further to be located in the hands of the end customer. This was enabled by processors from Intel and Motorola in the form of personal computer, which could be placed on the desk of the user. The footprint shrank to the size of a large book with the power consumption of the units ranging from about 100–300 W.

This dramatic change was enabled by the computing power reduction of the central processing unit (CPU), which had a power consumption of about 5 W for the Intel 486 processor to 15 W for the popular Pentium processor.

The client computers were capable of operating in a wide variety of environments with limited control of temperature or humidity. Special systems could be designed to work in ambient temperatures of 55°C in factory environments. Higher performing processors using the Intel X86 architecture began to be used in servers that housed the client data, analyzed it and store the information for quick retrieval.

From the early 1990s through 2018, the demand for improved performance kept growing at a nonlinear pace. While silicon manufacturing processes kept reducing the transistor sizes on the silicon to deliver improved performance at a higher frequency, the growth of performance expectations resulted in an increasing demand on the power consumption of the server.

An example of the growth of processor power is shown in Table 2.1 from the 1980s to present time.

Table 2.1. Intel Xeon power and thermal characteristics

Year Processor Processor Size Processor Power (W) Silicon Junction Temperature Processor Lid Temperature Processor PackageType
1988 Intel 486 160 micron 5 100°C 70°C Ceramic PGA
1993 Pentium 120 micron 15 90°C 70°C Ceramic PGA
1995 Pentium Pro 90 micron 30 90°C 70°C Ceramic PGA with Cu Spreader
1997 Pentium II 65 micron 45 90°C 70°C Cartridge with Aluminum Spreader
2000 Pentium III 45 micron 30 70°C 70°C Organic Carrier with exposed bare silicon
2005 Xeon 90 nm 110 65°C 70°C Organic Carrier with Cu Lid
2010 Xeon 65 nm 130 80°C 90°C Organic Carrier with Cu Lid
2012 Xeon 45 nm 145 80°C 90°C Organic Carrier with Cu Lid
2016 Xeon SP 14 nm 205 80°C 90°C Organic Carrier with Cu Lid

The evolution of systems based on the x86 processor technology allowed for systems to be installed in non-standard data center environments that typically used tightly controlled ambient temperatures, humidity levels and air quality. Systems that had a power consumption of 100–200 W in the extreme could be placed in locations that allowed standard power, even as low as 110 V and could be operated reliably without any recourse to high cooling costs for the installation.

Over time though the steady increase in processor, memory, disk and I/O devices have resulted in servers consuming 1–2 kW and even higher depending on the number of processors in the system. Cooling requirements of servers that consume power in this range require careful planning of the installed environment and special consideration of the design of the cooling to allow for reliable operation and minimal energy consumption attributed to the cooling requirements of the system.

The primary cooling requirement for all systems starts with ensuring that the cooling of the processor is reliably done and the silicon junction temperatures are maintained such that long-term operation of the silicon is ensured at the requisite operating frequency and switching voltage. The heat flux generated at the silicon needs to be transmitted from the junction of the processor to the external environment. The heat flux path from the switching transistor to outside of the silicon is shown in Figure 2.1.

Image

Figure 2.1. Heat flux on an air-cooled chip. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

The illustrative figure above shows an example of a processor or chip package. Although the details of the various processor packages can vary significantly based on the processor technology, the heat flux generated by the processor is extracted from the processor package through the TIM1 interface, which can be a thermal grease, cured gel, soldered indium alloy, etc. The heat spreader allows for a tightly coupled heat flux that allows for, in some instances, efficient transmission of heat flux into a heat sink. The main heat flow path from the chip to air is:

The package contribution to this path is determined by the thermal conductivity of TIM1 and of the lid. Advances in the thermal performance of processor packages are ultimately determined by innovations in materials for these components.

There is a synergy in the choice of materials for the TIM and the lid. In order to achieve enhanced thermal conductivities, manufacturers have been substituting metallic particles in thermal greases and gels for the traditional ceramic ones. In order to achieve even higher levels of thermal conductivity, it is necessary to use a TIM1, which is 100% metal, namely a solder. This places further demands on the lid materials. Not only must they have a high thermal conductivity, but also their coefficient of thermal expansion has to be a reasonably close match to silicon in order to maintain the integrity of the solder joint over many thermal cycles.

2.3. Chassis and rack cooling methods

An example of a 1U server that consists of processor, memory, disk and I/O is shown in Figure 2.2. This server is based on the Intel Xeon processor that supports 2 CPUs, 24 Memory DIMMs, 3 I/O slots, 10 drives and 2 power supplies.

Cooling of such a unit is enabled by seven-system fans that allow for front to back airflow through the server. In addition, two power supply fans assist in the cooling of the dense power supply. The typical dimensions of such a 1U server are 400 mm (wide) × 800 mm (deep) × 43 mm (high). The airflow into the server is through the front bezel and the exit airflow typically is ventilation from the back of the server through vertical vent regions, which is enhanced by an additional vent area from the top surface of the server.

Image

Figure 2.2. Lenovo 1U full width SR630 server. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Image

Figure 2.3. Front view of an SR630 server. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Image

Figure 2.4. Rear view of an SR630 server. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

When considering the amount of cooled air necessary to provide cooling for about 1 kW of server power, the thermodynamics of the solution very quickly results in the following requirements. It becomes necessary to provide about 120–150 CFM (cubic feet per minute) of airflow through the unit at the extreme power states. In the example shown, each of the seven-system fans would be required to deliver about ~15–20 CFM of airflow. To deliver this airflow, each cooling fan could consume about ~15 W or about 100 W of power. In effect, 10% of the power consumed by the server could be driven by the cooling requirement of the components.

Efficient system operation demands that the 100 W of cooling power is not a steady value; it is modulated based on actual system power consumption and inlet air temperatures. Control algorithms are used to mitigate the cooling power consumed by system fans. Typical system designs target cooling power as low as 2–3% of system power at idle states in benign ambient air environments, increasing to 5% for heavy workloads. However, as the ambient inlet air temperature increases, a system consuming 20 W of cooling power can increase up to 100 W with steady heavy workloads.

Within a server, the cooling air, which has an energy content associated with its delivery, needs to be efficiently managed. Optimal control of the airflow in the server requires modulation of the system fan speeds as well as calibrated flow control to regions and zones within the server that demand it the most.

The magnitude of airflow for a densely populated high-power server that is 1U high is difficult to sustain if deployed in a dense data center environment. Typical racks allow for placement of about 42 units to fill up the rack space. Nonstandard racks of 48U or 60U are also available now. When fully populated, a 42U rack could expect 6000 CFM of airflow in extreme circumstances. The magnitude of this airflow requirement typically would exceed dense data center footprint air delivery capacity for delivered ambient temperature-controlled air to the rack.

The following relationships between fan rotational speed, power and impeller diameter, which are represented by the Fan laws1, highlight the cooling constraints:

Law 1: With fan diameter (D) held constant:

Law 1a: Flow is proportional to fan speed:

[2.1a]

Law 1b: Pressure or head is proportional to the square of fan speed:

[2.1b]

Law 1c: Power is proportional to the cube of fan speed:

Law 2: With fan rotational speed (N) held constant:

Law 2a: Flow is proportional to the fan impeller diameter:

[2.2a]

Law 2b: Pressure is proportional to the square of the fan impeller diameter:

Law 2c: Power is proportional to the cube of fan impeller diameter (assuming constant fan rotational speed):

[2.2c]

where:

  • – Q is the volumetric flow rate (e.g. CFM or L/s);
  • – D is the impeller diameter (e.g. in or mm);
  • – N is the shaft rotational speed (e.g. rpm);
  • – H is the pressure or head developed by the fan (e.g. psi, Pascal or inA);
  • – P is the fan shaft power (e.g. W).

2.4. Metrics considered for cooling

2.4.1. Efficiency

The following factors have to be considered in cooling solutions for typical air-cooled servers: the volume occupied by the air-moving device (cm3); the power consumed in typical operation (W); the power allocated to the cooling elements (W) (which has to be subtracted from the power supply delivery allocation to the compute elements); and the cost of the solution to the supplier as it is passed on through to the purchaser of the system. Efficiency metrics also include the temperature rise of the air across the server, about a 15°C rise would be considered efficient, although 20°C would be ideal. When high-speed air-moving devices are used, the spinning action of the blades can excite aerodynamic noise and impact disk I/O rates or cause vibration that is transmitted through to the spinning drives causing the device to stop functioning.

Typically, the smaller system fans in 1U servers are not as efficient in airflow delivery, that is, the (CFM/W) of cooling airflow to power used is lower than 2U servers that can use larger diameter fans.

Typical efficiency curves (aka “Fan Curve”) are shown in Figure 2.5 for an 80 mm fan, where the x-axis is the fan air flow in cubic inch per minute (cfm), the left y-axis is the static air pressure in inches of water (inAq) and the right y-axis is the power in watts (W). The bottom curves represent CFM versus pressure, the top curves represent CFM versus power and the green square dots represent the operating points of the fan which is studied.

Image

Figure 2.5. Typical efficiency curve for an air moving device. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Systems design for cooling optimization needs to optimize the essential metrics of efficient cooling design. These include the following: allocated cooling energy/power for the system fans, consumed power for cooling at typical operating temperatures for the server in data center operation, normally in recommended range of 18–27°C, optimized temperature rises of the airflow across the server that is typically up to 20°C between the inlet and the exhaust of the server. The amount of power that is normally allocated to efficient server cooling is about 10% of the system power consumption with true power consumption in the range of 3–5% for efficiently designed systems in the recommended range of operating temperatures for servers.

The driving reason to ensure efficient operation of servers is related to the cooling energy requirement of the server in a data center. Excessive allocation of cooling fan power in the server can lead to power constraints for the servers allocated to a rack. Excessive airflow throughput to a server will result in a very low temperature differential for efficient operation of data center cooling infrastructure such as the CRAH/CRAC/in-row cooling units. Since air cooled data centers could allocate between 40% and 60% of the power for the cooling budget, inefficiency in allocating system fan power and airflow can lead to expensive operational costs for a data center.

Systems such as the 1U server can allocate about 100 W of power to the cooling fans and consume about 15–20 W of cooling power during a high-performance workload. Aggregated at a rack level, a dense configuration has about 4 kW of allocated cooling power or about 1 kW of consumed power. Some data centers allocate only 5–10 kW a rack; in effect cooling optimization is essential for dense compute intensive data center deployment.

2.4.2. Reliability cost

2.4.2.1. Operating temperature and reliability

System design constraints always need to balance the orthogonal vectors of performance, cost of operation and cost of acquisition to the customer, along with reliability of server operation, which is sensitive to operating temperature. There are several studies that describe thermal optimization as it relates to operating system reliability. Standard reliability models use activation energybased approaches, where the Arrhenius constant can define the relative magnitude of failure rates for systems operating at differing temperatures. Rules of thumb, in the past, have identified a 2× increase in failure rate that is attributable to the operating temperature of the device for every 10°C rise in device operating temperature. In the specific instance of the Intel processor operation, for example, there is a definition of the processor thermal load line that maps the required operating junction temperature to the power state of the processor. There is a requirement from the silicon provider that the system maintains the silicon junction temperature versus power state during its operating life (Intel 2018, Chapter 5).

In addition to the impact of operating temperatures of the devices, the RH of the environment can impose a much higher burden on the failure rates of essential components such as the drives, memory, processors and power supplies. For example, a recent study on system operation in a hyperscale data center documented the stronger impact of changes in RH on failure rates of rotating disks compared to the actual operating temperature of the devices.

2.4.2.2. Gaseous and particulate contamination and reliability

In many data centers, there is a desire to improve the cooling efficiency by leveraging free-air or air-side economizer-based cooling. These expectations can amplify the failure modes of systems, if suitable monitoring and controls are not placed on the airflow in the data center. Excessive throughput of air for the systems can amplify the consequence of imperfect environmental controls with the data center. ASHRAE standards do require air quality to be maintained as defined for reliable operation with methods such as corrosion coupons and secondary monitoring essential to reliable operation.

2.4.3. Thermal performance

Each component in the system has the following thermal metrics that must be adhered to ensure functional and reliable operation. In some data sheets, device vendors include an operating temperature of 55°C or 70°C, which is normally defined as an ambient air temperature for the inlet cooling medium for the server. It is essential to determine whether the specified value for the temperature is the air temperature or the device junction/case temperature for an adequate assessment of the cooling requirements.

The values include:

  • – minimum operating temperature: this is the temperature below which the device signal timings are not validated by the supplier. An example of this value is a typical value of 0°C for the processor;
  • – maximum operating temperature: this is the temperature above which the device will not function with acceptable signal quality and will start throttling. It is also called maximum junction temperature. An example of this value is 90–100°C for processors or 95°C for memory;
  • – critical temperature: this is the temperature at which irreparable damage of the hardware will occur. In many instances, the system will power itself off before it gets to this temperature. An example of this value is about 120°C for typical silicon and about 65°C for rotating drives;
  • – reliable operating temperature: this is the expected normal operating limit where the device can operate 24×7 for its stated operating life. Processors can operate reliably typically in the 70–80°C range, while drives expect a 50°C operating temperature, for example. In reliability predictive models for optimized operating temperatures, there is an expectation that the power consumption of the components is not at this limiting thermal design power state for continuous operation. Utilization rates of the devices are assumed to be in the ~50–70% range to determine the power consumption for the state at which the thermal behavior of the device is to be measured against its predicted reliability. In the example of the Intel Xeon 6148 CPU that has its TDP rated at 150 W and its maximum allowed operating temperature of 96°C, the processor is allowed to operate at 86°C associated with typical power state of 100 W.

Cooling solutions need to recognize the expectations for both maximum and typical operating temperatures for a systems design. If a device were to be allowed to operate close to its maximum operating temperature, continuously the operational life of the system can be impacted significantly and this situation should be avoided. While the advantage of 10°C reduction is critical for improved reliability, and further reductions in temperature would improve the reliability further, the consequence of this change can significantly impact the amount of airflow needed to attain this goal.

Figure 2.6 illustrates the specific example of the improvements in airflow needed to reduce the processor operating temperature.

Image

Figure 2.6. Thermal resistance and air flow pressure drop. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

To improve the operating temperature of a 150 W processor by 10°C, the thermal resistance of the eat sink that enables the solution has to bereduced by about 25%. According to Figure 2.6, from the operating point, the airflow value through the heat sink has to be increased from 15 CFM through the heat sink to about 28 CFM. The 85% increase in the airflow requirements can only be satisfied by higher fan speeds, which is accompanied with increase in fan, and hence system, power. In the example of the dense system, the reduction of 10°C can result in an increase in fan power of 40 W per server, in effect a 2× increase in the power consumed by the cooling compared to its normally expected value of 35 W per server. Another aspect to be considered would be the almost 80% increase in airflow requirements, which would result in reducing the average exhaust temperature rise through the server by 50%. The lower exhaust temperature then places an added burden in the data center cooling efficiency. The net result is that optimization in systems cooling needs to be balanced with reliability, energy consumption of the server and efficient operation of the data center.

2.5. Material used for cooling

Air cooling is traditionally achieved by ensuring the most efficient transport mechanism from the silicon to the external ambient air. Since the heat fluxes involved are high typically 10–100 W/cm2, the surface area of the device itself is not adequate to transport all the heat with a reasonable temperature rise on the device. Enhanced surface area for the heat transport to air is essential. This is accomplished by adding heat sinks to the device.

The heat sink material set that is typically available for system that is both economical and of reasonable performance includes aluminum, which is the most widely used, and copper with thermal conductivity values of 180 W/m-K and 380 W/m-K, respectively.

2.6. System layout and cooling air flow optimization

Let us consider now another air-cooled server with a higher density than the 1U SR630 server presented in section 2.3: the Lenovo SD530. The Lenovo SD530 server is a 2U chassis with four air-cooled nodes leading to a ½ U density per node or ½ width nodes. Each node contains two processors from the Intel Xeon processor Scalable family, up to 16 DIMMs, six drive bays, two PCIe slots and two power supplies.

Figure 2.7 shows the 2U chassis, whereas Figure 2.8 shows the node inside.

Image

Figure 2.7. Front view of the 2U Lenovo SD530 server with 4½ width nodes. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Comparing the SR630 and SD530 from a processor configuration, we see a different architecture.

Image

Figure 2.8. Inside view of the ½ width node of a Lenovo SD530. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

As shown in Figure 2.9, the SR630 has a spread core configuration while the SD530, due to a reduced board area, has a shadow core configuration. Shadow core configuration is used when the motherboard is too narrow to allow for a spread core configuration. As we will see in Chapter 5, a shadow core configuration has an impact of the processor’s thermal behavior.

Image

Figure 2.9. Example of a spread core configuration with 12 DIMMs and a shadow core configuration with eight DIMMs

Due to this higher density, although SD530 supports only eight DIMM slots per CPU versus 12 DIMMM slots for SR630, we can expect some power density challenge. For example, the max DC power per chassis of SR630 and SD530 with 150 W TDP SKU, full memory and disk configuration is, respectively, 515 W and 1798 W, leading to a 75% higher power density ratio per U of SD530 versus SR630. To support such high-power density configuration with air cooling, some restrictions and innovations are needed. Here, we present some examples for the SD530 (Lenovo n.d.).

With low TDP SKUs, standard 85 mm heat sinks are used for both CPU1 and CPU2 with 16 DIMMs. For medium TDP SKUs, 105 mm heat sink will be used in CPU1, and for its shape, this heat sink limits the number of DIMMs to 12. Standard 85 mm heat sink will be used for CPU2. For high TDP SKUs, thermal transfer module heat sinks will be used for CPU1. As shown in Figure 2.10, the thermal transfer module is two heat sinks connected together via thermal pipes as shown in the following figure. Due to its shape, this heat sink prevents an adapter (RAID or HBA) from being installed in PCIe slot 1 at the rear of the server and limits the number of DIMMs to 12. CPU 2 will use a larger 102 mm heat sink.

This new thermal transfer module (Artman 2018) improves the heat transfer and lowers the average inlet heat sink with the added remote heat displacement area. As shown in Tables 2.2 and 2.3, it results in lower fan speed, power and acoustic levels compared to standard heat sinks.

Image

Figure 2.10. Thermal transfer module for SD530. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Table 2.2. Comparison of SD530 fan speed, power and acoustics with 205 W TDP SKU

Heat Sink comparision with 205W SKU Fan Power (W) Fan Speed 60mm (rpm) Fan Speed 80mm (rpm) Sound Power (Bel)
Standard Heat Sink (205W TDP) 300 19.500 15.500 9.6
TTM Heat Sink (205W TDP) 175 16.000 12.500 9.1

Table 2.3. Comparison of SD530 fan speed, power and acoustics with 150 W TDP SKU

Heat Sink comparision with 150W SKU Fan Power (W) Fan Speed 60mm (rpm) Fan Speed 80mm (rpm) Sound Power (Bel)
Standard Heat Sink (150W TDP) 135 14.500 11.000 8.8
TTM Heat Sink (150W TDP) 50 9.600 7.500 8.1

Looking at fan speed and fan power, we note fan power varies as fan speed to the power ~2.5, which is close to the power 3 according to the Fan law [2.1c]. Similarly, we note fan speed varies linearly as ~1.28 times the fan diameter, which is close to the coefficient of 1.33 as given by [2.2b].

At a typical data center ambient temperature (25±2°C) with thermal transfer module, the server fan power can be reduced by 85 W for 150 W TDP SKU and sound pressure can be reduced 8.1 Bels, which is about 70% reduction in loudness. The reduced fan power improves the server ITUE, while the reduced server airflow can improve the data center PUE with reduced CRAH fan power.

This last example and the others we presented in this chapter magnify the challenge of air-cooling regarding power density, and Chapter 3 will present various solutions to this problem at the data center level.

  1. 1 Available at: https://en.wikipedia.org/wiki/Affinity_laws [Accessed April 29, 2019].
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset