7
PUE, ERE and TCO of Various Cooling Solutions

7.1. Power usage effectiveness, energy reuse effectiveness and total cost of ownership

7.1.1. Power usage effectiveness and energy reuse effectiveness

When we introduced energy reuse effectiveness (ERE) in [I.3], we used the definition given by Patterson et al. (2010) where ERE measures the benefit from waste heat produced by the data center without taking into account the potential benefit to the data center. In other words, the energy reuse from [I.3] is the waste heat energy dissipated by the IT equipment and reused outside the data center as we will see with National Renewable Energy Laboratory (NREL) Research Support Facility (RSF) in section 7.2.1. With such an approach, ERE is defined by:

[I.3] Image

with:

[7.1]

where ERF is the energy recovery factor:

[7.2]

If we want to measure the impact when waste heat is reused by the data center itself as with adsorption chillers producing cold water (see section 3.7) as done with CoolMUC at Leibnitz Supercomputing Center (LRZ) (see section 7.2.2), we have to introduce a new definition of ERE, which we will call ERE_DC given by:

[7.3]

Where energy-DCreused is the waste heat energy effectively reused by the data center and energy-reused is the net total energy consumed by the data center.

It should be noted that in such a situation:

[7.4]

where PUEERE_DC is the power usage effectiveness (PUE) computed as defined in [I.1] where the total energy is the net total energy consumed by the data center, which is equal to the total energy without reuse minus the energy reused by the data center.

In such a case, as we will see with CoolMUC-2, PUEERE_DC is lower than 1 (see section 7.2.2.3) while Patterson et al. (2010) and Sheppy et al. (2011) thought it cannot be the case.

Total cost of ownership represents the system cost over the life time of the system. Total cost of ownership (TCO) is the sum of CAPEX and OPEX, where CAPEX includes all capital expenditure to install the system (acquisition and installation) and OPEX includes all operating expenditure to run the system (maintenance, electricity, etc.). In the situation where waste heat is not reused, we have:

[7.5]
[7.6]

where data center installation cost includes the price to install or upgrade cooling equipment, which have some importance in TCO.

[7.7]

where operational cost includes maintenance costs and floor space cost per square meter or square feet, which have an impact in TCO when density of servers is taken into consideration to maximize the capacity of the data center.

[7.8]

Where energy costnoruse is the energy cost when waste heat is not reused, total energy is the amount of energy consumed by the computer facility over its life time and electricity price is the price of 1 kW/h.

Substituting PUE definition from [I.1] into [7.8], this becomes:

[7.9] Image

In the case of a hybrid cooling, the overall PUE is computed by:

[7.10]

where PUEair, PUErdhx and PUEdwc are, respectively, the PUE of systems cooled by air like computer air handling units (CRAH) or in-row coolers by chilled water like RDHX and chilled or hot water like direct water cooled (DWC) and where:

[7.11]
[7.12]
[7.13]

As we will see later, these different PUE have different values and have a great impact on TCO.

As TCO has become a critical element in the selection of IT equipment, some organizations introduced it in procurements to select the right solution. Therefore, there is a need to provide a proper way to evaluate TCO before the end of the system life time. To do that, a workload has to be defined that will be used to measure its energy during a short period of time at system delivery, and this energy will have to be scaled up to the life of the system to be introduced in the OPEX part of the TCO.

The easiest way is to use High Performance Linpack (HPL) (Petitet et al. 2018) or another classic benchmark like SPEC Power (SPEC 2008) as the reference workload. But as we have seen, HPL is not representative of real applications since it consumes more power than real workloads. Therefore, some organizations (Boyer 2017) have introduced their own TCO model using a benchmark to measure the energy of the system (including compute nodes, networks and disks) when running this workload, multiplied by the number of times this benchmark can be executed over the life time of the system. In principle, this benchmark should be composed of a suite of workloads, which represent as closely as possible what the system will be executing in production whether it is HPC or AI oriented or a mix of both.

This approach is rather thorough but still does not consider the possible waste heat reuse or the impact of free cooling. This will be discussed in the following sections.

7.1.2. PUE and free cooling

As discussed in section 3.6, free cooling occurs when chillers or mechanical compressors do not need to run since the temperature difference between the outside temperature and the cold water temperature is at least 1°C or 2°C (2–4°F).

To take this effect into consideration in the PUE calculation, we introduce fair, fcold and fwarm as:

[7.14]
[7.15]
[7.16]

And in such case PUE is computed by:

[7.17]

where PUEfree is the PUE of free cooling, which is very close to 1 since free cooling uses only the energy of the pumps. We will see the impact of free cooling on TCO in section 7.5.

7.1.3. ERE and waste heat reuse

When waste heat is reused, we have to introduce a variant of [7.8] where we deduce the energy produced from the waste heat and delivered back to the data center:

[7.18]

where energy costreuse is the energy cost when waste heat is reused by the data center and energy costDCreused is the amount of reused energy delivered to the data center. A typical example of waste heat reused by the data center is the adsorption chillers described in section 3.7.2 to produce cold water from the wasted heat captured by warm water cooling. Such an example will be presented in section 7.2.3.

Substituting in [7.18] “Total energy – energy_DCreused” by “ERE_DC * IT equipment energy” from [7.1], and “IT equipment energy” by “Total energy/PUE” from [I.1], and by the definition of “Energy costnoreuse” [7.8], we have:

[7.19]

where PUE is the system PUE with no heat reuse.

Therefore, when waste heat is not reused according to [7.9], a low PUE is important to reduce the energy cost, while when waste heat is reused according to [7.19], a low ERE_DC/PUE is critical. In the latter case, it is important to note that a low PUE is not as critical as it is when waste heat is no reused as long as ERE_DC is low. For example, in a data center where waste heat is reused, it is more important to get the lowest ERE_DC value than to get a lower PUE value.

We will now present a few examples of data centers and their PUE, ERE and ERE_DC.

7.2. Examples of data centers PUE and EREs

We present two data centers examples, one air-cooled built by NREL in Golden, CO and one water-cooled built by the LRZ in Garching close to Munich in Germany.

7.2.1. NREL Research Support Facility, CO

7.2.1.1. RSF data center

From breakthroughs in fundamental science to new clean technologies to integrated energy systems that power people lives, the mission of the NREL is to work on technologies that will transform the way the United States and the world use energy. In June 2010, the NREL completed construction of the new 22,000 m2/220,000ft2 RSF, which included a 190 m2/1,900 ft2 data center. The RSF was then expanded to 3,600 m2/360,000 ft2 with the RSF expansion wing in December 2011. The original “legacy” data center had annual energy consumption as high as 2,394,000 kWh, which would have exceeded the total building energy goal. As part of meeting the building energy goal, the RSF data center annual energy use had to be approximately 50% less than the legacy data center’s annual energy use.

7.2.1.2. RSF PUE and ERE

Sheppy et al. (2011) present the comparison of a Legacy Data Center and the RSF both managed by NREL where electricity price was $0.057/kWh. RSF was designed to reduce the energy cost of an air-cooled data center. Sheppy et al. (2011) present all the techniques they used to get the best of breed air-cooled data center including neat rack wiring cabling, hot aisle/cold aisle rack arrangements with hot aisle containment and maximum use of free air cooling, since outside air, due to the climate in Golden, allows for much of the data center cooling needs.

The figures in this section are reprinted with permission of the NREL from Sheppy et al. (2011).

Figure 7.1 presents RSF data center power consumption and its different components for the first 11 months of production from October 2010 to August 2011. Figure 7.2 presents RSF hourly PUE for the same period.

Image

Figure 7.1. RSF load profile for the first 11 months of operations. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Image

Figure 7.2. RSF hourly PUE over the first 11 months. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

With such a limited use of cooling energy, RSF annual average PUE of 1.16 was measured for the first 11 months of operations versus a PUE of 2.28 for the Legacy Data Center. Free-air cooling was key to achieving such a low cooling energy and PUE as we can see in Figures 7.1 and 7.2, where low cooling loads correspond to the lowest outside temperature periods and to the lowest PUE of 1.10. We will come back to this aspect in section 7.4 on TCO.

Waste heat was also reused in other parts of the building outside the data center during the working office hours and days and under specific temperature conditions. As RSF had not the appropriate metering equipment to measure ERE, Sheppy et al. (2011) approximated the average ERE to be 0.91 by a detailed energy balance discussion. Figure 7.3 shows the calculated ERE for the RSF data center for the 11-month period where ERE is shown as a function of outdoor air temperature (TOA).

Image

Figure 7.3. RSF ERE as a function of outdoor temperature. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

7.2.2. Leibnitz Supercomputing data center in Germany

7.2.2.1. LRZ mission

LRZ is located in Garching, near Munich, in Germany. It is the IT service provider for the Munich universities and colleges as well as a growing number of research organizations in Munich and throughout Bavaria. In addition, LRZ is a National Supercomputiing Centre for Germany (along with Juelich and Stuttgart) and a European Supercomputing Center (PRACE n.d.).

As LRZ paid the electricity €0.15/kWh in 2012, €0.178/kWh in 2014 and €0.185/kWh in 2018, this was a major reason for LRZ energy-efficient strategy.

7.2.2.2 LRZ data center

The LRZ data center has 3,160 m2 of floor space for IT equipment, 6,393 m2 of floor space for infrastructure, 2 × 10 MW of power supply provided by renewable energy and an average power consumption of 5.5 MW.

The LRZ data center is composed of two cubes housing the two supercomputers we will describe below.

Figure 7.4 presents LRZ data center sections with cooling towers on the roof, supercomputers on the third floor, Linux clusters and general-purpose servers on the second floor, storage and archive/backup on the first floor, cooling and water processing on ground level and power and UPS below.

Image

Figure 7.4. Sectional view of LRZ data center. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Figure 7.5 presents the cooling infrastructure overview for the different floors and systems.

For cooling, LRZ has cold water and hot water distribution circuits. Figure 7.5 presents the cooling infrastructure overview for the different floors and systems.

Image

Figure 7.5. LRZ cooling infrastructure overview. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

The hot water loop is exclusively used for the compute nodes of the supercomputers on the third floor and of the CoolMUC systems on the second floor, and the cold water loop for the remaining elements.

7.2.2.3. LRZ supercomputing systems

LRZ supercomputers from 2012 to 2018 are named SuperMUC.

SuperMUC Phase 1 was installed in 2012. It is composed of 9216 IBM iDataPlex dx360M4 nodes of two sockets Intel Xeon E5-2680 processors (SandyBridge architecture) and 32GB of memory per compute node. It has a peak performance of 3.2 PFlops and was ranked number 4 in the June 2012 Top500 (Top500 Lists n.d.) with 2.987 PFlops. IBM iDataPlex was a DWC node design with inlet water temperature ranging from 20°C to 45°C and a heat-to-water ratio of about 85% depending on the ambient air temperature.

SuperMUC Phase 2 was installed in 2015. It is composed of 3072 Lenovo NeXtScale nx360M5 nodes with two sockets Intel Xeon E5-2697v3 processors (Haswell architecture) and 64GB of memory per compute node. It has a peak performance of 3.6 PFlops and was ranked number 20 in June 2015 Top500 (Top500 Lists n.d.) with 2.8 PFlops. Lenovo NeXtScale is a DWC node design with inlet water temperature ranging from 20°C to 45°C and a heat-to-water ratio of about 75% depending on the ambient air temperature.

From 2012 up to the present, LRZ buildings are heated reusing waste heat from SuperMUC.

In 2015, LRZ installed another but smaller hot-water cooled system, named CoolMUC-2, to further explore waste heat reuse. It is composed of 384 Lenovo NeXtScale nx360M5 nodes with two sockets Intel Xeon E5-2697v3 processors (Haswell architecture). It has a peak performance of 466 TFlops and was ranked number 252 in the November 2015 Top500 (Top500 Lists n.d.) with 366 TFlops. To help LRZ conduct its waste heat reuse project, Lenovo NeXtScale supported inlet water temperature up to 50°C. In 2017, 70 similar compute nodes were added to CoolMUC-2 leading to a configuration with 454 Lenovo NeXtScale nx360M5 nodes with two sockets Intel Xeon E5-2697v3 processors.

In 2018, LRZ is installing a new supercomputer named SuperMUC-NG (“Next Generation”), which will replace SuperMUC. It is composed of 6480 Lenovo ThinkSystem SD650 nodes with two sockets Intel Xeon 8174 processors (Skylake architecture) and 96GB of memory per thin compute node and 768GB for the fat compute nodes.

The Skylake 8174 is an off-roadmap SKU, which is a derivative of the 8168 (see Table 7.1).

Table 7.1. Xeon 8168 and 8174 TDP and frequencies

SKU Cores LLC (MB) TDP (W) Base non-AVX Core Frequency (GHz) Max. non-AVX Core Turbo Frequency (GHz) all cores active Base AVX2 Core Frequency (GHz) Max. AVX2 Core Turbo Frequency (GHz) all cores Base AVX-512 Core Frequency (GHz) Max. AVX-512 Core Turbo Frequency (GHz) with all cores active
8168 24 33.0 205 2.7 3.4 2.3 3.0 1.9 2.5
8174 24 33.0 249 3.1 3.8 2.7 3.2 2.3 2.8

SuperMUC-NG has a peak performance of 22.4 PFlops and is ranked number 8 in November 2018 Top500 (Top500 Lists n.d.) with 19.5 PFlops. HPL power consumption of SuperMUC-NG is 4.0 MW when the 8174 is run with Turbo ON. As LRZ is focusing on energy-efficient computing, the 8174 will run in production at lower frequencies controlled by EAR (see section 6.3.2).

The Lenovo SD650 is the last generation of DWC node with inlet water temperature ranging from 20°C to 50°C and a heat-to-water ratio of about 85% depending of the ambient air temperature. SD650 cooling has been addressed in section 3.5. As discussed in sections 3.4 and 3.5, such a performance and power consumption per socket is only possible in a dense server with DWC.

7.2.2.4. LRZ PUE and ERE

Figure 7.6 presents the measured PUE of SuperMUC during the year 2015.

Image

Figure 7.6. SuperMUC PUE for 2015

On this graph, PUE peaks are due to the system maintenances, which lead to a much lower IT power consumption while the cooling energy is about constant. On average for the year, PUE has been around 1.16. It should be noted that due to the efficiency of the dynamic UPS systems alone the PUE cannot be better than 1.10.

Figures 7.77.9 present Cool lMUC-2 power consumption, its heat transfer to the adsorption chiller and the cold water produced based on the hot water outlet temperature of 50°C during one week in 2016.

Image

Figure 7.7. CoolMUC-2 power consumption in 2016

Image

Figur 7.8. CoolMUC-2 heat transfer to the absorption chiller in 2016

Image

Figure 7.9. CoolMUC-2 cold water generated by the absorption chiller in 2016

On average CoolMUC-2 power consumption was 121.5 kW with a maximum of 143.5kW and minimum of 106.9 kW. This is what we would we expect from a 384 nodes system based on two sockets 2697v3 (145 WTDP), 64GB of memory and one Mellanox FDR adapter, knowing it is expected HPL performance consumption for a water cooled configuration is 147 kW. On average, the heat output into the hot water loop was 90 kW (heat-to-hot-water = 74%) and the average cold water generated by the adsorption chillers was 47.7 kW. Therefore, during this period, COP = 0.53 [3.2], ERE = 0.36 [I.3] and ERE_DC = PUEERE_DC = 0.71 [7.4].

Figure 7.10 presents the power consumption, heat transfer of the hot water loop (HT) to the adsorption chiller, the cold water produced (LT) and the electricity saved by the adsorption chillers by CoolMUC-2 during summer 2017 based on the hot water outlet temperature of 50°C.

On average CoolMUC-2 power consumption during summer 2017 was 142 kW, average heat output into the hot water loop was 92 kW (heat-to-hot-water transfer = 65%) and the average cold water generated by the adsorption chillers was 54 kW. The operation of the adsorption chillers saved on average 18 kW of electricity, which is computed as the electricity LRZ would have spent to generate the chilled water with the compression chillers minus the electricity spent on the adsorption chillers, its pumps and the cooling tower leading to a total of around 6 kW. Therefore, during this period, COP = 0.58 [3.2], ERE = 0.51 [I.3] and ERE_DC = PUEERE_DC = 0.77 [7.4].

Image

Figure 7.10. CoolMUC-2 operations 05/2017–09/2017.For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

A detailed analysis of the impact of adsorptions chillers on ERE and TCO is discussed in the following section.

7.3. Impact of cooling on TCO with no waste heat reuse

We saw earlier that electricity price has a linear impact on the energy cost. In this section, we will analyze how cooling affects TCO by comparing the TCO of different cooling solutions with no waste heat reuse. We will use a similar methodology to the one used by Demetriou et al. (2016).

As all the solutions will be based on the same IT equipment, CAPEX will include only the acquisition cost of the cooling infrastructure (CRAH, RDHX, chillers, etc.) and the cooling cost of the servers (fan, heat sink, cold plate, etc.). Therefore, TCO will be very different if the data center is already equipped with cooling devices or not. A Brownfield data center is an existing data center already equipped with a cooling infrastructure, designed for air cooling since it is the defactor standard today. A Greenfield data center is a new data center being built where no cooling infrastructure has been installed. In such a situation, the cost of all cooling devices will be factored, while for a Brownfield data center air-cooling will have some obvious advantage.

For a given configuration and cooling design, total energy is computed by the addition of IT equipment energy plus the respective cooling energy. The cooling energy considers the PUE of the different cooling technologies. Based on the different ratios fair, frdhx and fdwc [7.14–7.16] and the number of hours in a year, an air-cooled system can use free cooling, an overall PUE is calculated [7.17] and energy cost is calculated according to [7.9] with the appropriate electricity price.

The financial analysis uses standard discounted cash flow methods (Park 2012) to compute the acquisition costs. An incremental analysis is used to compare each technology to a traditional air-cooled design. We compute the acquisition cost of each components using Table 7.2 and its annual cost An using the following capital recovery formula:

[7.20]

where P0 is the present cost at installation time, n is the number of years (1–5) and i is the discount rate per year, which is set to 5%. Electricity price is set constant over the 5 years.

Table 7.2. Acquisition Costs of the different cooling components

Cooling Infrastructure Costs (excludes installation, service, plumbing, external pumps, etc..)
Chiller (0.5kW/Ton) $120 per kW
Tower, Evap cooler $36 per kW
Dry-cooler $120 per kW
In-Row Cooler CW $600 per kW
RDHX $241 per kW
CDU $100 per kW
CRAH $270 per kW
CRAC $486 per kW
     
Node Cooling Costs (fan, heat sink, cold plate, manifold ..)
Air $54 per node
DWC $294 per node
Hybrid $204 per node
RDHX $54 per node

It should be noted that our installation cooling infrastructure costs are realistic but excludes installation, service, plumbing and external pumps, which have a financial impact but are too much data center dependent to be taken into account here. That is why a specific TCO analysis will have to be done before making a decision on which technology to use for a data center.

The IT equipment we consider is composed of 32 racks with 72 nodes per rack. Each node has two central processing unit (CPU) sockets, 12 DIMMs, one HDD and one high-performance network adapter, very similar to the nodes we have used in Chapter 5 to measure power and performance of workloads. In our analysis, the base TDP SKU will be 150 W but some variations will be calculated with 200 W and 250 W TDP SKU. With a 150 W TDP SKU, DC node power is assumed to be 335 W for an air-cooled node, which leads to 768 kW for the 32 racks. This node power consumption, as we have seen in section 5.1.6, is representative of the power consumption of real workloads on a two-socket node with a 150 W TDP SKU. We assume, as did Demetriou et al. (2016), that the cooling technology will only affect the power consumption of the servers and not their performance, which is a choice the data center can select as described in Chapter 4.

We compare four different data center cooling designs based on three different cooling technologies of the IT equipment.

The three cooling technologies for the IT equipment are air cooling (Air), direct water cooling and hybrid cooling (Hybrid).

In an air-cooled node, 100% of the heat load is removed to air using the nodes internal fans. The maximum inlet air temperature is 25°C since at this temperature the processor has the same performance as the DWC node. The water-cooled (DWC) node is the server similar to the one described in section 3.5.5, where the most power consuming components of the node (CPU, memory, network device and voltage regulators) are directly cooled by water through conductive plate and heat pipes. With latest generation of DWC servers about 90% of the node’s heat is removed directly to water and 10% being rejected to the data center using the power supply fans. Latest generation DWC server can support up to 50°C and inlet air up to 25°C. In a hybrid-cooled node, as described in section 3.7.3, water removes only the CPU heat with a conductive plate. With Hybrid cooling, about 60% of heat goes to water and 40% to the data center air. We assume the thermal characteristic of DWC servers result in about 10% reduction of the overall node power consumption compared to an equivalent air-cooled node with 7% due to the removal of internal fans and 3% due to the node power reduction resulting from a decreased processor leakage power. Under the same assumptions, the power consumption of the hybrid node is reduced by 6% due fan power and power leakage reductions.

All the above parameters are used for illustrative purposes since they will vary depending on the workloads, the specific processor, the node and data center characteristics.

The four data center cooling designs are an air cooled (Air), a DWC, a hybrid cooled (Hybrid) and an extension of air cooled called RDHX.

In an air-cooled data center (Air), CRAH are supplied with 10°C chilled water from a water-cooled chiller. A plate and frame heat exchanger allows for free cooling when the cooling tower supply water is lower than the data center return water temperature. The Hybrid cooling data center uses a combination of low-density rear door exchanger (RDHX) capable of removing 15 kW for the heat going to air and direct cold plate cooling for the heat going to water. RDHX are supplied via cooling distribution units (CDUs). CDUs are feed via a compression chiller with 10°C supply. The cold plate removes the majority of the load and is fed using the return water from the RDHX. In a DWC data center, evaporative coolers provide chilled liquid to the CDUs, which remove 90% of the IT equipment heat. Water-cooled direct expansion (DX) in-row cooling units cool the 10% of IT equipment heat rejected to the data center. Forth scenario (RDHX) is similar to the air-cooled design except RDHX are introduced on all racks to extract the heat in place of less efficient CRAH units. These RDHX are fed with CDUs, which receive 10°C water from compression chillers. Water economizer operation is provided using plate and frame heat exchangers.

In our analysis of the different data center cooling designs, the individual PUEs extracted from Table 7.3 will be used. We take these values as examples, and other data centers will have higher or lower individual PUEs, which will impact the overall PUE of the cooling design.

Table 7.3. Individual PUEs of the different cooling solutions

Air RDHX DWC free cooling
individual PUEs 1.6 1.3 1.1 1.06

The set of figures in the following sections present the percentage of incremental cumulative discounted cash flow over 5 years for each of the cooling solutions (DWC, Hybrid and RDHX) with respect to air cooling. A negative value means air cooling is a better financial alternative. A positive value means DWC, Hybrid or RDHX design is a better choice. When the DWC, Hybrid or RDHX curve crosses the 0% value, it shows the number of years needed for the solution to pay back versus an air-cooled solution.

Figure 7.11 presents the impact of electricity on project payback with three different electricity prices ($0.10, $0.15 and $0.20 per kWh), 20% of free air-cooling ratio and a 150 W processor TDP. Figure 7.12 presents the impact of processor TDP with an electricity price set to $0.15/kWh and the same free air-cooling ratio of 20%. Figure 7.13 presents the impact of three free air-cooling ratios (10%, 20% and 30%) on the project payback with an electricity price of $0.15/kWh and a 150 W processor TDP.

7.3.1. Impact of electricity price on TCO

Tables 7.4 and 7.5 present the overall PUE [7.10] of each cooling solution, the cooling infrastructure cost, the servers cooling cost, the total installation cost and the energy cost after 5 years.

In a Greenfield data center (see Table 7.4), for RDHX the installation cost is about the same as Air and the 5-year energy costs are much higher than the total installation cost at every electricity price. For Hybrid and DWC, the cooling infrastructure cost is lower than Air and RDHX, while the server cooling costs are much higher. The 5-year energy costs are much higher than the total installation costs only for high electricity prices ($0.15/kWh and above).

Table 7.4. Greenfield costs and PUE of the different designs and electricity price

Greenfield Air RDHX Hybrid DWC
overall PUE 1.49 1.25 1.16 1.12
Cooling infra cost $326,653 $381,497 $238,714 $171,161
Servers cooling cost $124,416 $124,416 $470,016 $677,376
Total Installation cost $451,069 $505,913 $708,730 $848,537
5 year Energy cost at $0.20 $10,035,362 $8,421,095 $7,339,213 $6,750,866
5 year Energy cost at $0.15 $7,526,521 $6,315,821 $5,504,409 $5,063,150
5 year Energy cost at $0.10 $5,017,681 $4,210,547 $3,669,606 $3,375,433

In a Brownfield data center (see Table 7.5), the cooling infrastructure cost for Air is zero and therefore much less than for RDHX, Hybrid or DWC. Server cooling costs are identical to Greenfield. The energy costs are the same for Brownfield and Greenfield data centers, which are proportional to their respective overall PUEs.

Table 7.5. Brownfield costs and PUE of the different designs and electricity price

Brownfield Air RDHX Hybrid DWC
overall PUE 1.49 1.25 1.16 1.12
Cooling infra cost $0 $261,937 $193,759 $171,161
Servers cooling cost $124,416 $124,416 $470,016 $677,376
Total Installation cost $124,416 $386,353 $663,775 $848,537
5 year Energy cost at $0.20 $10,035,362 $8,421,095 $7,339,213 $6,750,866
5 year Energy cost at $0.15 $7,526,521 $6,315,821 $5,504,409 $5,063,150
5 year Energy cost at $0.10 $5,017,681 $4,210,547 $3,669,606 $3,375,433

Figure 7.11 shows that for a Greenfield data center, all three cooling solutions (RDHX, Hybrid and DWC) present a positive payback compared to Air after only 1 year regardless of the electricity price, except for DWC at $0.10/kWh. When electricity price is $0.15/kWh, RDHX has a 10% payback ratio after 1 year while DWC and Hybrid have a better payback than RDHX after 2 years. At $0.20/kWh, DWC provides a 20% payback ratio after 5 years. For a Brownfield data center, due to the low installation cost of Air, when electricity price is $0.10/kWh, RDHX has a positive payback after 3 years, Hybrid after 5 years and DWC has no financial benefit even after 5 years. At $0.20/kWh, RDHX has a positive payback after 1 year and a payback ratio of 9% after 5 years, while Hybrid and DWC have a positive payback after 2 years and a payback ratio of 13% and 15% after 5 years.

7.3.2. Impact of node power on TCO

Another important factor for TCO is the power consumed by the IT equipment and what will happen as servers use higher TDP SKU.

Figure 7.12 presents the same analysis as presented in Figure 7.11 with a constant price of electricity of $0.15 kW/h and three different node powers (333, 413 and 493 W) representing the typical power consumption of a workload running on the two-socket node we described earlier with a processor TDP of 150, 200 and 250 W.

Figure 7.12 shows that higher power consumption improves the attractiveness of DWC and Hybrid for both Greenfield and Brownfield data centers. For Greenfield data centers, DWC and Hybrid have positive payback ratios after 1 year and which increase up to 20% after 5 years for SKU TDP of 200 W and higher. For Brownfield data centers, with a 200 W TDP SKU, all three cooling solutions have a positive payback after 2 years while DWC and Hybrid have payback ratios of 10% after 5 years. With a 250 W SKU TDP, DWC and Hybrid have a payback ratio of 13% and 15% after 5 years. We note the SKU TDP has nearly no influence on RDHX payback versus Air.

Image

Figure 7.11. Impact of electricity on project payback. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Image

Figure 7.12. Impact of SKU TDP on project payback at $ 0.15/kWh electricity price. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Table 7.6 shows that for a Greenfield data center, the cooling infrastructure costs increase with the SKU TDP as more cooling equipment has to be installed. It shows energy cost is also increasing with SKU TDP to a point where energy cost overwhelms the installation cost.

Table 7.6. Greenfield costs and PUE of the different designs and SKU TDP

Greenfield Air RDHX Hybrid DWC
overall PUE 1.49 1.25 1.16 1.12
Cooling infra cost at 250W $483,483 $564,659 $353,324 $253,337
Servers cooling cost $124,416 $124,416 $212,312 $677,376
Total Installation cost at 250W $607,899 $689,075 $565,636 $930,713
5 year Energy cost at 250W $11,140,093 $9,348,121 $8,147,141 $7,494,027
5 year Energy cost at 200W $9,333,307 $7,831,971 $6,825,775 $6,278,588
5 year Energy cost at 150W $7,526,521 $6,315,821 $5,504,409 $5,063,150

7.3.3. Impact of free cooling on TCO

Figure 7.13 presents the same analysis as presented in Figure 7.11 with a constant price of electricity of $0.15 kW/h, a 333 W node power (corresponding to the typical power of an air-cooled node with a 150 W SKU TDP) and three different free air-cooling ratios of 10%, 20% and 30%, which represent the ratio of number of free cooling hours per year versus the total number of hours per year (8,670 h).

As shown in Table 7.7 and Figure 7.13, free-air cooling’s impact depends greatly on the data center cooling design. It has a big impact on the Air design, a bit less on the RDHX design, much less on the Hybrid design and close to marginal on the DWC design. This is due to the fact that with the Air and RDHX cooling designs, 100% of heat is extracted to air and with a high PUE (respectively, 1.6 and 1.3), which is higher than the free-cooling PUE (1.06). On the contrary, the DWC design has only 10% of heat extracted to air and 90% of heat is extracted with a PUE of 1.1 comparable to the free-cooling PUE. The Hybrid design is in between with 60% of heat extracted by water and 40% with RDHX.

Image

Figure 7.13. Impact of free cooling ratio on project payback at $0.15 electricity price. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Table 7.7. Greenfield costs and PUEs of different designs and free air-cooling ratios

Greenfield Air RDHX Hybrid DWC
overall PUE at 10% 1.55 1.28 1.17 1.12
overall PUE at 20% 1.49 1.25 1.16 1.12
overall PUE at 30% 1.44 1.23 1.15 1.11
Cooling infra cost $326,653 $381,497 $238,714 $171,161
Servers cooling cost $124,416 $124,416 $470,016 $677,376
Total Installation cost $451,069 $505,913 $708,730 $848,537
5 year Energy cost at 10% $7,798,929 $6,436,891 $5,549,932 $5,074,046
5 year Energy cost at 20% $7,526,521 $6,315,821 $5,504,409 $5,063,150
5 year Energy cost at 30% $7,254,114 $6,194,751 $5,458,887 $5,052,253

This explains why air-cooled data center can be very efficient in terms of overall PUE and TCO when they are located in geographies with a large number of free air-cooling hours per year and when the air PUE is low due to a careful design of the data center. That is the case with the data centers of cloud providers such as Microsoft, which can achieve overall average PUE of ~1.25 with air-cooled data centers (Data Center Knowledge 2010), and with the Google data centers, which can achieve overall average PUE of ~1.15 (Google n.d.).

In geographies where the outside environmental conditions do not provide a large number of free air-cooling hours, other cooling design than Air may have to be used like RDHX or “Hot Hut” (Kava 2012), which provide a much better individual PUE than CRAH. A complementary approach is to increase the room temperature up to 27–35°C to maximize the use of free air-cooling.

Under such circumstances, the DWC and Hybrid designs show little benefit even though the payback can be slightly positive with a Greenfield data center and more positive as the electricity price is increasing. This conclusion will change when waste heat reuse is taken into account as we will see in the following section.

7.4. Emerging technologies and their impact on TCO

We present now the impact of emerging technologies on TCO. The two technologies we discuss are waste heat reuse and electricity generation. They both address a different aspect of the power cycle and therefore can be complementary.

7.4.1. Waste heat reuse

We discussed in section 3.7 the different ways to reuse waste heat. We will focus here on the PUE and TCO impact of reusing waste heat to produce cold water with adsorption chillers as described in section 3.7.2 and with the CoolMUC-2 use case presented in section 7.2.2.3.

To evaluate the impact of such a technology on TCO, we reuse the same framework as we did in section 7.3 and instead of comparing DWC, Hybrid and RDHX designs to the Air design, we compare DWC, DWC with adsorption chillers (which we will call DWC + ERE) and RDHX designs to the Air design.

7.4.1.1. Ideal scenario of hot water reuse

Figures 7.15 presents the relative payback ratios for the scenario DCW + ERE with a 150 W SKU TDP, a free cooling ratio of 20% and various electricity prices ($0.10/kWh, $ 0.15/kWh and $ 0.20/kWh) as in Figure 7.13 but with a COP of 50%. We call it the “ideal scenario” for reasons we will explain later.

Energy cost with reused energy is calculated according to [7.18] and the energy reused by the data center is calculated by:

[7.21]

where COP as defined by [3.2] is set to 50% and Energy_hotwater is calculated as 90% of the IT power of a node (333 W × 0.90) times 72 × 32 nodes leading to 622 kW of Energy_hotwater and 311 kW of reused Energy_DCreused.

Table 7.8 presents the costs and overall PUEs of the different cooling designs. It should be noted that for DWC + ERE design, PUE is PUEERE_DC = 0.62 and equal to ERE_DC.

Image

Figure 7.14. Impact of hot water reuse on project payback and different electricity price. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

Table 7.8. Greenfield costs and PUE of the different designs

Greenfield Air RDHX DWC+ERE DWC
overall PUE 1.49 1.25 ERE=0,62 1.12
Cooling infra cost $326,653 $381,497 $706,469 $171,161
Servers cooling cost $124,416 $124,416 $677,376 $677,376
Total Installation cost $451,069 $505,913 $1,383,845 $848,537
5 year Energy cost at $0.20 $10,035,362 $8,421,095 $3,754,382 $6,750,866
5 year Energy cost at $0.15 $7,526,521 $6,315,821 $2,815,787 $5,063,150
5 year Energy cost at $0.10 $5,017,681 $4,210,547 $1,877,191 $3,375,433

Figure 7.14 shows that, although DWC + ERE installation costs are high, the payback is important. For a Greenfield data center, even at low electricity price of $0.10/kWh, payback happens after 3 years but is barely better than RDHX after 5 years. When electricity price increases, payback is important and fast and the DWC + ERE design has the best payback of all designs. For a Brownfield data center, at low electricity price a slight positive payback is reached after 5 years. With an increased electricity price, the DWC + ERE payback is important and fast with a 13% payback ratio after 2 years and 31% payback ratio after 5 years.

As said earlier, we should emphasize that installation costs do not take into account installation service and plumbing costs that can be important, shifting by 1 or 2 years the payback return our analysis is showing.

Finally, it should be noted that with this DWC + ERE scenario we assume that Energy_DCreused = Energy_produced meaning all the energy produced is exactly reused by the data center. This energy is used either for cooling the building or for cooling the 10% of heat going to air from the DWC servers since there is no air-cooled equipment such as air-cooled servers, storage or networks in this configuration.

This is not the usual situation in a data center and why we call it an “ideal” scenario. We will analyze a more realistic scenario in the following section with a mix of water-cooled and air-cooled IT devices.

7.4.1.2. Realistic scenario of hot water reuse

In this scenario, the configuration is composed of the same DWC servers as above with additional air-cooled IT devices, which consume 20%, 30% and 40% of the DWC servers power consumption leading to a configuration with a total power consumption of 1.2, 1.3 and 1.4 times the power of the configuration analyzed in the previous section. To simplify the calculation of the servers cooling cost, we assume the air-cooled devices are storage. The other parameters are unchanged: electricity price of $0.15/kWh, free air-cooling ratio of 20%, COP of 50%. We compare the Air, RDHX, DWC and DWC + ERE designs. The Air design uses CRAH to extract the heat going to air with a PUE of 1.6. The RDHX design uses RDHX with a PUE of 1.3 to extract the 100% of heat going to air. The DWC design uses RDHX to extract the 10% of heat going to air of the DWC servers and 100% of the heat of the air-cooled devices. The DWC + ERE design uses the same cooling as the DWC design but reuses the hot water produced by the DWC servers to produce by adsorption chillers the cold water needed to cool all the air-cooled devices.

Table 7.9 presents the overall PUE of the different cooling designs with the various air to water power ratio configurations (20%, 30% and 40%); the cold-water energy produced, needed and used by each configuration of the DWC + ERE cooling design; and the corresponding ERF value.

Table 7.9. Cold water balance and PUE of the different power ratio configurations

over all PUE
AIR vs DWC power ratio 20% 30% 40%
DWC 1.14 1.15 1.16
RDHx 1.25 1.25 1.25
Air 1.49 1.49 1.49
DWC + ERE 0.80 0.78 0.81
Cold water balance
produced kW 311 311 311
needed kW 289 389 489
used kW 289 311 311
yearly Energy Cost reused $380,362 $449,473 $449,473
Energy Recovery Factor (%) 30.1% 32.3% 29.6%

We note that with an air to water power ratio of 20%, the DWC servers produce more cold water than the capacity needed by the air-cooled devices (311 kW vs. 289 kW). With air to power ratios of 30% and 40%, the DWC servers produce less cold water capacity than needed (respectively, 311kW vs. 389kW and 311 kW vs. 489 kW) leading to a constant energy cost reused value. The overall PUEs of the Air and RDHX designs are unchanged versus Table 7.8, since these cooling designs have no energy reuse. The only difference between the ideal and realistic scenario for the Air and RDHX designs is found in Table 7.10 where we see an increased total power and energy cost versus Table 7.8. For the DWC design, the overall PUE is increasing with the air to water power ratio since the amount of heat to air is increasing and cooled by RDHX with a PUE of 1.3. For the DWC + ERE design, ERE_DC (which is equal to PUEERE_DC) is less than 1 but varies depending on the air to water power ratios. With a power ratio of 20%, the servers produce more cold water than needed while with power ratios of 30% and 40%, they produce less cold water than needed. The optimal ERE_DC value of 0.77 would be achieved with a power ratio of 23% when cold water generated is equal to the cold water needed. This optimal ERE_DC value of 0.77 for the realistic scenario is below the ERE_DC value of 0.63 of the ideal scenario (see Table 7.8) since for the same amount of cold water generated by the DWC servers there are additional air-cooled IT devices while there were none in the ideal scenario. This behavior is shown by the PUE, which varies between 1.14 and 1.16 in Table 7.9 while PUE is 1.12 in Table 7.8. ERF as defined in [7.2] is computed by dividing the 5-year energy cost reused in Table 7.9 by the 5-year energy cost of the DWC design in Table 7.10. ERF value is quite high and varies with the PUE. ERF is maximum with the 30% air to power ratio and the ERE_DC of 0.78. We note that ERE_DC (which is equal to PUEERE_DC), PUEDWC and ERF verify [7.1].

Table 7.10 presents the cooling infrastructure cost for a power ratio of 40%, the servers cooling cost and the energy cost after 5 years with power ratios of 20%, 30% and 40% for each cooling design for a Greenfield data center as presented in Table 7.8.

Table 7.10. Greenfield costs of the different designs

Greenfield Air RDHX DWC+ERE DWC
Cooling infra cost at 40% $457,314 $534,096 $859,068 $323,759
Servers cooling cost $124,416 $124,416 $677,376 $677,376
Total Installation cost $581,730 $658,512 $1,536,444 $1,001,135
5 year Energy cost at 40% $10,537,130 $8,842,149 $5,342,115 $7,589,478
5 year Energy cost at 30% $9,784,478 $8,210,567 $4,710,533 $6,957,896
5 year Energy cost at 20% $9,031,826 $7,578,985 $4,424,505 $6,326,314

In Table 7.10, we note a significant difference of costs between the DWC and DWC + ERE designs due to the additional cost of adsorption chillers.

From an energy saving perspective, Table 7.10 shows that the energy savings between the DWC + ERE design and the Air design for all power ratios are around 50%, and are around 40% compared with the RDHX design.

Figure 7.15 presents the same relative payback ratio over time of RDHX, DWC and DWC + ERE versus Air as in Figure 7.14 but with a varying air to water power ratio.

Although the realistic scenario is financially less attractive than the ideal scenario with a lower payback after 5 years, the DWC + ERE design provides a rather important relative payback ratio of 22%, 25% and 24% for a Greenfield data center and of 15%, 18% and 17% for a Brownfield data center with the 20%, 30% and 40% power ratios. It shows also the DWC + ERE design has a much better payback than the RDHX design for Greenfield and Brownfield data centers.

Image

Figure 7.15. Impact of hot water energy reused on project payback with various air to water power ratios, $0.15/kWh electricity price, 20% free air-cooling ratio and 50% COP. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

7.4.2. Renewable electricity generation

So far, we discussed how to minimize the total power consumption and how to reuse waste heat generated by a data center, which are essential to minimize the data center energy cost. Once we reduced the energy needed, it is important to look for solutions to reduce the amount of electricity needed to power the system. This section presents how renewable electricity generation can be used to produce “free” electricity for the data center through photovoltaic panels (PV) and wind turbines.

In section 7.2.1, we analyzed how NREL is reducing its data center load by optimizing the design of the RSF air-cool data center and reusing waste heat. NREL has also conducted other research in the area of renewable electricity generation.

Hootman et al. (2012) describe an expansion of the RSF data center to build a net-zero energy high-performance building based, among other technologies, on PV located on the RSF and RSF expansion roofs as well as the RSF visitor and staff parking roofs. The total PV size (theoretical power capacity) is 2.5 MW. NREL built a model of the energy generated by such PV over 1 year: 3.4 GWh per year with a split of 606 MWh for RSF roof, 551 MWh for the RSF expansion roof, 707 MWh for the visitor parking and 1,560 MWh for the staff parking. A comparison with the measured energy shows the error is just a few percent. As the RSF data center has an energy consumption of 935 MWh per year (Sheppy et al. 2011), this shows PV electricity generated by the RSF roof itself would cover about two-thirds of RSF data center energy needs while adding the PV energy produced by the visitor parking (707 MWh) would cover 100% of the data center energy.

But as shown in Figure 7.1, NREF data center energy needs are not constant over the year. Similarly, PV production varies a lot depending on the season and the time of the day while the data center consumption is about constant. Table 7.11 summarizes the RSF data center power consumption variations and the variations of the energy produced by NREL PVs installed on the RSF data center roof.

Table 7.11. RSF data center power consumption and PV power production

RSF DC Min Power consumption (kWh) RSF DC Max Power consumption (kWh) Golden, CO PV efficiency RSF PV Avg. Power production (kWh) RSF PV Min Power production (kWh) RSF PV Max Power production (kWh)
90.0 130.0 15.7% 69.2 11.3 116.3

RSF power consumption estimations are based on Figure 7.1. RSF PV power production is based on NREL measured data, including the PV efficiency (ratio of peak vs. produced kWh), while the minimum and maximum have been estimated using a simple tool (Wattuneed n.d.). As expected, PV production varies considerably and if on average PV can cover two-thirds of RSF data center need, there are times when PV will cover only a 1/10 and times when it could cover 100% or more. Table 7.12 present the same results taking into account that all PVs installed on NREL RSF premises (RSF roof, RSF visitor parking, RSF staff parking) are used at the exception of the PVs installed on the RSF expansion roof.

Table 7.12. RSF data center power consumption and all PV power production

RSF DC Min Power consumption (kWh) RSF DC Max Power consumption (kWh) Golden CO PV efficiency "All" PV Avg. Power production (kWh) "All" PV Min Power production (kWh) "All" PV Max Power production (kWh)
90.0 130.0 15.7% 328.1 53.4 551.5

With this PV capacity, there are periods when PVs produce more power than needed and also be periods when PVs will not produce enough capacity to power the data center. The following section will address the technologies to store excess PV energy and retrieve it to cover the shortage periods.

Wind Turbines are also another way to provide renewable energy for data centers (Dykes et al. 2017). With this program called “System Management of Atmospheric Resource through technology” (SMART), NREL researchers are able to accurately model the behavior of wind flow into and through a wind plant at a level of resolution that illustrates the full flow physics. Scientists will apply supercomputing to high-fidelity physics models (HFMs) of complex flows and will use “Big Data” along with data science to manage extensive measurements that provide formal validation of supercomputing models.

Wind turbine companies like Vestas are also using “Big Data” to model the effectiveness of a wind turbine to select the location where wind turbines will be built and then manage their production in real time (Vestas n.d.).

The latest example is a new data center built by Google in Denmark (Kava 2018) using at large scale similar techniques as NREL’s RSF data center in order to reduce PUE and use 100% carbon-free electricity, which will come from onshore and offshore wind and PVs.

7.4.3. Storing excess energy for later reuse

Electrolysis (Energy.gov n.d.) can produce hydrogen from water when DC current is applied causing water to split into oxygen and hydrogen:

[7.22]

A fuel cell is an electrochemical cell that converts the energy from a fuel (like hydrogen) into electricity through an electrochemical reaction of the hydrogen fuel with oxygen. A proton exchange membrane (PEM) fuel cell (PEMFC)1 is a specific type of fuel cell using PEM as electrolyte with a response time of about 1 s, which makes it usable in the context of energy generation in a data center. Figure 7.16 presents the schematics of a PEMFC.

Image

Figure 7.16. PEMFC diagram. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

On the anode side, hydrogen is split into protons (H+) and electrons (e). The protons permeate through the polymer electrolyte membrane to the cathode side. The electrons travel through the external circuit to the cathode side. At the cathode side oxygen molecules react with the protons permeating and the electrons arriving through the external circuit to form water molecules [7.22].

Zao et al. (2014) show that a 10 kW PEMFC can reliably be used to power servers in a data center with efficiency between 40 and 60%. Duan et al. (2019) present a reversible protonic ceramic electrochemical cell with above 97% overall electric-to-hydrogen energy conversion efficiency and a repeatable round-trip efficiency (electricity-to-hydrogen-to-electricity) efficiency of above 75% and stable operation.

Therefore, H2 electrolysis coupled with PEM is becoming an effective way to use excess electricity to produce hydrogen through water electrolysis, which can be stored and used later to produce electricity as shown in Figure 7.17.

Image

Figure 7.17. Storing and reusing excess energy with PEMFC. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

7.4.4. Toward a net-zero energy data center

The different technologies we presented in this book can minimize the data center total energy consumption by reducing both the IT energy and the cooling energy. By introducing renewable energy in the data center and storing excess energy for later reuse, solutions exists that can lead to net-zero energy data centers.

Figure 7.18 presents a schema of a carbon free data center with a net-zero energy goal. For the sake of simplicity, it does not draw the N+1 generators, UPS, static switches and batteries to provide uninterruptable power.

In this data center, servers are hot-water cooled and waste heat generated is reused by adsorption chillers to produce cold water for the data center air-cooled devices. As we have seen, others with low PUE solutions exist but they do not address waste heat reusability within the data center. Such a solution does minimize the data center energy needs.

Image

Figure 7.18. Toward a net-zero energy data center. For a color version of this figure, see www.iste.co.uk/brochard/energy.zip

PVs and/or wind turbines are used in conjunction with PEMFC to provide as much as possible renewable energy to the data center. As it is unlikely that the local grid will produce 100% of the data center energy 100% of the time, coexistence of classic electrical power supply, distributed power generation and the associated AC/DC/AC converters are still needed. In a perfect net-zero energy data center, those AC/DC/AC converters would be removed.

  1. 1 Available at: https://en.wikipedia.org/wiki/Proton-exchange_membrane_fuel_cell#Efficiency [Accessed April 30, 2019].
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset