Open access peer-reviewed chapter

The Performance and Characteristics of the Cooling System of Processors in Data Centres

Written By

Raksha Manel Shenoy

Submitted: 19 January 2023 Reviewed: 25 January 2023 Published: 27 March 2023

DOI: 10.5772/intechopen.1001300

Chapter metrics overview

58 Chapter Downloads

View Full Metrics

Abstract

In this contemporary world, technology is advancing at a rapid pace. In recent decades, information technology has grown exponentially. Improving data centre energy efficiency is an urgent issue with remarkable economic and environmental impacts. Enormous heat generation is fuelled by data centres. The distribution of heat is a crucial parameter that affects data centre cooling and energy consumption. Heat reduction is the crux of environmentally sustainable computing. Typically, data centres lack in effective sensing systems to monitor heat distribution at a large scale. In this paper, I aim to use sensor networks as a dense instrumentation technology to comprehend and control cooling in data centres. I present the Aquasar project, as a case study and explore environmental challenges in developing sensor networks in data centres. I also explore the effective way of dealing with the excess heat from multiple servers in the data centres. And the upper hand of water cooling over air cooling.

Keywords

  • water-cooling
  • air cooling
  • data centre
  • processor
  • Aquasar

1. Introduction

Green computing is the survey and methodology of naturally renewable computing. It encompasses planning, fabricating, utilizing and getting rid of computers, servers, and related subsystems such as screens, printers, storage drives and organizing the communications frameworks productively and viably with negligible or no effect on the environment, as in [1]. Data centres, which have been criticized for their extraordinarily high energy demand, are a primary focus for exponents of green computing. E-waste refers to the waste, unusable electronic and electrical goods. All electronic items have generally elements such as lead, beryllium, etc., which, if untreated, may cause harmful effects. Thus e-waste treatment is a very important part of environmental management. To make matters worse, e-waste generation has been growing exponentially over the past few years. The main culprits are the data centres. Data centres are places with many computers serving as servers or storage devices. According to StEP (Solving the E-waste problem) initiative, data centres are responsible for 60% of the e-waste coming into the world. Based on current trends, E-waste will grow from 48.9 million metric tonnes in 2013 to 65.4 million tonnes in 2017; additional information can be found in section [2]. Their e-waste generation has alarmed environmental enthusiasts, and they need to be given attention.

The e-waste is generated mainly because either they go for up gradation or complete renewal. The main e-waste generated is that of processors. Since they have the least lifetime and the most amount of work, they get wearer easily and thus require a lot of attention. Processors have a property of over-clocking, which is used to get higher speeds at times, which generates a lot of heat, since the processor goes beyond its capacities and provides a higher speed. This heat generation almost halves the processor life, as given in section [3]. The main reason why processors are thrown is their degradation due to heat and dust. The processor lifetime is about 3 years; due to cooling, I can upgrade it to about 6 years. Thus, the data centre, which was throwing away processors every 3 years, will now throw away the same processors after 6 years, thus is reducing the e-waste by a direct factor of half. In this way, it will help to “reduce” e-waste.

The method discussed here, as the case study requires an extensive use of copper, therefore giving a chance for the e-wastes containing copper to get used up again. Thus, I am “reusing” the e-waste components.

Advertisement

2. Air cooling

Figure 1 is a pictorial representation of the cross-section of a data centre room which is also called server collocation (colo for short). Racks are installed on the elevated floor in aisles. Computer room air conditioning (CRAC) systems supply cool air to the subfloor. To make cool air available to servers, some floor tiles are holed, serving as vents. The aisles, along with these vents, are called cold aisles. Usually, cool air is drawn from the front, and hot exhaust air is blown to the back – hot aisles. Servers are arranged face to face adjacent to the cold aisles for the efficient use the cool air. As shown in Figure 1, cool air and hot air are ultimately mixed near the ceiling and is drawn into the CRAC where this mixed air exchanges heat with chilled water. For regulating the temperature to a set point, as in [4] chilled water valve opening is controlled, and a temperature sensor is located at the intake of the CRAC. Air cooling was the technique used initially in the data centres.

Figure 1.

Cross-Section of colo as in [4].

Cooling was done by using air in most of the data centres for practical reasons. Air is abundant, it normally poses no threat to humans or equipment, it is a bad conductor of electricity, it’s easy to propagate and it’s free of cost. But air also falls short on several counts; for example, the thermally efficiency of air is less (i.e., it does not hold much heat relative to liquids), thus cooling of high-density implementations is not feasible, details of which are provided in [5]. And moreover, the cold air may just pass between the aisles instead of within the server. That is how air cooling precisely fails. In order to increase efficiency, most of the companies are using water (or some other liquids with similar properties as of water) and provide more-targeted cooling to achieve high-power computing and similar conditions. Water, mainly because it is cheap, readily available and has the highest specific heat in comparison with other liquids.

Advertisement

3. Water cooling

The power loadings are incorporated in the novel designs of data centres. Other cooling methods, like phase change liquid systems or single-phase forced liquid flow, are now being used [6, 7]. Using liquid cooling systems can result in significant reductions in the overall cooling energy need, as mentioned by Greenberg et al. [8].

The requirement for both Computer Room Air Conditioning (CRAC) units and chiller plants can be decreased or, in some circumstances, potentially eliminated using ambient free-air cooling and the use of water/air side economizers, resulting in significant energy savings. According to Brunschwiler et al. [9], water-cooled systems have modest temperature fluctuations throughout the system because of more effective heat transmission. The microprocessor junction temperature can be kept far below 85°C with a water flow rate of 0.7 L/min and an input temperature of 60°C, and if the chip is allowed to get close to 85°C, the intake water temperature can get as high as 75°C [9]. The waste heat produced by these higher temperatures for liquid cooling systems will be of greater quality, and energy absorption using the direct discharge liquid stream will be simpler. Chillers may not be necessary with the higher-temperature coolant, which lowers the energy consumption for the system. This technique places the cold plates as near as feasible to the parts that generate heat [9]. These systems’ thermal resistance, which can remove more than 200 W/cm2 of heat, is less than 20% of air-cooling systems’ thermal resistance [9].

The benefits of water-cooled systems over air-cooled ones are illustrated by a thorough comparison study comparing the energy efficiency of water-cooled and air-cooled high-density servers, carried out by Ellsworth and Iyengar [10]. According to their research, employing water cooling can boost processor performance by 33%. All energy-dissipating components, such as the CPU, memory, power conversion, and I/O circuits, are cooled by aluminum/copper tube cold plates in another water-cooling study by Ellsworth et al. [11]. In this instance, condensation was prevented by setting the cold plate supply temperature at least 7°C above the dew point. The effectiveness of a water-cooling system for an electronic module containing a 150 W dual-core CPU and an 8 W memory chip was evaluated by Campbell and Tuma [12]. According to their calculations, the maximum temperature of water entering the module must be higher than 28°C in order to maintain junction temperature at 65°C with a water flow rate of 0.95 L/min. In a dual loop chiller-less data centre that IBM recently developed and built, the cooling energy need was decreased from the industry standard of 45% of the total energy consumed by the data centre to just 3.5% [13, 14, 15]. 38 “warm water” cooled servers are housed in a single rack in this test-scale data centre. Each server has cold plates for the processors and memory modules, and recirculated air that has been pre-cooled by the water entering the servers is used to cool the remaining components.

The requirement for a CRAC unit is essentially nonexistent because the water loop removes most of the server heat dissipation. The cooling water from the rack is circulated internally in a loop with heat transfer to an external coolant loop, like a water-glycol solution. Using a dry cooler, heat is transferred from the closed external loop to the surrounding air without the use of make-up water, as is the case with wet cooling tower methods. The water flow rate varied from 4 to 8 GPM, and the rack power ranged from 13.4 to 14.5 kW [13, 14, 15]. To balance maximizing waste heat recovery with maintaining the thermal stability of the chip, Sharma et al. [16] found that the ideal water inlet temperature for cooling a microprocessor chip should be between 40 and 47.5°C at an ideal flow rate of 1 L/min.

Advertisement

4. Aquasar project

As an attempt to have efficient cooling methods, IBM developed this project, mainly for the benefit of supercomputers and data centres. IBM and the Swiss Federal Institute of Technology are working on a water-cooled supercomputer whose excess heat will be utilized to heat the university’s buildings. This project is IBM patented and unique since it uses warm water instead of cold water for cooling mechanism. Water, which is about 4000 times more efficient as a coolant than air, is made to enter the system at 60°C, keeping the chips in the system at operating temperatures below their maximum of 85°C. A high-grade heat (which in this case will be about 65°C as in [17]) is obtained as an output due to high input temperature.

The system makes use of jet impingement cooling, which involves direct contact of water with the back of the chip via microchannels in the heat sink. “This method neither causes the overhead in thermal resistance of a base plate, nor the overhead and reliability problem of thermal interface materials, and thus is responsible for removing highest-power densities,” according to one paper published by the scientists at IBM as in [17]. Figure 2 depicts the connection between the pipelines from the individual blades and the server rack’s water-pipe network, which is further connected to the main water transportation network. Aquasar will need about 10 liters of water for cooling, pumped at some 30 liters per minute as described in section [17].

Figure 2.

Water flows along copper pipes in a blade server used in the Aquasar supercomputer as in [17].

In a closed-circuit cooling system, chips are used to heat the water, which is subsequently cooled to the required temperature as it passes through a passive heat exchanger, distributing the discarded heat directly to be used later. However, water and electrical components are isolated from each other.

My work on this project was to focus on the performance and characteristics of the cooling system which will be measured with an extensive system of sensors, to optimize it further. I propose the following sensor algorithms.

Sensor Algorithms

Algorithm for temperature sensing in data centres

STEP 1: start

STEP 2: repeat STEP 3 until temperature belongs to optimal temperature range (depends on the processor used)

STEP 3: do nothing

[End of STEP 2 loop]

STEP 4: signal the OS

STEP 5: repeat STEP 6 until temperature > optimal temperature or temperature < optimal temperature

STEP 6: continue signaling the OS

[End of STEP 5 loop]

STEP 7: go to STEP 2

STEP 8: END

Note: [temperature < optimal temperature is checked to prevent undercooling of processors]

Algorithm for pressure sensing in data centres

STEP 1: start

STEP 2: repeat STEP 3 until pressure belongs to optimal pressure range exerted by the liquid flowing in the copper pipes.

STEP 3: do nothing

[End of STEP 2 loop]

STEP 4: signal the OS

STEP 5: repeat STEP 6 until pressure > optimal pressure or pressure < optimal pressure

STEP 6: continue signaling the OS.

[End of step 5 loop]

STEP 7: go to STEP 2

STEP 8: END

Advertisement

5. Sensor deployment

The sensors used are wireless as it enables real-time decisions to operate a data centre at peak efficiency. This real-time capability ensures streaming data packets to gateway devices that relay the data to software for visual display and interpreted for operational adjustments as in [18]. Temperature sensor is placed above the server cabinet (see Figure 3). Pressure sensor is placed between the pipes used to circulate the coolant to monitor the pressure of the liquid coolant flowing in the pipes.

Figure 3.

Stacked server racks with sensors as in [18].

When the temperature of the processor exceeds the allowed temperature (50–95°F) range, temperature sensor detects this and signals the operating system (OS), which in turn signals the motor to increase the quantity of water flowing in the pipes.

Overheating of the processor due to over-clocking is prevented.

When the temperature of the processor falls below the allowed temperature (below 50°F), the temperature sensor detects this and signals the operating system (OS), which in turn signals the motor to decrease the quantity of water flowing in the pipes. Overcooling of the processor is prevented.

Similarly, if the pressure falls below the required pressure (depending on the radius of the pipe, the nature of the liquid and several other factors), the pressure sensor detects this and signals the operating system, which in turn signals the motor to increase the water flow in the pipes.

If the pressure exceeds the required pressure, the pressure sensor detects this and signals the OS, which in turn signals the motor to decrease the water flow.

If the temperature and pressure sensors signal the motor directly, input to the motor is confusing. This results in error. There are four possible cases.

  • Temperature and pressure sensors signal the motor to increase the water flow.

  • Temperature and pressure sensors signal the motor to decrease the water flow.

  • Temperature sensor signals to decrease the water flow and pressure sensor signals to increase the water flow and vice versa.

Therefore, sensors should signal the occurrence of these events to the OS and not the motor. The whole point of an event is to notify a listener that something has happened to a part of the User Interface (UI). An event incorporates all the data that a listener ought to find out what took place and in which component it happened. The data should be whole and complete so that the listener can find out what happened precisely and respond accordingly [19]. OS controls the motor states via listeners, namely temperature and pressure. It is possible to create your own event listener method with appropriate commands to handle the event.

If the water requirement cannot be met as the processor executes too many processes, OS must kill some processes. Different algorithms can be used to choose the victim process.

Advertisement

6. Thermal-responsive server rearrangement to reduce thermal stress

An increment in inlet air temperature may decrease the rate of heat emanation from air-cooled servers. In this way, thermal stress is established in these servers. As a result, the hotspot locales of thermal stress originate due to the ineffectively cooled active servers that begin conducting heat to the adjacent servers. Thus, failure in the hardware of these servers may result in performance and financial loss consuming higher energy for the cooling mechanism. Therefore, thermal profiling needs to be done.

This segment illustrates the model to reduce the thermal stress by relocating the thermal-aware server. All servers likely to initiate hotspots or have been a part of them are considered to make a thermal profile of all the data centre servers concerning inlet temperature. To serve this purpose, inlet temperature effect (ITE) thermal benchmark test can be used. The result of this test points to the change in a server’s outlet temperature in accordance with changes in inlet temperature at full and zero CPU utilization levels. (Since, CPU is the most power-consuming and the most heat-dissipating hardware component of any computer system, it makes up most of the phrase server utilization.) The values of 𝑇𝑖max inlet and 𝑇𝑖outlet (increased) can be concluded from ITE test. Homogenous servers have the same 𝑇𝑖max inlet. However, 𝑇𝑖outlet (increased) is influenced by the location of temperature monitoring sensors which can be verified with multiple tests with different sensor locations as described in [20].

Advertisement

7. Thermal state transition

This section demonstrates a finite set of thermal states for a server within a data centre with respect to inlet and outlet temperature, thermal stress and server utilization level. A thermal state of a server𝑖can be defined as a tuple (𝑇𝑖inlet, 𝑇𝑖outlet, 𝜇, 𝜎𝑖) denoted by 𝑆𝑖𝑛, where 𝑛 is a whole number ranging from 0 to 3 as per the state transition diagram as shown in Figure 4.

Figure 4.

State diagram of the datacentre as in [20].

As per the assumption, for all the states 𝑇𝑖received<𝑇threshold, the states 𝑆𝑖0 and 𝑆𝑖1 are the desired states with no thermal stress. State 𝑆𝑖0 is the idle state where the server has no workload 𝜇, and the inlet temperature is close to the set temperature 𝑇set. State 𝑆𝑖1 is an active state of the server where 𝜇 is not idle, and the inlet temperature is the same as that of 𝑆𝑖0. Both these initial states have outlet temperature below the red line temperature 𝑇 threshold. There exists a difference in the states 𝑆𝑖2 and 𝑆𝑖3. This is because 𝑆𝑖2 indicates future thermal stress as hotspots. However, due to a higher inlet temperature than state 𝑆𝑖2, state 𝑆𝑖3 may have thermal stress. Moreover, because the outlet temperature is more than the maximum threshold and the presence of thermal stress as in section [20], state 𝑆𝑖3 turns out to be a hotspot state.

Advertisement

8. Understanding the economic factors

As the initial hardware investment cost is high for adopting water-cooled data centres, this issue may provoke some companies without going for it, which is between 10 and 30% more than for conventional data centres, according to survey details in [21]. Appropriate copper pipe-based cooling and heat distribution networks to be in place are required by IBM’s model. However, the benefits of going green are that computer room air conditioners and chillers are not required in the data centre, which will slash energy costs by up to 50% and dramatically lower the facility’s carbon footprint. From a business perspective, the metric that exists and which can give a slightly better idea of what a business measures itself on is the carbon footprint. A carbon footprint measures the total greenhouse gas emissions caused directly and indirectly by a person, organization, event or product, as in [22]. The overall total cost for an air-cooled data centre is greater than that of the ownership cost.

Moreover, buying new processors will cost more than having an efficient cooling system. Thus, it is economically feasible too.

Advertisement

9. Awareness of implications to society at large

The reduction of carbon dioxide emissions will benefit society as a whole. It will be reduced, as said, by 30 tonnes annually. And according to figures in [23], energy consumption will be lowered by 40%. The energy crisis like the one that occurred in since 1972 will be prevented due to the lower consumption of energy.

9.1 Awareness of contemporary issues

The only contemporary issues are that I will have to convince companies to go ahead with these methods. The design must be carefully planned, because any small leakage of water, can damage the whole system.

Another issue can be that of sensors, their working and lifetime. Since they must be working for all 6 years with full-on efficiency. If not, they will reduce the efficiency at large and may cause hazards in case they do not perform their required task.

Advertisement

10. Understanding the ethical and professional issues

Professional challenges will only arise when trying to persuade data centers to accept and use the architecture.

Nearly no ethical problems exist; the only problem is that some individuals might not agree with the procedure because I’m letting water directly touch the CPU.

11. Conclusion

A proper comparison is provided between conventional air-cooling techniques employed in the data centres and modern water-cooling methods. The necessity to employ efficient water-cooling techniques is elaborated. A detailed analysis of sensor algorithms is proposed in this paper. This deployment of sensors further improvises the much-acclaimed Aquasar project. Thus, processor cooling is one of the most important aspects of green computing. The heat captured can be used to warm the building in winter and provide year-round hot water for bathroom and kitchen use. Excess heat can be provided to the swimming pools. Heat can be converted to other forms of energy and used for various purposes.

References

  1. 1. Clark J. A Journal Article from The Data Centre Journal, proposed at The GraphLab Conference. San Francisco: The Data Centre Association; September 2023. Available from: www.datacentrejournal.com/…/liquid-cooling-the-wave-of-the-present/
  2. 2. Murugesan S. Harnessing Green IT: Principles and Practices. Florida, USA: IEEE IT Professional; 2008. pp. 24-33
  3. 3. Rien Dijkstra. The Fundamental Problem of it and data centre e-waste [online] 11 January 2014. Available from: http://infrarati.wordpress.com/category/e-waste-2/
  4. 4. Goth et al. Thermal and mechanical analysis and design of the IBM Power 775 water cooled supercomputing central electronics complex. In: 13th IEEE Intersociety Conference. San Diego, USA: Institute of Electrical and Electronics Engineers; 2012. pp. 700-709. DOI: 10.1109/ITHERM.2012.6231496
  5. 5. Patel CD, Bash CE, Sharma R, Beitelmal M, Friedrich R. Smart cooling of data centres. In: Proceedings of International Electronic Packaging Technical Conference and Exhibition. Maui, Hawaii: The American Society of Mechanical Engineers; June 2003
  6. 6. Schmidt RR, Cruz EE, Iyengar MK. Challenges of data center thermal management. IBM Journal of Research and Development. 2005;49(4/5):709-723
  7. 7. Joshi Y, Samadiani E. Energy efficient thermal management of data centers via open multi-scale design: A review of research questions and approaches. Journal of Enhanced Heat Transfer. 2011;18(1):15-30
  8. 8. Greenberg S, Mill E, Tschudi B, Rumsey P, Myatt B. Best practices for data centers: lessons learned from benchmarking 22 data centers. In: Proceedings of 14th ACEEE Summer Study on Energy Efficiency in Buildings. Pacific Grove (CA, USA): ECEEE Organization; 13-18 August 2006
  9. 9. Brunschwiler T, Smith B, Ruetsche E, Michel B. Toward zero-emission data centres through direct reuse of thermal energy. IEEE Certified. 2009;53(3):11:1-11:13. DOI: 10.1147/JRD.2009.5429024
  10. 10. Ellsworth MJ, Iyengar MK. Energy efficiency analyses and comparison of air and water cooled high-performance servers. In: Proceedings of IPACK0 09. San Francisco (CA, USA): CERN; 19-23 July 2009
  11. 11. Ellsworth MJ, Goth GF, Zoodsma RJ, Arvelo A, Campbell LA, Andrel JW. An overview of the IBM power 775 supercomputer water cooling system. In: Proceedings of IPACK0 11. Portland (OR, USA): CERN; 2011
  12. 12. Campbell L, Tuma P. Numerical prediction of the junction-to-fluid thermal resistance of a 2-phase immersion-cooled IBM dual core POWER6 Processor. In: Proceedings of the 28th IEEE SEMI-THERM Symposium. San Jose (CA, USA): Institute of Electrical and Electronics Engineers; 18-22 March 2012. pp. 36-45
  13. 13. David MP, Iyengar M, Parida P, Simons R, Schultz M, Schmidt R, et al. Experimental characterization of an energy efficient chiller-less data center test facility with warm water cooled servers. In: Proceedings of the 28th IEEE SEMI-THERM symposium. San Jose (CA, USA): Institute of Electrical and Electronics Engineers; 18-22 March 2012. pp. 232-237
  14. 14. Parida PR, David M, Iyengar M, Schultz M, Gaynes M, Kamath V, et al. Experimental investigation of water cooled server microprocessors and memory devices in an energy efficient chiller-less data center. In: Proceedings of the 28th IEEE SEMI-THERM Symposium. San Jose (CA, USA): Institute for Electrical and Electronics Engineers; 18-22 March 2012. pp. 224-231
  15. 15. Iyengar M, David M, Parida P, Kamath V, Kochuparambil B, Graybill D. Server liquid cooling with chiller-less data center design to enable energy savings. In: Proceedings of the 28th IEEE SEMI-THERM Symposium. San Jose (CA, USA): Institute of Electrical and Electronics Engineers; 18-22 Mar 2012. pp. 212-223
  16. 16. Sharma CS, Zimmermann S, Tiwari MK, Michel B, Pulikakos D. Optimal thermal operation of liquid-cooled electronic chips. International Journal of Heat and Mass Transfer. 2012;55:1957-1969
  17. 17. EDN Network. Made in IBM Labs: IBM hot water-cooled supercomputer goes live at ETH Zurich. 02 July 2010. Retrieved from: 25/06/2014 World Wide Web. Available from: https://www-03.ibm.com/press/us/en/pressrelease/32049.wss
  18. 18. US Department of Energy, Wireless Sensors Improve Data Centre Energy Efficiency. Federal Energy Management Program. September 2010; p. 1
  19. 19. Sintes T. Events and listeners from Java World. 4 August 2000. [online]. Available from: http://www.javaworld.com/article/2077351/java-se/events-and-listeners.html
  20. 20. Chaudhry MT, Ling TC, Hussain SA, Manzoor A. Minimizing thermal stress for data centre servers through thermal-aware relocation-research article. Scientific World Journal. 2014;2014, Article ID 684501. Received from: 2013, December, 02. [Accepted 2014, February, 12]. Published 2014, March, 31. Hindawi Publishing Corporation:1, 3-1, 4
  21. 21. Sciacca C. (of IBM media relations). Zurich, IBM and ETH Zurich Unveil Plan to Build New Kind of Water-cooled Supercomputer. 20 June 2009 [online]. Available from: http://www.computerIekly.com/feature/IBM-builds-water-cooled-processor-for-Zurich-supercomputer
  22. 22. Carbon Trust. Carbon Footprinting. April 2010. Retrieved 4/06/2014 World Wide Web. Available from: http://www.carbontrust.co.uk/cut-carbon-reducecosts/calculate/carbon-footprinting/pages/carbon-footprinting.aspx.
  23. 23. Brunschwiler T, Smith B, Ruetsche E, Michel B. Toward zero-emission data centers through direct reuse of thermal energy. IBM Journal of Research and Development. Institute of Electrical and Electronics Engineers; 2009;53(3):1-13

Written By

Raksha Manel Shenoy

Submitted: 19 January 2023 Reviewed: 25 January 2023 Published: 27 March 2023