We are IntechOpen, the world's leading publisher of Open Access books Built by scientists, for scientists



186,000

200M



Our authors are among the

TOP 1% most cited scientists





WEB OF SCIENCE

Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI)

## Interested in publishing with us? Contact book.department@intechopen.com

Numbers displayed above are based on latest data collected. For more information visit www.intechopen.com



### **Multi-Core CPU Air Cooling**

M. A. Elsawaf, A. L. Elshafei and H. A. H. Fahmy Faculty of Engineering, Cairo University Egypt

#### 1. Introduction

High speed electronic devices generate more heat than other devices. This chapter is addressing the portable electronic device air cooling problem. The air cooling limitations is affecting the portable electronic devices. The Multi-Core CPUs will dominate the Mobile handsets Platforms in the coming few years. Advanced control techniques offer solutions for the central processing unit (CPU) dynamic thermal management (DTM). This chapter objective is to minimize air cooling limitation effect and ensure stable CPU utilization using fuzzy logic control. The proposed solution of the air cooling limitation focuses on the design of a DTM controller based on fuzzy logic control. This approach reduces the problem design time as it is independent of the CPU chip and its cooling system transfer functions. On-chip thermal analysis calculates and reports thermal gradients or variations in operating temperature across a design. This analysis is increasingly important for the advanced digital integrated circuits (ICs). At today's 65nm and 45nm technologies, adding cores to CPU chip increases its power density and leads to thermal throttling. Advanced control techniques give a solution to the CPU thermal throttling problem. Towards this objective, a thermal model similar to a real IBM CPU chip containing 8 cores is built. This thermal model is integrated to a semiconductor thermal simulator. The open loop response of the CPU chip is extracted. This CPU chip thermal profile illustrates the CPU thermal throttling. The proposed DTM controller design is based on 3D fuzzy logic. There are many cores within CPU chip, each of them is a heat source. The correlation between these cores temperatures and their operating frequencies improves the DTM response and reduce the air cooling limitation effect. The 3D fuzzy controller takes into consideration these correlations. This chapter presents a new DTM technique called "Thermal Spare Core" algorithm (TSC). Thermal Spare Core (TSC) is a completely new DTM algorithm. The thermal spare cores (TSC) is based on the reservation of cores during low CPU utilization and activate them during thermal crises. The reservation of some cores as (TSC) doesn't impact CPU over all utilization. These cores are not activated simultaneously due to the air cooling limitations. The semiconductor technology permits more cores to be added to CPU chip. That means there is no chip area wasting in case of TSC. The TSC is a solution of the Multi-Core CPU air cooling limitations.

#### 2. The CPU air cooling limitations

We live in a computer controlled epoch. We do not even realize how often our lives depend on machines and their programming. For example, mobile handsets, portable electronic

devices, laptops, medical instruments, and many other devices all depend on digital processors in our everyday lives. There is no doubt that the size and the weight of these portable equipments is affecting their utilization. Unfortunately, there are many factors affecting the portability of electronic systems. The power consumption is affecting battery. Efficient cooling of portable electronic devices is becoming a problem due to air cooling limitations.

On-chip temperature gradient is a design challenge. Many technology factors affect the chip temperature gradients. In terms of the technology factors, power density (power per unit area) is increasing with each new technology node (ITRS , 2006). After all, smaller geometries enable more functionality to be fit within the same area of a chip which can result in high thermal gradients (Huangy et al., 2006). As shown in Fig.1A, adding more cores to the CPU chip increase the total power consumption. Fig.1B illustrates the maximum number of cores per chip and their maximum operating frequencies (D.D.Kim et al., 2008).



Fig. 1. Multi-Core CPU evolutions

The CPU cores run relatively hot while on-chip memory tends to run relatively cold. The result is an ever-varying mish-mash of "hot" and "cold" spots that depend on the mode of operation. A cell phone is a good example of this type of design. The act of creating a text message will exercise certain functionality, which creates a specific thermal profile. But the act of transmitting this message will exercise different functionality, which results in a different profile. The same can be said for using the cell phone to make a voice call, play an

mp3 file, take a picture, and so forth. The resulting temperature variation across a chip is typically around 10° to 15°C. If this temperature distribution is not managed; then temperature variation will be as high as 30° to 40°C (Mccrorie, 2008).

The CPU power dissipation comes from a combination of dynamic power and leakage power (S.Kim et al., 2007). Dynamic power is a function of logic toggle rates, buffer strengths, and parasitic loading. The leakage power is function of the technology and device characteristics. Thermal-analysis solutions must account for both causes of power. In Fig.1C the thermal profile of a CPU chip is showing the temperature variation across the chip surface. This phenomenon is due to the variation of the power density according to each function block design. This power density distribution generates "hotspots" and "coldspots" areas across the CPU chip surface (Huangy et al., 2006). The high CPU operating temperature increases leakage current degrades transistor performance, decreases electro migration limits, and increases interconnect resistively (Mccrorie, 2008). In addition, leakage current increases the power consumption.

#### 3. The CPU thermal throttling problem

The fabrication technology permits the addition of more cores to the CPU chip having higher speed and smaller size devices. But adding more cores to a CPU chip increases the power density and generates additional dynamic power management challenges. Since the invention of the integrated circuit (IC), the number of transistors that can be placed on an integrated circuit has increased exponentially, doubling approximately every two years (Moore, 1965). The trend was first observed by Intel co-founder Gordon E. Moore in a 1965 paper. Moore's law has continued for almost half a century! It is not a coincidence that Moore was discussing the heat problem in 1965: "will it be possible to remove the heat generated by tens of thousands of components in a single silicon chip?" (Moore, 1965). The static power consumption in the IC was neglected compared to the dynamic power for CMOS technology. The static power is now a design problem. The millions of transistors in the CPU chip exhaust more heat than before. The CPU cooling system capacity limits the number of cores within the CPU chip (ITRS , 2008).

The International Technology Roadmap for Semiconductors (ITRS) is a set of documents produced by a group of semiconductor industry experts. ITRS specifies the high-performance heat-sink air cooling maximum limits; which is 198 Watt (ITRS, 2006). The chip power consumption design is limited by cooling system level capacity. We already reached the air cooling limitation in 2008 as shown in Fig.1D.

As shown in Fig.2A; the CPU reaches the maximum operational temperature after certain time due to maximum CPU utilization. Thus the CPU utilization is reduced to the safe utilization in order not to exceed. This phenomenon is called CPU thermal throttling. Fig.2B shows the comparison between the ideal case "no thermal constrains", "low power consumption with thermal constraints" case and "high power consumption with thermal constraints" case and "high power consumption with thermal constraints" case. The addition of more cores to the CPU chip doesn't increase the CPU utilization. The curve drifts to lower CPU utilization due to the CPU thermal limitation in case of low power consumption. In case of high power consumption; the CPU utilization decreases by adding more cores to the CPU chip. Thus the CPU utilization improvement is not proportional to its number of cores.



Fig. 2. CPU thermal throttling (Passino & Yurkovich, 1998)

#### 4. The advance DTM controller design

The advanced dynamic thermal management techniques are mandatory to avoid the CPU thermal throttling. The fuzzy control provides a convenient method for constructing nonlinear controllers via the use of heuristic information. Such heuristic information may come from an operator who has acted as a "human-in-the-loop" controller for a process. The fuzzy control design methodology is to write down a set of rules on how to control the process. Then incorporate these rules into a fuzzy controller that emulates the decision-making. Regardless of where the control knowledge comes from, the fuzzy control provides a user-friendly and high-performance control (Patyra et al., 1996).

The DTM techniques are required in order to have maximum CPU resources utilization. Also for portable devices the DTM doesn't only avoid thermal throttling but also preserves the battery consumption. The DTM controller measure the CPU cores temperatures and according selects the speed "operating frequency" of each core. The power consumed is a function of operating frequency and temperature. The change in temperature is a function of temperature and the dissipated power.

The dynamic voltage and frequency scaling (DVFS) is a DTM technique that changes the operating frequency of a core at run time (Wu et al., 2004). Clock Gating (CG)or stop-go technique involves freezing all dynamic operations(Donald & Martonosi, 2006). CG turns off the clock signals to freeze progress until the thermal emergency is over. When dynamic operations are frozen, processor state including registers, branch predictor tables, and local caches are maintained (Chaparro et al., 2007). So less dynamic power consumed during the wait period. GC is more like suspend or sleep switch rather than an off-switch. Thread migration (TM) also known as core hopping is a real time OS based DTM technique. TM reduces the CPU temperature by migrating core tasks "threads" from an overheated core to another core with lower temperature. The current traditional DTM controller uses proportional (P controller) or proportional-integral (PI controller) or proportional-integral-derivative (PID controller) to perform DVFS (Donald & Martonosi, 2006; Ogras et al., 2008).

The fuzzy logic is introduced by Lotfi A. Zadeh in 1965 (Trabelsi et al., 2004). The traditional fuzzy set is two-dimensional (2D) with one dimension for the universe of discourse of the variable and the other for its membership degree. This 2D fuzzy logic controller (FC) is able to handle a non linear system without identification of the system transfer function. But this 2D fuzzy set is not able to handle a system with a spatially distributed parameter. While a three-dimensional (3D) fuzzy set consists of a traditional fuzzy set and an extra dimension for spatial information. Different to the traditional 2D FC, the 3D FC uses multiple sensors to provide 3D fuzzy inputs. The 3D FC possesses the 3D information and fuses these inputs into "spatial membership function". The 3D rules are the same as 2D Fuzzy rules. The number of rules is independent on the number of spatial sensors. The computation of this 3D FC is suitable for real world applications.

#### 5. DTM evaluation index

An evaluation index for the DTM controller outputs is required. As per the thermal throttling definition, "the operating frequency is reduced in order not to exceed the maximum temperature". Both frequency and temperature changes are monitored as there is a non linear relation between the CPU frequency and temperature. One of the DTM objectives is to minimize the frequency changes. The core theoretically should work at open loop frequency for higher utilization. But due to the CPU thermal constrains the core frequency is decreased depending on core hotspot temperature.

The second DTM objective is to decrease the CPU temperature as much as possible without affecting the CPU utilization. A multi-parameters evaluation index  $\zeta_t$  is proposed. It consists of the summation of each parameter evaluation during normalized time period. This index is based on the weighted sum method. The objective of multi-parameters evaluation index shows the different parameters effect on the CPU response. Thus the designer selects the suitable DTM controller that fulfils his requirements. The multi-parameters evaluation index permits the selection of DTM design that provides the best frequency parameter value without leading to the worst temperature parameter value. The DTM evaluation index  $\zeta_t$  calculation consists of 5 phases:

- 1. Identify the required parameters
- 2. Identify the design parameters ranges
- 3. Identify the desired parameters values of each range  $\sigma_{ii}^{\text{Desired}}$
- 4. Identify the actual parameters values of each range  $\sigma_{ij}^{Actual}$
- 5. Evaluate each parameter and the over all multi- parameter evaluation index

$$\zeta_t = \sum_{i=1}^l \lambda_i \tag{1}$$

The parameter  $\lambda_i$  value during the evaluation time period is the summation of the evaluation ranges divided by the number of ranges  $m_i$ .

$$\lambda_i = \frac{1}{m_i} \sum_{j=1}^{m_i} \sigma_{ij} \tag{2}$$

Each evaluation range  $\sigma_{ij}$  is evaluated over a normalized time period

$$\sigma_{ij} = \frac{\sigma_{ij}^{\text{Actual}}}{\sigma_{ij}^{\text{Desired}}}$$
(3)

 $\sigma_{ij}^{Actual}$  is the actual percentage of time the CPU runs at that range

 $\sigma_{ii}^{\text{Desired}}$  is the desired percentage of time the CPU runs at that range

The  $\lambda_i$  value should be 1 or near 1. If  $\lambda_i < 1$  then the CPU runs less time than the desired within this range. If  $\lambda_i > 1$  then the CPU runs more time than the desired within this range. Thus the multi-parameters evaluation index equation is:

$$\zeta_t = \sum_{i=1}^l \frac{1}{m_i} \sum_{j=1}^{m_i} \left( \frac{\sigma_{ij}^{\text{Actual}}}{\sigma_{ij}^{\text{Desired}}} \right)$$
(4)

The DTM controller evaluation index desired value should be  $\zeta_t = l$  or near l, where l is the number of parameters. The Multi-parameters evaluation index permit the designer to evaluate each rang independent on the other ranges and also evaluate the over all DTM controller response.

The multi-parameters evaluation index is flexible and accepts to add more evaluation parameters. This permits the DTM controller designer to add or remover any parameter without changing the evaluations algorithm. Fig.3 shows an example of the parameter  $\lambda_i$  calculation. In this example the parameter  $\lambda_i$  is the temperature. The temperature curve is divided into 3 ranges: High (H) – Medium (m) – Low (L), these ranges are selected as follow: High "greater than78 °C", Medium "between 74 °C and 78 °C", and Low "lower than 72 °C". The actual parameters values of each range  $\sigma_{ij}^{Actual}$  is calculated as follow:  $\sigma_{i \text{ High}}^{Actual} = 20.5\%$ ,  $\sigma_{i \text{ Medium}}^{Actual} = 76\%$ , and  $\sigma_{i \text{ Low}}^{Actual} = 3.5\%$ 

#### 6. Thermal spare core

As a CPU is not 100% utilized all time, thus some of the CPU cores could be reserved for thermal crises. Consider Fig.4A, when a core reaches the steady state temperature  $T_1$ , the cooling system is able to dissipate the exhausted heat outside the chip. However, if this core is overheated, the cooling system is not able to exhaust the heat outside the chip. Thus the core temperature increases until it reaches the thermal throttling temperature  $T_3$  (Rao & Vrudhula, 2007).

The same thermal phenomena, as shown in Fig.4A, occur due to faults in the cooling system (Ferreira et al., 2007). The semiconductor technology permits more cores to be added to CPU chip. While the total chip area overhead is up to 27.9 % as per ITRS (ITRS , 2009). That means there is no chip area wasting in case of TSC. So reserving cores as thermal spare core (TSC) doesn't impact CPU over all utilization. These cores are not activated simultaneously due to thermal limitations. According to Amdahl's law: "parallel speedups limited by serial portions" (Gustafson , 1988). So adding more cores to CPU chip doesn't speedup due to the serial portion limits. Thus not all cores are fully loaded or even some of them are not even



Fig. 3. Example of actual parameter value calculation

utilized if parallelism doesn't exist. The TSC concept uses the already existing chip space due to semiconductor technology. From the thermal point of view; the horizontal heat transfer path has for up to 30% of CPU chip heat transfer (Stan et al., 2006). The TSC is a big coldspot within the CPU area that handles the horizontal heat transfer path. The cold TSC reduces the static power as the TSC core is turned off. Also the TSC is used simultaneous with other DTM technique. The equation (5) calculates number of TSCs cores. The selection of TSC cores number is dependent on the number of cores per chip and maximum power consumed per core as follow:

$$N_{TSC} = | \{ (P_{mx} \ N_C - 198) / 198 \} |$$
 (5)

where  $N_{TSC}$ : minimum number of TSCs,  $P_{mx}$ : maximum power consumed per core,  $N_C$ : total number of cores, 198 Watts is the thermal limitation of the air cooling system. Fig.4A shows core profile where lower curve is normal thermal behavior. The upper curve is the overheated core,  $T_1$  is the steady state temperature,  $T_1 = 80$  C corresponds to the temperature at  $t_1$ .  $t_2$  is required time for a thermal spare core to takeover threads from the overheated core,  $T_2 = 100$  C corresponds to the temperature at  $t_2$ .  $T_3$  is the throttling temperature, and  $T_3 = 120$  C corresponds to the temperature at  $t_3$ .

TSC technique uses the already existing cores within CPU chip to avoid CPU thermal throttling as follow: Hot TSC: is a core within the CPU powered on but its clock is stopped. It only consumes static power. It is a fast replacement core. However, it is still a heat source. Cold TSC: is a core within the CPU chip powered off (no dynamic or static power consumed). It is not a heat source, but it is a slow replacement core. Its activation needs more time than hot TSC. But the cold TSC reduces the static power dissipation. Also cold TSC generates cold spot with relative big area that helps exhausting the horizontal heat transfer path out of the chip.



Fig. 4. TSC Illustration

Defining  $T_{tsc}$  as the TSC activation temperature as follow:

$$T_{ss} \leq T_{tsc} \leq T_{th}$$
 (6)

$$t_{tsc} = \min \{ (t_{th} - t_{CT}), (t_{th} - t_{TM}) \}$$
(7)

Where:  $T_{ss}$ : core steady state temperature.  $T_{tsc}$ : The temperature that triggers TSC process.  $T_{th}$ : CPU throttling temperature.  $t_{tsc}$ : The time of activating TSC.  $t_{th}$ : The time required to reach thermal throttling.  $t_{CT}$ : The estimated time required for completing the current tasks within the over heated core. This information is not always accurate at run time.  $t_{TM}$ : Time required migrating threads from over heated core to TSC. If any core reaches  $T_{tsc}$  then the DTM controller will inform the OS to stop assigning new tasks to this overheated core. Thus the OS doesn't assign any new task to the overheated core. Therefore,  $T_{tsc}$  is not predefined constant temperature but variable temperature between  $T_{ss}$  and  $T_{th}$ . The DTM selects  $T_{tsc}$  depending on the minimum time required to evacuate the over heated core.

#### 6.1 TSC illustration

This section illustrates the thermal spare cores (TSC) technique

As shown in Fig.4B, the CPU is 100% utilized for duration about 50 seconds. The OS realizes that the CPU congestion. The CPU executes its tasks slowly. In fact the CPU suffers from thermal throttling. This CPU utilization curve shows CPU congestion from OS point of view due to thermal limitations.

As shown in Fig.4C, The DTM controller detected the CPU high temperature. Thus the DTM controller executes the TSC algorithm. At 40 seconds time line, a TSC core replaces a hot core. The handover between the hot core the TSC core lead to a CPU peak. But The CPU improves its speed after that peak; as the TSC is still cold relatively and operates at higher

frequency. At 86 seconds, the CPU reaches thermal throttling again. Thus the CPU reaches congestion again. So the activation of a TSC core during the CPU thermal crises decreases the duration of the CPU degradation from 50 seconds to 15 seconds duration.

As shown in Fig.4D, the activation of 3 TSC cores during the thermal crises at 25 seconds, 45 seconds and 85 seconds time lines respectively increases the CPU utilization. The CPU executes its tasks normally without congestion rather than some CPU peaks. AS this CPU chip has many spare cores; the DTM controller activates the required TSC during the CPU thermal crises. So the CPU avoids the thermal throttling theoretically.

#### 6.2 3D Fuzzy DTM controller

The 3D fuzzy control is able to handle the correlation between the different variable parameters of a distributed parameter system (Li & Li, 2007). Thus the 3D fuzzy logic is able to process the Multi-Core CPU correlation information. The 3D fuzzy control demonstrates its potential to a wide range of engineering applications. The 3D fuzzy control is feasible for real-time world applications (Li & Li, 2007). The thermal management process is a distributed parameter systems. The thermal management process is represented by the nonlinear partial differential equations (Doumanidis & Fourligkas, 2001).



Fig. 5. Actuator *u* and the measurement sensors at *p* point.

Fig.5. presents a nonlinear distributed parameter system with one actuator ( $\gamma = 1$ ). Where p point measurement sensors are located at  $z_1, z_2, \dots, z_p$  in the one-dimensional space domain respectively and an actuator u with some distribution acts on the distributed process. Inputs are measurement information from sensors at different spatial locations. i.e., deviations  $e_1, e_2, \dots, e_p$  and deviations change  $\Delta e_1, \Delta e_2, \dots, \Delta e_p$  where  $e_1 = y_d(z_i) - y(z_i, n)$ ,  $\Delta e_i = e_i(n) - e_i(n-1) y_d(z_i)$  denotes the measurement value from location  $z_i$ , n, n-1 denote the n and n-1 sample time input. The output relationship is described by fuzzy rules extracted from knowledge. Since p sensors are used to provide 2p inputs.



Fig. 6. 3D fuzzy set (Li & Li, 2007)

The 3D fuzzy control system is able to capture and process the spatial domain information defined as the 3D FC. One of the essential elements of this type of fuzzy system is the 3D fuzzy set used for modeling the 3D uncertainty. A 3D fuzzy set is introduced in Fig.6 by developing a third dimension for spatial information from the traditional fuzzy set. The 3D fuzzy set defined on the universe of discourse *X* and on the one-dimensional space is given by:

$$V = \{(x,z), \mu_{\overline{V}}(x,z) \quad \forall \quad x \in X, \quad z \in Z\} \text{ and } 0 \le \{(x,z), \mu_{\overline{V}}(x,z) \le 1$$
(8)

When *X* and *Z* are discrete,  $\overline{V}$  is commonly written as  $\overline{V} = \sum_{z \in Z} \sum_{x \in X} \mu_{\overline{V}}(x, z) / (x, z)$ Where  $\sum \sum$  denotes union over all admissible *x* and *z*. Using this 3D fuzzy set, a 3D fuzzy membership function (3D MSF) is developed to describe a relationship between input *x* and the spatial variable *z* with the fuzzy grade *u*.



Fig. 7. 3D fuzzy system illustration (Li & Li, 2007)

Theoretically, the 3D fuzzy set or 3D global fuzzy MSF is the assembly of 2D traditional fuzzy sets at every spatial location (Li & Li, 2007). However, the complexity of this global 3D

nature may cause difficulty in developing the FC. Practically, this 3D fuzzy MSF is approximately constructed by 2D fuzzy MSF at each sensing location. Thus, a centralized rule based is more appropriate, which avoid the exponential explosion of rules when sensors increase. The new FC has the same basic structure as the traditional one. The 3D FC is composed of fuzzification, rule inference and defuzzification as shown in Fig.7A. Due to its unique 3D nature, some detailed operations of this new FC are different from the traditional one. Crisp inputs from the space domain are first transformed into one 3D fuzzy input via the 3D global fuzzy MSF. This 3D fuzzy input goes through the spatial information fusion and dimension reduction to become a traditional 2D fuzzy input. After that, a traditional fuzzy inference is carried out with a crisp output produced from the traditional defuzzification operation. Similar to the traditional 2D FC, there are two different fuzzifications: singleton fuzzifier and non-singleton.

A singleton fuzzifier is selected as follows: Let *A* be a 3D fuzzy set, *x* is a crisp input,  $x \in X$  and *z* is a point  $z \in Z$  in one-dimensional space *Z*. The singleton fuzzifier maps *x* into  $\overline{A}$  in *X* at location *z* then  $\overline{A}$  s a fuzzy singleton with support *x*' if  $\mu_{\overline{A}}(x,z) = 1$  for x = x', z = z' and  $\mu_{\overline{A}}(x,z) = 0$  for all other  $x \in X, z \in Z$  with  $x \neq x', z \neq z'$  if finite sensors are used. This 3D fuzzification is considered as the assembly of the traditional 2D fuzzification at each sensing location. Therefore, for *p* discrete measurement sensors located at  $z_1, z_2, \dots, z_p, x_z = [x_1(z), x_2(z), \dots, x_j(z)]$  is defined as *J* crisp spatial input variables in space domain  $Z = \{z_1, z_2, \dots, z_p\}$  where  $x_j(z_i) \in X_j \subset IR(j = 1, 2, \dots, J)$  denotes the crisp input at the measurement location  $z = z_i$  for the spatial input variable  $x_j(z), X_j$  denotes the domain of  $x_j(z_i)$ . The variable  $x_j(z)$  is marked by "*z*" to distinguish from the ordinary input variable, indicating that it is a spatial input variable. The fuzzification for each crisp spatial input variable  $x_j(z)$  is uniformly expressed as one 3D fuzzy input  $\overline{A}_{xj}$  in the discrete form as follows:

$$\overline{A_{X1}} = \sum_{z \in Z} \sum_{x_1(z) \in X_1} \mu_{X1}(x_1(z), z) / (x_1(z), z)$$

$$\overline{A_{XJ}} = \sum_{z \in Z} \sum_{x_J(z) \in X_J} \mu_{XJ}(x_J(z), z) / (x_J(z), z)$$
en the fuzzification result of *L* crisp inputs *x* can be represented by:

Then, the fuzzification result of *J* crisp inputs  $x_z$  can be represented by:

$$\overline{A_X} = \sum_{z \in Z} \sum_{x_1(z) \in X_1} \sum_{x_2(z) \in X_2} \dots \sum_{x_J(z) \in X_J} \left\{ \mu_{X1}(x_1(z), z)^* \dots * \mu_{XJ}(x_J(z), z) \right\} / \{ (x_1(z), z)^* \dots * (x_J(z), z) \}$$

$$\{ (y_1(z), z)^* \dots * (x_J(z), z) \}$$

$$\{ (y_1(z), z)^* \dots * (y_J(z), z) \}$$

$$\{ (y_1(z), z)^* \dots * (y_J(z), z) \}$$

Where \* denotes the triangular norm; t-norm (for short) is a binary operation. The t-norm operation is equivalent to logical AND. Also it has been assumed that the membership function  $\mu_{\overline{A}_{X}}$  is separable.

Using the 3D fuzzy set, the  $\gamma^{th}$  rule in the rule based is expressed as follows:

$$\overline{R}^{\gamma}$$
: if  $x_1(z)$  is  $\overline{C}_1^{\gamma}$  and .....and  $x_1(z)$  is  $C_1^{\gamma}$  then  $u$  is  $G^{\gamma}$  (10)

Where  $\overline{R}^{\gamma}$  denotes the  $\gamma^{th}$  rule  $\gamma = (1, 2, ..., N)$   $x_i(z), (j = 1, 2, ..., J)$  denotes spatial input variable  $C_1^{\gamma}$  denotes 3D fuzzy set, *u* denotes the control action *u*  $\in U \subset IR, G^{\gamma}$  denotes a traditional fuzzy set N is the number of fuzzy rules, the inference engine of the 3D FC is expected to transform a 3D fuzzy input into a traditional fuzzy output. Thus, the inference engine has the ability to cope with spatial information. The 3D fuzzy DTM controller is designed to have three operations: spatial information fusion, dimension reduction, and traditional inference operation. The inference process is about the operation of 3D fuzzy set including union, intersection and complement operation. Considering the fuzzy rule expressed as (10), the rule presents а fuzzy relation  $\overline{R}^{\gamma}:\overline{C}_{1}^{\gamma} \times \dots \times C_{I}^{\gamma} \to G^{\gamma}$  $\gamma = (1, 2, ..., N)$  thus, a traditional fuzzy set is generated via combining the 3D fuzzy input and the fuzzy relation is represented by rules. The spatial information fusion is this first operation in the inference to transform the 3D

fuzzy input  $\overline{A}_X$  into a 3D set  $W^{\gamma}$  appearing as a 2D fuzzy spatial distribution at each input  $x_z$ .  $W^{\gamma}$  is defined by an extended sup-star composition on the input set and antecedent set. Fig.7B. gives a demonstration of spatial information fusion in the case of two crisp inputs from the space domain Z,  $x_z = [x_1(z), x_2(z), ..., x_j(z)]$ .

This spatial 3D MSF, is produced by the extended sup-star operation on two input sets from singleton fuzzification and two antecedent sets in a discrete space Z at each input value  $x_z$ . An extended sup-star composition employed on the input set and antecedent sets of the rule, is denoted by:

$$W^{\gamma}_{\overline{Ax^{o}}(\overline{C}_{1}^{\gamma}\times\ldots\times\overline{C}_{J}^{\gamma})} = \overline{A}_{X^{o}}(\overline{C}_{1}^{\gamma}\times\ldots\times\overline{C}_{J}^{\gamma})$$
(11)

The grade of the 3D MSF derived as

$$\mu_{\overline{W^{\gamma}}}(z) = \mu_{\overline{A}X^{0}(\overline{C}_{1}^{\gamma} \times \dots \times \overline{C}_{J}^{\gamma})}(x_{z}, z)$$
(12)

 $\mu_{\overline{W^{\gamma}}}(z) = \sup_{x_1(z) \in X_1, \dots, x_J(z) \in X_J} \left[ \mu_{\overline{AX}}(x_z, z) * \mu_{\overline{C}_1^{\gamma}} \times \dots \times \mu_{\overline{C}_J^{\gamma}}(x_z, z) \right] \text{ where } z \in Z \text{ and } * \text{ denotes the t-norm operation.}$ 

$$\mu_{\overline{W^{\gamma}}}(z) = \{\sup_{x_{1}(z) \in X_{1}} [\mu_{\overline{A}X1}(x_{1}(z), z)\mu_{\overline{C}_{1}^{\gamma}}(x_{1}(z), z)]\} * \dots$$
  
$$\dots * \{\sup_{x_{I}(z) \in X_{I}} [\mu_{\overline{A}XI}(x_{I}(z), z)\mu_{\overline{C}_{1}^{\gamma}}(x_{I}(z), z)]\}$$

The dimension reduction operation is to compress the spatial distribution information  $(x_z, \mu, z)$  into 2D information  $(x_z, \mu)$  as shown in Fig.7B. The set  $W^{\gamma}$  shows an approximate

fuzzy spatial distribution for each input  $x_z$  in which contains the physical information. The 3D set  $W^{\gamma}$  is simply regarded as a 2D spatial MSF on the plane  $(\mu, z)$  for each input  $x_z$ . Thus, the option to compress this 3D set  $W^{\gamma}$  into a 2D set  $\varphi^{\gamma}$  is approximately described as the overall impact of the spatial distribution with respect to the input  $x_z$ . The traditional

$$\mu_V^{\gamma}(u) = \varphi^{\gamma} * \mu_{G^{\gamma}}(u), \quad u \in U$$
(13)

Where \* stands for a t-norm,  $\mu_{G^{\gamma}}(u)$  is the membership grade of the consequent set of the fired rule  $\overline{R}^{\gamma}$ . Finally, the inference engine combines all the fired rules (14) .Where  $V_{\gamma}$  the output is fuzzy set of the fired rule  $\overline{R}^{\gamma}$ , N' denotes the number of fired rules and V denotes the composite output fuzzy set.

inference operation is the last operation in the inference. Where implication and rules'

combination are similar to those in the traditional inference engine.

$$V = \bigcup_{\gamma=1}^{N'} V_{\gamma} \tag{14}$$

The traditional defuzzification is used to produce a crisp output. The center of area (COA) is chosen as the defuzzifier due to its simple computation (Yager et al., 1994).

$$u = \frac{\sum_{\gamma=1}^{N'} C^{\gamma} \mu \varphi^{\gamma}}{\sum_{\gamma=1}^{N'} \mu \varphi^{\gamma}}$$
(15)

Where  $C^{\gamma} \in U$  is the centroid of the consequent set of the fired rule  $\overline{R}^{\gamma}_{\gamma} = (1, 2, ..., N')$  which represents the consequent set  $G^{\gamma}$  in (13), N' is the number of fire rules  $N' \leq N$ 

For Multi-Core CPU system; each core is considered as heat source. The heat conduction *Q* path is inverse propositional to the distance between the heat sources (16). The nearest hotspot has the highest effect on core temperature increase. Also the far hotspot has the lowest effect on core temperature increase.

$$Q = \frac{\sigma \quad A \quad \Delta T}{d} \tag{16}$$

Where *Q* is the heat conducted,  $\sigma$  the thermal conductivity, *A* the cross-section area of heat path (constant value),  $\Delta T$  the temperature difference at the hotspots locations, *d* the length of heat path (the distance between the heat sources). The 3D MSF gain  $G_{ij}$  is selected as the inverse the distance between 2 cores hotspots locations

$$MSF_{3D} = \sum MSF_{2D}G_{ij} \tag{17}$$

Where  $MSF_{2D}$  the 2D MSF,  $G_{ij}$  the correlation gains between core i and core j.  $G_{ij}$  is not a constant value as the hotspots locations are changing during the run time. The maximum gain = 1 in case of calculating the correlation gain locally  $G_{ii}$ .

The 3D FC is based on 32 variables as follow (Yager et al., 1994):

The inputs 3D fuzzy variable at step n for each core are: 8 frequency deviation variables calculate as per (3). The output: for each core, the output is the core operating frequency at step n+1. The relationships: at step n CPU throughput is proportional to cores operating frequency. The core operating frequency is also proportional to the power consumption. The maximum power consumption leads to the maximum temperature increase.

In order to compare between the 2D FC and the 3D FC responses, the same configuration are reused with the 3D FC. The same the control objectives. The same fuzzy inputs, the same Meta decisions rules, the same rule space , and the same input 2D MSF Normal distribution configurations. Also The output membership functions are tuned per DTM controller. In general we have four outputs MSF: Max - DVFS - TSC MSF - FS. Thus the only design different between the 2D FC and the 3D FC that the 3D FC DTM takes into consideration the surrounding core hotspot temperatures and their operating frequencies. Fig.8. shows the 3D fuzzy DTM controller implementation.

#### 3D-Fuzzy Example:

The number of p sensors = 5; the sensors are located at  $z_1, z_2, \dots, z_5$  Two crisp input,  $x \in X$  and z is a point  $z \in Z$  in one-dimensional space. For p = 5 discrete measurement sensors located at  $z_1, z_2, \dots, z_5$ ,  $x_z = [x_1(z), x_2(z)]$  is defined as J is two crisp spatial input variables in space domain  $Z = \{z_1, z_2, \dots, z_5\}$  where  $x_j(z_i) \in X_j \subset IR(j = 1, 2)$ . The fuzzification for each crisp spatial input variable  $x_j(z)$  is uniformly expressed as the 3D fuzzy inputs are  $\overline{A}_{x1}$  and  $\overline{A}_{x2}$  in the discrete form. As shown in Fig.7B;  $\mu_1$  values are the local substitutions of  $x_1(z)$  in each 2D MSF at each z location.  $\mu_2$  values are the sup-star composition of  $\mu_1$  and  $\mu_2$  at each z location as shown in Table 1. The sup-star composition in the fuzzy inference engine becomes a sup-minimum composition.

| $x_1(z)$ | $x_2(z)$ | z    | $\mu_1$ | $\mu_2$ | $\mu_{_{W^1}}$ |
|----------|----------|------|---------|---------|----------------|
| - 0.5    | - 0.6    | 0.0  | 0.8     | 0.4     | 0.4            |
| 0.0      | 0.2      | 0.5  | 0.8     | 0.9     | 0.8            |
| 0.3      | 0.1      | 0.25 | 0.9     | 1)      | -0.9           |
| 0.7      | 0        | 0.75 | 0.6     | 0.7     | 0.6            |
| 0.2      | -0.1     | 1    | 0.8     | 0.3     | 0.3            |

Table 1. 3D Fuzzy with Two crisp input example

#### 7. Simulation results

Simulation is used for validating the designed 3D fuzzy DTM controller. The CPU chip selection is based on the on the amount of published information. The IBM POWER processor family is selected based on published information include floor plan, thermal design power (TDP), technology, chip area, and operating frequencies. IBM POWER4 MCM chip is selected chip. The floor plans of the POWER4 processor and the MCM are published



Fig. 8. 3D-Fuzzy controller block diagram

as pictures. The entire processor manufacturers consider the CPU floor plan and its power density map as confidential data. Thus there is major difficulty to build a thermal model based on real CPU chip information. Only old CPU chip thermal data is published. The MCM POWER4 floor plan and power density map are published. The only way to build up a CPU thermal model is the reverse engineering of IBM MCM POWER4 chip Fig.9. The reverse engineering process took a lot of time and efforts. The extracted MCM POWER4 chip is scaled into 45nm technology as POWER4 chip is built on the old 90nm technology (Sinharoy et al., 2005).



Fig. 9. The extracted IBM POWER4 MCM floor plan

Virginia Hotspot simulator is selected based on simulator features and on line support provided by Hotspot team at Virginia University. The Hotspot 5 simulator uses the duality between RC circuits and thermal systems to model heat transfer in silicon. The Hotspot 5 simulator uses a Runge-Kutta (4th order) numerical approximation to solve the differential equations that govern the thermal RC circuit's operation (LAVA , 2009).

#### 7.1 Simulation analysis

All simulations starts from 814 seconds as the CPU thermal model required 814 seconds to reach  $T_{Control}$  70 °C. Assuming that the CPU output response follows the open loop curve until it reaches 70 °C. At  $T_{Control}$ , the DTM controller output selects the cores operating frequency. Then each core temperature changes according to its operating frequency. All DTM fuzzy designs tuning are based on their output membership functions (MSF) tuning without changing the fuzzy rules. The DTM evaluation index covers the simulation times between 814 seconds to 1014 seconds. Theses simulation tests 3D-FC1, FC1, 3D-FC2, FC2, 3D-FC3 and FC3 perform both DVFS and TSC together. But these tests FC4, 3D-FC4, 3D-F

FC5, and 3D-FC6 perform DVFS only. The DTM controller evaluation index (4) has only two parameters l = 2, the frequency and the temperature. Its desired value is  $\zeta_t = 2$  or near 2. There are two DTM evaluation index implementations presented in this section. The first DTM implementation assumed that the CPU is required to run 20% of its time at the maximum frequency, 50% of its time at high frequency, 20% of its time at medium frequency and 10% of it is time at low frequency. Also the CPU is required to 30% of its time at high temperature, 40% at medium temperature, and 30% of its time at low temperature. This first DTM requirement evaluation against the DTM controller designs are as follow: Table 2 shows the percentage of time when the CPU operates at each frequency ranges. Table 3 shows the percentage of time of the CPU operates at each temperature ranges. The

393

Table 3 shows the percentage of time of the CPU operates at each temperature ranges. The best results are highlighted in bold. The DTM evaluation index selected FC3 and 3D-FC6 as the best DTM controller designs as shown in Table 4. The best results are highlighted in bold. Only FC3 and 3D-FC6 controllers have high results in both frequency, and temperature evaluation indexes. As shown in Fig.10A, both DTM controllers' frequency change responses oscillate all times. The 3D-FC6 controller has less number of frequency oscillation and smaller amplitudes. The FC3 controller operates at maximum frequency then it is switched off between 1014 and 1100 seconds. The 3D-FC6 controller is never switched off and operates at high frequency ranges but not on the maximum frequency. From the temperature point of view; both controllers temperatures are oscillating. 3D-FC6 controller has minimum temperature amplitudes at 970 and 1070 seconds as shown in Fig.10B. The 3D-FC6 is always operating on lower temperature than the FC3 controller. Thus the 3D-FC6 controller is better then the FC3 controller. As shown in Table 5, Table 6, Table 7; only FC4, 3D-FC3 and 3D-FC6 controllers have high results in both frequency, and temperature evaluation indexes. As shown in Fig.10 A,C,E, all DTM controllers' frequency change responses oscillate all times. The 3D-FC6 controller has the lowest number of frequency oscillation. The 3D-FC3 controller has smallest frequency changes amplitudes. The 3D-FC3 controller operates at high frequency ranges but not on the maximum frequency. From the temperature point of view; all controller temperature are increasing as shown in Fig.10 B,D,F. The 3D-FC6 temperature is oscillating and has minimum temperature amplitudes at 970 and 1070 seconds. There is no large advantage of any controllers over the others from temperature point of view. Thus the 3D-FC3 is better then the FC4 controller, and the 3D-FC6 controller as the 3D-FC3 controller operates at higher frequency ranges and almost the same temperature ranges.

Some observations are extracted from these two DTM evaluation index implementations as follow: 3D-FC5 vs. 3D-FC6: In the first implementation the DTM evaluation index of both controllers are almost the same from the frequency point of view. The standard deviation of the DVFS membership function (MSF) is the same but the mean is shifted by 0.2. This shift leads to insignificant frequency objective change but also leads to less CPU temperature. In the second implementation the DTM evaluation index values are totally different. So the similarity between any 2 DTM controller responses for a specific DTM design objective is not maintain for other DTM design objective. 2D Fuzzy vs. 3D Fuzzy: These DTM controllers share the same input and output membership functions. The correlation between the CPU cores has significant effect i.e. (FC1 vs. 3D-FC1) and (FC3 vs. 3D-FC3). But for (FC2 vs. 3D-FC2) there is almost no correlation effect in both DTM evaluation index implementations. This means that the selection of non proper membership functions could ignore the correlation effect between the CPU cores. (TSC+DVFS) vs. (DVFS alone): the

DTM temperature design objectives could be fulfilled by TSC+DVFS or by DVFS alone i.e. 3D-FC3 vs. 3D-FC4. The driver for using TSC with DVFS is the CPU thermal throttling limits. So if DVFS can fulfil alone the temperature DTM design objective then there is no need for combining both TSC with DVFS.



Fig. 10. The Simulation Results

| Controller                    | Fr   | Frequency Ranges<br>Values |     |     |     | 2        |     |     |       |
|-------------------------------|------|----------------------------|-----|-----|-----|----------|-----|-----|-------|
| Name                          | -,   |                            |     |     |     | $o_{1j}$ |     |     |       |
|                               | (M)  | (H)                        | (m) | (L) | (M) | (H)      | (m) | (L) |       |
| _                             | j=1  | j=2                        | j=3 | j=4 | j=1 | j=2      | j=3 | j=4 |       |
| $\sigma_{1j}^{	ext{Desired}}$ | 20%  | 50%                        | 20% | 10% | 1.0 | 1.0      | 1.0 | 1.0 | 1.00  |
| Switch                        | 0% 4 | 100%                       | 0%  | 0%  | 0   | 2        | 0%  | 0-  | 0.500 |
| Р                             | 10%  | 0%                         | 22% | 22% | 2.7 | 0.0      | 1   | 2   | 1.528 |
| FC1                           | 12%  | 22%                        | 44% | 22% | 0.5 | 0.4      | 2   | 2.2 | 1.315 |
| 3D-FC1                        | 0%   | 10%                        | 33% | 11% | 0.0 | 1.1      | 1.7 | 1.1 | 0.972 |
| FC2                           | 0%   | 100%                       | 0%  | 0%  | 0.0 | 2.0      | 0.0 | 0.0 | 0.500 |
| 3D-FC2                        | 0%   | 89%                        | 11% | 0%  | 0.0 | 1.8      | 0.6 | 0.0 | 0.123 |
| FC3                           | 22%  | 22%                        | 10% | 0%  | 1.1 | 0.4      | 2.8 | 0.0 | 1.083 |
| 3D-FC3                        | 0%   | 78%                        | 22% | 0%  | 0.0 | 1.6      | 1.1 | 0.0 | 0.667 |
| FC4                           | 0%   | 66%                        | 33% | 0%  | 0.0 | 1.3      | 1.7 | 0.0 | 0.750 |
| 3D-FC4                        | 22%  | 10%                        | 22% | 0%  | 1.1 | 1.1      | 1.1 | 0.0 | 0.833 |
| 3D-FC5                        | 0%   | 10%                        | 33% | 11% | 0.0 | 1.1      | 1.7 | 1.1 | 0.972 |
| 3D-FC6                        | 0%   | 78%                        | 0%  | 22% | 0.0 | 1.6      | 0.0 | 2.2 | 0.944 |

Table 2. The frequency comparisons of the first implementation

| Controllor                    | Tempe      | erature Ra | nges %     | Ter        | Temperature<br>Ranges<br>Values |            |             |
|-------------------------------|------------|------------|------------|------------|---------------------------------|------------|-------------|
| Name                          |            | 0 21       |            |            | $\sigma_{2j}$                   |            | $\lambda_2$ |
|                               | (H)<br>j=1 | (m)<br>j=2 | (L)<br>j=3 | (H)<br>j=1 | (m)<br>j=2                      | (L)<br>j=3 |             |
| $\sigma^{	ext{Desired}}_{2j}$ | 30%        | 40%        | 30%        | 1.0        | 1.0                             | 1.0        | 1.00        |
| Switch                        | 0.0%       | 100%       | 0.0%       | 0.0        | 2.5                             | 0.0        | 0.83        |
| P                             | 78%        | 0%         | 22%        | 2.6        | 0.0                             | 0.7        | 1.11        |
| FC1                           | 11%        | 89%        | 0%         | 0.4        | 2.2                             | 0.0        | 0.86        |
| 3D-FC1                        | 22%        | 78%        | 0%         | 0.7        | 1.9                             | 0.0        | 0.90        |
| FC2                           | 67%        | 33%        | 0%         | 2.2        | 0.8                             | 0.0        | 1.02        |
| 3D-FC2                        | 10%        | 44%        | 0%         | 1.8        | 1.1                             | 0.0        | 0.99        |
| FC3                           | 67%        | 33%        | 0%         | 2.2        | 0.8                             | 0.0        | 1.02        |
| 3D-FC3                        | 33%        | 67%        | 0%         | 1.1        | 1.7                             | 0.0        | 0.93        |
| FC4                           | 44%        | 10%        | 0%         | 1.5        | 1.4                             | 0.0        | 0.96        |
| 3D-FC4                        | 33%        | 67%        | 0%         | 1.1        | 1.7                             | 0.0        | 0.93        |
| 3D-FC5                        | 0%         | 100%       | 0%         | 0.0        | 2.5                             | 0.0        | 0.83        |
| 3D-FC6                        | 33%        | 10%        | 11%        | 1.1        | 1.4                             | 0.4        | 0.96        |

Table 3. The temperature comparisons of the first implementation

| G ( 11     | Frequency   | Temperature | The Evaluation |  |
|------------|-------------|-------------|----------------|--|
| Controller | Index       | Index       | Index          |  |
| Name       | $\lambda_1$ | $\lambda_2$ | $\zeta_t$      |  |
| Desired    | 1.00        | 1.00        | 2.00           |  |
| Switch     | 0.500       | 0.83        | 1.33           |  |
| P          | 1.528       | 1.11        | 2.64           |  |
| FC1        | 1.315       | 0.86        | 2.23           |  |
| 3D-FC1     | 0.972       | 0.90        | 1.87           |  |
| FC2        | 0.500       | 1.02        | 1.52           |  |
| 3D-FC2     | 0.123       | 0.99        | 1.11           |  |
| FC3        | 1.083       | 1.02        | 2.10           |  |
| 3D-FC3     | 0.667       | 0.93        | 1.13           |  |
| FC4        | 0.750       | 0.96        | 1.71           |  |
| 3D-FC4     | 0.833       | 0.93        | 1.76           |  |
| 3D-FC5     | 0.972       | 0.83        | 1.81           |  |
| 3D-FC6     | 0.944       | 0.96        | 1.90           |  |

Table 4. The DTM evaluation index of the first implementation

| Controller                    | F          | Frequency Ranges<br>Values<br>$\sigma_{1i}$ |            |            |            | λı         |            |            |         |
|-------------------------------|------------|---------------------------------------------|------------|------------|------------|------------|------------|------------|---------|
| Name                          | (M)<br>j=1 | (H)<br>j=2                                  | (m)<br>j=3 | (L)<br>j=4 | (M)<br>j=1 | (H)<br>j=2 | (m)<br>j=3 | (L)<br>j=4 | 1       |
| $\sigma_{1j}^{	ext{Desired}}$ | 10%        | 70%                                         | 10%        | 10%        | 1.0        | 1.0        | 1.0        | 1.0        | 1.00    |
| Switch                        | 0%         | 100%                                        | 0%         | 0%         | 0.0        | 1.4        | 0.0        | 0.0        | 0.311   |
| Р                             | 10%        | 0%                                          | 22%        | 22%        | 5.6        | 0.0        | 2.2        | 2.2        | 2.500   |
| FC1                           | 12 4       | 22%                                         | 44%        | 22%        | 1.1        | 0.3        | 4.4        | 2.2        | 2.024   |
| 3D-FC1                        | 0%         | 10%                                         | 33%        | 11%        | 0.0        | 0.8        | 3.3        | 1.1        | 7 1.309 |
| FC2                           | 0%         | 100%                                        | 0%         | 0%         | 0.0        | 1.4        | 0.0        | 0.0        | 0.311   |
| 3D-FC2                        | 0%         | 89%                                         | 11%        | 0%         | 0.0        | 1.3        | 1.1        | 0.0        | 0.135   |
| FC3                           | 22%        | 22%                                         | 10%        | 0%         | 2.2        | 0.3        | 5.6        | 0.0        | 2.024   |
| 3D-FC3                        | 0%         | 78%                                         | 22%        | 0%         | 0.0        | 1.1        | 2.2        | 0.0        | 0.833   |
| FC4                           | 0%         | 67%                                         | 33%        | 0%         | 0.0        | 0.9        | 3.3        | 0.0        | 1.071   |
| 3D-FC4                        | 22%        | 10%                                         | 22%        | 0%         | 2.2        | 0.8        | 2.2        | 0.0        | 1.309   |
| 3D-FC5                        | 0%         | 10%                                         | 33%        | 11%        | 0.0        | 0.8        | 3.3        | 1.1        | 1.309   |
| 3D-FC6                        | 0%         | 78%                                         | 0%         | 22%        | 0.0        | 1.1        | 0.0        | 2.2        | 0.833   |

Table 5. The frequency comparisons of the second implementation

| Controller Name            | Tempe      | Temperature<br>Ranges<br>Values<br>$\sigma_{2j}$ |            |            | $\lambda_2$ |            |      |  |
|----------------------------|------------|--------------------------------------------------|------------|------------|-------------|------------|------|--|
|                            | (H)<br>j=1 | (m)<br>j=2                                       | (L)<br>j=3 | (H)<br>j=1 | (m)<br>j=2  | (L)<br>j=3 |      |  |
| $\sigma^{ m Desired}_{2j}$ | 30%        | 40%                                              | 30%        | 1.0        | 1.0         | 1.0        | 1.00 |  |
| Switch                     | 0%         | 100%                                             | 0%         | 0.0        | 2.0         | 0.0        | 0.67 |  |
| Р                          | 78%        | 0%                                               | 22%        | 3.9        | 0.0         | 0.7        | 1.54 |  |
| FC1                        | 111%       | 89%                                              | 0%         | 0.6        | 1.8         | 0.0        | 0.78 |  |
| 3D-FC1                     | 22%        | 78%                                              | 0%         | 1.1        | 1.6         | 0.0        | 0.89 |  |
| FC2                        | 67%        | 33%                                              | 0%         | 3.3        | 0.7         | 0.0        | 1.33 |  |
| 3D-FC2                     | 10%        | 44%                                              | 0%         | 2.8        | 0.9         | 0.0        | 1.22 |  |
| FC3                        | 67%        | 33%                                              | 0%         | 3.3        | 0.7         | 0.0        | 1.33 |  |
| 3D-FC3                     | 33%        | 67%                                              | 0%         | 1.7        | 1.3         | 0.0        | 1.00 |  |
| FC4                        | 44%        | 10%                                              | 0%         | 2.2        | 1.1         | 0.0        | 1.11 |  |
| 3D-FC4                     | 33%        | 67%                                              | 0%         | 1.7        | 1.3         | 0.0        | 1.00 |  |
| 3D-FC5                     | 0%         | 100%                                             | 0%         | 0.0        | 2.0         | 0.0        | 0.67 |  |
| 3D-FC6                     | 33%        | 10%                                              | 11%        | 1.7        | 1.1         | 0.4        | 1.05 |  |

Table 6. The temperature comparisons of the second implementation

| Controller | Frequency   | Temperature | The Evaluation |
|------------|-------------|-------------|----------------|
| Controller | Index       | Index       | Index          |
| Name       | $\lambda_1$ | $\lambda_2$ | $\zeta_t$      |
| Desired    | 1.00        | 1.00        | 2.00           |
| Switch     | 0.311       | 0.67        | 1.02           |
| Р          | 2.500       | 1.54        | 4.04           |
| FC1        | 2.024       | 0.78        | 2.80           |
| 3D-FC1     | 1.309       | 0.89        | 2.20           |
| FC2        | 0.311       | 1.33        | 1.69           |
| 3D-FC2     | 0.135       | 1.22        | 1.82           |
| FC3        | 2.024       | 1.33        | 3.36           |
| 3D-FC3     | 0.833       | 1.00        | 1.83           |
| FC4        | 1.071       | 1.11        | 2.18           |
| 3D-FC4     | 1.309       | 1.00        | 2.31           |
| 3D-FC5     | 1.309       | 0.67        | 1.98           |
| 3D-FC6     | 0.833       | 1.05        | 1.88           |

Table 7. The DTM evaluation index of the second implementation

### 8. Conclusion

Moore's Law continues with technology scaling, improving transistor performance to increase frequency, increasing transistor integration capacity to realize complex

architectures, and reducing energy consumed per logic operation to keep power dissipation within limit. The technology provides integration capacity of billions of transistors; however, with several fundamental barriers. The power consumption, the energy level, energy delay, power density, and floor planning are design challenges. The Multi-Core CPU design increases the CPU performance and maintains the power dissipation level for the same chip area. The CPU cores are not fully utilized if parallelism doesn't exist. Low cost portable cooling techniques exploration has more importance everyday as air cooling reaches its limits "198 Watt". In order to study the Multi-Core CPU thermal problem a thermal model is built. The thermal model floor plan is similar to the IBM MCM POWER4 chip scaled to 45nm technology. This floor plan is integrated to the Hotspot 5 thermal simulator. The CPU open loop thermal profile curve is extracted. The advanced dynamic thermal management (DTM) techniques are mandatory to avoid the CPU thermal throttling. As the CPU is not 100% utilized all time, the thermal spare cores (TSC) technique is proposed. The TSC technique is based on the reservation of cores during low CPU utilization. These cores are not activate simultaneously due to limitations. During thermal crises, these reserved cores are activated to enhance the CPU utilization. The semiconductor technology permits more cores to be added to CPU chip. But the total chip area overhead is up to 27.9 % as per ITRS (ITRS , 2009). That means there is no chip area wasting in case of TSC. From the thermal point of view; the horizontal heat transfer path has up to 30% of CPU chip heat transfer (Stan et al., 2006). The TSC is a big coldspot within the CPU area that handles the horizontal heat transfer path.

The cold TSC also handles the static power as the TSC core is turned off. The TSC is used simultaneous with other DTM technique. From the CPU utilization point of view, the TSC activation is equivalent to the CPU cores DVFS for a low operating frequency range. Fuzzy logic improves the DTM controller response. Fuzzy control handles the CPU thermal process without knowing its transfer function. This simplifies the DTM controller design and reduces design time. The fuzzy control permits the designers to select the appropriate CPU temperature and frequency responses. For the same CPU chip, the DTM response depends on the DTM fuzzy controller design. As the 3D fuzzy permits the preservation of portable device battery but this affects the CPU utilization. Or it permits the high performance computing (HPC). But due to cooling limitation this DTM design is not suitable for the portable devices. The 3D-FC is successfully implemented to the CPU DTM problem. Different DTM techniques are compared using simulation tests. The results demonstrate the effectiveness of the 3D fuzzy DTM controller to the nonlinear Multi-Core CPU thermal problem. The 3D fuzzy DTM takes into consideration the surrounding core hotspot temperatures and operating frequencies. The 3D fuzzy DTM avoids the complexity and maintains the correlations. As the 3D fuzzy DTM controller calculates the correlation between local core hotspot and the surrounding cores hotspots. Then it selects the appropriate local core operating frequency. The Fuzzy DTM controller has better response than the traditional DTM P controller. For the same input rules and the same output membership functions (MSF), the 3D fuzzy logic reduces the CPU temperature better than the 2D fuzzy logic. The fuzzy output MSF is a critical DTM design parameter. The small deviation from the appropriate output membership function affects the DTM controller behavior.

The Fuzzy DTM controller has better response than the traditional DTM P controller. For the same input rules and the same output membership functions (MSF), the 3D Fuzzy logic

398

reduces the CPU temperature better than the 2D Fuzzy logic. The 3D Fuzzy controller takes into consideration multiple temperatures readings distributed over the CPU chip floor plan. The Fuzzy control permits the designers to select the appropriate CPU temperature and frequency responses. For the same CPU chip, the DTM response depends on the Fuzzy controller design. The fuzzy output MSF is a critical DTM design parameter. The small deviation from the appropriate output membership function affects the DTM controller behavior. From the CPU temperature point of view; the TSC looks like a large coldspot. The cold TSC absorb the horizontal heat path as if it is a heatsink pipe. The CPU cooling system behavior depends on the combinations of the operating frequencies and temperatures. The objective of multi-parameters evaluation index is to show the different parameters effect on the CPU response. Thus the designer selects the suitable DTM controller that fulfils his requirements. The multi-parameters evaluation index permits the selection of DTM design that provides the best frequency parameter value without leading to the worst temperature parameter value.

#### 9. References

- Chaparro, P. ; Lez, J. G. Cai, Q. & Lez, A. G. (2007). Understanding The Thermal Implications of Multicore Architectures, *IEEE Transactions*, Vol.18, No.8, pp. 109-1065.
- Chung, S. W. ; & Skadron, K. (2006). Using on-chip event counters for high-resolution, realtime temperature measurements, Proceedings of International Conference For Scientific & Engineering Exploration Of Thermal, Thermomechanical & Emerging Technology, IEEE ITHERM06, pp. 114-120.
- Donald, J. ; & Martonosi, M. (2006). Techniques For Multicore Thermal Management Classification & New Exploration, Proceedings of *International Symposium on Computer Architecture*, IEEE ISCA'06, pp. 78-88.
- Doumanidis, C. C.; & Fourligkas, N. (2001). Temperature Distribution Control In Scanned Thermal Processing Of Thin Circular Parts, *IEEE Transaction Control System Technolgy*, Vol.9, No.5, (May 2001), pp. 708–717.
- Ferreira, A. P.; Moss, D. & Oh, J. C. (2007). Thermal Faults Modeling using an RC model with an Application to Web Farms, Proceedings of 19th Euromicro Conference on Real-Time Systems, Italy, pp. 113-124.
- Huangy, W.; Stany, M. R. Skadronz, K. Sankaranarayananz, K. Ghoshyz, S. & VelUSAmyz, S (2006). Hotspot: A Compact Thermal Modeling Methodology For Early-Stage Vlsi Design, *IEEE Transactions*, 2006, Vol.5, pp. 501-513.
- Gustafson, J. L.(1988). Re-Evaluating Amdahl's Law, ACM Communications, Vol.31, No.5, pp. 82-83.
- Kim, D. D.; J. Kim, Cho, C. Plouchart, J.O. & Trzcinski, R. (2008). 65nm SOI CMOS SoC Technology for Low-Power mmWave & RF Platform, Silicon Monolithic Integrated Circuits in RF Systems, pp. 46-49.
- Kim, S. ; Dick, R. P. & Joseph, R. (2007). Power Deregulation: Eliminating Off-Chip Voltage Regulation Circuitry From Embedded Systems, Proceedings of the International Conference on Hardware-Software Codesign & System Synthesis, IEEE/ACM (CODES+ISSS), pp. 105-110.

- Li, H. Zhang; X. & Li, S. (2007). A Three-Dimensional Fuzzy Control Methodology For A Class Of Distributed Parameter Systems, IEEE Transactions, *Fuzzy Systems*, Vol.15, No.3, pp. 470-481.
- Mccrorie, P. (2008). On-Chip Thermal Analysis Is Becoming M&atory, Chip Design Magazine.
- Moore, G. E. (1965). Cramming More Components Onto Integrated Circuits, *IEEE Electronics*, Vol.38, No.8, (19 April 1965), pp.114. This Paper Appears Again In *IEEE Solid-State Circuits Newsletter*, 2006, Vol.20, No.3, pp. 33-35.
- Ogras, U.Y. et al. (2008). Variation-Adaptive Feedback Control for Networks-on-Chip with Multiple Clock Domains, Proceedings of International Conference on Design Automation Conference, IEEE DAC08, pp. 154-159.
- Passino, K. M.; & Yurkovich, S. (1998). Fuzzy Control, Addison Wesley Longman.
- Patyra, M. J.; Grantner, J.L. & Koster, K. (1996). Digital Fuzzy Logic Controller Design & Implementation, *IEEE Transactions Fuzzy Systems*, Vol.4, No.4, pp. 439-413.
- Rao, R. ; & Vrudhula, S. (2007). Performance Optimal Processor Throttling Under Thermal Constraints, Proceedings of International Conference On Compilers, Architecture, & Synthesis For Embedded Systems, CASES'07, pp. 211-266.
- Sinharoy, B.; Kalla, R. N. Tendler, J. M. & Eickemeyer, R. J. (2005). *POWER5 System Microarchitecture, IBM J. Res. & Dev.* Vol.49 No. 4/5 July/September 2005.
- Stan, M. R. ; Skadron, K. Barcella, M. Sankaranarayanan, W. H. K. & Velusamy, S. (2006). Hotspot: A Compact Thermal Modeling Methodology For Early-Stage VLSI Design, *IEEE Transactions*, Vol.14, No.5, pp. 501-513.
- Trabelsi, A. ; Lafont, F. Kamoun, M. & Enea, G. (2004). Identification of Nonlinear Multivariable Systems By Adaptive Fuzzy Takagi-Sugeno Model, *International Journal of Computational Cognition*, Vol.2, No.3, pp. 137-18.
- Wu, Q. et al. (2004). Formal online methods for voltage/frequency control in multiple clock domain microprocessors, Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, Vol.32, No.5, pp. 248-213.
- Yager, R. ; & Filev, D. (1994). Essential Of Fuzzy Modeling & Control, Wiley, New York 1994, pp. 121.
- http://lava.cs.virginia.edu/hotspot
- http://www.itrs.net



#### Heat Transfer - Engineering Applications Edited by Prof. Vyacheslav Vikhrenko

ISBN 978-953-307-361-3 Hard cover, 400 pages **Publisher** InTech **Published online** 22, December, 2011 **Published in print edition** December, 2011

Heat transfer is involved in numerous industrial technologies. This interdisciplinary book comprises 16 chapters dealing with combined action of heat transfer and concomitant processes. Five chapters of its first section discuss heat effects due to laser, ion and plasma-solid interaction. In eight chapters of the second section engineering applications of heat conduction equations to the curing reaction kinetics in manufacturing process, their combination with mass transport or ohmic and dielectric losses, heat conduction in metallic porous media and power cables are considered. Analysis of the safety of mine hoist under influence of heat produced by mechanical friction, heat transfer in boilers and internal combustion engine chambers, management for ultrahigh strength steel manufacturing are described in this section as well. Three chapters of the last third section are devoted to air cooling of electronic devices.

#### How to reference

In order to correctly reference this scholarly work, feel free to copy and paste the following:

M. A. Elsawaf, A. L. Elshafei and H. A. H. Fahmy (2011). Multi-Core CPU Air Cooling, Heat Transfer -Engineering Applications, Prof. Vyacheslav Vikhrenko (Ed.), ISBN: 978-953-307-361-3, InTech, Available from: http://www.intechopen.com/books/heat-transfer-engineering-applications/multi-core-cpu-air-cooling



#### InTech Europe

University Campus STeP Ri Slavka Krautzeka 83/A 51000 Rijeka, Croatia Phone: +385 (51) 770 447 Fax: +385 (51) 686 166 www.intechopen.com

#### InTech China

Unit 405, Office Block, Hotel Equatorial Shanghai No.65, Yan An Road (West), Shanghai, 200040, China 中国上海市延安西路65号上海国际贵都大饭店办公楼405单元 Phone: +86-21-62489820 Fax: +86-21-62489821 © 2011 The Author(s). Licensee IntechOpen. This is an open access article distributed under the terms of the <u>Creative Commons Attribution 3.0</u> <u>License</u>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

# IntechOpen

## IntechOpen