Three-Dimensional Integrated Circuits Design for Thousand-Core Processors: From Aspect of Thermal Management

As the performance of a processing system is to be significantly enhanced, on-chip manycore architecture plays an indispensable role. Since there are fast growing numbers of transistors on the chips, two-dimensional topologies face challenges of significant increases in interconnection delay and power consumption (Hennessy & Patterson, 2007; Kurd et al., 2001). Explorations of a suitable three-dimensional integrated circuit (3D IC) with throughsilicon via (TSV) to realize a large number of processing units and highly dense interconnects certainly attracts a lot of attention. However, the combination of processors, memories, and/or sensors in a stacked die leads to the cooling problem in a tottering situation (Tiwari et al., 1998). One solution to overcome the obstacles and continue the performance scaling while still is to integrate on chip many cores and their communication network (Beigne, 2008; Yu & Baas, 2006). Through concerted processors, routers, and links, the network-on-chip (NoC) provides the advantages of low power dissipation and abundance of connectivity. Moreover, because of the widespread uses of radio frequency (RF), micro-electro-mechanical systems (MEMS) (Lu, 2009), and various sensors in mobile applications, proposals of three-dimensional integrated circuit (3D IC) with through silicon via (TSV) implementations in a layered architecture have been reported (Lee, 1992; Tsai & Kang, 2000). For interconnection scalability from layer to layer, 3D fabrics are a necessity. Consequently, a thermal solution which has a high heat removing rate seems unavoidable. Since there are fast growing numbers of transistors on the chips, two-dimensional topologies face challenges of significant increases in wire delay and power consumption. The two factors are often regarded as the primary limitations for current processor architectures (Hennessy & Patterson, 2007; Kurd et al., 2001; Tiwari et al., 1998).

Thermal-aware floorplanning is the key in which the inter-layer interconnection plays a role more than just signal transmission or power delivery. Figure 1 depicts the usage of thermal TSV to alleviate the heat accumulation, which is brought from that used in printed circuit boards (PCBs) (Lee et al., 1992). For 3D ICs, the problems of high power/thermal density can be more serious than that in the planar form. Thus, the thermal TSVs become essential for heat dissipation. Of particular interest is the design of an efficient heat transferring path. Some recent works discussed the placement of thermal TSVs. However, not only the routing but also the floorplan may need to be changed substantially after the thermal TSVs are inserted (Tsai & Kang, 2000). This leads to long iterations. Further, as the circuit complexity is increased, to insert the thermal TSVs without largely changing the floorplan is an important technology to be developed (Tsui et al., 2003). In order to keep the original routing and floorplan as much as possible, the temperature-driven design should be brought in early phases of the design procedure.

Design and theoretical analysis of on-chip thermal ridge 2.1 Theoretical analysis
The thermal TSVs are intended to be placed in the inter-CG whitespace, which is called a thermal ridge. In this section, we derive analytical expressions for some key parameters. www.intechopen.com where T is the temperature, g is the heat generation rate in W/cm 2 ,  is the density of the material, C is the thermal capacity of the material,  is time, and k is the thermal conductivity of the material. This fundamental thermal conduction equation describes that the temperature transmitting through the thermal volume depends on time θ and directional thermal conductivities xx k , yy k , and zz k (Chieh et al., 2010;Lung et al., 2010). The boundary conditions of the top and bottom surfaces of the chip are adiabatic and those of the surrounding surfaces are convective.
For dissipating the heat into the substrate homogeneously, the inter-core-group thermal ridges are aligned orthogonally in column and in row. The temperature prediction of the many-core system is performed by utilizing CFD-RC which is commercial thermal and fluidic temperature simulation software. However, in order to illustrate the physical phenomenon more intuitively, a simplified one-dimensional conduction equation without taking the transient into consideration is utilized.
The heat removing rate of the thermal ridge is assumed to be q. Let us consider two CGs. The temperature distribution between CG1 and CG2 can be expressed by where T 1 and T 2 are the temperatures of CG1 and CG2, respectively, q is the heat conducted to the ambient environment by the thermal ridge, k s is the equivalent thermal conductivity of the thermal ridge, and w is the width of the thermal ridge. Since T denotes the temperature at the location x, examining the mid-point T 1/2 by substituting x with w/2 into (3), we have From (4), it is easy to see that if the mid-point temperature T 1/2 is targeted to be lower, w needs to be larger.

Effective thermal conductivity of the thermal ridge
The equivalent thermal conductivity k szz of a thermal ridge is decided by the density of the thermal TSVs in the thermal ridge (Chieh et al., 2010;Lung et al., 2010). To determine k szz , the effective thermal conductivity should be taken into account and described as the following equation: where k emb is the equivalent thermal conductivity of the thermal TSVs, k sub is the thermal conductivity of the silicon substrate, d is the percent contribution of the thermal TSVs in the thermal ridge. Since the orientation of the thermal TSV is longitudinal along the z direction, this effective thermal conductivity cannot be applied to the lateral heat transfer computation. For x and y directional heat transfer, the thermal conductivity should be applied by the following equation.
where m is the percent contribution of the metal lines for thermal conduction in the silicon substrate. In general, the vertical thermal conductivity k szz is much larger than the lateral thermal conductivities k sxx and k syy . By (5) and (6), we can clearly figure out that k sxx is around 10 W/mK and k szz is around 120 W/mK. Thus, the heat flows through the thermal ridge almost dissipates by the heat sink instead of transferring laterally. By substituting the equivalent k s and the temperature values of T 1 , T 2 and T 1/2 into (3), we obtained that the widths of the thermal ridge should be 200 µm ~ 400 µm.

Design parameters and assumptions
Here, we focus on a mesh-connected NoC with 1,024 cores. A globally asynchronous, locally synchronous (GALS) digital-signal processor (DSP) design is adopted (Tran et al., 2009a(Tran et al., , 2009bTruong et al., 2008). Each DSP, constituting a tile, is composed of a core with an onchip oscillator for its own clocking and a switch with associated buffers, as shown in Figure  2. The tile allows repetitive, mirrored layout, occupying an area of 0.168 mm 2 (410 μm × 410 μm) (Tran et al., 2009a(Tran et al., , 2009b. Consider a simple power map with two major sources in the tile. One is attributed to the computation and the other to the communication. Correspondingly, the average power consumption at the active status is broken down to 17.6 mW and 1.1 mW, respectively (Tran et al., 2009a(Tran et al., , 2009b. The cores are arranged as a 32 × 32 square mesh. Since the international technology roadmap for semiconductor (ITRS) predicts that the maximum chip size will maintain similar dimensions, we assume 20 mm × 20 mm as our upper bound. Under such a constraint, the remaining area not occupied by the tiles is the input/output and peripheral circuits. The total power consumption of the chip is around 20 W, which leads to the average power density of 5 W/cm 2 . Since ITRS also predicts the power density is reasonable up to the level of 100 W/cm 2 , the power density assumed in this chapter is a probable value (Brunschwiler et al., 2009;Xu et al., 2004).
In this chapter, we assumed that there are three layers of the die stack and the many-core NoC is sandwiched in the middle. As mentioned earlier, a commercial tool based on finite element method (FEM) is used. The three-dimensional model of the NoC is created with the widely used package model, in a fashion similar to that shown in Figure 1. However, the heat sink is not modelled and analyzed in our case. Instead, it is simplified to a heat loss, and a proper heat transfer coefficient is applied to the boundary condition on the top surface where the heat sink would have been located originally. First, the 1,024 cores are divided into 8 × 8 CGs, each CG consisting of 4 × 4 cores. As shown in Figure 3, thermal ridges are inserted between the hottest CGs. By the locations where they are inserted, the thermal ridges can be categorized into two types. The type-I thermal ridge has a low density of thermal TSVs and the type-II thermal ridge has a high density of thermal TSVs. This is because the type-I thermal ridge is located between two CGs in which their routing dominates the most of the silicon area, even after the expansion to gain more whitespace. On the other hand, the type-II thermal ridge lies in the intersectional area having no wires passing through, and therefore, a large quantity of thermal TSVs can be planted.
The physical effect of the thermal ridge can be illustrated by using the electrical lumped model as shown in Figure 4. By the duality between electrical and thermal models, the temperature T is substituted by a voltage V, the power P is substituted by a current I, and the thermal resistance R by definition is proportional to the reciprocal of thermal www.intechopen.com  vertical thermal resistance R 11 (R 21 ) is much larger than the lateral thermal resistance R 12 (R 22 ), the voltage V 1 (V 2 ) keeps at a high value. Figure 4(b) shows the case when a type-I thermal ridge is inserted between CG1 and CG2. Another conduction path is added through the thermal resistance R TS1 . As aforementioned, R TS1 is inversely proportional to k s . As long as k s is much larger than the thermal conductivity k sub of the silicon substrate, R TS1 is much smaller than R 11 (R 21 ); the current I 1 (I 2 ) goes mostly through R TS1 , rather than R 11 (R 21 ). In addition, by voltage division, V TS1 is obviously lower than V 1 (or V 2 ). In other words, the temperature of the type-I thermal ridge is definitely lower than the temperature of CG1 or CG2. Figure 4(c) shows the case when a type-II thermal ridge is inserted at the intersectional area between the CGs to remove more heat. The value of R TS2 depends on that of k s . Since the thermal TSVs are densely planted on the type-II thermal ridge, R TS2 is much smaller than R 11 (or R 21 ). Compared with CG1 and CG2, the type-II thermal ridge, which has a lower temperature, is designed to be an on-chip heat sink.

Rotation of the hotspots
To verify the feasibility of the proposed scheme for thermal-aware floorplanning, we obtain the temperature distribution of the basic CG first. There are 4 × 4 cores within a CG as shown in Figure 5. The cores are homogenous, with the hotspot near the lower right corner. It is clear that since the hotspot is not located at the center of the core, when assembled into the CG, the temperature distribution is asymmetric.  However, the situation becomes worse, when 64 such CGs are put together to construct the 1,024-core NoC. Figure 6 shows a typical layout in which the orientation of each core is kept the same as in the Figure 5, with the hotspot near the lower right corner. Apparently, the design maintains regularity in connectivity with the same routing distance between cores, but unfortunately, it is not thermal-aware. The temperature distribution is still asymmetric and the maximum temperature of the whole chip now rises up to 408.9 K which requires a heat sink. The lack of symmetry leads to that the heat sink cannot be placed at a simple orientation with equal heat dissipation ability.
Let us define the temperature non-uniformity as follows: where T  is temperature difference and x  is distance between any two points on the single core. Hence, it represents the slope of the temperature gradient per unit length. Clearly, the bigger the value of U , the more severe the temperature difference between neighboring cores. In the case of Figure 6 the maximum U is around 4.1 K/cm the averaged U is around 3.1 K/cm. To mitigate the non-uniformity, we may try to rotate either the cores in the CG or the CGs so as to align the temperature profile symmetrically (Xu et al., 2006). Figure 7 shows the latter approach by dividing the CGs into four quadrants, keeping the orientation of the second quadrant, and rotating the other three quadrants of the CGs to the upper left, upper right, and lower left corners, respectively.
To compare with those attained in Figure 6, the maximum temperature decreases 1 K, but the averaged temperature non-uniformity increases to 3.8 K/cm. If we rotate the cores in the CG in a similar fashion and then assemble such CGs, the result is not much different and hence is not shown here. This illustrates the fact that the rotation of the hotspots cannot reduce the maximum temperature effectively. www.intechopen.com

Insertion of the thermal ridges
The primary objective of the thermal ridges is to reduce the maximum temperature and the temperature non-uniformity at the same time. The thermal ridges are introduced into the design, with the required extra space under the constraint of manufacturing cost. In our case, at most 20% of the chip area is allowed for the thermal ridges and their locations are depicted in the Figure 8. Straits with widths of 400 μm and 200 μm are created by expanding the routing distances between CGs.

Simulation results of the proposed scheme
First, the type-I thermal ridges are inserted into the straits, except for their intersectional areas as shown in Figure 8(a). The resulting temperature distribution is shown in Figure 9. The maximum temperature is 373.4 K, which occurs in the center of the chip. To compare with the previous solutions, the maximum temperature significantly decreases 35 K by using the thermal ridges. The temperature difference at the center of the chip is about 32 K. Also, the thermal map changes a lot, since the thermal ridges are distributed in the suburb areas. Fig. 9. The temperature distribution of the 1024-core NoC with type I thermal ridge. Furthermore, the design affects the temperature non-uniformity substantially. In Figure 6 and Figure 7, it is easy to find that the value of U keeps almost constant all around the chip. However, after inserting the thermal ridges, there are several values of U on the chip. The largest U is around 4.6 K/cm, but the average U decreases substantially to 1.5 K/cm. The temperature non-uniformity is largely improved at the center and the suburb areas by the values of 0.5 K/cm and 1.5 K/cm, respectively. About 85% of the chip area is covered in the region. This means that around 850 cores have better temperature non-uniformity. Since the tile size is 410 μm × 410 μm, the temperature difference between neighboring cores in the region is less than 0.3 K.
In addition, the insertion of the type-II thermal ridge is performed, as shown in Figure 8(b). The temperature profile is shown in Figure 10. The maximum temperature of 371.8 K is

www.intechopen.com
Three-Dimensional Integrated Circuits Design for Thousand-Core Processors: From Aspect of Thermal Management 27 about 1.5 K lower than that shown in Figure 9. It can be further reduced, since the thermal conductivity of the type-I thermal ridge is lower than that of the type-II thermal ridge. The temperature non-uniformity and the temperature profile remain quite similar. Compared with the results from the traditional scheme with mere rotation of the hotspots, the maximum temperature decreases from 408.9 K to 372.8 K, and the temperature nonuniformity decreased from 3.2~4.0 K/cm to 0.5~1.5 K/cm in 80% of the chip area, under the constraint of increasing 20% extra area for the thermal ridges.

Chip design and implementation by using metallic thermal skeletons
In this chapter, a realistic thermal dissipation enhancement methodology for NoC system will be introduced. The on-chip virtual 126-core network as the hot-spot dissipates the generated heat through the metallic thermal skeletons. To evaluate the feasibility of the thermal enhancement, 9 arrays of metallic thermal skeletons are designed in the test chip. Essentially, by improving the lateral thermal dissipation path by increasing the thermal metallic skeleton in the back end of line (BEOL) metals, the heat consumed by the virtual core can be conducted into the on-chip heat sink such as the TSVs. The temperature of the hotspot can be lowered substantially if the metallic thermal skeletons arranged properly. In addition, we design thermal sensor-network on chip to facilitate the measurement and evaluation for the capability of heat transfer. Last, some important thermal characteristics of metallic thermal skeleton are listed in this chapter. In order to design a better thermal dissipation path, metallic thermal skeletons can provide alternatives for just increasing the number of thermal TSVs. The FEM simulation is performed by using CFD-RC, based on the following assumptions. As shown in Figure 11, a TSV is on the left, and a heat source is on the right. The other half of the structure is mirrored to the cross section. The heat source consists of 12 squares, each with power of 0.5 mW, and area of 1 µm × 1 µm, which run to the top by local interconnects (not shown in the figure for they are buried in the structure), just shy of the front metal layer at the top. It is seen that the neighboring TSV is unconnected electrically and cold. The simulation assumes a TSV with dielectric thickness of 0.5 µm, diameter of 10 µm, and length of 50 µm.

Overall floorplan of the chip
The floorplan of the proposed test chip is depicted in the Figure 12. The metallic thermal skeletons are arranged and enclosed by the core-sensor blocks. The peripheral area is for input/output and power/ground connections which provide external accesses. The test chip is designed without resorting to a complex control scheme. The virtual cores are arranged in three groups, each consisting of three rows and seven columns. The whole chip can be divided into nine regions. Each region consists of two separate areas which are enclosed by core-sensor block named A1-A7, B1-B7 and C1-C7 respectively and represent 3 types of metallic thermal skeletons. to are identical design of the metallic thermal skeleton, so do the to and to . The major differences among these nine regions are the combinations of , and elements, which are shown in Figure 13. In this design as shown in Figure 13(a), elements , and are different in the distribution densities of metal in the BEOL. For better visualization, Figure 13(b) shows the three-dimensional view of the metallic thermal skeletons. The combinations of TSVs with front metals form the onchip heat sink, and the BEOL metal 1 to metal 4 form the metallic thermal skeletons.
Core-sensor block α 1 α 2 β 1 β 2 γ 1 γ 2 α 3 α 4 β 3 β 4 γ 3 γ 4 α 5 α 6 β 5 β 6 γ 5 γ 6 In this chapter, the stacking of the identical chips is not included in discussions, only planar die is reported. The future thermal TSV test chip will divide the core area into blocks, each, as shown in Figure 14, consisting of virtual cores, temperature sensors, and a TSV array with metallic thermal skeletons to constructs the on-chip heat sink. The virtual cores and temperature sensors are laid out at the left and right side of the on-chip heat sink. As shown in Figure 14, thermal TSV with front metals will be the on-chip heat sink, and the metallic thermal skeletons play the role as the conduction path for high speed heat transfer. Therefore, the performance of the metallic thermal skeletons are emphasized and compared with each other.  In this chapter, to verify the capability of heat conduction, triplet experiments are designed to test the chip. Since A1-A3 is at the corner of the chip, the heat transfers more to the peripherals than to the central area of the chip. Such kind of location factors occur often in the chip measurement of thermal phenomenon. Hence, A1-A3, B1-B3 and C1-C3 are identical combination of the metallic thermal skeletons to avoid the location effects happening. The layout of the designed test chip is shown in Figure 15. The core-sensor blocks, metallic thermal skeletons, peripherals, IOs, and power domains are in one SOC chip as the NoC. The virtual core system composed of on-chip heaters can be operated at the same time. The die size measures 5,040 µm × 5,040 µm, including the seal ring. There are three voltage levels, four power domains, and nine test regions in this chip. Each voltage level can be separately controlled by the programmable logic analysis instrument. All the cores in the chip can be operated independently through the power gating mechanism. In order to precisely observe the temperature distribution of the chip surface, all sensors on the chip are activated simultaneously, and the measured temperature values can be read out as the matrix data.

Design of the core-sensor block
The temperature sensitive ring oscillator (TSRO) thermal sensor in Figure 16 is based on a ring oscillator whose oscillation frequency is sensitive to temperature, albeit not completely linear. In fact, the ring oscillator is also sensitive to supply voltage. Hence, to minimize power droop is important in improving the accuracy. By establishing the relationship between temperature and frequency, and opting for on-die calibration, the thermal sensor can be quite accurate. The frequency is converted by a counter and read out to a register. Figure 16(a) shows the block diagram. The control unit (CU) accepts a reference clock TS_CK and an input TS_EN which enables the sensing operation when transitioning from 0 to 1. As shown in Figure 16(a), four signals a, b, c, and RDY are generated. When the internal signal a changes from 0 to 1, the counter is reset and the count is cleared. When internal signal b changes from 0 to 1, the ring oscillator is activated and the counter starts; when it changes from 1 to 0, the ring oscillator is deactivated and the counter stops. When the internal signal c changes from 0 to 1, the count is loaded into an output register TS_REG

www.intechopen.com
Three-Dimensional Integrated Circuits Design for Thousand-Core Processors: From Aspect of Thermal Management 31 to be read out. The handshake signal RDY indicates that the count is ready. The physical view of the thermal sensor used in this test chip is shown in Figure 16 The virtual core circuit is composed of a PMOS switch and a p-type diffusion resistor, as shown in Figure 17. The diffusion resistor is non-silicided and placed in an n-well. Consequently, the n-well becomes hot at first, if the heater in the virtual core is turned on, which is slightly different from a conventional CMOS circuit in that the substrate is more likely to be the heat source. The maximum current flowing into the resistor is regulated below 13.5 mA.

Thermal property analysis of the metallic thermal skeletons
The metallic thermal skeletons are intended to be placed in the regions enclosed by the coresensor blocks. In this section, we derive analytical expressions for some key parameters.

Analytical model of the metallic thermal skeleton
It is clear that the heat removing rate of the metallic thermal skeletons is assumed to be q. Let us consider a pair of core-sensor blocks as the heat sources. The temperature distribution on the metallic thermal skeletons between any couple of core-sensor blocks can be expressed by (4), and then can be expressed as the following equation.

 
As shown in Figure 18, where T a and T b are the temperatures of CS1 and CS2, respectively, q is the heat conducted to the ambient environment by the metallic thermal skeletons, k sk is the equivalent thermal conductivity of the metallic thermal skeletons, and w is the width of the metallic thermal skeletons. Since T k denotes the temperature at the location x, examining the mid-point T 1/2 by substituting x with w/2 into (9), we have

Effective thermal conductivity of the metallic thermal skeletons
For the die with 9 μm of BEOL and 450 μm of the silicon substrate, we can clearly figure out that k sxx is around 12~68 W/mK and k szz is around 116~147 W/mK, by substituting the thermal conductivities into (6). The variation in the equivalent thermal conductivity depends on the percentage distribution of the metal in BEOL. Thus, the heat flows through the silicon substrate almost dissipates by the metallic thermal skeletons instead of transferring by silicon dioxide in the BEOL. By substituting the equivalent k sk and the temperature values of T a , T b and T 1/2 into (9) we obtained that the widths of the metallic thermal skeleton should be 420 µm. FEM simulations have been performed to see the effectiveness of the proposed metallic thermal skeletons, as shown in Figure 19. For the reason of compatibility, we have combined the simulation results both from CFD-RD and ANSYS, so as to link the design platform for our circuit designers. Hence, to design the metallic thermal skeleton shown in Figure 12, we assumed the type  ,  and  with different distribution densities of metal in the BEOL as following equation.
The matrix D represents the weighting coefficients of the metallic thermal skeletons. The percent contribution of the element  is limited by the metal density constraint in the design rule released from the foundry. The enable signal H_EN is broadcast to all virtual cores.

Experimental setup
The die photo of the proposed test chip in this chapter is shown in Figure 20. This chip is fabricated by TSMC in 0.18 μm 1P4M mixed-mode process technology. The package uses 256-pin IST Universal PGA. The front of the chip is covered by the package glue. In order to observe the thermal behavior of the test chip, the back of the chip is exposed to air with a transparent PYREX ® glass of 120 μm. There is a 6 cm x 6cm open window in the central area of the evaluation board to facilitate the observation on the temperature measurement. The principle measurement environment setup includes DC power supplier (MOTECH PPS 3210), current meter (FLUKE 189), function generator (HP 8166A), temperature-humidity chamber (HOLINK EZ040-72001), logic analyzer (Agilent N6705A), infrared camera (FLIR SC5700), and thermal management total analysis platform. As shown in Figure 21(a), the FLIR SC5700 with a microscope of three μm resolution is responsible for infrared radiation (IR) inspection. The temperature responses are measured by the thermal management total analysis platform designed by ICL, ITRI as shown in Figure 21(b). It is clear in Figure 21(c), the test environment is controlled at a constant ambient temperature, in which the temperature error varies within ± 0.5 o C. The programmable temperature-humidity chamber HOLINK EZ040-72001 is used to control the operation temperature from 0 o C to 100 o C. MOTECH PPS 3210 is the power supply which provides the three voltage levels. The control signals (TS_EN and CLK) are generated from HP 8166A. The current meter FLUKE 189 is utilized for measuring the current consumption. Last, the output signals are collected and analyzed by Agilent N6705A. www.intechopen.com

Results and discussions
The experimental results are shown in Table 1. When the power density of 7.38 W/cm 2 is applied to the virtual core, each core is operated at the power of 20 mW. To evaluate the thermal conduction capability of the metallic thermal skeleton, the average temperature of the metallic thermal skeleton is an important index. Since the metallic thermal skeletons are employed to conduct the heat flux generated by the virtual cores, the temperature at w/2 (referred to Figure 18) especially represents the results of the lateral thermal diffusion. To compare with the experimental steady state data shown in On the other hand, transient temperature response is recorded by the high speed infrared radiation dynamic photos as shown in Figure 22. Take the region in the photo for example; the 2  -A5-1  region (referred to Figure 12) includes 2 types of metallic thermal skeletons. It is clear that the temperature of 1  is higher than that of 2  . This results show that the thermal conductive capability of 1  is better than that of 2  . The area of  is limited by the metal density constraint in the design rule released from the foundry, therefore no more metal are allowed to be placed. However, the  region may be reserved for the placement of thermal TSVs or front metal stripes during the post CMOS process to be the on-chip heat sink.

Conclusion
The cost of thermal ridges and metallic thermal skeletons may be compared with the advanced techniques, such as micro-channel liquid cooling or the thermo-electric cooling (TEC). Since by ITRS, the number of stacked dies is expected to increase in the future, the cooling problem of the inter-layer dies will become more challenging. If the heat should be removed by pumping liquid or external energy into the stacked dies, the cooling cost will grow exponentially. The thermal ridges and metallic thermal skeletons proposed in this chapter will be relatively cost-effective and energy-saving. Moreover, this proposed method locally improves the temperature non-uniformity, and the thermal gradient of the most part of the chip also decreases. Nevertheless, the global temperature non-uniformity which affects the chip operations from the electrical perspective deserves more efforts to pursue. Since the 3D IC with TSV now appears as an emerging technology, the early floorplan for the insertion of thermal ridges and metallic thermal skeletons for thermal management will be discussed more and more widespread. The temperature distributions measured by the infrared radiation and by the thermal sensors are compared in this study. By these results, readers can understand that both of the data could be calibrated with each other if the package of the chip is chosen properly. Meanwhile, authors would like show also that the thermal test chip designed and proposed would be capable to evaluate the thermal properties and thermal characteristics of the packages if desired. In the 3D design of the stacking dies, the thermal measurement and verification are getting much more important. This research may give a direction or inspiration for the engineers to investigate the possibility or feasibility of better thermal designs.