Power Consumption Comparison between 2-D and 3-D Computer system
According to the American Energy Information Administration (EIA) and to the International Energy Agency (IEA), the world-wide energy consumption will on average continue to increase by 2% per year. A yearly increase by 2% leads to a doubling of the energy consumption every 35 years (Michaelbluejay, 2012). This means the world-wide energy consumption is predicted to be twice as high in the year 2047 compared to today (2012) which is about 2500 GW of capacity 35 years from now.Current global consumer electronics devices (TV and PC etc.) account for roughly 15% residential electricity usage annually and are growing. Nearly 185 GW of capacity is consumed. About 370 medium size (500 MW) power plants are required. Estimated 650 megaton’s of CO2 emissions per year today and could surpass 1000 megaton’s by 2030 without policy intervention. With the switch and design architecture changed from 2-D to 3-D for all TV and PC electronics devices the energy consumption could be reduced to the tenth of the current consumed capacity which is about 18 GW of capacity today only (Cheng, K., 2011). This could save about 600 megaton’s of CO2 emissions per year today. 37 medium size (500 MW) power plants are needed only instead of 370 power plants (Massoud Pedram, 2009; Noah Horowitz, 2011).
From the recent study, it is learnt that 3-dimensional electronicsintegrated circuit device(3-D IC) and 3-dimensional computing systems (3-D Computer) will be the ultimate architecture for all our daily used electronics equipments such as TV, PC, cell phone, PDA and GPSdevices etc. because of the 3 dimensional integrated circuit has the benefit of small size, light weight, high speed and less power consumption to meet the global warming regulation.No PCB, no package and even no cable connector are needed in the future 3 dimensional computer system. It has been proved in the laboratory that the 3 dimensional computing systems consumed only about one seventh to one tenth of that of the 2 dimensional counterparts. In the next few years to come each of us will have a mobile device such as i-phone or tablet computer to carry with all the time. This means the number of the device needed is about the population of the world. Therefore, it is vital urgent to turn all the current 2 dimensional computing system to the 3 dimensional architecture. This will lead to the seven to ten times of power saving and will benefit to our environment by using less carbon and meet the global warming regulation.
The era of cloud computing is near and becoming more reality for us. More business organizations are taking the instant tremendous large volume of data from cloud computing to analyze and direct their business orientation for profit consideration. For the private section to guide their investment for instant profit is becoming more reality by manipulating the instant data information provided from the cloud computing. The advantage of the cloud computing technology is the tremendous large instant data base available for public. Handling this large amount of instant data into your local need is one of the biggest challenges of this new era. In order to quickly process the data obtained from the cloud, a local smart terminal computing system is required. An example of a cloud computing terminal is an advanced GPS device. Live traffic information can be processed in real time such that alternate routes can be derived. Such smart terminals required for this application could be further advanced by applying 3 dimensional computing systems. This means in the near future era of cloud computing these smart terminals are needed almost every corner of the world. Again, it is vital urgent to turn all the current 2 dimensional computing system to the 3 dimensional architecture which will cut down the residential power requirement of electronic gadgets from 15% to 1.5%-2.1% and would help in reduction of power requirement substantially and approximately 600 megaton’s of CO2 emissions per year could be saved on the basis of current data. It would thus protect our environment by using less carbon and meet the global warming regulation if all the electronics equipments are switched to 3-D. Source: (Latest industry research & analysis; Prediction of energy consumption world-wide).
2. Construction features of 2-d to 3-d integrated circuit devices
A 2-dimensional (2-D) system chip or device, for instance, may consist of 5 elements as shown in the Fig. 1-a. This system can be partitioned and then stacked up to form a 3-dimensional (3-D) system chip or device with the same functionality as shown in Fig. 1-c. Fig. 1-d is also a 3-D but sometime it is named 2.5-D because only small portion of the whole system are stacked up to form a 3-D. As can be seen is that the global wirings (wirings between elements) shown in orange color in Fig. 1-b are much longer than the global wirings shown in Fig. 1-d, i.e. the wirings in 3-D system are much shorter than the 2-D’s. With the shorter wirings the system may functioning more efficiency. Besides, it consumes less power too.
In general, the power for IC circuit can be expressed as in the following equation:
where C=capacitance, V=voltage swing, f=frequency of operation.
So the advantage of consumed less power in 3-D comes from the smaller capacitance due to the reduced size and length of the interconnection among the system.
2-D IC system as described in Fig.1-a can be partitioned more and form a totally 3-D IC system device or chip (Fig. 1-c) with all 5 elements stacked vertically into 5 layers. With the same token, stacking up more of the same or different functioning 3-D IC chips vertically will finally become a complete close-up system, a 3D-Computer as shown in Fig. 2 below. The form factor (size and weight) of the 3-D computer system is much compacted and smaller than its 2-D counterpart.
Fig. 2 shows the 3 systems; 2-D IC, 3-D IC and the 3-D Computer. According to the lab experimental results indicated that the 2.5-D IC with only portion of the system stacked to form a 3-D IC device like the one in Fig. 1-d, the power consumption is about 3 quarter of that 2-D IC counterpart. With more detail partitioning and stacking into more layers as shown in Fig. 2, the 3-D Computer, the power consumed by this 3-D Computer system is only one seventh or one tenth of the 2-D counterpart. The experimental results  have been performed in the lab as shown in Table 1 below:
|System with all MSI/LSI only||~1.000||~125||~0.6|
|System with Conventional VLSI||~0.038||~16||~1.27|
|System with Wafer Scale VLSI||~0.012||~6||~0.95|
|System with Multi-Wafers Stacked VLSI (3-D Computer)||~0.003||~1||~0.14|
From the above Table 1, the size, weight and power of 3-D Computer are at most (0.003/0.012=)1/4 in size, 1/6 in weight and (0.14/0.95=)1/7 in power as compared with the other 2-D Computer counterparts.
The system power consumption as stated in the above equation (1) is also proportional to the voltage swing, V. With the system stacked in a very compact form like that in 3-D Computer system, the voltage swing actually can be reduced to a point where the logic states, “1” and “0” is still valid. Supposed the voltage swing of the 3-D Computer can be reduced to half then the power consumption will be reduced to the factor of 4. Therefore, the power consumed by the 3-D computing system could reduce almost to one 4x7≒thirtieth and beyond. This is a very optimal way of saving the power. This concluded that the system must go 3-D in order to save power consumption and meet the global warming regulation.
3. Signal, Information processing and Super Computer Architecture of 3-D IC
In this section the power of 3-D IC signal and information processing is described by some sample applications. 3-D IC's will have the advantage of parallel processing over its 2-D IC counterpart which will lead to better image, signal and data processing (Little, M.J.,1989; Toborg, S.T., 1990; Boguslaw C. et al., 2009; Orion Jones, 2012; Sarah Perrin, 2012).
The main technologies for integrating and fabricating the 3D-IC device and system are:
Through Silicon Via (TSV) technology for vertical communication;
Interconnecting technology to accomplish the up and down signal flow between wafer layers and
Stacking technology to complete the final assembly of the 3D-IC device and system.
The 3-D IC fabrication technology and its manufacturing processes are becoming more mature now-a-days. The TSV and the multi-layer wafers stacking technologies are going to be more prevalent in the manufacturing industry. These advancing techniques are now providing an opportunity for the integrated circuit industry to advance in an optimal circuit design using 3-D IC's. The advantages of the 3-D IC design are what we always strive for in
IC manufacturing, that is, smaller sizes, lighter weights and less power consumption.
Most of the micro-processor (μP) currently used in all computing system is 2-D. With the advancement of the TSV, wafer to wafer interconnection and stacking technologies in 3-D IC manufacturing, it is possible to produce the 3-D μP at relatively easy and low cost. What will be the different between the 2-D and 3-D μP is that the processing algorithm and the circuit design architecture. With the implementation of SIMA (Single Instruction Multiple Data) and BSWP (Bit Serial Word Parallel) processing technologies,a 3-D μP or super computer can be designed to execute the data signal paralleling while the conventional 2-D is sequentially. Therefore, the power of the 3-D signal processing could be tremendously better than that of the 2-D version (Singh, A. D., 1985).
The 2-D conventional microprocessor is consisted by 5 processing cells as shown in the Fig. 3. These 5 processing cells can be stacked up vertically to form a 3-D μP as shown in Fig. 4. We could turn this single 3-D μP into multiple or wafer scale 3-D high performance microprocessor easily as shown in Fig. 5. This wafer scale 3-D microprocessor could be the ultimate high performance processor or super computer we are searching for.
With multiple identical N x N cells on the same level of wafer and stacked with other 4 different types of cell wafer will lead to an N x N processors. Each processor can process its own instruction or data signal and all N x N different instructions are executed parallel in the same time frame. Fig. 6 shows a complete 3-D multiple micro-processor system or 3-D super computer. The data signal can go horizontally and vertically in this 5 wafers stack. This 3-D IC design architecture constructed a very powerful N x N processor. For example, let’s assume this is a 16 bits word processor. The data on any level of wafer with N x N cells are transmitted to the other level of wafer paralleling through the TSV inter-connected link between levels then executed according to the instruction paralleling. This process is called “words parallel” because the entire N x N data or instructions is processed or executed parallel in the same time. If there is only one TSV for each cell to communicate to the other level, then the 16 bits will be shifted bit by bit through the single TSV and therefore we call it “bit serial” because shifting the word to other level of wafer is bit by bit sequentially.
The command for this BSWP (bit serial word parallel) is instructed by a single instruction and the operation is performed on all N x N data in the same time. Therefore, we called it SIMA (single instruction multiple data) operation. For example:
SHIFT A TO C;
MOVE DATA TO D;
The above 3 single instructions will do:
shift the data on A level of wafer through TSV to the C level
do the adding on all N x N data of A with C
The results of N x N data will be moved to I/O which is D wafer level and accessed to outside world such as printer etc…
With the concept of BSWP and SIMD technologies a super high performance computer could be achieved to enhance the 3-D IC design architecture. Theoretically the N could be a very large number approaching to the infinite and the system performance could be tremendously powerful. But in reality the manufacturing encountered the yield and fault tolerance problem. Redundancy and Interposer technologies could be applied to the circuit design for enhancing the yield and fault tolerance to make the dream system come true.
Wafers with new function type of cell can be added to the stack randomly to expand the system because the system can be re-configured by built-in software after stacking. Fig. 7 indicated the relative manufacturing cost per system with the size of N x N array related to the number of stacked wafers in that system will become cheaper as the advancement in TSV and inter-connection technologies in the next few years to come.
3.1. 3-D Computer for Image Understanding Application
Image understanding technology has broad application. Especially on Today’s LCD display panel and LED lighting device production, it can be applied to test and evaluate the quality of the display panel and lighting products. For the industrial applications such as bin picking, assembly and inspection, image understanding technology can be used to support those applications at real-time rates. Another area of applicability is that of automatic interpretation of aerial or satellite imagery for use in GPS or computer aided cartography.
IC design for machine vision and image understanding is one of the most computationally intensive domains of artificial intelligence research. It requires that an interpretation of the changing scene be updated with every new video frame, once every thirtieth of a second. With three quarters of a million color-intensity data values which comprise the picture elements or pixels of the image, performing a single operation on each of these pixels requires an execution rate of about twenty-three million instructions per second for just to keep up with the input. Of course far more than one operation per pixel is necessary. The image understanding is not limited to the data processing in the visible spectrum; it should have more applicable to computer aided systems for image signal processing such as avionics flight control, synthetic array radar and sonar systems etc... The bandwidth requirement of these systems is even higher. In order to achieve these enormous bandwidths a 3-D IC parallel processing design architecture and its algorithm will be discussed. With the advancement of the high density packaging and the level of 3-D IC integration technologies available today, it maybe the best approach to achieve these bandwidths by implementing the SIMD (Single Instruction Multiple Data) and the BSWP (Bit Serial Word Parallel) processing technologies into the 3-D IC circuitry. The SIMDand the BSWP will be discussed and analyzed as approaches to achieving these enormous bandwidths requirement for machine vision and image understanding.
The era of cloud computing is near. The advantage of the cloud computing technology is the tremendous large instant data base available for public. How to handle this large amount of instant data into your local need is one of the most challenges of this new era. In order to quickly manipulating the instant data obtained from the cloud computing to meet your instant need, a local smart terminal computer is required to take the advantage of the cloud computing. 3-D data processing architecture is believed to be the best candidate for this smart terminal computing requirement.
4. SIMD and BSWP technologies
With the advancement of the TSV (Through Silicon Via), wafer to wafer interconnection and stacking technologies in 3-D IC manufacturing, it is possible to produce the 3-D smart computer to meet the requirement of the cloud computing at relatively easy and low cost. What will be the different between the 2-D and 3-D computer is that the processing algorithm and the circuit design architecture. With the implementation of SIMD (Single Instruction Multiple Data) and BSWP (Bit Serial Word Parallel) processing technologies,a 3-D smart computer can be designed to execute the data signal paralleling while the conventional 2-D is sequentially. Therefore, the power of the 3-D signal processing could be tremendously better than that of the 2-D version. Although the TSV and the micro-bump technologies developed for stacking the silicon chips or wafers are quite advance now, but the vertical interconnection or wiring density is still significantly lower than in the other two dimensions. Thus, it is still important to partition the system to make best use of the limited vertical dimension in the 3-D IC circuit design.
The MEMS technology applied to form the vertical connection, i.e. TSV and wafer to wafer interconnection, is still in the micron geometrics while the 2-D surface wiring is now in the nanometer technology. There are a million orders of magnitude different between the vertical and horizontal wiring connection. With the limited bandwidth for the vertical interconnection as compared with the other two dimensions how to optimize the 3-D IC circuit design architecture becomes the most popular issue..
DRIE (Deep Reactive Ion Etcher) or Bosch process is the most popular one used in processing the TSV. The aspect ratio can be reached to the maximum of 13:1 so far. For 3-D stacked chip or wafer the thinnest silicon chip or wafer can be archived now are about 50 um. With the aspect ratio of 13:1 the smallest diameter of through silicon hole is about 3.85um in diameter. It is truly a waste on the wafer surface if this 3.85 um diameter area has to be taken by the TSV. Therefore, the less number of vertical interconnections the more circuit could be deposited on the wafer surface.
Take for example; a 2-D conventional microprocessor is formed by 5 processing cells as shown in the Fig. 3. The interconnections between the cells are mostly data bus and a few control signals. In order to save the physical silicon space on each chip level after stacking them up to form a 3-D structure, a single wire connection is designed to connect the 5 cells vertically thru each of the 5 chip levels. These 5 processing cells each on 5 different levels are stacked up vertically to form a 3-D microprocessor as shown in Fig. 4. The vertical connection is limited to one single wire in order to save the silicon space on each of the silicon chip level. Communication between levels is now can only go thru the single vertical connection wire. In order for the data and a few control signals to go thru the single channel a special shift register circuit must be installed on each cell of the 5 levels. So that signals from a cell can pass thru the single vertical channel to the other cell on the different level sequentially. If the data or the control signal is a 16 bit wide word then the 16 bits will be passed one by one consecutively thru the vertical interconnection wire. By the same token the receiving cell will need to have the similar shift registers to get it back to the original data and signals.
If a system contains more than a single processor, i.e. a multiple N*N microprocessors system, then it can be expanded to an N x N multiple processors system on 5 full size wafers stacked vertically (wafer scale) as shown in the Fig. 5. The transfer of the data signals of the system can be performed simultaneously on the all N x N cells on the same level to the other level of N x N cells correspondingly. After the data been transferred to the designated level then all the data signals in the N x N cells will be performed and executed paralleling in the same time on the designated level. This bit by bit way of transferring the data signals sequentially between the wafer levels and executing all the N x N cells data signals on the designated level in a parallel fashion is called BSWP technology. Because an adaptive multi-microprocessors message passing network makes better use of limited bandwidth than any other communication structure we know of, the natural vertical partition in the packaging is likely to match one dimension of the message network.
Based on the above criteria it is suggested that the BSWP (bit serial word parallel) and the SIMD (single instruction multiple data) technologies applied to the 3-D circuit design, together with the circuit partition technique for less interconnection may be a more desired approach for the design of a better circuit performance in regards to limited vertical interconnection bandwidth.
Up to this point questions will be raised for how the cells on a level can transfer their data signals to the designated level or levels; and how the cells on a level or levels can receive data signals from other level without interfering. This is done by the so called wired-AND scheme or tri-state I/O buffer circuit connected to the single vertical interconnection wire as shown in the Fig. 8. Supposed data signals on level 1 are going to transfer to level 3 and/or 4. The path between level 1 and level 3 and/or 4 will be open for transfer data signals while other levels will be in tri-state or “Z” state. Sometime the tri-state or “Z” state is referred to electrical potentially “floating” state because it is in the high impedance state condition.
5. Example of image understanding by implementing BSWP and SIMD
3 types of cells array are needed for this example and all the cells data are assumed to be the 16-bit shift register with 2 bytes word. Each single cell is a simple logic circuit with only few hundred gates count and all N x N cells on a wafer level has the identical circuit. This will simplify the circuit design tremendously. Type 1 as shown in Fig. 9 is the N x N cells array data that can be shifted right and left, up and down and wraparound and is named SHIFTER, which in this example is the B wafer level as shown in Fig. 10. Type 2 is the N x N cells array data that can do the “XNOR” operation and is named COMPARATOR, which in this example is the C wafer level as shown in Fig. 10. The data of this COMPARATOR can only communicate vertically. Type 3 is the N x N cells array data that can be loaded and unloaded to and from the I/O bus and is named DATA I/O, which in this example is the D wafer level as shown in Fig. 10. The rest of the wafers levels are data base for many kinds of images. Let’s assumed DATA-1 is the image of a ball; DATA-2 is the image of a bird and DATA-3 is the image of an airplane etc… All the cells array data can be communicated vertically of course. Fig. 10 shows a simple system for this example which contained all 3 types of wafer levels and the images data base. In order to have better picture of how the image understanding will work, take for an example as below:
Assuming the pilot of a commercial airline saw a spot far ahead. It could be an airplane heading toward him. Supposed the pilot has an image understanding equipment similar to the one shown in Fig. 10. The spot (image) picture was taken by a digital camera that connected to the system. The first thing will be done is to center the spot and magnify (or zoom) it to the suitable size for image processing: Assuming the image is already at the center and zoomed to the right size by the digital camera, and then the following simple instructions program will do the target recognition:
|MOVE D TO B||step 1|
|SHIFT B RIGHT-LEFT||step 2|
|SHIFT B UP-DOWN||step 3|
|MOVE B TO C||step 4|
|FOR i=1 TO 3||step 5|
|MOVE DATA-i TO C||step 6|
|PERFORM COMPARE||step 7|
|IF C = “1” THEN||step 8|
|PRINT “IT’S AN AIRPLANE”||step 9|
|END ELSE NEXT i||step 10|
The above 10 single instructions as shown in Table 2, will do:
Step 1 Image data has been loaded to D level from I/O Bus. Therefore this step is to move the image data in D level to B level (because the B wafer level can do right-left, up-down and wraparound shifting)
Step 2, 3 will do the fine adjustment for the image to be right on the center with the right size.
Step 4 then move the image on B to C ready to be compared
Step 5 indicates the program will do at most 3 times of comparison because system has only 3 images data as exampled.
Step 6 will move the system image data from DATA-1 to C level where C will perform the “XNOR” operation as shown on top of Fig.11.
Step 7 is ordering the C level to perform the “XNOR” manipulation as shown in the Fig.11 (1), (2) and (3).
Step 8 is to compare the resultant data whether it has all “1” in the N x N array as shown in the Fig.11 (1), (2) and (3).
Step 9 is the end of the program if all the cells data in the N x N array are “1” then the image is understood to be Data-3 which is an airplane as shown in the Fig.11 (3) results marked with red.
Step 10 is for continuing the manipulation if the image is still not recognized.
The example shown above imply the BSWP and SIMD technologies been applied to accomplish the mission at lightening speed. The single instruction contained “MOVE” in step 1, 4 and 6 for moving the N x N data from the wafer level to the destination level are all manipulated by the BSWP technology which the 2-D computer system can hardly do. The single instruction contained “SHIFT” and “PERFORM” are actually manipulating the entire N x N data in the same time which again the 2-D computer system can never do. In the 2-D system, the pixel of the entire N x N data is performed in sequentially while in our 3-D system is paralleling. Therefore, the processing speed is much faster for 3-D system than that of 2-D version.
From the above study, it is concluded that:
The 3-D computing system could be custom designed for broad applications with its high performance signal processing power. It will meet the challenges of the coming Cloud Computing.
The 3-D computing system as described in the above application example can be expanded easily by adding more wafer levels and the size of the N x N array are unlimited. This will make our dream system come true as the advancement of the 3-D computer manufacturing technology progressed.
The power consumption of the 3-D computer system is about one tenth (1/10) to one seventh (1/7) of that of the 2-D computer system counterpart based on the lab experimental results. The 3-D will be qualified for environmental issue and meet the global warming limits.
It is also assessed that once 2-D IC is swiched to 3-D IC pertaining to all the electronic domestic appliances and or equipments, 3 Dimensional Architecture Integrated Circuits will cut down the residential power requirements from 15% to 2.1% to 1.5% and would help in reduction of power requirement substantially. Apperently600 megaton’s of CO2 emissions per year could be saved on the basis of current data of EIA. It would thus protect our environmental issues to some extent by using less carbon and meet the global warming regulation.
Authors express their thanks to Ms. Daria Nahtigal,Commissioning Editor, Editor Care and Support Department, for her continued support to get this paper in final shape.