In-Situ Supply-Noise Measurement in LSIs with Millivolt Accuracy and Nanosecond-Order Time Resolution

This chapter explores signal analysis of a circuit embedded in an LSI to probe the voltage fluctuation conditions, and is described as an example of digital signal processing1. As process scaling has continued steadily, the number of devices on a chip continues to grow according to Moore’s Law and, subsequently, highly integrated LSIs such as multi-CPU-core processors and system-level integrated Systems-on-a-Chip (SoCs) have become available. This technology trend can also be applied to low-cost and low-power LSIs designed especially for mobile use. However, it is not the increase in device count alone that is making chip design difficult. Rather, it is the fact that parasitic effects of interconnects such as interconnect resistance now dominate the performance of the chip. Figure 1 shows the trends in sheet resistance and estimated power density of LSIs. These effects have greatly increased the design complexity and made power-distribution design a considerable challenge.


Introduction
This chapter explores signal analysis of a circuit embedded in an LSI to probe the voltage fluctuation conditions, and is described as an example of digital signal processing 1 .A s process scaling has continued steadily, the number of devices on a chip continues to grow according to Moore's Law and, subsequently, highly integrated LSIs such as multi-CPU-core processors and system-level integrated Systems-on-a-Chip (SoCs) have become available. This technology trend can also be applied to low-cost and low-power LSIs designed especially for mobile use. However, it is not the increase in device count alone that is making chip design difficult. Rather, it is the fact that parasitic effects of interconnects such as interconnect resistance now dominate the performance of the chip. Figure 1 shows the trends in sheet resistance and estimated power density of LSIs. These effects have greatly increased the design complexity and made power-distribution design a considerable challenge.  (Kanno, et al., 2007).

In-Situ Supply-Noise Measurement in LSIs with Millivolt Accuracy and Nanosecond-Order
Time Resolution 5 www.intechopen.com Power supply integrity is thus a key for achieving higher performance of SoCs fabricated using an advanced process technology. This is because degradation of the power integrity causes a voltage drop across the power supply network, commonly referred to as the IR-drop, which, in turn, causes unpredictable timing violations or even logic failures (Saleh et al., 2000). To improve power integrity, highly accurate analysis of a power-supply network is required. However, sophisticated SoCs, such as those for mobile phones, have many IPs and many power domains to enable a partial-power-down mode in a single chip. Thus, many spots of concentrated power consumption, called "hot spots", appear at many places in the chip as shown in the Fig. 2. Analysis of the power-supply network is therefore becoming more difficult. To address these issues, it is necessary to understand the influence of supply noise in product-level LSIs, gain more knowledge of it, and improve evaluation accuracy in the design of power supply networks via this knowledge. Above all, this understanding is very important; therefore, in-situ measurement and analysis of supply-noise maps for product-level LSIs has become more important, and can provide valuable knowledge for establishing reliable design guidelines for power supplies.

Hotspot:
Heavy current consumption part. e.g. high speed operation, or simultaneous operation of plural circuits .
Power gating switch to reduce standby leakage current consumption.
LSI Fig. 2. Hotspots in the LSIs. The hotspots are defined as heavy current consumption parts in the LSIs. The sophisticated LSI has many CPUs and hardware Intellectual Properties (HW-IPs) in it, so the many hotspots become appearing.
In-depth analysis of the power supply network based on this in-situ power supply noise measurement can be helpful in designing the power supply network, which is becoming requisite for 65-nm process technology and beyond.

Related work
Several on-chip voltage measurement schemes have recently been reported (Okumoto et al., 2004;Takamiya et al., 2004), and the features are illustrated in Fig. 3. One such scheme involves the use of an on-chip sampling oscilloscope (Takamiya et al., 2004). This function accurately measures high-speed signal waveforms such as the clock signal in a chip. Achieving such high measurement accuracy requires a sample/hold circuit which consist of an analog-to-digital converter (ADC) in the vicinity of the measurement point. This method can effectively avoid the influence of the noise on the measurement. Therefore, a large chip footprint is required for implementing measurement circuits such as a voltage noise filter, a reference-voltage generator and a timing controller.  (Takamiya et al., 2004) and (b) is a simple analog measurement (Okumoto et al., 2004).
A small, simple analog measurement was reported in (Okumoto et al., 2004). This probe consists of a small first amplifier, and the output signal of the probe is sent to a second amplifier and then transmitted to the external part of the chip. Because the probe is very small and has the same layout height as standard cells and needs only one second amplifier, many probes can be implemented in a single LSI with minimal area overhead. This method, however, requires dedicated power supplies for measuring voltages that are different from local power supplies V DD and V SS . These measurements are therefore basically done under test-element-group (TEG) conditions, and they may find it difficult to capture supply noise at multiple points in product-level LSIs when actually running applications. To resolve this difficulty, an in-situ measurement scheme is proposed. This method requires only a CMOS digital process and can be applied to standard-cell based design. Thus, it is easy to apply to product-level LSIs. The effect was demonstrated on a 3G cellular phone processor , and the measurement of power supply noise maps induced by running actual application programs was demonstrated.

Key points for an in-situ measurement
Three key points need to be considered in order to measure the power supply noise at multiple points on a chip: area overhead, transmission method, and dynamic range.
1. The first point is the area overhead of the measurement probes. Because the power-consumption sources are distributed over the chip and many independent power domains are integrated in an LSI, analyzing the power supply network for product-level LSIs is very complicated. To analyze these power-supply networks, many probes must be embedded in the LSI. Thus, the probes must be as small as possible.
Minimal area overhead and high adaptability to process scaling and ready-made electrical design automation (EDA) tools are therefore very important factors regarding the probes.
2. The second point is the method used to transmit the measured signal. It is impossible to transmit the measured voltage by using a single-ended signal, because there is no flat (global) reference voltage in an LSI. Dual-ended signal transmission is a promising technique to get around this problem; however, this method gives rise to another issue: the difficulty of routing by using a ready-made EDA tool. Noise immunity of the transmission is another concern, because analog signal transmission is still needed.
3. The third point is the dynamic range of the voltage measurement.
To measure supply-voltage fluctuation, a dedicated supply voltage for the probes needs to have a greater range than that of the measured local supply voltage difference.

In-situ supply-noise map measurement
An in-situ power-supply-noise map measurement scheme was developed by considering the above key points. Figure 4 shows the overall configuration of our proposed measurement scheme. The key feature of this scheme is the minimal size of the on-chip measurement circuits and the support of off-chip high resolution digital signal processing with frequent calibration , (Kanno, et al., 2007). The on-chip measurement circuit therefore does not need to have a sample-and-hold circuit. The on-chip circuits consist of several voltage monitors (VMONs) and their controller (VMONC). The VMON is a ring oscillator that acts as a supply-voltage-controlled oscillator, so that the local supply difference (LSD) between V DD1 and V SS1 can be translated to a frequency-modulated signal (see Fig. 5). The VMONC activates only one of the VMONs and outputs the selected frequency-modulated signal to the external part of the chip. Every VMON can be turned off when measurement is not necessary. The output signal is then demodulated in conjunction with time-domain analysis by an oscilloscope and calibrations by a PC. The frequency-modulated signal between the VMONs and VMONC is transmitted only via metal wires, so dozens of power-domain partitions can be easily implemented in an LSI . The frequency-modulated signal has high noise immunity for long-distance, wired signal transmission. Although the measurement results are averaged out in the nanoseconds of the VMON's sampling period, this method can analyze voltage fluctuation easily as the voltage fluctuation map in LSIs by using multi-point measurement. The dynamic range of the measuring voltage is not limited despite requiring no additional dedicated supply voltage. This is because we measure a frequency fluctuation as a voltage fluctuation, which is based on the fact that the oscillation frequency of a ring oscillator is a simple monotonic increasing function of the supply voltage (Chen, et al., 1996).

Time resolution and tracking of LSD
The ring oscillator's oscillation period consists of each inverter's delay, which depends on its LSD (Chen, et al., 1996). The voltage-measurement mechanism of the ring oscillator and the definition of our measured voltage are depicted in Fig In the ring oscillator, since only one inverter in the ring is activated, each inverter converts the LSD voltages into delays one after another. This converted delay τ is a unique value based on the LSD, where τ ri is the rise delay of the i-th stage, τ fi is the fall delay of the i-th stage, and V LSDi is the LSD supplying the i-th stage. The output signal of the ring oscillator used to measure the external part of the chip has ap e r i o do fT osc , which is the sampling period of the ring oscillator. The T osc is the total summation of all of the rise and fall delays of all the stages; that is, Since we can only measure the period of the ring oscillator T osc and its inverse frequency( f osc ), we must calculate the voltage from (3) in order to determine the LSD. However, it is impossible to solve (3) because there are many combinations of V LSDi that satisfy (3). Therefore, the measured LSD, V LSDm , is defined as the constant voltage which provides the same period T osc , The period T osc is thus the time resolution of the V LSDm . In this scheme, the LSD is calculated from a measured period T osc or a measured frequency f osc . The measured LSD denoted as V LSDm is therefore an average value. Since the voltage fluctuation is integrated through the period T osc , the time resolution is determined by the period T osc . Next the tracking of the LSD is discussed. There is a limitation in the tracking because the measurement of the voltage fluctuation is done by a ring oscillator as mentioned above, and the local voltage fluctuation is averaged out at the period of the ring oscillator. When the voltage fluctuation has a high-frequency element, the reproduction is difficult. In addition, a single measurement is too rough to track the target voltage fluctuation. However, although the voltage fluctuation is synchronized to the system clock, in general, since the ring oscillator oscillates asynchronously to the system frequency, the sampling points are staggered with each measurement. It is well known that averaging multiple low-resolution samples yields a higher resolution measurement if the samples have an appropriate dither signal added to them (Gray,et al., 1993). For example, Fig. 7 (a) illustrates the case where the supply voltage fluctuation frequency is 150 MHz, which is about half the frequency of the ring oscillator. In this case, a single measurement cannot track the original fluctuation, but a composite of all measured voltages follows the power supply fluctuation. Another example is shown in Fig. 7 (b). In this case, since the frequency of the power supply fluctuation is similar to the frequency of the ring oscillator, the measured voltage V LSDm is almost constant. These examples show that this scheme tracks the LSD as an averaged value during the period of T osc . Therefore, as shown in these examples, a rounding error occurs even when the frequency of the LSD is the half that of the VMON frequency. Thus, for precise tracking, the frequency of the ring oscillator should be designed to be more than 10 times higher than that of the LSD. In general, the frequency of the power-supply voltage fluctuation can be classified into three domains; a low-frequency domain (∼MHz), a middle-frequency domain (∼100 MHz), and a high-frequency domain (>GHz). Especially, the low-frequency domain is important in the case such as the operational mode switching and the power gating by on-chip power switches. Thus, in these cases, the accuracy of this method is sufficient to the tracking with high accuracy and the time resolution. Recently measurement of the influence of the on-chip power gating is reported (Fukuoka, 2007). Although the measured voltage is averaged out in the period of the VMON, however, the measurement of the voltage fluctuations at the actual operational mode in the product level LSI is innovative. The higher the frequency of the ring oscillator, the higher the time resolution and improving the tracking accuracy; however, signal transmission at a higher frequency limits the length of the transmission line between the VMONs and VMONC due to the bandwidth limitation of the transmission line. There is therefore a trade-off between time resolution and transmission length. Although bandwidth can be widened by adding a repeater circuit, isolation cells, µI/O s (Kanno, et al., 2002), are needed when applying many power domains, and, thus, the design will be complicated.

Accuracy of waveform analysis
Accurate measurement of the VMON output frequency is also important in the in-situ measurement scheme. The accuracy also depends on the resolution of the oscilloscope  used. Generally, frequency measurement is carried out by using a fast-Fourier-transform (FFT) based digital sampling oscilloscope. Sampling frequency and memory capacity of the oscilloscope are key for the FFT analysis. First, the sampling frequency of the oscilloscope must be set in compliance with Shannon's sampling theorem. To satisfy this requirement, the sampling frequency must be set to at least double that of the VMONs. Second, the frequency resolution of the oscilloscope must be determined in order to obtain the necessary voltage resolution. Basically, the frequency resolution ∆ f of an FFT is equal to the inverse of the measurement period T meas .I f a 100-M word memory and a sampling speed of 40 GS/s are used, continuous measurement during a maximum measurement period of 25 ms can be carried out. If the frequency of the VMON output is several hundred megahertz and the coefficient of voltage-to-frequency conversion is about several millivolts per megahertz, highly accurate voltage measurement of the low-frequency LSD with an accuracy of about 1 mV can be achieved.

Support of off-chip digital signal processing
The proposed scheme has several drawbacks due to the simplicity of the ring-oscillator probe.
One of the drawbacks is that the voltage-to-frequency dependence of the ring oscillator suffers from process and temperature variation. However, we can calibrate it by measuring the frequency-to-voltage dependence of each VMON before the in-situ measurement by setting the chip in standby mode. We can also compensate for temperature variation by doing this calibration frequently. Figure 8 shows the measurement procedure of the proposed in-situ measurement scheme. First, the chip must be preheated in order to set the same condition for in-situ measurement, because the temperature is one of the key parameters for the measurement. This preheating is carried out by running a measuring program in the same condition as for the in-situ measurement. A test program is coded in order to execute an infinite loop because multiple measurements are necessary for improving the measurement accuracy. Because the measuring program is executed continuously, the temperature of the chip eventually reaches a state of thermal equilibrium. After the chip has reached this state, the calibration for the target VMON is executed just before the in-situ measurement. In the calibration, the frequency of the VMON output of a selected VMON is measured by varying the supply voltage while the chip is set in standby mode. Note that the calibration method can compensate for macroscopic temperature fluctuations, but not for microscopic fluctuations that occur in a short period of time that are much less than the calibration period. After the calibration, the in-situ measurement is executed by resetting the supply voltage being measured. In measuring the other VMONs continuously, the calibration step is repeated for each measurement. If other measurement conditions such as supply voltage, clock frequency, and the program being measured are changed, the chip must be preheated again.  Fig. 8. Procedure for in-situ measurement as described in section2.1, since the ring oscillator oscillates asynchronously with the chip operating frequency, high resolution can be achieved by averaging multiple low-resolution measurements using an oscilloscope (Abramzon, et al., 2004). This method is also effective for eliminating noise from measurements. If the wire length between VMON and VMONC is longer, the amplitude of the signal becomes small. This small amplitude suffers from the effect of noise. However, by using this averaging method, the influence of noise can be reduced, and signals can be measured clearly.

Measurement results
The in-situ measurement scheme was implemented in a 3G cellular phone processor  as an example. Supply-noise maps for the processor were obtained while several actual applications were running. Figure 9 shows a chip photomicrograph. Three CPU cores and several IPs, such as an MPEG-4 accelerator, are implemented in the chip. A general-purpose OS runs on the AP-SYS CPU, and a real-time OS runs on the APL-RT CPU. The chip was fabricated using 90-nm, 8-Metal (7Cu+1Al), dual-Vth low-power CMOS process technology. This chip has 20 power domains, and seven VMONs are implemented in several of the power domains . Five VMONs are implemented in the application part (AP-Part), and two VMONs are implemented in the baseband part (BB-Part). VMONs 1, 3, 4, and 5 are in the same power domain, whereas the others are in separate power domains. The reason these four VMONs were implemented in the same power domain is that this domain is the largest one, and many IPs are integrated in it.  Fig. 9. Implementation example. This chip has three CPUs and several hardware accelerator such as a moving picture encoder (MPEG-4). The 20-power domains for partial power-shut down are implemented in a single LSI. This chip has a distributed common power domains (CPD) whose power-down opportunity is very rare. Seven VMONs and one VMOC are implemented in this chip.
Each VMON was only 2.52 µm × 25.76 µm, and they can be designed as a fundamental standard cell. Figure 10 shows the dependence of each VMON frequency on voltage, which were between 2.9 and 3.1 mV/MHz. In Fig. 10, the frequency of the ring oscillators was designed to be about 200 MHz. Time resolution was about 5 ns. Note that we used LeCroy's SDA 11000 XXL oscilloscope with a 100-M-word-long time-interval recording memory and a maximum sampling speed of 40 GS/s.

Dhrystone measurement
We show the results of measurements taken while executing the Dhrystone benchmark program in the APL-RT CPU and a system control program in the AP-SYS CPU. The Dhrystone is known as a typical benchmark program for measuring performance per unit power, MIPS/mW, and the activation ratio of the circuit in the CPU core is thus high. Figure 11 shows the local supply noise from VMON1 embedded in the APL-RT CPU that was measured while executing the Dhrystone benchmark program. In these measurements, the cache of the APL-RT CPU was ON, and the hit ratio of the cache was 100%. This is the heaviest load for the APL-RT CPU executing the Dhrystone program. The measured maximum local supply noise was 69 mV under operation of the APL-RT CPU at 312 MHz and V DD =1.25 V. In this measurement, the baseband part was powered on, but the clock distribution was stopped. (c) the local supply noise is at its maximum; (d) the AP-SYS CPU shows a supply "bounce" due to an inductive effect; (e) A typical situation where the APL-RT CPU executes Dhrystone and (f) both CPUs show a supply bounce due to an inductive effect. Although the seven measurement points are insufficient for showing in a 3D surface expression, this expression helps to understand the voltage relation between these points. Figure 12 shows supply-noise maps obtained using these VMONs. Generally, although seven measurement points is insufficient for rendering in a 3D surface expression, this simple expression helps to understand the voltage relation between these points. This scheme can also produce a supply-noise-map animation, and Figs. 12(a) to (f) show snapshots of supply-noise maps corresponding to the timing points indicated in Fig. 11. Figure 12(a) is a snapshot when the CPUs are not operating but are consuming clock power. The location of each VMON is shown in Fig. 12(a). Note that the APL-RT CPU was running at 312 MHz, and the AP-SYS CPU was running at 52 MHz. Figure 12(b) is a snapshot taken when the Dhrystone program has just started. Two hot spots are clearly observed. Figure 12(c) is a snapshot when the local supply noise is at its maximum. Figure 12(d) is an image taken when the AP-SYS CPU shows a supply "bounce" due to an inductive effect. A typical situation where the APL-RT CPU executes Dhrystone while the AP-SYS CPU is not operating but is consuming clock power is depicted in Fig. 12(e). Figure 12(f) is a snapshot when both CPUs show a supply bounce due to an inductive effect. At this time, the Dhrystone program was terminated, and both CPUs changed their operating modes, causing large current changes. It looks as if clock power consumption has vanished, although the clock remains active.

Measurement of moving picture encoding
Another measurement example involves moving picture encoding. A hardware accelerator that executes moving picture encoding and decoding (MPEG4) was implemented in this chip, as shown in Fig. 9, and VMON5 was embedded in it. The waveform measured by VMON5 is shown in Fig. 13. In this MPEG4-encoding operation, a QCIF-size picture was encoded using the MPEG4 accelerator. In the measurement, the APL-RT CPU was running at 312 MHz, and the AP-SYS CPU was running at 208 MHz. The maximum local supply noise measured by VMON5 was 30.9 mV, and the average voltage drop was smaller than that when executing the Dhrystone benchmark program. This result confirms that good power efficiency was attained using hardware accelerators. Measured maps of the typical situations are shown in Fig. 14. Figure 14(a) is a snapshot taken when neither CPU was operating but both were consuming clock power; it also shows the location of each VMON. Note that the APL-RT CPU was running at 312 MHz, and the AP-SYS CPU was running at 208 MHz. Figure 14(b) is a snapshot when the APL-RT CPU was initializing the MPEG4 accelerator. Figure 14(c) depicts the situation when the local supply noise was at its maximum. The image in Fig. 14(d) illustrates the period when the execution of the MPEG4 accelerator was dominant. Figure 14(e) is a snapshot when the APL-RT CPU was executing an interruption operation from the MPEG4 accelerator, and Fig. 14 (f) shows the typical situation where the MPEG4 accelerator was encoding a QCIF-size image. This measurement was done using simple picture-encoding programs, so frequent interruptions were necessary to manage the execution of the program. However, in real situations, since operation would not be carried out with frequent interruptions , and the APL-RT CPU might be in the sleep mode, the power consumption of the APL-RT CPU would be reduced, and the map would show a calmer surface. These results show that by using a hardware accelerator, the power consumption was also distributed over the chip, resulting in a reduction in the total power consumption. This voltage-drop map therefore visually presents the effectiveness of implementing a hardware accelerator.

Conclusion
An in-situ power supply noise measurement scheme for obtaining supply-noise maps was developed. The key features of this scheme are the minimal size of simple on-chip measurement circuits, which consist of a ring oscillator based probe circuit and analog amplifier, and the support of off-chip high resolution digital signal processing with frequent calibration. Although the probe circuit based on the ring oscillator does not require a sampling-and-hold circuit, high accuracy measurements were achieved by off-chip digital signal processing and frequent calibrations. The frequent calibrations can compensate for process and temperature variations. This scheme enables voltage measurement with millivolt accuracy and nanosecond-order time resolution, which is the period of the ring oscillator.
Using the scheme, we demonstrated the world's first measured animation of a supply-noise map in product-level LSIs, that is, 69-mV local supply noise with 5-ns time resolution in a 3G-cellular-phone processor. In this book the reader will find a collection of chapters authored/co-authored by a large number of experts around the world, covering the broad field of digital signal processing. This book intends to provide highlights of the current research in the digital signal processing area, showing the recent advances in this field. This work is mainly destined to researchers in the digital signal processing and related areas but it is also accessible to anyone with a scientific background desiring to have an up-to-date overview of this domain. Each chapter is self-contained and can be read independently of the others. These nineteenth chapters present methodological advances and recent applications of digital signal processing in various domains as communications, filtering, medicine, astronomy, and image processing.