In VLSI circuit design, nonlinear signals processing circuits such as minimum (MIN), maximum (MAX), median (MED), winner-take-all (WTA), loser-take-all (LTA), k-WTA, and arbitrary rank-order extraction are useful functions (Lippmann, 1987; Lazzaro et al., 1989). In general, median filter is used to filtering impulse noise so as to suppress the impulsive distortions. The MAX and MIN circuits are important elements in fuzzy logic design. With regard to WTA application, it is the major function in pattern classification and artificial neural networks. Thus, design of these nonlinear signal-processing circuits to integrate smoothly within SoC (System-on-a-chip) applications becomes an important research. Recently, complementary metal-oxide-semiconductor (CMOS) technology is widely used to fabricate various chips. In this chapter, the designs of all circuits are realized by using CMOS process. However, since CMOS transistor is continuously scaled down via thinner gate oxides and reduced device size, supply voltage is necessary to reduce in order to improve device reliability. Therefore, a high reliable WTA/LTA circuit, a simple MED circuit, and a low-voltage rank-order extractor are addressed in the chapter. The organization of this chapter is as follows. Section 1 introduces the background of these nonlinear functions, including definitions and applications. Section 2 describes conventional WTA/LTA architectures and presents a high reliable winner-take-all/loser-take-all circuit. Section 3 shows an analog median circuit, with advantage of simple circuit. Section 4 describes a CMOS circuit design for arbitrary rank order extraction. Restrictions and design techniques of low voltage CMOS circuit are also addressed. Section 5 will briefly conclude this chapter.
Given a set of external input n variables a 1, …, an , the operation of MAX (or MIN) circuit determines the maximum (or minimum) value. A median filter puts out the median variable among a window of input samples. The function of a WTA network is to select and identify the largest variable from a specified set of variables. A counter part of WTA, LTA identifies the smallest input variable and inhibits remain ones. Instead of choosing only one winner, the k-WTA network selects the largest k numbers among n competing variables ( ), which allows for more flexibility in applications. For arbitrary rank order identification, a rank-order filter (extractor) is designed to select the k-th largest element ak among n variables a 1, …, an . Depending on application requirements, these input variables are either voltage, or current signals.
In order to clearly describe these nonlinear functions, taking one example indicates these definitions. Two output responses of a circuit corresponding to a set of input currents Iin1, Iin2, …, and IinN : one is analog output current Io, the other one is digital outputs set Vo1(rank), Vo2(rank), …, and VoN(rank). Assuming five external input currents are 9, 7, 10, 5, and 3 μA. Depending on various functions requirement, the output current Io and the corresponding digital outputs responses are as follows.
MAX: Io = Maximum(Iin1, Iin2, …, IinN)= Iin3= 10 μA
MIN: Io = Minimum(Iin1, Iin2, …, IinN)= Iin5= 3 μA
MED: Io = Median(Iin1, Iin2, …, IinN)= Iin2= 7 μA
WTA: Output voltages Vo1(rank), Vo2(rank), …, and Vo5(rank) respond to logic high to identify which one is the maximum value among Iin1, Iin2, …, and IinN. In this case, (Vo1(rank), Vo2(rank), …, Vo5(rank))= (0, 0, 1, 0, 0), where “0” and “1” are the logic low and logic high, respectively.
LTA: A reverse operation of WTA function, and outputs set is (0, 0, 0, 0, 1) for this case.k-WTA: Depending on k value, k winners are selected. The function has more flexible in application than WTA. For example, the outputs of 2-WTA is (Vo1(rank), Vo2(rank), …, Vo5(rank))= (1, 0, 1, 0, 0) in this case.
Rank order: The function of the rth rank-order extraction identifies the rth largest magnitude among Iin1, Iin2, …, and IinN. For example, outputs of the 2nd and 3rd rank order are (1, 0, 0, 0, 0) and (0, 1, 0, 0, 0) in this case, respectively.
Various applications for these nonlinear functions are described as follows. The MAX and MIN circuits are important elements in fuzzy logic design (Yamakawa, 1993). Figure 1 shows the MAX and MIN operations in fuzzy inference. Variables “x” and “y” are inputs; variable “z” is the corresponding output response. In a specific status, either rule 1 or rule 2 is satisfied. MIN function realizes the “and” operation in fuzzy rules, and MAX function realizes the “or” operation. In image signal processing, MED function in general is used to filtering impulse noise so as to suppress the impulsive distortions. Figure 2 shows a one-dimension application for noise cancellation. Figure 2(a) shows a Vpp 1.2 V sinusoidal signal corrupted by noise, and Figure 2(b) shows the processed signal after MED filtering with a window of size five. In addition, Figure 3 shows a two-dimension application also for noise cancellation of image. With regard to WTA application, it is the major function in pattern classification, vector quantization, data compression, and self-organization neural networks. Figure 4 shows WTA application for pattern identification. Commonly, an analogue rank order filter is widely used in signals sorting and classification.
In general, these nonlinear functions are achieved either by using digital or analog implementations. Under digital implementation, since most of signals obtained from the real world are continuous forms, the continuous inputs must first be transferred to digital type by using one-or-multiple analog-to-digital converter (A/D). As a result, the circuit complexity, chip area, and power consumption are increased due to the extra data converters in digital realization. Whereas for analog implementation, the circuit accuracy is slightly lost than digital operation and there is weaker tolerance to fabricate process variation. However, without extra data transfer, the analog operation is with many advantages such as saving time, bandwidth, and computation at the system level. Considering the practicality and flexibility, design issues of a CMOS analog signal processing circuit therefore must include 1) precision; 2) speed; 3) high tolerance to fabrication process variation; 4) wide range of supply voltage; 5) wide input range; 6) low circuit complexity; 7) low power consumption; 8) scalability; 9) programmability, and so forth, to allow these functions easily integration within various system-embedded chips. Additionally, when the device size of CMOS transistor is shrunk thinner and smaller, supply voltage is necessary to scale down in order to improve device reliability. A forecast of high-performance CMOS circuit operated within low voltage had been reported (Semiconductor Industry Association, 2008). Figure 5 shows the trend of CMOS supply voltage and physical gate length. Moreover, portable equipments such as biomedical electronics, computer, and portable telecommunication equipments are common used recently. Battery operation and low-power consumption are also important design requirements for these circuits.
2. Winner-Take-All and Loser-Take-All circuit
2.1 Architectures of WTA/LTA circuits
Based on different circuit structures, conventional WTA/LTA circuits are roughly cataloged into four types: 1) global-inhibition structure, in which the connectivity increases linearly with the number of inputs (Lazzaro et al., 1989; Starzyk & Fang, 1993); 2) cell-based tree-topology (Smedley et al., 1995; Demosthenous et al., 1998); 3) excitatory/inhibitory connection (He & Sanchez-Sinencio, 1993); and 4) serial cascade structure (Aksin, 2002). Figure 6(a-d) shows the conceptual diagrams of these topologies. In Figure 6(a), each cell receives the same global inhibition, and a common current Icomn or voltage Vcomn is shared by all the competing cells. The cells represented in a square block are nonlinear signal processing elements. Therefore, the precision of the circuit is degraded as the number of inputs increases. Since the operation of this circuit relies on the cells matching, a stable fabrication process is required for manufacturing a high-precision system. The complexity of the connectivity of the circuit is O(N), where N is the number of inputs. Figure 6(b) shows a cell-based tree-topology, with N-1 cells arranged in a tree topology for N inputs. Each cell receives two input variables to compare and outputs the larger (or smaller) of the two input signals. The backward digits in the bottom cell are then successive feedback to 1st-layer cells to identify the maximum (or minimum) input. The precision of this circuit is also sensitive to cell matching. With this circuit design, the device sizes must be rescaled when the supply voltage is modified.
Figure 6(c) shows an excitatory/inhibitory connection with an O(N2) connectivity complexity. Each cell receives the inhibited signals from other cells and an excitatory signal from itself. With this design, chip area increases with the square of the number of inputs. Based on comparators operation, Figure 6(d) shows an N-1 analog comparison blocks and N-1 digital blocks cascaded in serial. Within a comparison time Tcomp, the larger magnitude of inputs in each analog block is sent to next stage to compare with other inputs. The result of the each comparison is then sent to the corresponding digital block, and a decision digit is feedback from right block to left block to identify the maximum input. As a result, the response time of the circuit is approximated to , where Tdig is the total propagation time of the digital part. The offset voltage of each comparator dominates the precision of the architecture. Circuit implementation of Figure 6(d) is also sensitive to process variation. For a high precision application, identical internal circuit blocks shown in Figure 6(a-d) are necessary. The primary limitations of accuracy for the conventional architectures are fabricated process variations and matching requirement of internal cells. The variations of CMOS fabricated process include transistor threshold voltage, actual device size, thinness of the gate oxide, and other variety of factors. In a common process, threshold voltage in general varies from –10% to +10% of its nominal value. Due to the non-uniform etch and diffusion procedures, actual device sizes are also varied. In a real CMOS process, these variations are hard to eliminate completely. How can we improve the accuracy of analog circuit in a conventional process?
2.2 A high reliable WTA/LTA circuit
In the section, a highly reliable CMOS signal processing circuit with a programmable capability for WTA function and LTA function is described (Hung & Liu, 2004). A symbol (1≦j, k≦N and N is the number of inputs) is defined such that the ith comparator cell receives two input variables (Vinj and Vink ) to compare in magnitude at time t, and the output of the cell is the larger variable or a binary value. For a operation, is defined as
Therefore, returning to the conventional architecture the tree topology of Figure 6(b), WTA mode, is represented as:t1:
After time O( ) the maximum (or the minimum) input variable is obtained. Total N-1 identical comparators are necessary for this operation.
To reduce the matching requirement of internal cell, Figure 7 shows a conceptual diagram of high reliable circuit. In the scheme, there are N identical ‘digital’ control cells and a single comparator for N input variables. A single comparator block multiplexes in time to achieve all inputs comparisons. The operating procedures are described as follows:t1: :
The strategy adopted to find the maximum/minimum among a set of variables is that two variables are first compared; then the result of this comparison is compared with the next input variable using the same comparator. The procedure continues until the comparisons of all input variables are completed. Conceptually, circuit operation is similar to a serial comparison. Unlike the traditional architectures that require N-1 analogue comparators; this architecture requires only a single comparator to eliminate sensitivity to component matching requirements. Using the same algorithm, the LTA function is easily obtained by only reversing the output state in the same architecture.
The key block in this architecture is the comparator cell. Comparator performance is a crucial factor for realizing high-speed data conversion systems and telecommunication interfaces. The precision of a comparator is usually defined as the minimum identifiable differential voltage (or current) between inputs, that is, the comparator’s resolution capability. A comparator design from (Hosotani et al., 1990) is used herein; the schematic diagram is shown in Figure 8. Transistors Msw1, Msw2, Msw3 are used as switches. The circuit operates on two phases, auto-zero phase and comparison phase. Assuming the voltage at node B is Vx. Based on charge conservation, after the comparison phase, Vx arrives at the following:
The effect of the term in (1) represents a degrading factor. To reduce the decision time, the succeeding inverters amplify the different voltage (Vin2 - Vin1) to pull node D up to high (logic 1) or push it down to 0 V (logic 0). The functions of the N-latch are to sample the voltage at node D as latch_clk turns high and to hold the comparison result as latch_clk turns low. Ultimately, the output polarity of the N-latch will be changed according to the max/min selector setting. The max/min selector signal modifies the polarity of the compared result; therefore, without the need for structural modification, this circuit possesses win/lose configurable capability. The comparison block shown in Figure 8 is reused during all comparison procedures. The architecture of N-inputs circuit is shown in Figure 9, in which Control_Celln (1≦ n ≦N) are identical. N cells are required for N input variables. Each cell contains a status block, a control_switch block, and two latch blocks.
Figure 10 shows the clocks for the whole circuit. Signal reset and clock reg_clk must be generated externally; other clocks are produced by reg_clk and some logic gates.
To describe the operations of the entire circuit, the circuit architecture in Figure 9 and the clock waveform in Figure 10 are referred. First, at t1, reset signal is used to initiate the status blocks, control_switch blocks and latch blocks. The N-latch in the status block and Ro1, Ro2, …, RoN are reset to zero by reset signal. Based on max/min selector signal, the MOS transistors Ms1, Ms2, Ms3 and Ms4 preset the initial sampling voltage (0 V or VDD) at node cap_comn. Despite the magnitude of input-1 variable, the input-1 variable must be a winner during an initial interval for a serial comparison. The initial sampling voltage at node cap_comn is thus set as 0 V when the max/min selector signal is set to logic 1 for WTA operation, and vice versa.
Then, at t2, the Vs1 clock turns high (auto-zero phase) to sample the initial voltage (0 V or VDD) at node cap_comn. Next, at t3, Ro1 turns high to sample voltage Vin1. At this time, the clock Vs1 turns low (comparison phase) to compare the Vin1 with the initial sampling voltage, and the compared result is stored in the N-latch of the first status block. The state of the N-latch is logic 1 if the variable is the winner. At t4, the present winner Vin1 is sampled again. At t5, a new comparison between previous winner Vin1 and Vin2 is performed. At t6, the winner (the result for the Vin1 and Vin2 comparison) is sampled again. After this procedure, a new comparison between the present winner and Vin3 is performed. The procedure continues until comparison of all the input voltages is completed. Ultimately, only one state Vosn (n=1,..., N) in these cells is logic 1 for WTA/LTA indication; others are logic 0. Therefore, a WTA or a LTA operation has been accomplished.
Figure 11 shows the status block. Figure 12 shows the control_switch block. It receives an input variable and controls the transmission gate to sample input level. A true single-phase latch composed of an N-latch and a P-latch is used to reduce the clock skew issue (Yuan & Stensson, 1989).
2.3 Simulation results and reliability test
With regard to the high reliable WTA/LTA circuit, an experimental chip with six inputs was also fabricated using a 0.5-μm CMOS technology. The sampling capacitance Cs implemented by using two-layer polysilicon is set to be 3 pF. The period of reg_clk clock is 100 ns with a 50% duty cycle. WTA/LTA functions, supply-voltage range, and Monte Carlo analysis of transistor variation by simulation were also tested.
1) WTA/LTA functions
To test the function of the circuit, each example takes ten input voltages for the WTA/LTA operation. For supply voltage VDD=3.3 V, the input variables Vin1, Vin2, …, and Vin10 are 0.003, 0.006, 1.000, 0.997, 2.000, 2.003, 2.000, 3.297, 3.300, and 3.297 V for testing WTA function, respectively, and 3.297, 3.294, 2.000, 1.997, 2.000, 1.000, 0.997, 0.006, 0.009, and 0.003 V for testing LTA function. During the WTA operation, the logic state Vosn of each cell at each time slice becomes:
Vos1= 1,0,0,0,0,0,0,0,0,0 Vos2= 0,1,0,0,0,0,0,0,0,0 Vos3= 0,0,1,1,0,0,0,0,0,0 Vos4= 0,0,0,0,0,0,0,0,0,0 Vos5= 0,0,0,0,1,0,0,0,0,0 Vos6= 0,0,0,0,0,1,1,0,0,0 Vos7= 0,0,0,0,0,0,0,0,0,0 Vos8= 0,0,0,0,0,0,0,1,0,0 Vos9= 0,0,0,0,0,0,0,0,1,1 Vos10=0,0,0,0,0,0,0,0,0,0.
When all comparisons are finished, the outputs Vos1, Vos2, Vos3,..., and Vos10 respond as logic 0, 0, 0, 0, 0, 0, 0, 0, 1, and 0, respectively. Therefore, among these ten inputs, input variable Vin9 is the maximum. Figure 13 shows the results of HSPICE simulation for the WTA operation. The time period of the latch clock (top trace) is 100 ns. In the same operation, Figure 14 shows the results for the LTA operation. The final outputs Vos1, Vos2, Vos3, …, and Vos10 are logic 0, 0, 0, 0, 0, 0, 0, 0, 0, and 1, respectively, and the input variable Vin10 is the minimum one. Choice for the above tested voltages was based on the followings: 1) input voltages of neighbor cells should be as close as possible to test discrimination capabilities; 2) input voltages are distributed from 0 V to 3.3 V to test for wide dynamic range.
2) Supply voltage range
All circuit parameters such as transistor dimensions, clock periods and sampling capacitance Cs are held constant. A supply voltage VDD varies from 2 V to 5 V, and the logic high of these clocks are also modified when the supply voltage alters. The supply voltage VDD for each iteration increases in 0.1 V steps. The simulation results show that the circuit operates successfully within 3-mV discrimination when the supply voltage ranges from 2.7 V to 5 V. Without any procedure for rescaling the device size, the circuit works under various commonly used supply voltages.
3) Process variations
A statistical distribution of manufacturing parameters often occurs during CMOS fabrication. Wafer-to-wafer, run-to-run and transistor-to-transistor process variations determine the electrical yield and critical second-order effects. Threshold voltage, channel widths, and channel lengths of all MOS transistors were set to nominal values with ±5 % variation at the 3 sigma level, and each transistor was given an independent random Gaussian distribution. After 30 Monte Carlo iterations, HSPICE results indicate that circuit precision and speed are not degraded over this range. In addition, to verify the circuit with multi-technology support capability, using various CMOS fabrication parameters also simulates the circuit performance. The results show that the performance of the circuit under various fabrication processes is functional work, without needing to tune any device dimension. The following reasons contribute to the robustness of this circuit: 1) the circuit is designed with only a single analog cell (comparator), while the other active components are digital; 2) the comparator itself is designed with a auto-zero property, therefore, the operation of the comparator is more tolerant to manufacturing process variation.
4) Circuit precision
The accuracy of the comparator cell dominates the identified precision. The comparator accuracy is dependent on two factors. One is the clock feed-through error and charge-injection error in transistor Msw3, shown in Figure 8; the other is the degrading factor in Eq. (1). Charge-injection error is a complicated function of substrate doping concentration, load capacitor, input level, clock voltage, clock falling rate, MOS channel dimension, and the threshold voltage. Therefore, this error is difficult to be completely eliminated. In general, complementary clock, transmission gates, and dummy transistor are adopted for a switch realization to reduce the error.
3. CMOS analogue median cell
Median (MED) filter is a useful function in image processing application to eliminate pulse noise. Given a set of external input n variables a1, …, an, the operation of MED circuit determines the median value. The extracted median operation is a nonlinear function. The MED circuit realizations can be classified as analog filtering and digital filtering depending upon what type of input signals are. The digital filtering architecture has a variety of sophisticated algorithms to support the circuit realization so as with advantages of higher flexible and higher reliability. For power consumption and chip area considerations, however, it is costly expensive than analog architecture. In 1994, without using an operational amplifier, an analogue median extractor with simple structure and high sharp DC transfer characteristic was presented (Opris & Kovacs, 1994). The circuit expects to reduce the errors in the transition region. In 1997, for the same authors, an improved version with high speed operation was proposed. The median circuit has transient recovery less than 200 ns by using 2-um CMOS process (Opris & Kovacs, 1997). In 1999, a current-input analog median filter composed of absolute value and minimum circuits was proposed (Vlassis & Siskos, 1999). The operational amplifier and transconductor are also not needed in design of the circuit. Based on transconductance comparators and analog delay elements, a fully continuous-time analog median filter is presented in 2004 (Diaz-Sanchez et al., 2004). By using the median filter cells, an image of 91×80 pixels can be processed in less than 8 μs to remove salt and pepper noise. In the section, an intuitional and simple CMOS analog median cell is described (Hung et al., 2007). Based on current-mirror, current comparison, and some basic digital logics, a simple analog median filter cell is achieved. By using TSMC 0.35 μm CMOS technology, simulation shows that the median filter provides a 0.4-μA discriminability and well tracked the median value among input currents.
Figure 15 shows a basic one-input current cell composed of current mirror and control logic circuits. The cell has one signal input (is), a current source (is_src) output and a current sink (is_sink) output, a control signal Vctr., and an output current (iout). Transistors M1-M12 are cascode current mirrors. Mswp and Mswn constitute transmission gate for analog switch function. Mdummy is designed to compensate the Mswn and Mswp loading to improve the accuracy of output current. Miso is used to isolate the clock noise from transmission gate. Mdis1-2 and Mres are used to speedup transmission operation and control the discharge timing. Corresponding to Figure 15(a), Figure 15(b) is a symbol representation, which is named as current signal control unit and is abbreviated as CSCU.
Three input signals is1, is2, and is3, how can circuit extract the median value? Assuming is2 is a median current. The criteria must be satisfied.
As a result, current level comparison and logic decision are required to realize the function. Figure 16 shows a three-input median circuit composed of three CSCU cells and three decision logic blocks. The decision logic circuit is simply realized by AND-OR gate circuit to perform
where (1), (2), (3) and (4) represent the corresponding the logic inputs, that is, these signals come from comparison results signals. Depending on the output status of each decision logic, Eq. (3) determines Vctr a low level or a high level, respectively. A low Vctr will turn on the transmission gate of corresponding CSCU cell to switch on the input current; otherwise, the input current is prohibited. As a result, three-input MED filter cell is successfully arrived. Due to the transition pulse noise, a capacitor Cfilter is used to suppress the switch noise.
In the circuit, NMOS transistor size (W/L)N=5μ/1μ and PMOS transistor size (W/L)P=10μ/1μ are used for M1-M12. The sizes of inverters are (W/L)N=5μ/0.35μ and (W/L)P=20μ/0.35μ. The device site of switch transistors Mswn and Mswp are equal to (W/L)N-P=20μ/0.35μ. All transistors in decision logic block are sizing (W/L)N=5μ/0.35μ and (W/L)P=10μ/0.35μ. The filter capacitance Cfilter is designed as 10 pF. The supply voltage VDD is commonly used as 3.3 V. Input current signals is1, is2, is3 have 10 μA peak value at different 5 μs, 10 μs, and 15 μs time slot, respectively. Figure 17 shows three triangle waves and the corresponding median output. The red line represents the MED output. The output is tracked well with the median value of the three inputs current. By observing Figure 17, when two input values are closed to each other, the minimum difference must be larger than 0.4 μA. That is the discriminability of the MED filter. However, there are some little spike occurs in the transition point.
Inspecting Figure 16, the proposed three-input median cell has three input pins (is1, is2, and is3) and a common output pin (iout). By modifying the switch transistors and decision logic, the MED cell can be easily modified as three inputs and three outputs. The modified MED cell will have maximum value imaxmin, median value imedian, and minimum value iminmum outputs, simultaneously. As a result, the multiple modified MED cells can be organized cooperation to perform the ‘sorting’ function. In the design, no critical components such as operational amplifier and precise voltage reference are required in the MED cell. These properties are useful for the MED cell simply embedded into a larger system.
4. Low-voltage arbitrary rank order extraction
4.1 Principle of rank-order extraction
Ether WTA, LTA, or MED function, however, is only a single order operation. In 2002, a low-voltage rank-order filter with compact structure was designed (Cilingiroglu & Dake, 2002). The filter is based on a pair of multiple-winners-take-all and a set of logic gates. In the section, a new architecture for with both arbitrary rank-order extraction and k-WTA functionalities is described (Hung & Liu, 2002). An rth rank-order extraction is defined that identifies the rth largest magnitude of input variables. In the design, the circuit locates an arbitrary rank order among a set of input voltages by setting different binary signals. A set of output voltages Vo_1, Vo_2, …, and Vo_M corresponds to the output voltages of a rank-order extractor for inputting of a set of variables V1, V2, …, and VM. The output status Dij of a comparator with two-input terminals is defined as
where M is the number of the input variables. For convenience of description, a temporal index Si defines the total number of winners for the ith input variable compared with the others. Thus, Si is represented as
Based on the definition of (5), Si is expanded as follows
Thus, from the left-hand side of (6), M(M-1) comparators’ cooperation is required for M input variables to identify the rank order. Since Dji is the complementary of Dij ( Dji= the expression is replaced by in the right-hand side of (6). The physical meaning is that if both the output of the comparator and its complementary are given, the total number of comparators can be reduced from M(M-1) to M(M-1)/2.
In this section, the comparator generates a unit current Iunit when input variable Vi is larger than Vj. Thus, the index Si in (5) is rewritten as
where n is the number of the winner in comparison. If the inputs are arranged in ascending order of magnitude, V1, V2, …, VM, which satisfy V1<V2< … <VM, then Obviously, the minimum, next minimum, …, maximum input variables can be found by checking the index The k-WTA function is defined so that the outputs must be logic high when
For example, if the input variables are (0.5, 0.6, 0.9, 0.2, 0.4), the first variable 0.5 is larger than variables 0.2 and 0.4. Thus, the index is 2Iunit; the meaning is that the variable wins two other input variables among all comparisons. For the same reason, the Therefore, the rank order is found among the input variables by checking the index . In this example, the output voltages (Vo_1, Vo_2, …, Vo_5) of the extractor respond to be (0, 0, 1, 0, 0), (0, 1, 0, 0, 0), (1, 0, 0, 0, 0), (0, 0, 0, 1, 0) for the maximum operation, next maximum operation, median operation, and the minimum operation, respectively. The “0” and “1” are the logic low and high. Similarity, if the extractor is configured as k-WTA function, the output voltages (Vo_1, Vo_2, …, Vo_5) of the circuit respond to be (1, 1, 1, 1, 1), (1, 1, 1, 0, 1), (1, 1, 1, 0, 0), …, and (0, 0, 1, 0, 0) for 5-WTA, 4-WTA, 3-WTA, …, and 1-WTA operations, respectively.
4.2 Architecture of rank-order extraction
The structure of the extractor is shown in Figure 18 for five input variables (Hung & Liu, 2002). There are a total of M(M – 1)/2 comparators and M evaluation cells for M input variables. Each comparator cell accepts two input signals, and the results of each comparison are fed into the individual evaluation cell. In the first row of Figure 18, the input V1 is compared with other input variables. In addition, the results of the comparison will generate the proper unit currents Iunit. Then, these currents will be summed up in Eval-1 cell if V1 is larger than the other samples; otherwise, the result of the comparison will be fed into the corresponding evaluation cell. The connecting strategy is the same for other input variables. Therefore, equation (7) have been realized in this architecture.
The signal Vchoice in Figure 18 is used to decide the function of the circuit. Vchoice is preset at logic high to allow the rank-order operation; otherwise, the k-WTA function is enabled. The binary signals sel_1, sel_2, and sel_3 are used to determine which rank-order/k-WTA will be located. Based on the select signals (sel_1-3) setting, the logic states of the evaluating cells indicate which input variable belongs to this rank order. For example, in the seven inputs rank-order operation, the (sel_1, sel_2, sel_3) signals are set to logic (0, 0, 0) to find the minimum variable; the logic (0, 1, 1) and (1, 1, 0) setting are the median and maximum functions, respectively. Similarity, in the k-WTA operation, the (sel_1, sel_2, sel_3) is set as (0, 0, 1) and (1, 1, 0); therefore, the 6-WTA and 1-WTA are obtained, respectively.
4.3 Circuit design
4.3.1 1.2-V comparator
Comparator is a key element in Figure 18. An auto-zero comparator shown in Figure 19 is designed to operate at low voltage supply. To improve the speed of the comparator, the succeeding gain stage is designed to operate in dynamic mode. First, in the auto-zero phase, the input V1 is sampled at the top plate of the capacitor Cs, and the MOS transistor M11 is biased at Vbias voltage. In next phase, the voltage at node E is Vbias+(V2-V1)(Cs/Cs+Cp) during the comparison phase. Then, a deviation voltage is amplified by transistors M11 and M12. To reduce the power dissipation, the adjustable biasing voltage Vbias is chosen simply to overcome the threshold voltage of a MOS transistor, and the biasing voltage is also adjusted for the comparator operation in different voltage supplies. The succeeding transistors M13 and M14 provide the current to generate the proper voltage at node F. Depending on which input voltage is larger, either the voltage at node H or node G will be at logic high. The output node G of the comparator and its complementary node H are fed into next stage to generate unit currents Ilarge_1, Ilarge_2, Ismall_1, and Ismall_2. During the evaluation phase, the unit currents Ilarge_1 and Ilarge_2 will be presented when V1 is larger than V2. Otherwise, the Ismall_1, Ismall_2 are generated. The symbol representation of the comparator cell is shown in the right-bottom of Figure 19.
The function of the comparator shown in Figure 19 is summarized as
where is the unit current of the PMOS transistor Mbase.
4.3.2 Evaluation cell
The circuit of the evaluation cell is shown in Figure 20. The MOS transistors Mgen and Munit reproduce the same unit current. The unit current is equal to the Ilarge_1, Ilarge_2, Ismall_1, and Ismall_2 in Figure 19. In order to find the various rank orders for all input signals, the cell must identify that the unit-current summation in (7) comes from Out_com1 and Out_com2 terminals. It is not easy to identify the exact current value in the VLSI circuit. However, whether the summation current lies inside a valid range or not can be checked by the criterion,
It is a reasonable and safe design to choose Therefore, the dimensions of these MOS transistors are designed as
where W is a channel width and L is a channel length. MOS transistors Madd1 and M4 realize the effect, and the M8 realizes the one. Depending on the sel_1-3 signals setting, the transistors Mcnt_1-6 enable the corresponding binary-weight current. The inverters inv4-7 support sufficient gain to amplify the current difference between the currents which come from Out_com1-2 terminals and the binary-weight currents. This mechanism is similar to a current comparator. In the upper row of Figure 20, the extra PMOS transistor Madd1 generates an extra unit current; therefore, the voltage Vout-h is always larger or equal to Vout-l. If the Vchoice is preset to 0, the dash block in Figure 20 resets the Vout-l to 0. Then the effect of lower row in Figure 20 is disabled. At this time, the function of the cell resembles performing only the
Thus, this is a k-WTA criterion.
Take an example to describe the function of the evaluation cell. The number of input variables is seven, and the sel_1-3 signals are set as (0, 0, 1) to find the next minimum input variable. Since the next minimum is only larger than the minimum one, only a single unit current comes from Out_com1-2 terminals of the corresponding evaluation cell. In the upper row of Figure 20, the summation of one unit current and the extra unit current (Madd1) is larger than binary weight current 1.5Iunit; therefore, Vout_h is logic 1. In contrast with the upper row, in the lower row the unit current Iunit (which comes from Out_com1-2 terminals) is smaller than the binary weight current 1.5Iunit; therefore, Vout_l is logic 0. Thus, the transistors Mid1 and Mid2 only allow the situation (Vout_h, Vout_l)= (1, 0) to pull up the corresponding output (Vo_n, n=1, …, 7) to logic 1. Otherwise, the status of Vo_n will be logic 0 or open state for other cases. Therefore, by inspecting the logic state of Vo_n, it is found which input variable belongs to this desired rank order.
4.4 Measured results and design consideration
A seven-input experimental chip was fabricated using a 0.5 μm CMOS technology. Bias voltage Vbias is set to 0.9 V in this design. The sampling capacitor Cs is 0.8 pF, and these analog switches in this circuit are implemented by CMOS transmission gates. The micrograph of the experimental chip is shown in Figure 21, and the active area is 610 × 780 μm2. An individual comparator cell was built in this chip for measuring the accuracy. The supply voltages of the core circuit and the input/output pads were all set as 1.2 V. The accuracy of the individual comparator was measured roughly as 40 mV, that is, the resolution of the comparator was near five bits under a 1.2 V supply voltage. Figure 22(a)
shows the rank-order function, whereas Figure 22(b) shows the function of the k-WTA. On the average, the accuracy of whole circuit was approximated 150 mV. The performance of the chip was degraded by many factors such as the mismatch in comparator cells, the different capacitance at input terminals of the evaluation cells, and the clock feed-through error. Due to these non-ideal effects, each rank-order function was finished in 20 μs. After increasing supply voltage up to 1.5 V and proper biasing voltage Vbias adjusting, the performance of the circuit can be improved. Including power consumption of the input/output pads, the static power consumption of the chip was 1.4 mW.
Many factors such as precision, speed, process variation, and chip area must be considered for design of a low-power low-voltage rank order extractor.
Limitations of low voltage and low power
The average power consumption of the circuit is expressed by
where f is the frequency, C is the capacitance in the circuit, VDD is the voltage supply, Io is the standby current, Ileakage is the leakage current, and the Qsc is the short-current charge during the clock transient period. In order to reduce the power consumption, the voltage supply VDD must be reduced, and the standby current in the comparator and evaluation cell must be designed as small as possible. In mask layout, the clock and its complementary are generated locally to reduce delay and mismatch. Thus, the probability of a short current occurring in the circuit is minimized.
Speed and precision
The accuracy of the comparators determines the resolution of the circuit. For the comparator design, the smallest differential voltage, that is, distinguished correctly is influenced by two factors. One is the charge-injection error in analog switches, and the other is the parasitic capacitor Cp effect. The effect is reduced by enlarging the sampling capacitor Cs and making the switches dimension as small as possible. In the design, the response time of the extractor is the summation of the auto-zero time the comparison time , and the evaluation time
Reducing and will improve the response time The minimum auto-zero time is required to sample the input voltage correctly at sampling capacitor Cs and to bias the inverter properly at high gain region. The switches shown in Figure 19 with larger dimension reduce auto-zero time However, the clock feed-through error and charge injection error will also be enlarged during the clock transition. In the same situation, the smaller sample capacitor Cs will reduce the time Unfortunately, it will reduce the effective magnitude of the difference voltage; thus, the comparator accuracy is degraded. The comparison time dominates the response time especially when the input levels are close each other. Since the amplification in the transition region of a CMOS inverter operated at low voltage supply is not high enough, the comparator must take a long time to identify which input variable has a larger level. The evaluation time is defined so that the time interval between the comparator cells generates the proper currents and the extractor has finished finding the desired rank order. Time is a function of the current Iunit. The maximum number M of input variables is also influenced by the current Iunit. Although reducing the magnitude of the current Iunit is able to reduce the power consumption, however, the relationship among , Iunit, and M in this architecture is a complicated function.
Process variation analysis
With contemporary technology, process variation during fabrication cannot be completely eliminated; as a result, mismatch error must be noticed in VLSI circuit design. The match in dimension of the binary-weight MOS in the evaluation cell (M1 - M8 in Figure 20) is an important factor for the circuit operation. If the mismatch error induces an error current Ierr larger (or smaller) than half of the unit current Iunit, decision of the evaluation cell fails. Thus, a rough estimated constraint for Ierr is
The chapter describes various nonlinear signal processing CMOS circuits, including a high reliable WTA/LTA, simple MED cell, and low-voltage arbitrary order extractor. We focus the discussion on CMOS analog circuit design with reliable, programmable capability, and low voltage operation. It is a practical problem when the multiple identical cells are required to match and realized within a single chip using a conventional process. Thus, the design of high-reliable circuit is indeed needed. The low-voltage operation is also an important design issue when the CMOS process scale-down further. In the chapter, Section 1 introduces various CMOS nonlinear function and related applications. Section 2 describes design of highly reliable WTA/LTA circuit by using single analog comparator. The analog comparator itself has auto-zero characteristic to improve the overall reliability. Section 3 describes a simple analog MED cell. Section 4 presents a low-voltage rank order extractor with k-WTA function. The flexible and programmable functions are useful features when the nonlinear circuit will integrate with other systems. Depend on various application requirements, we must have different design strategies for design of these nonlinear signal process circuits to achieve the optimum performance. In state-of-the-art process, small chip area, low-voltage operation, low-power consumption, high reliable concern, and programmable capability still have been important factors for these circuit realizations.