## 1. Introduction

In VLSI circuit design, nonlinear signals processing circuits such as minimum (MIN), maximum (MAX), median (MED), winner-take-all (WTA), loser-take-all (LTA), *k*-WTA, and arbitrary rank-order extraction are useful functions (Lippmann, 1987; Lazzaro et al., 1989). In general, median filter is used to filtering impulse noise so as to suppress the impulsive distortions. The MAX and MIN circuits are important elements in fuzzy logic design. With regard to WTA application, it is the major function in pattern classification and artificial neural networks. Thus, design of these nonlinear signal-processing circuits to integrate smoothly within SoC (System-on-a-chip) applications becomes an important research. Recently, complementary metal-oxide-semiconductor (CMOS) technology is widely used to fabricate various chips. In this chapter, the designs of all circuits are realized by using CMOS process. However, since CMOS transistor is continuously scaled down via thinner gate oxides and reduced device size, supply voltage is necessary to reduce in order to improve device reliability. Therefore, a high reliable WTA/LTA circuit, a simple MED circuit, and a low-voltage rank-order extractor are addressed in the chapter. The organization of this chapter is as follows. Section 1 introduces the background of these nonlinear functions, including definitions and applications. Section 2 describes conventional WTA/LTA architectures and presents a high reliable winner-take-all/loser-take-all circuit. Section 3 shows an analog median circuit, with advantage of simple circuit. Section 4 describes a CMOS circuit design for arbitrary rank order extraction. Restrictions and design techniques of low voltage CMOS circuit are also addressed. Section 5 will briefly conclude this chapter.

Given a set of external input *n* variables *a* _{1}, …, *an *, the operation of MAX (or MIN) circuit determines the maximum (or minimum) value. A median filter puts out the median variable among a window of input samples. The function of a WTA network is to select and identify the largest variable from a specified set of variables. A counter part of WTA, LTA identifies the smallest input variable and inhibits remain ones. Instead of choosing only one winner, the *k*-WTA network selects the largest *k* numbers among *n* competing variables (*k*-th largest element *ak *among *n* variables *a* _{1}, …, *an *. Depending on application requirements, these input variables are either voltage, or current signals.

In order to clearly describe these nonlinear functions, taking one example indicates these definitions. Two output responses of a circuit corresponding to a set of input currents I_{in1}, I_{in2}, …, and I_{inN} : one is analog output current I_{o}, the other one is digital outputs set V_{o1(rank)}, V_{o2(rank)}, …, and V_{oN(rank)}. Assuming five external input currents are 9, 7, 10, 5, and 3 μA. Depending on various functions requirement, the output current I_{o} and the corresponding digital outputs responses are as follows.

*MAX:*Io = Maximum(Iin1, Iin2, …, IinN)= Iin3= 10 μA*MIN:*Io = Minimum(Iin1, Iin2, …, IinN)= Iin5= 3 μA*MED:*Io = Median(Iin1, Iin2, …, IinN)= Iin2= 7 μA*WTA:*Output voltages Vo1(rank), Vo2(rank), …, and Vo5(rank) respond to logic high to identify which one is the maximum value among Iin1, Iin2, …, and IinN. In this case, (Vo1(rank), Vo2(rank), …, Vo5(rank))= (0, 0, 1, 0, 0), where “0” and “1” are the logic low and logic high, respectively.*LTA:*A reverse operation of WTA function, and outputs set is (0, 0, 0, 0, 1) for this case.k-WTA: Depending on k value, k winners are selected. The function has more flexible in application than WTA. For example, the outputs of 2-WTA is (Vo1(rank), Vo2(rank), …, Vo5(rank))= (1, 0, 1, 0, 0) in this case.*Rank order:*The function of the rth rank-order extraction identifies the rth largest magnitude among Iin1, Iin2, …, and IinN. For example, outputs of the 2nd and 3rd rank order are (1, 0, 0, 0, 0) and (0, 1, 0, 0, 0) in this case, respectively.

Various applications for these nonlinear functions are described as follows. The MAX and MIN circuits are important elements in fuzzy logic design (Yamakawa, 1993). Figure 1 shows the MAX and MIN operations in fuzzy inference. Variables “x” and “y” are inputs; variable “z” is the corresponding output response. In a specific status, either rule 1 or rule 2 is satisfied. MIN function realizes the “and” operation in fuzzy rules, and MAX function realizes the “or” operation. In image signal processing, MED function in general is used to filtering impulse noise so as to suppress the impulsive distortions. Figure 2 shows a one-dimension application for noise cancellation. Figure 2(a) shows a V_{pp} 1.2 V sinusoidal signal corrupted by noise, and Figure 2(b) shows the processed signal after MED filtering with a window of size five. In addition, Figure 3 shows a two-dimension application also for noise cancellation of image. With regard to WTA application, it is the major function in pattern classification, vector quantization, data compression, and self-organization neural networks. Figure 4 shows WTA application for pattern identification. Commonly, an analogue rank order filter is widely used in signals sorting and classification.

In general, these nonlinear functions are achieved either by using digital or analog implementations. Under digital implementation, since most of signals obtained from the real world are continuous forms, the continuous inputs must first be transferred to digital type by using one-or-multiple analog-to-digital converter (A/D). As a result, the circuit complexity, chip area, and power consumption are increased due to the extra data converters in digital realization. Whereas for analog implementation, the circuit accuracy is slightly lost than digital operation and there is weaker tolerance to fabricate process variation. However, without extra data transfer, the analog operation is with many advantages such as saving time, bandwidth, and computation at the system level. Considering the practicality and flexibility, design issues of a CMOS analog signal processing circuit therefore must include 1) precision; 2) speed; 3) high tolerance to fabrication process variation; 4) wide range of supply voltage; 5) wide input range; 6) low circuit complexity; 7) low power consumption; 8) scalability; 9) programmability, and so forth, to allow these functions easily integration within various system-embedded chips. Additionally, when the device size of CMOS transistor is shrunk thinner and smaller, supply voltage is necessary to scale down in order to improve device reliability. A forecast of high-performance CMOS circuit operated within low voltage had been reported (Semiconductor Industry Association, 2008). Figure 5 shows the trend of CMOS supply voltage and physical gate length. Moreover, portable equipments such as biomedical electronics, computer, and portable telecommunication equipments are common used recently. Battery operation and low-power consumption are also important design requirements for these circuits.

## 2. Winner-Take-All and Loser-Take-All circuit

### 2.1 Architectures of WTA/LTA circuits

Based on different circuit structures, conventional WTA/LTA circuits are roughly cataloged into four types: 1) global-inhibition structure, in which the connectivity increases linearly with the number of inputs (Lazzaro et al., 1989; Starzyk & Fang, 1993); 2) cell-based tree-topology (Smedley et al., 1995; Demosthenous et al., 1998); 3) excitatory/inhibitory connection (He & Sanchez-Sinencio, 1993); and 4) serial cascade structure (Aksin, 2002). Figure 6(a-d) shows the conceptual diagrams of these topologies. In Figure 6(a), each cell receives the same global inhibition, and a common current I_{comn} or voltage V_{comn} is shared by all the competing cells. The cells represented in a square block are nonlinear signal processing elements. Therefore, the precision of the circuit is degraded as the number of inputs increases. Since the operation of this circuit relies on the cells matching, a stable fabrication process is required for manufacturing a high-precision system. The complexity of the connectivity of the circuit is O(N), where N is the number of inputs. Figure 6(b) shows a cell-based tree-topology, with N-1 cells arranged in a tree topology for N inputs. Each cell receives two input variables to compare and outputs the larger (or smaller) of the two input signals. The backward digits in the bottom cell are then successive feedback to 1st-layer cells to identify the maximum (or minimum) input. The precision of this circuit is also sensitive to cell matching. With this circuit design, the device sizes must be rescaled when the supply voltage is modified.

Figure 6(c) shows an excitatory/inhibitory connection with an O(N^{2}) connectivity complexity. Each cell receives the inhibited signals from other cells and an excitatory signal from itself. With this design, chip area increases with the square of the number of inputs. Based on comparators operation, Figure 6(d) shows an N-1 analog comparison blocks and N-1 digital blocks cascaded in serial. Within a comparison time T_{comp}, the larger magnitude of inputs in each analog block is sent to next stage to compare with other inputs. The result of the each comparison is then sent to the corresponding digital block, and a decision digit is feedback from right block to left block to identify the maximum input. As a result, the response time of the circuit is approximated to_{dig} is the total propagation time of the digital part. The offset voltage of each comparator dominates the precision of the architecture. Circuit implementation of Figure 6(d) is also sensitive to process variation. For a high precision application, identical internal circuit blocks shown in Figure 6(a-d) are necessary. The primary limitations of accuracy for the conventional architectures are fabricated process variations and matching requirement of internal cells. The variations of CMOS fabricated process include transistor threshold voltage, actual device size, thinness of the gate oxide, and other variety of factors. In a common process, threshold voltage in general varies from –10% to +10% of its nominal value. Due to the non-uniform etch and diffusion procedures, actual device sizes are also varied. In a real CMOS process, these variations are hard to eliminate completely. How can we improve the accuracy of analog circuit in a conventional process?

### 2.2 A high reliable WTA/LTA circuit

In the section, a highly reliable CMOS signal processing circuit with a programmable capability for WTA function and LTA function is described (Hung & Liu, 2004). A symbol_{inj} and V_{ink} ) to compare in magnitude at time t, and the output

Therefore, returning to the conventional architecture the tree topology of Figure 6(b), WTA mode, is represented as:

t_{1}:

_{2}:

After time O(

To reduce the matching requirement of internal cell, Figure 7 shows a conceptual diagram of high reliable circuit. In the scheme, there are N identical ‘digital’ control cells and a single comparator for N input variables. A single comparator block multiplexes in time to achieve all inputs comparisons. The operating procedures are described as follows:

t_{1}:

_{2}:

The strategy adopted to find the maximum/minimum among a set of variables is that two variables are first compared; then the result of this comparison is compared with the next input variable using the same comparator. The procedure continues until the comparisons of all input variables are completed. Conceptually, circuit operation is similar to a serial comparison. Unlike the traditional architectures that require N-1 analogue comparators; this architecture requires only a single comparator to eliminate sensitivity to component matching requirements. Using the same algorithm, the LTA function is easily obtained by only reversing the output state

The key block in this architecture is the comparator cell. Comparator performance is a crucial factor for realizing high-speed data conversion systems and telecommunication interfaces. The precision of a comparator is usually defined as the minimum identifiable differential voltage (or current) between inputs, that is, the comparator’s resolution capability. A comparator design from (Hosotani et al., 1990) is used herein; the schematic diagram is shown in Figure 8. Transistors M_{sw1}, M_{sw2}, M_{sw3} are used as switches. The circuit operates on two phases, auto-zero phase and comparison phase. Assuming the voltage at node B is V_{x}. Based on charge conservation, after the comparison phase, V_{x} arrives at the following:

The effect of the_{in2} - V_{in1}) to pull node D up to high (logic 1) or push it down to 0 V (logic 0). The functions of the N-latch are to sample the voltage at node D as latch_clk turns high and to hold the comparison result as latch_clk turns low. Ultimately, the output polarity of the N-latch will be changed according to the max/min selector setting. The max/min selector signal modifies the polarity of the compared result; therefore, without the need for structural modification, this circuit possesses win/lose configurable capability. The comparison block shown in Figure 8 is reused during all comparison procedures. The architecture of N-inputs circuit is shown in Figure 9, in which Control_Cell_{n} (1≦ n ≦N) are identical. N cells are required for N input variables. Each cell contains a status block, a control_switch block, and two latch blocks.

Figure 10 shows the clocks for the whole circuit. Signal reset and clock reg_clk must be generated externally; other clocks are produced by reg_clk and some logic gates.

To describe the operations of the entire circuit, the circuit architecture in Figure 9 and the clock waveform in Figure 10 are referred. First, at t1, reset signal is used to initiate the status blocks, control_switch blocks and latch blocks. The N-latch in the status block and R_{o1}, R_{o2}, …, R_{oN} are reset to zero by reset signal. Based on max/min selector signal, the MOS transistors Ms1, Ms2, Ms3 and Ms4 preset the initial sampling voltage (0 V or V_{DD}) at node cap_comn. Despite the magnitude of input-1 variable, the input-1 variable must be a winner during an initial interval for a serial comparison. The initial sampling voltage at node cap_comn is thus set as 0 V when the max/min selector signal is set to logic 1 for WTA operation, and vice versa.

Then, at t2, the V_{s1} clock turns high (auto-zero phase) to sample the initial voltage (0 V or V_{DD}) at node cap_comn. Next, at t3, R_{o1} turns high to sample voltage V_{in1}. At this time, the clock V_{s1} turns low (comparison phase) to compare the V_{in1} with the initial sampling voltage, and the compared result is stored in the N-latch of the first status block. The state of the N-latch is logic 1 if the variable is the winner. At t4, the present winner V_{in1} is sampled again. At t5, a new comparison between previous winner V_{in1} and V_{in2} is performed. At t6, the winner (the result for the V_{in1} and V_{in2} comparison) is sampled again. After this procedure, a new comparison between the present winner and V_{in3} is performed. The procedure continues until comparison of all the input voltages is completed. Ultimately, only one state V_{osn} (n=1,..., N) in these cells is logic 1 for WTA/LTA indication; others are logic 0. Therefore, a WTA or a LTA operation has been accomplished.

Figure 11 shows the status block. Figure 12 shows the control_switch block. It receives an input variable and controls the transmission gate to sample input level. A true single-phase latch composed of an N-latch and a P-latch is used to reduce the clock skew issue (Yuan & Stensson, 1989).

### 2.3 Simulation results and reliability test

With regard to the high reliable WTA/LTA circuit, an experimental chip with six inputs was also fabricated using a 0.5-μm CMOS technology. The sampling capacitance C_{s} implemented by using two-layer polysilicon is set to be 3 pF. The period of reg_clk clock is 100 ns with a 50% duty cycle. WTA/LTA functions, supply-voltage range, and Monte Carlo analysis of transistor variation by simulation were also tested.

1) WTA/LTA functions

To test the function of the circuit, each example takes ten input voltages for the WTA/LTA operation. For supply voltage V_{DD}=3.3 V, the input variables V_{in1}, V_{in2}, …, and V_{in10} are 0.003, 0.006, 1.000, 0.997, 2.000, 2.003, 2.000, 3.297, 3.300, and 3.297 V for testing WTA function, respectively, and 3.297, 3.294, 2.000, 1.997, 2.000, 1.000, 0.997, 0.006, 0.009, and 0.003 V for testing LTA function. During the WTA operation, the logic state V_{osn} of each cell at each time slice becomes:

V_{os1}= 1,0,0,0,0,0,0,0,0,0 V_{os2}= 0,1,0,0,0,0,0,0,0,0 V_{os3}= 0,0,1,1,0,0,0,0,0,0 V_{os4}= 0,0,0,0,0,0,0,0,0,0 V_{os5}= 0,0,0,0,1,0,0,0,0,0 V_{os6}= 0,0,0,0,0,1,1,0,0,0 V_{os7}= 0,0,0,0,0,0,0,0,0,0 V_{os8}= 0,0,0,0,0,0,0,1,0,0 V_{os9}= 0,0,0,0,0,0,0,0,1,1 V_{os10}=0,0,0,0,0,0,0,0,0,0.

When all comparisons are finished, the outputs V_{os1}, V_{os2}, V_{os3},..., and V_{os10} respond as logic 0, 0, 0, 0, 0, 0, 0, 0, 1, and 0, respectively. Therefore, among these ten inputs, input variable V_{in9} is the maximum. Figure 13 shows the results of HSPICE simulation for the WTA operation. The time period of the latch clock (top trace) is 100 ns. In the same operation, Figure 14 shows the results for the LTA operation. The final outputs V_{os1,} V_{os2,} V_{os3}, …, and V_{os10} are logic 0, 0, 0, 0, 0, 0, 0, 0, 0, and 1, respectively, and the input variable V_{in10} is the minimum one. Choice for the above tested voltages was based on the followings: 1) input voltages of neighbor cells should be as close as possible to test discrimination capabilities; 2) input voltages are distributed from 0 V to 3.3 V to test for wide dynamic range.

2) Supply voltage range

All circuit parameters such as transistor dimensions, clock periods and sampling capacitance C_{s} are held constant. A supply voltage V_{DD} varies from 2 V to 5 V, and the logic high of these clocks are also modified when the supply voltage alters. The supply voltage V_{DD} for each iteration increases in 0.1 V steps. The simulation results show that the circuit operates successfully within 3-mV discrimination when the supply voltage ranges from 2.7 V to 5 V. Without any procedure for rescaling the device size, the circuit works under various commonly used supply voltages.

3) Process variations

A statistical distribution of manufacturing parameters often occurs during CMOS fabrication. Wafer-to-wafer, run-to-run and transistor-to-transistor process variations determine the electrical yield and critical second-order effects. Threshold voltage, channel widths, and channel lengths of all MOS transistors were set to nominal values with ±5 % variation at the 3 sigma level, and each transistor was given an independent random Gaussian distribution. After 30 Monte Carlo iterations, HSPICE results indicate that circuit precision and speed are not degraded over this range. In addition, to verify the circuit with multi-technology support capability, using various CMOS fabrication parameters also simulates the circuit performance. The results show that the performance of the circuit under various fabrication processes is functional work, without needing to tune any device dimension. The following reasons contribute to the robustness of this circuit: 1) the circuit is designed with only a single analog cell (comparator), while the other active components are digital; 2) the comparator itself is designed with a auto-zero property, therefore, the operation of the comparator is more tolerant to manufacturing process variation.

4) Circuit precision

The accuracy of the comparator cell dominates the identified precision. The comparator accuracy is dependent on two factors. One is the clock feed-through error and charge-injection error in transistor M_{sw3}, shown in Figure 8; the other is the degrading factor in Eq. (1). Charge-injection error is a complicated function of substrate doping concentration, load capacitor, input level, clock voltage, clock falling rate, MOS channel dimension, and the threshold voltage. Therefore, this error is difficult to be completely eliminated. In general, complementary clock, transmission gates, and dummy transistor are adopted for a switch realization to reduce the error.

## 3. CMOS analogue median cell

Median (MED) filter is a useful function in image processing application to eliminate pulse noise. Given a set of external input n variables a_{1}, …, a_{n}, the operation of MED circuit determines the median value. The extracted median operation is a nonlinear function. The MED circuit realizations can be classified as analog filtering and digital filtering depending upon what type of input signals are. The digital filtering architecture has a variety of sophisticated algorithms to support the circuit realization so as with advantages of higher flexible and higher reliability. For power consumption and chip area considerations, however, it is costly expensive than analog architecture. In 1994, without using an operational amplifier, an analogue median extractor with simple structure and high sharp DC transfer characteristic was presented (Opris & Kovacs, 1994). The circuit expects to reduce the errors in the transition region. In 1997, for the same authors, an improved version with high speed operation was proposed. The median circuit has transient recovery less than 200 ns by using 2-um CMOS process (Opris & Kovacs, 1997). In 1999, a current-input analog median filter composed of absolute value and minimum circuits was proposed (Vlassis & Siskos, 1999). The operational amplifier and transconductor are also not needed in design of the circuit. Based on transconductance comparators and analog delay elements, a fully continuous-time analog median filter is presented in 2004 (Diaz-Sanchez et al., 2004). By using the median filter cells, an image of 91×80 pixels can be processed in less than 8 μs to remove salt and pepper noise. In the section, an intuitional and simple CMOS analog median cell is described (Hung et al., 2007). Based on current-mirror, current comparison, and some basic digital logics, a simple analog median filter cell is achieved. By using TSMC 0.35 μm CMOS technology, simulation shows that the median filter provides a 0.4-μA discriminability and well tracked the median value among input currents.

Figure 15 shows a basic one-input current cell composed of current mirror and control logic circuits. The cell has one signal input (i_{s}), a current source (i_{s_src}) output and a current sink (i_{s_sink}) output, a control signal V_{ctr}., and an output current (i_{out}). Transistors M_{1}-M_{12} are cascode current mirrors. M_{swp} and M_{swn} constitute transmission gate for analog switch function. M_{dummy} is designed to compensate the M_{swn} and M_{swp} loading to improve the accuracy of output current. M_{iso} is used to isolate the clock noise from transmission gate. M_{dis1-2} and M_{res} are used to speedup transmission operation and control the discharge timing. Corresponding to Figure 15(a), Figure 15(b) is a symbol representation, which is named as current signal control unit and is abbreviated as CSCU.

Three input signals i_{s1}, i_{s2}, and i_{s3}, how can circuit extract the median value? Assuming i_{s2} is a median current. The criteria must be satisfied.

As a result, current level comparison and logic decision are required to realize the function. Figure 16 shows a three-input median circuit composed of three CSCU cells and three decision logic blocks. The decision logic circuit is simply realized by AND-OR gate circuit to perform

where (1), (2), (3) and (4) represent the corresponding the logic inputs, that is, these signals come from comparison results signals. Depending on the output status of each decision logic, Eq. (3) determines V_{ctr} a low level or a high level, respectively. A low V_{ctr} will turn on the transmission gate of corresponding CSCU cell to switch on the input current; otherwise, the input current is prohibited. As a result, three-input MED filter cell is successfully arrived. Due to the transition pulse noise, a capacitor C_{filter} is used to suppress the switch noise.

In the circuit, NMOS transistor size (W/L)_{N}=5μ/1μ and PMOS transistor size (W/L)_{P}=10μ/1μ are used for M_{1}-M_{12}. The sizes of inverters are (W/L)_{N}=5μ/0.35μ and (W/L)_{P}=20μ/0.35μ. The device site of switch transistors M_{swn} and M_{swp} are equal to (W/L)_{N-P}=20μ/0.35μ. All transistors in decision logic block are sizing (W/L)_{N}=5μ/0.35μ and (W/L)_{P}=10μ/0.35μ. The filter capacitance C_{filter} is designed as 10 pF. The supply voltage V_{DD} is commonly used as 3.3 V. Input current signals i_{s1}, i_{s2}, i_{s3} have 10 μA peak value at different 5 μs, 10 μs, and 15 μs time slot, respectively. Figure 17 shows three triangle waves and the corresponding median output. The red line represents the MED output. The output is tracked well with the median value of the three inputs current. By observing Figure 17, when two input values are closed to each other, the minimum difference must be larger than 0.4 μA. That is the discriminability of the MED filter. However, there are some little spike occurs in the transition point.

Inspecting Figure 16, the proposed three-input median cell has three input pins (i_{s1}, i_{s2}, and i_{s3}) and a common output pin (i_{out}). By modifying the switch transistors and decision logic, the MED cell can be easily modified as three inputs and three outputs. The modified MED cell will have maximum value i_{maxmin}, median value i_{median}, and minimum value i_{minmum} outputs, simultaneously. As a result, the multiple modified MED cells can be organized cooperation to perform the ‘sorting’ function. In the design, no critical components such as operational amplifier and precise voltage reference are required in the MED cell. These properties are useful for the MED cell simply embedded into a larger system.

## 4. Low-voltage arbitrary rank order extraction

### 4.1 Principle of rank-order extraction

Ether WTA, LTA, or MED function, however, is only a single order operation. In 2002, a low-voltage rank-order filter with compact structure was designed (Cilingiroglu & Dake, 2002). The filter is based on a pair of multiple-winners-take-all and a set of logic gates. In the section, a new architecture for with both arbitrary rank-order extraction and k-WTA functionalities is described (Hung & Liu, 2002). An rth rank-order extraction is defined that identifies the rth largest magnitude of input variables. In the design, the circuit locates an arbitrary rank order among a set of input voltages by setting different binary signals. A set of output voltages V_{o_1}, V_{o_2}, …, and V_{o_M} corresponds to the output voltages of a rank-order extractor for inputting of a set of variables V_{1}, V_{2}, …, and V_{M}. The output status D_{ij} of a comparator with two-input terminals is defined as

where M is the number of the input variables. For convenience of description, a temporal index S_{i} defines the total number of winners for the ith input variable compared with the others. Thus, S_{i} is represented as

Based on the definition of (5), S_{i} is expanded as follows

Thus, from the left-hand side of (6), M(M-1) comparators’ cooperation is required for M input variables to identify the rank order. Since D_{ji} is the complementary of D_{ij} ( D_{ji}=

In this section, the comparator generates a unit current I_{unit} when input variable V_{i} is larger than V_{j}. Thus, the index S_{i} in (5) is rewritten as

where n is the number of the winner in comparison. If the inputs are arranged in ascending order of magnitude, V_{1}, V_{2}, …, V_{M}, which satisfy V_{1}<V_{2}< … <V_{M}, then

For example, if the input variables are (0.5, 0.6, 0.9, 0.2, 0.4), the first variable 0.5 is larger than variables 0.2 and 0.4. Thus, the index_{unit}; the meaning is that the variable wins two other input variables among all comparisons. For the same reason, the_{o_1}, V_{o_2}, …, V_{o_5}) of the extractor respond to be (0, 0, 1, 0, 0), (0, 1, 0, 0, 0), (1, 0, 0, 0, 0), (0, 0, 0, 1, 0) for the maximum operation, next maximum operation, median operation, and the minimum operation, respectively. The “0” and “1” are the logic low and high. Similarity, if the extractor is configured as k-WTA function, the output voltages (V_{o_1}, V_{o_2}, …, V_{o_5}) of the circuit respond to be (1, 1, 1, 1, 1), (1, 1, 1, 0, 1), (1, 1, 1, 0, 0), …, and (0, 0, 1, 0, 0) for 5-WTA, 4-WTA, 3-WTA, …, and 1-WTA operations, respectively.

### 4.2 Architecture of rank-order extraction

The structure of the extractor is shown in Figure 18 for five input variables (Hung & Liu, 2002). There are a total of M(M – 1)/2 comparators and M evaluation cells for M input variables. Each comparator cell accepts two input signals, and the results of each comparison are fed into the individual evaluation cell. In the first row of Figure 18, the input V_{1} is compared with other input variables. In addition, the results of the comparison will generate the proper unit currents I_{unit}. Then, these currents will be summed up in Eval-1 cell if V_{1} is larger than the other samples; otherwise, the result of the comparison will be fed into the corresponding evaluation cell. The connecting strategy is the same for other input variables. Therefore, equation (7) have been realized in this architecture.

The signal V_{choice} in Figure 18 is used to decide the function of the circuit. V_{choice} is preset at logic high to allow the rank-order operation; otherwise, the k-WTA function is enabled. The binary signals sel_1, sel_2, and sel_3 are used to determine which rank-order/k-WTA will be located. Based on the select signals (sel_1-3) setting, the logic states of the evaluating cells indicate which input variable belongs to this rank order. For example, in the seven inputs rank-order operation, the (sel_1, sel_2, sel_3) signals are set to logic (0, 0, 0) to find the minimum variable; the logic (0, 1, 1) and (1, 1, 0) setting are the median and maximum functions, respectively. Similarity, in the k-WTA operation, the (sel_1, sel_2, sel_3) is set as (0, 0, 1) and (1, 1, 0); therefore, the 6-WTA and 1-WTA are obtained, respectively.

### 4.3 Circuit design

#### 4.3.1 1.2-V comparator

Comparator is a key element in Figure 18. An auto-zero comparator shown in Figure 19 is designed to operate at low voltage supply. To improve the speed of the comparator, the succeeding gain stage is designed to operate in dynamic mode. First, in the auto-zero phase, the input V_{1} is sampled at the top plate of the capacitor C_{s}, and the MOS transistor M11 is biased at V_{bias} voltage. In next phase, the voltage at node E is V_{bias}+(V_{2}-V_{1})(C_{s}/C_{s}+C_{p}) during the comparison phase. Then, a deviation voltage is amplified by transistors M11 and M12. To reduce the power dissipation, the adjustable biasing voltage V_{bias} is chosen simply to overcome the threshold voltage of a MOS transistor, and the biasing voltage is also adjusted for the comparator operation in different voltage supplies. The succeeding transistors M13 and M14 provide the current to generate the proper voltage at node F. Depending on which input voltage is larger, either the voltage at node H or node G will be at logic high. The output node G of the comparator and its complementary node H are fed into next stage to generate unit currents I_{large_1}, I_{large_2}, I_{small_1}, and I_{small_2}. During the evaluation phase, the unit currents I_{large_1} and I_{large_2} will be presented when V_{1} is larger than V_{2}. Otherwise, the I_{small_1}, I_{small_2} are generated. The symbol representation of the comparator cell is shown in the right-bottom of Figure 19.

The function of the comparator shown in Figure 19 is summarized as

where_{base}.

#### 4.3.2 Evaluation cell

The circuit of the evaluation cell is shown in Figure 20. The MOS transistors M_{gen} and M_{unit} reproduce the same unit current. The unit current is equal to the I_{large_1}, I_{large_2}, I_{small_1}, and I_{small_2} in Figure 19. In order to find the various rank orders for all input signals, the cell must identify that the unit-current summation in (7) comes from Out_com1 and Out_com2 terminals. It is not easy to identify the exact current value in the VLSI circuit. However, whether the summation current

It is a reasonable and safe design to choose

where W is a channel width and L is a channel length. MOS transistors M_{add1} and M4 realize the_{cnt_1-6} enable the corresponding binary-weight current. The inverters inv4-7 support sufficient gain to amplify the current difference between the currents which come from Out_com1-2 terminals and the binary-weight currents. This mechanism is similar to a current comparator. In the upper row of Figure 20, the extra PMOS transistor M_{add1} generates an extra unit current; therefore, the voltage V_{out-h} is always larger or equal to V_{out-l}. If the V_{choice} is preset to 0, the dash block in Figure 20 resets the V_{out-l} to 0. Then the effect of lower row in Figure 20 is disabled. At this time, the function of the cell resembles performing only the

Thus, this is a k-WTA criterion.

Take an example to describe the function of the evaluation cell. The number of input variables is seven, and the sel_1-3 signals are set as (0, 0, 1) to find the next minimum input variable. Since the next minimum is only larger than the minimum one, only a single unit current comes from Out_com1-2 terminals of the corresponding evaluation cell. In the upper row of Figure 20, the summation of one unit current and the extra unit current (M_{add1}) is larger than binary weight current 1.5I_{unit}; therefore, V_{out_h} is logic 1. In contrast with the upper row, in the lower row the unit current I_{unit} (which comes from Out_com1-2 terminals) is smaller than the binary weight current 1.5I_{unit}; therefore, V_{out_l} is logic 0. Thus, the transistors M_{id1} and M_{id2} only allow the situation (V_{out_h}, V_{out_l})= (1, 0) to pull up the corresponding output (V_{o_n}, n=1, …, 7) to logic 1. Otherwise, the status of V_{o_n} will be logic 0 or open state for other cases. Therefore, by inspecting the logic state of V_{o_n}, it is found which input variable belongs to this desired rank order.

### 4.4 Measured results and design consideration

A seven-input experimental chip was fabricated using a 0.5 μm CMOS technology. Bias voltage V_{bias} is set to 0.9 V in this design. The sampling capacitor C_{s} is 0.8 pF, and these analog switches in this circuit are implemented by CMOS transmission gates. The micrograph of the experimental chip is shown in Figure 21, and the active area is 610 × 780 μm^{2}. An individual comparator cell was built in this chip for measuring the accuracy. The supply voltages of the core circuit and the input/output pads were all set as 1.2 V. The accuracy of the individual comparator was measured roughly as 40 mV, that is, the resolution of the comparator was near five bits under a 1.2 V supply voltage. Figure 22(a)

shows the rank-order function, whereas Figure 22(b) shows the function of the k-WTA. On the average, the accuracy of whole circuit was approximated 150 mV. The performance of the chip was degraded by many factors such as the mismatch in comparator cells, the different capacitance at input terminals of the evaluation cells, and the clock feed-through error. Due to these non-ideal effects, each rank-order function was finished in 20 μs. After increasing supply voltage up to 1.5 V and proper biasing voltage V_{bias} adjusting, the performance of the circuit can be improved. Including power consumption of the input/output pads, the static power consumption of the chip was 1.4 mW.

Many factors such as precision, speed, process variation, and chip area must be considered for design of a low-power low-voltage rank order extractor.

Limitations of low voltage and low power

The average power consumption of the circuit is expressed by

where f is the frequency, C is the capacitance in the circuit, V_{DD} is the voltage supply, I_{o} is the standby current, I_{leakage} is the leakage current, and the Q_{sc} is the short-current charge during the clock transient period. In order to reduce the power consumption, the voltage supply V_{DD} must be reduced, and the standby current in the comparator and evaluation cell must be designed as small as possible. In mask layout, the clock and its complementary are generated locally to reduce delay and mismatch. Thus, the probability of a short current occurring in the circuit is minimized.

Speed and precision

The accuracy of the comparators determines the resolution of the circuit. For the comparator design, the smallest differential voltage, that is, distinguished correctly is influenced by two factors. One is the charge-injection error in analog switches, and the other is the parasitic capacitor C_{p} effect. The effect is reduced by enlarging the sampling capacitor C_{s} and making the switches dimension as small as possible. In the design, the response time

Reducing_{s} and to bias the inverter properly at high gain region. The switches shown in Figure 19 with larger dimension reduce auto-zero time_{s} will reduce the time_{unit}. The maximum number M of input variables is also influenced by the current I_{unit}. Although reducing the magnitude of the current I_{unit} is able to reduce the power consumption, however, the relationship among_{unit}, and M in this architecture is a complicated function.

Process variation analysis

With contemporary technology, process variation during fabrication cannot be completely eliminated; as a result, mismatch error must be noticed in VLSI circuit design. The match in dimension of the binary-weight MOS in the evaluation cell (M1 - M8 in Figure 20) is an important factor for the circuit operation. If the mismatch error induces an error current I_{err} larger (or smaller) than half of the unit current I_{unit}, decision of the evaluation cell fails. Thus, a rough estimated constraint for I_{err} is

## 5. Conclusion

The chapter describes various nonlinear signal processing CMOS circuits, including a high reliable WTA/LTA, simple MED cell, and low-voltage arbitrary order extractor. We focus the discussion on CMOS analog circuit design with reliable, programmable capability, and low voltage operation. It is a practical problem when the multiple identical cells are required to match and realized within a single chip using a conventional process. Thus, the design of high-reliable circuit is indeed needed. The low-voltage operation is also an important design issue when the CMOS process scale-down further. In the chapter, Section 1 introduces various CMOS nonlinear function and related applications. Section 2 describes design of highly reliable WTA/LTA circuit by using single analog comparator. The analog comparator itself has auto-zero characteristic to improve the overall reliability. Section 3 describes a simple analog MED cell. Section 4 presents a low-voltage rank order extractor with k-WTA function. The flexible and programmable functions are useful features when the nonlinear circuit will integrate with other systems. Depend on various application requirements, we must have different design strategies for design of these nonlinear signal process circuits to achieve the optimum performance. In state-of-the-art process, small chip area, low-voltage operation, low-power consumption, high reliable concern, and programmable capability still have been important factors for these circuit realizations.