Design of 4-Bit 4-Tap FIR Filter Based on Quantum-Dot Cellular Automata (QCA) Technology with a Realistic Clocking Scheme

The increasing demand for efficient signal processors necessitates the design of digital finite duration impulse response FIR filter which occupies less area and consumes less power. FIR filters have simple, regular and scalable structures. This paper represents designing and implementation of a low-power 4-tap FIR filter based on quantum-dot cellular automata (QCA) by using a realistic clocking scheme. The QCADesigner software, as widely used in QCA circuit design and verification, has been used to implement and to verify all of the designs in this study. Power dissipation result has been computed for the proposed circuit using accurate QCADesigner-E software. The proposed QCA FIR achieves about 97.74% reduction in power compared to previous existing designs. The outcome of this work can clearly open up a new window of opportunity for low-power signal processing systems


Introduction
Recently, the design of high-performance digital circuits meeting area, power and speed metrics has become a challenge. On one side, several digital signal processing applications are based on complex algorithm which requires great computational power per silicon area. On the other side, there are stringent portability and energy requirements which further complicate the design task. Therefore, achieving the required computational throughput with minimum energy consumption has become the key design goal, as it contributes to the total power budget as well as reliability of target application. So far, VLSI industry has been successfully following Moore's law. Simultaneous reduction in critical dimensions and operating voltage of CMOS transistors yields higher speed and packaging density while decreasing the silicon area and power consumption [1]. However, this trend of successive transistor scaling cannot continue for long, as the CMOS technology is reaching its fundamental physical limits and entails many challenges [2][3][4]. Low-power digital design is being investigated at all levels of design abstraction.
At device level, a number of CMOS alternatives are summarized in International Technology Roadmap for Semiconductors (ITRS) report such as quantum-dot cellular automata (QCA), single-electron transistor (SET), carbon nanotube fieldeffect transistors (CNTFET) and resonant tunneling diodes (RTD) [5]. The use of (QCA) on the nanoscale has a promising future because of its ability to achieve high performance in terms of device density, clock frequency and power consumption [6][7][8][9]. Essentially, QCA offers potential advantages of ultralow-power dissipation. QCA is expected to achieve very high device density of 1012 device/cm 2 and switching speeds of 10 ps and a power dissipation of 100 W/cm 2 [10]. These features, which are not offered by CMOS devices, can open new opportunities to save power in mobile systems design. In addition, they can make the proposed QCA approach useful for signal and image processing systems applied on portable communication devices where real-time processing and low-power consumption are needed in today's world in order to extend battery life. Several attempts are made towards the cost-effective realization of QCA circuit in [11][12][13][14][15][16][17][18][19]. Whereas QCA technology has advantages over CMOS technology, various limitations are identified. Its include placing long lines of cells among clocking zones which leads to thermal fluctuation issue and increases delay of the circuit. Recently, a universal, scalable, and efficient (USE) clocking scheme [20] is a proposed technique to overcome the mentioned limitations. This scheme can design feedback paths with different loop sizes. It is regular and flexible enough to allow placement and routing, besides avoiding thermodynamic effects due to long wires. On the other hand, for designing several digital signal processors (DSP), finite impulse response (FIR) filter is widely used as a critical component. For their guaranteed linear phase and stability, the FIR filter is used for the conception of very highly efficient hardware circuits. Theses circuits perform the key operation in various recent mobile computing and portable multimedia applications. We denote highefficiency video coding (HEVC), channel equalization, speech processing, software-defined radio (SDR) and others. Indeed, an efficient FIR filter design essentially improves the performance of a complex DSP system. This fact pushed designers to search for new methods to grant low-power consumption for FIR filter [21][22][23][24][25][26][27][28]. QCA logic design circuit is stimulated by its applications in low-power electronic design. It has lately attracted significant attention. All these above factors motivate us to investigate a new architecture around QCA by using USE clocking scheme, which can efficiently perform FIR operation.
The main concern of this paper is to present a new design for FIR filter based on QCA technology which yields significant reduction in terms of power. This paper is organized as follows. Section 2 presents the background of FIR filter structures. Section 3 indulges the preliminaries of QCA technology. Section 4 discusses the FIR filter power optimization by QCA technology. Section 5 shows the discussions and results of the proposed FIR filter-based technology. Finally, conclusions are drawn in Section 6.

Background of FIR filter structures
FIR filters are important building blocks among the various digital signal processing applications. Recently, due to the popularity of the portable batterypowered wireless communication systems, low-power and high-performance digital filter designs become more and more important.
An nth order FIR filter performs N-point linear convolution of input sequence with filter coefficients for new input sample. The transfer function of the linear invariant (LTI) FIR filter can be expressed as the following equation: where N represents the length of the filter, h k is the Kth coefficient, and x n À k ð Þ is the input data at time instant n À k ð Þ. The z transform of the data output is where H (z) is the transfer function of the filter, given by Several architectures have been proposed in the last recent years. A filter can be implemented in direct form (DF) or transposed direct form (TDF) [29]. The transposed form and the direct form of a FIR filter are equivalent. It's easy to prove that, in direct form, the word length of each delay element is equal to the word length of the input signal. However, in the transposed form, each delay element has a longer word length than that in the direct form. The transposed structure reduces the critical path delay, but it uses more hardware. DF FIR filter is area-efficient, while the TDF filter is delay-efficient. In this paper, the architecture of the proposed FIR filter is presented. It is based on the transposed direct form FIR filter structure as shown in Figure 1. This structure comprises adders, D flip-flops, and multipliers.

QCA review
The QCA approach, introduced in 1993 by Lent et al. [6], is able to replace devices based on field-effect transistor (FET) on nanoscale. This nanotechnology was conceived based on some of Landauer's ideas regarding energy-efficient and robust digital devices [30]. It consists of an array of cells. Each cell contains four quantum dots at the corner of a square which can hold a single electron per dot. Only two electrons diametrically opposite are injected into a cell due to Coulomb interaction [31]. Through Coulombic effects, two possible polarizations (labeled À1 and 1) can be shaped. These polarizations are represented by binary "0" and binary "1" as shown in Figure 2. Figure 3 shows the propagation of logic "0" and logic "1," respectively, from input to the output in QCA binary wires due to the Coulombic repulsion. Generally, in neighboring cells, the coulombic interaction between electrons is used to implement many logic functions which are controlled by the clocking mechanism [32].
A majority and inverter gates are the fundamental logic gates in the QCA implementations which are composed of some QCA cells as shown in Figure 4 [7,33]. Furthermore, the majority gate acts as an AND gate and OR gate just by setting one input permanently to 0 or 1. It has a logical function that can be expressed by Eq. (4): 3.1 QCA clocking The clocking system is an important factor for the dynamics of QCA. Its principal functions are the synchronization of data flows and the implementation of   adiabatic cell operation which enables QCA circuits with high energy efficiency [34]. Generally, QCA clocking is presented with four different phases which are switch, hold, release and relax as illustrated in Figure 5. During the switch phase, which actual computations are occurred, the barriers are raised, and a cell is affected by the polarization of its adjacent cells, and a distinctive polarity is obtained. During the hold phase, the barriers are high, and the polarization of the cell is retained. During the release phase, the barriers are lowered, and the cell loses the polarity. During the relax phase, the cell is non-polarized [35].
Over recent years, various clocking schemes have been proposed, but they have introduced some difficulties such as long paths for feedbacks [35]. Recently, USE clocking scheme is a proposed technique for clocking and timing of the QCA circuits. It may be implemented using actual fabrication technologies of integrated circuits. This scheme can design feedback paths with different loop sizes, and its routing is flexible [20]. It defines a grid of clock zones, which are consecutively numbered from 1 to 4 as depicted in Figure 6. This grid ensures the correct arrangement for the clock zones. Much information about the clocking circuitry are mentioned in [20].

QCA 4 Â 4 multiplier
Multiplier plays an important role in DSP systems. In divers' DSP application, it is not needed to utilize all output bits of multiplier. As in most of the FIR implementation, the FIR output can also be obtained using only the MSB bits of the multiplier output [29]. In literature, there are various algorithms of multiplier such as array multiplier, parallel multiplier and booth multiplier [36][37][38][39], which consumed more area and could not meet the criteria of propagation delay. This problem has been overcome in this paper by making use of Vedic multiplier which is much faster with minimum propagation delay [40][41][42][43]. To design the QCA circuit, we have used the version of the circuit proposed in [44]. Figure 8 demonstrates the schematic of 4-bit Vedic multiplier architecture where A ¼ A 3 A 2 … A 0 and B ¼ B 3 B 2 … B 0 are the inputs and the outputs signal for the multiplication result are P ¼ P 7 P 6 … … P 0 . The implementation of this multiplier can be done by using four 2 Â 2 Vedic multiplier blocks and three 4-bit adder blocks.

QCA 4-bit parallel adder
The 4-bit adder performs computing function of the FIR filter. Therefore, the half and the full adder are used to construct the 4-bit binary adder. The proposed  half adder is composed by three majority gates and one inverter gate. Figure 9 shows the block diagram and the QCA layout of the proposed half adder. It consists of 232 cells covering an area of 0.76 μm 2 . It needs 16 clock phases to generate the sum and carry outputs. In addition, the proposed full adder consists of three majority gates and two inverters. Figure 10 depicts the block diagram and the QCA layout of the proposed full adder. For the proposed QCA full adder, the required number cells is 349, and the required area is 0.76 μm 2 . It requires 16 clock phases. The parallel adder layout in size of 4-bit is depicted in Figure 11. It is designed by cascading one-half adder and three 1-bit adders. In this way, the carry out (Cout) is then transmitted to the carry in (Cin) of the next higher-order bit. The final outcome creates a sum of 4 bits plus a carry out (Cout 4). This design uses 2735 cells in its structure. It consists of a circuit area of 11.46 μm 2 . This circuit has a critical path length of 61 clock zones which is designated by a blue dashed line.

QCA 2 Â 2 vedic multiplier
The block diagram of 2 Â 2 bit Vedic multiplier is shown in Figure 12. Firstly, B0 is multiplied with A0; the generated partial product is considered as an LSB of final product.
Secondly, B0 is multiplied with A1,and B1 is multiplied with A0. To add the generated partial products (B0*A1+ A0*B1), a QCA half adder is required, which generates a 2-bit result (Carry and S1), in which S1 is considered as the second bit of the final product and Carry is saved as pre-carry for the next step. Finally, B1 is multiplied with A1, and the overall product term will be obtained for 2 Â 2 Vedic multiplier. Here, four majority gates and two half adder circuits are used, and the output will be four bits (s0, s1, s2 and s3).  The proposed 2 Â 2 multiplier takes only 1683 QCA cells with a region of 8.42 μm 2 .The simulated result of the proposed Vedic multiplier confirms that the expected operation is correctly achieved with 60 clock zones delay as depicted in Figure 13. Clock amplitude factor 2,000,000 Layer separation 11,500,000 Maximum iterations per sample 100 Table 1. Bistable approximation parameter model.

QCA adder
Since the FIR output can be obtained using only the MSB bits of the Vedic multiplier output, for the proposed structure of FIR filter, we need a 4-bit QCA adder. The same 4-bit adder designed above is used in this subsection ( Table 1).

Results and discussions
The complete QCA FIR design is implemented using the functional units discussed in the previous section. The implementation and the simulation of the proposed hardware designs are achieved by using QCADesigner 2.0.3 tool [45]. The coherence vector simulation engine is used for this purpose. Table 2 depicts the simulation parameters. In the first step, the sub-module schematic and layout is completed and verified by functional simulations.
These designs have been implemented using a free and a regular USE clock scheme. In addition, we have successfully demonstrated that sub-module design of FIR unit properly satisfies all logic and timing constraints by using the 4 Â 4 USE grid with a square dimension of 5 Â 5 QCA cells. In this direction, with a welldefined methodology and regular timing zones, this design is a standard candidate for fabrication. We note that our proposed entire system requires a huge number of QCA cells mostly due to the long wires necessary to delay compensation. Since the proposed FIR circuit based on QCA technology has started to bloom, we have only compared the full adder module with regular standard scheme circuits. Table 2 shows a comparison of the proposed full adder with some existing designs [35,46]. The proposed full adder has 1.13, 56.9 and 11% improvements, respectively, in terms of cell count, area occupation and circuit latency as compared to that reported in [35].
In QCA technology, the power consumption of any circuit depends on the number of majority and inverter gates [47]. Therefore, this technology reduces more power than CMOS technology. The consumption of FIR unit in QCA-18 nm technology is valuing 1.6 mW. This value is carried out using QCADesigner-E software [48].
However, the QCA FIR circuit requires 97.74% lesser power consumption than the previous existing designs [49]. In addition, the proposed design of FIR filter can operate at a higher frequency (upper than 1 GHz) than the conventional solution, and it can be useful for future digital signal processing applications for providing excellent processing speed. The overall performance of the proposed QCA design is therefore superior to the existing techniques in terms of power consumption. In this way, we think that this work forms an essential step in the building of QCA circuits for low-power design in this area.
Influence of temperature variations on the polarization of the proposed design has also been investigated. Figure 14 illustrates the effect of polarization on output of FIR circuit due to temperature variations. QCADesigner tool is used to observe this effect. By increasing temperature the AOP of any output cell of the QCA circuit is decreased. Therefore, between 1 K and 7 K, the FIR circuit works efficiently. Over 7 K, the circuit falls down radically and produces incompatible outputs.

Conclusion
Design of low-power high-speed FIR filter is always a challenge for DSP applications. In this article, a novel design of FIR filter architecture in the QCA technology has been presented. The functionality of the proposed circuits has been verified with QCADesigner version 2.0.3 software. The proposed QCA FIR achieves up to 1 GHz frequency and consumes 1.6 mW power. By comparison of previous designs and the proposed design, it could be concluded that the proposed design has appropriate features and performance. Therefore, this work will provide better silicon area utilization, maximization of clock speed and very low-power consumption than traditional VLSI technology. It should be an important step towards highperformance and low-power design in this field. Future extensions, such as various applications based on this QCA FIR unit, could be investigated.  Effect of polarization on output of FIR filter due to temperature.