Towards Optimised FPGA Realisation of Microprogrammed Control Unit Based FIR Filters

Finite impulse response (FIR) filter is one of the most common type of digital filter used in digital signal processing (DSP) applications. An FIR filter is usually realised in hardware using multipliers, adders and registers. Field programmable gate arrays (FPGAs) have been widely explored for the hardware realisation of FIR filters using different algorithms and techniques. One such technique that has recently gained considerable attention is the use of microprogrammed control unit (MPCU) in designing FIR filters. In this chapter, we further explore MPCU technique for optimised hardware realisation of digital FIR filter. To evaluate the performance, two different architectures of FIR filter are designed using Wallace tree multiplier. Both the architectures are coded in Verilog hardware description language (HDL). The performance is analysed by evaluating the resource utilisation and timing reports of Virtex-5 FPGA generated by the Synopsys Synplify Pro tool. Based on the implementation results, as compared to conventional design, Wallace tree multiplier using carry skip adder (CSKA) provides optimal digital FIR filter.


Introduction
Digital filters play an important role in many digital signal processing (DSP) applications. These applications range from noise reduction, spectral shaping, equalisation, signal detection and signal analysis, etc. The basic building blocks of digital filter are adder, multiplier and register based delay elements. Based on the application requirement, these blocks are connected to realise a particular architecture of filter. There are several ways to realise digital filters. Two such filters used in different applications are finite impulse response (FIR) and infinite impulse response (IIR) filters. FIR filters are widely preferred for DSP applications because they are always stable, exhibit linear phase properties and provide no feedback. Convolution, the core operation of FIR filter, performed on a window of N data samples involves multiplication and addition. For optimal realisation of FIR filter, these arithmetic operation needs to be optimised.
Direct form is the most commonly used FIR filter. As can be seen from Figure 1, N-tap or (N-1) th order FIR filter consist of N multipliers, N-1 adders and N shift registers. The tap coefficients, {W 0 , W 1 , W 2 ,……,W N-1 } constitute the filter impulse response. The filter type (low pass, high pass or band pass) is determined by these coefficients.
Different techniques for the field programmable gate array (FPGA) realisation of FIR filter using microprogrammed control unit (MPCU) have been reported in the literature [1][2][3]. Multipliers and adders play a dominant role in the optimal realisation of FIR filters [4,5]. The objective of this chapter is to further explore this technique using Wallace tree multiplier with different adder configurations for optimal realisation of FIR filter [6]. The proposed design is modular and scalable which enables realisation of higher-order FIR filter.
The rest of the chapter is organised as follows. Section 2 presents two different designs of MPCU-based FIR filters. Section 3 describes the design of Wallace tree multiplier using two different adder configuration. Section 4 presents the FPGA implementation results and its analysis. Finally, Section 5 concludes the chapter.

MPCU based FIR filter architectures
The FIR filter top-level module as shown in Figure 2 consists of a datapath unit and a control unit. The control unit is realised using the microprogrammed approach. The MPCU consist of two main parts, the first part addresses the microinstructions stored in the control memory while the second part holds and generates microinstruction for the datapath unit [1][2][3].   The first architecture of N-tap FIR filter as shown in Figure 3 comprises of a control and datapath units. The control signals generated by MPCU are fed to the datapath unit. For demonstration, the sequence of operation for a third-order FIR filter is listed in Table 1 [1]. Control Theory in Engineering 4 The datapath consist of the following modules: For (N-1) th order FIR filter, the datapath unit of second architecture uses N multipliers and N-1 adders. In addition to the multiplier and adder, the datapath also need the following modules for proper functioning of FIR filter as illustrated in Figure 4. For illustration, the sequence of operation for third-order FIR filter in this case is listed in Table 2 [2].

Wallace tree multiplier
To overcome the drawbacks associated with conventional array multiplier, tree multiplier is considered. Wallace tree is one such implementation of adder tree that results in high speed. A conventional Wallace tree multiplier uses half and full adders to multiply two numbers in three steps as shown in Figure 5 [4]. First step is to multiply each bit of n-bit multiplicand with every bit of n-bit multiplier to yield n 2 results. Each bit carry different weights based on the position of the generated bits. The second step involves reduction of partial products using full and half adders. This process continues until two layer of partial products remain. In the last step, the remaining two layers of partial product are added using conventional adder [5].
In this chapter, two different variants of Wallace tree multiplier are realised. First variant uses conventional full and half adders, while a carry skip adder (CSKA) is used in the second variant.
Carry look ahead adder (CLA) provides high-speed computation but at the cost of high power and high area. To overcome the drawbacks of CLA, CSKA is used Control Theory in Engineering 8 which provides a balanced implementation [5]. A CSKA comprises of a basic ripple carry adder with a distinctive speed-up carry chain referred to as a skip chain. As shown in Figure 6, a skip chain comprises of AND gate and 2:1 MUX.

Results and analysis
All the top-level modules and sub-modules described in this chapter are coded in Verilog HDL using top-down hierarchical design methodology. The proposed designs are synthesised and implemented in Virtex-5 (xc5vlx50t-1ff1136) FPGA device using Synplify pro electronic design automation (EDA) tool [7]. The results are evaluated based on the slice look-up tables (LUTs), minimum period and maximum clock frequency of the target FPGA. Tables 3 and 4 summarise the implementation results for both the FIR filter architectures.
It can be inferred that, generally, the Wallace tree multiplier using conventional full and half adder consumes less FPGA slice LUT (area) but at the cost of higher minimum period (delay). In the first architecture, Wallace tree multiplier using CSKA  has the lowest minimum period. It is therefore concluded that the first FIR filter architecture using Wallace tree multiplier with CSKA provides optimal result. It is also observed that more FPGA resources are utilised as we increase the order of the filter.

Conclusion
In this chapter, we further explored the design of MPCU-based digital FIR filters. MPCU is a promising technique that could be utilised for optimal realisation of digital filters used in DSP systems. The overall performance of the FIR filter depends on the multiplier and adder used in the multiply-accumulate unit. Two different architectures of FIR filter were designed using Wallace tree multiplier employing two variants of adder, one using conventional full/half adders and the other using CSKA. All the designs were realised in Xilinx Virtex-5 FPGA using Synplify pro EDA tool. Based on the reports generated by the EDA tool, it is concluded that the design of first FIR filter using the Wallace tree multiplier with CSKA provides optimal result in comparison to the one using conventional full and half adders.