## We are IntechOpen, the world's leading publisher of Open Access books Built by scientists, for scientists

6.900

186,000

Our authors are among the

most cited scientists

12.2%



WEB OF SCIENCE

Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI)

Interested in publishing with us? Contact book.department@intechopen.com

> Numbers displayed above are based on latest data collected. For more information visit www.intechopen.com



## Radio-Frequency (RF) Beamforming Using Systolic FPGA-based Two Dimensional (2D) IIR Space-time Filters

Arjuna Madanayake, *Member*, *IEEE*, and Leonard T. Bruton, *Fellow*, *IEEE Multidimensional Signal Processing Group, University of Calgary Calgary*, *Alberta*, *Canada (hmadanay, bruton@ucalgary.ca)* 

#### 1. Introduction

Plane-waves are far-field solutions to (1) the vector wave equation, for the case of electromagnetic waves, (2) the scalar wave equation, for the case of longitudinal pressure waves in seismic, acoustic, and ultrasonic systems, as well to (3) linear surface waves, such as those created by dropping a pebble into still the waters of a pond. Far-field beamforming refers to the highly-selective directional enhancement of propagating spatio-temporal planewaves based on their directions-of-arrival (DOA).

The directional enhancement (beamforming) of electromagnetic plane-waves is of importance in many areas of electrical engineering, such as in wireless communications, radar and radio-frequency (RF) imaging (multi-GHz range). Of particular importance are applications in radio-astronomy and space physics (Van Ardenne,2000), where far-field beamforming is increasingly employed in aperture arrays, and in wireless mobile voice and data communication systems. In the case of data communications, beamforming is used for mitigating the fading effects of multipath propagation (Litva and Lo,1996; Liberti Jr. and Rappaport,1999; Huseyin Arslan, Zhi-Ning Chen and Maria-Gabriella Di Benedetto,2006) in satellite-borne remote sensing applications involving synthetic aperture and Doppler radar, in navigation and location devices based on GPS (Silva, Worrel and Brown), as well as in various ultra-wideband location technologies (Ghavami, Michael and Kohno).

Traditionally, receiver-side beamforming has been achieved using highly-directional receiving antennas, typically employing parabolic reflectors and horns, antenna array configurations with passive phasing networks (such as delay-and-sum networks and phased-array feeds) and reflect-arrays (Hum, Okoniewski and Davies). Digital beamforming algorithms are often based on fractional-delay steering algorithms and/or finite impulse response (FIR) digital filters (Ghavami, Michael et al.; Johnson and Dudgeon,1993; Liberti Jr. and Rappaport,1999; Staderini,2002; Huseyin Arslan, Zhi-Ning Chen et al.,2006; J. Roderick, H. Krishnaswamy, K.Newton et al.,2006). The use of digital signal processing (DSP) in far-

field broadband beamforming for smart antenna array applications is currently receiving much attention, mainly due to the continuously increasing availability of digital programmable logic and custom silicon fabrication technologies that are gradually enabling the typically high levels of real-time computational throughputs necessitated by such DSP-based broadband smart antenna arrays.

In this contribution, we describe a particular type of recently proposed far-field beamformer that is based on two-dimensional (2D) space-time digital filters having infinite impulse responses (IIRs) (Ramamoorthy and Bruton; Agathoklis and Bruton,1983; Bruton and Bartley,1985). Unlike the more widely-used DSP-based 2D FIR beamformers, the described 2D IIR beamformers have 2D **z**-domain transfer-functions  $H(z_1, z_2) \equiv N(z_1, z_2) / D(z_1, z_2)$ having pole-manifolds, as well as zero-manifolds, in the 2D complex plane  $\mathbb{C}^2$ . Further, for beamforming applications,  $D(z_1, z_2)$  must be non-separable, implying non-trivial design challenges to avoid multidimensional instability and computability constraints (such challenges are not encountered in 1D design or if  $D(z_1, z_2)$  is separable in the 2D case). However, suitable closed-form algebraic expressions for beamforming discrete-domain 2D IIR transfer functions are available in the literature that are computable, stable and realizable using hardware with lower complexity than FIR beamformers having similar directional selectivity, angular half-power bandwidth, etc. Furthermore, the existence of such closed-form algebraic transfer functions facilitates the real-time continuous steering of the direction of the beam and adjustment of its bandwidth, making these filters attractive for applications in emerging software-defined radio (SDR), microwave imaging and cognitive radio systems.

This paper is organized as follows: in section 2, we provide a brief review of space-time plane-waves and their properties in the 2D space-time frequency domain, followed by section 3, where we provide a comprehensive review of broadband plane-wave digital filter design. Thereafter, in section 4, we discuss practical hardware implementations starting from 2D difference equations that lead to signal flow graphs and massively-parallel systolic-array hardware realizations. Section 5 describes some recent progress we have achieved in prototyping these new systolic-array circuits using field programmable gate array (FPGA) technology. Finally, in section 6 we describe recently available technology and electronic design automation (EDA) tools that may eventually lead to 2D IIR beamformers that operate in real-time for various broadband microwave beamforming applications.

#### 2. Electromagnetic Plane-waves in Space-time

We consider here either the transverse electric field  $E_y(x,y,z,ct)$  or magnetic field  $H_x(x,y,z,ct)$  of a propagating electromagnetic plane wave, where  $(x,y,z,ct) \in \mathbb{R}^4$  is the 4D space-time continuous-domain,  $(x,y,z) \in \mathbb{R}^3$  is 3D space,  $t \in \mathbb{R}$  is time and  $c \approx 3 \times 10^8 \ ms^{-1}$  is the speed of light in air/vacuum. By analogy with 3D planes, the equation

$$\alpha_1 x + \alpha_2 y + \alpha_3 z + ct = \pm \lambda, \ \alpha_{1,2,3} \in \mathbb{R}, \text{ and, } \sqrt{\sum_{k=1}^3 \alpha_k^2} \equiv 1$$
 (1)

is a 4D hyper-plane in the 4D continuous-domain  $(x, y, z, ct) \in \mathbb{R}^4$ . Propagating electromagnetic plane waves are 4D hyper-plane waves in (x, y, z, ct) given by

$$w(x, y, z, ct) = w_{PW} \left( \underbrace{\alpha_1 x + \alpha_2 y + \alpha_3 z + ct}_{\pm \lambda} \right), \qquad \sqrt{\sum_{k=1}^{3} \alpha_k^2} \equiv 1 \quad \forall \lambda \in \mathbb{R}^1$$
 (2)

and therefore have the property that they are constant-valued in each of the hyper-planes (1): that is, for each  $\lambda \in \mathbb{R}^1$ . Equivalently, for each value of  $\lambda$ ,  $w_{PW}(\pm \lambda)$  is a corresponding 4D iso-surface in (x,y,z,ct). In Fig. 1, we show the 4D plane wave  $w_{PW}\left(\alpha_1x+\alpha_2y+\alpha_3z+ct\right)$  in the 3D spatial domain  $(x,y,z)\in\mathbb{R}^3$  in an iso-plane which, by simply 3D geometry, is perpendicular distance ct from the origin, as shown. In 3D, we may therefore visualize the 4D space-time plane wave of equation (2) as an infinite set of such iso-planes  $w(\pm \lambda)$ , each of which is *propagating in* (x,y,z) *over time t* with speed c in a direction normal to the iso-planes. Depending on the 1D spectral properties of the c-scaled temporal signal  $w_{PW}(ct)$ , the plane wave might be temporally-narrowband or temporally-broadband.

Note that, for the case of the *ideal* plane wave, the region of support (ROS) of equation (2) in  $(x, y, z, ct) \in \mathbb{R}^4$  extends, in general, to infinity in at least some directions in  $\mathbb{R}^4$ . Equation (2) represents either the electric or magnetic field of the plane wave in 4D space-time. In this chapter, we are only concerned with the values of the 4D plane wave signal as received on a straight line in (x, y, z). Therefore, we consider only the special case of the 2D space-time

representation for which equation (2) reduces to the form  $w(x,0,0,ct) = w_{pw}\left(\underbrace{\alpha_1 x + ct}_{\lambda}\right)$  for signals on the x-axis. With the DOA in 3D space defined by the angles  $\theta_o$  (measured on x - z plane) and  $\psi_o$  as shown in Fig. 1, it is easily shown that (2) may be written in the form

$$w(x, y, z, ct) = w_{PW} \left( \underbrace{\left(-\sin\theta_o \cos\psi_o\right) x + \left(\cos\theta_o \cos\psi_o\right) z + \left(\sin\psi_o\right) y + ct}_{\lambda} \right)$$
(3)

with  $-\pi/2 < \theta_o$ ,  $\psi_o \le \pi/2$ , from which it follows that the corresponding 2D space-time plane wave signal *received on the x-axis* is given by

$$w(x,ct) = w_{PW} \left( \underbrace{\left(-\sin\theta_o\cos\psi_o\right)x + ct}_{\lambda} \right)$$
 (4)

The 4D hyper-planar iso-*surfaces* of constant  $\lambda$  in (3) become 2D iso-lines of (4) in  $(x,ct) \in \mathbb{R}^2$  given by

$$\left(-\sin\theta_o\cos\psi_o\right)x + ct = \lambda \tag{5}$$

As shown in Fig. 2, the space-time direction of the 2D space-time plane wave is defined by the normal to these contours and is given by (Gunaratne and Bruton; Khademi)

$$\theta = \tan^{-1}(\sin\theta_o\cos\psi_o) \tag{6}$$

with respect to the ct-axis in the 2D Cartesian space-time domain (x,ct), where  $\sin\phi = \sin\theta_o \cos\psi_o$ . Note from (6) that the 2D spatio-temporal direction  $\theta = \tan^{-1}(\sin\phi)$  is constrained by  $-\pi/4 \le \theta < \pi/4$  where the extreme values  $\pm\pi/4$  occur where  $((\theta_o = \pm\pi/2) \text{ and/or } \psi_o = 0$ . These extreme directions are known as the 'end-fire' angles, corresponding to plane waves having DOAs in 3D space that are in the direction of the x-axis. The so-called 'broadside' DOAs, with respect to the x-axis, are those directions for which the 4D space-time signal in (3) is constant everywhere on the x-axis at all instants of time t, which corresponds to the 2D space-time direction  $\theta = 0$  and is equivalent to the DOAs in 3D space given by  $(\theta_o = 0, \pi)$  and/or  $\psi_o = \pm\pi/2$ .



Fig. 1. The direction of arrival (DOA) of a plane-wave in 3D space,  $(\theta_o, \psi_o)$ , and the appararent spatial DOA seen by a linear array along x-axis,  $\phi$ .



Fig. 2. Propagating plane-wave in 3D space (a), 2D spatial view on the y = 0 plane (b), 2D spatio-temporal DOA (c), and region of support (ROS) on 2D frequency-domain, aligned along the spatio-temporal DOA (d).

### 2.3 On The Region of Support (ROS) of the 2D Fourier Transform of 2D space-time Plane Waves

Given the 2D Fourier transform pair for 2D space-time plane waves  $w_{PW}\left(-x\sin\theta+ct\right) \Leftrightarrow W_{PW}\left(e^{j\omega_x},e^{j\omega_{ct}}\right)$ , it may be shown (Bruton and Bartley,1985) that the Region of Support (ROS) of the spectrum  $\left|W_{PW}\left(e^{j\omega_x},e^{j\omega_{ct}}\right)\right|$  in the 2D Cartesian frequency domain  $(\omega_x,\omega_{ct})$  is confined to the straight line

$$\omega_{x} + \omega_{ct} \sin \theta = 0 \tag{7}$$

which passes through the origin and subtends angle  $\theta$  to the  $\omega_{ct}$  axis. It lies on the  $\omega_{ct}$  axis for broadside DOAs and on  $\omega_{ct} = \pm \omega_x$  for end-fire DOAs. Importantly therefore, the ROS of all 2D space-time electromagnetic plane wave signals, propagating at speed c, cannot lie outside the 90-degree wide 2D fan-shaped region  $|\omega_x| \leq |\omega_{ct}|$  in  $(\omega_x, \omega_{ct})$ .

### 2.4 Spectral-Filtering a Desired 2D Space-time Plane-wave in the Presence of other Plane Waves and Noise

Typically, the signal received on the x-axis may be represented by M multiple broadband plane-waves, each having a different orientation  $\theta_{o,k}, \psi_{o,k}, k=0,1,2,...(M-1)$ , in 2D spacetime, and additive 2D noise. We assume the first plane wave, given by k=0 is the desired plane wave to be recovered by filtering. Then the received signal may be written in the form

$$w(x,ct) = \sum_{k=0}^{M-1} w_{PW,k}(-x\sin\phi_k + ct) + n_v(x,ct)$$
 (8)

Where  $\sin \phi_k = \sin \theta_{o,k} \cos \psi_{o,k}$ , and where  $n_v(x,ct)$  represents 2D space-time noise. The Fourier transform of (8) is therefore given by

$$W(e^{j\omega_x}, e^{j\omega_{ct}}) = \sum_{k=0}^{M-1} W_{PW,k}(e^{j\omega_x}, e^{j\omega_{ct}}) + N_{\nu}(e^{j\omega_x}, e^{j\omega_{ct}})$$
(9)

where  $N_v(\omega_x,\omega_{ct}) \stackrel{2D}{\Leftrightarrow} n_v(x,ct)$ . Typically,  $n_v(x,ct)$  corresponds to non-plane-wave electromagnetic propagating interference or other sources of 2D broadband noise, modelled as additive white Gaussian noise (AWGN). Therefore, the 2D ROS of the noise spectrum  $|N_v(\omega_x,\omega_{ct})|$  is typically uniform throughout the 2D frequency-domain  $(\omega_x,\omega_{ct}) \in \mathbb{R}^2$ . The ROS of  $W(e^{j\omega_x},e^{j\omega_{ct}})$  therefore consists of the uniform ROS of  $|N_v(\omega_x,\omega_{ct})|$  and M lines through the origin, where the orientation of each line is given by the M different angles  $\theta_k = \tan^{-1}(\sin\phi_k)$ .

For notational convenience in the rest of the chapter, we will use  $\omega_1 \equiv \omega_x$  as the spatial frequency variable, and  $\omega_2 \equiv \omega_{ct}$  as the time-frequency variable corresponding to spacetime ct.

#### 2.5 Beamforming of a Broadband Plane Wave using Space-time Filters

The simplest of 2D space-time plane-wave filter is known as a 'frequency-planar beam' filter (Bruton and Bartley,1985), because it's passband lies on a line-through the origin and ideally has a 'beam' shaped 2D passband of uniform width. The beam-shaped 2D passband is oriented to enclose the ROS of the 2D spectrum of the desired plane-wave over its full temporal bandwidth while attenuating all spectra away from this narrow passband. Such beam filters can be realized using several methods involving both FIR or IIR digital filters (Gunaratne and Bruton; Khademi; Bruton and Bartley,1985). FIR filters are inherently stable but are of high arithmetic complexity due to the relatively higher order of the filter that is required for a given selectivity, relative to approximately-equivalent IIR filters. However, the latter are not as straightforward to design and to implement.

The focus of this chapter is on the design and real-time hardware implementation of a first-order 2D IIR beam digital plane-wave filter.

#### 2.6 Effects of 2D Space-time Sampling using a Uniform Linear Array (ULA)

The 2D continuous-domain space-time signal w(x,ct) is sampled in space using a number of equi-spaced antennas along the x-axis and sampled in time to yield the corresponding 2D sampled space-time signal  $w(n_1\Delta x, n_2c\Delta T_{CLK})$ ,  $n_{1,2}=0,1,2,3,...$ , and where. In order to prevent undesirable spatial aliasing, such uniform linear arrays (ULAs) of antennas require that the uniform distance between antennas satisfy the Nyquist condition (the distance between antennas is  $\Delta x \le c \Delta T$ , where  $f_U \equiv 1/\Delta T$  is the upper temporal frequency of the input signal beyond which its spectrum lacks support). For N antennas, the 2D spatiallysampled continuous-time antenna array signals are given by  $w(n_1\Delta x, ct), n_1 = 0, 1, ..., N-1$ . The 2D spectrum of  $w(n_1\Delta x, ct)$  replicates on the spatial-frequency  $\omega_1$  axis with periodicity  $2\pi$ . The N continuous-time signals  $w(n_1\Delta x, ct)$  are amplified, using a low noise amplifier having the required temporal bandwidth and noise performance, then low-pass filtered prior to A/D conversion, resulting in the 2D discrete-domain antenna signal  $w(n_1\Delta x, n_2c\Delta T_{CLK})$ . The ADC clock frequency  $F_{CLK} = 1/\Delta T_{CLK}$  where is chosen by selecting the inter-sample time  $\Delta T_{\it CLK} \leq \Delta T$  such that Nyquist sampling theorem is satisfied in both the spatial and temporal frequency domains. Usually,  $\Delta T_{CLK} = K_U \Delta T$ ,  $K_U > 1$ , where  $K_U$ is the so-called temporal oversampling factor and is sometimes necessary for minimizing the frequency-warping effects that are introduced in the design of the digital filter transfer function.

Methods have been proposed and implemented for significantly reducing the required number of antennas without significantly reducing performance, for wireless communications and other applications. These methods also lead to much reduced arithmetic complexity of the filter and are based on allowing a controlled amount of multidimensional spatial aliasing and thereby spatial under-sampling, as reported in (Khademi and Bruton; Madanayake, Hum and Bruton).

#### 3. First-order 2D IIR Frequency-Beam Plane-wave Filter

In the above, it is established that the directional enhancement of an ideal desired space-time plane-wave may be achieved using a 2D space-time filter having a 2D passband that encompasses the line-shaped ROS of the desired plane-wave signal. Further, undesired signals, such asother plane-waves and/or noise may be attenuated by ensuring that the ROS of the 2D stopband correspondas to the ROS of the spectrum of the undesired signals. Although this approach has been extended to 3D and 4D space-time signals (Kuenzle and Bruton; Bolle,1994; Bruton,2003; Dansereau,2003; Kuenzle and Bruton,2005; Dansereau and Bruton,2007), here we focus on the simplest 2D case by describing the design and implementation of a suitable 2D filter.

#### 3.1 The Prototype Resistively-terminated 2D Passive Frequency-Beam Network

Consider a 2D first-order continuous-domain inductance-resistance network (Bruton and Bartley,1985), where  $s_1$  and  $s_2$  are spatial and temporal Laplace variables (Dudgeon and Mersereau,1990; Johnson and Dudgeon,1993; Schroeder and Blume,2000), respectively. The input-output 2D Laplace voltage transfer function of this network is given by

$$T(s_1, s_2) = \frac{R}{R + L_1 s_1 + L_2 s_2} \equiv \frac{Y(s_1, s_2)}{W(s_1, s_2)}$$
(10)

where the parameters  $L_1 \geq 0, L_2 \geq 0$ , and R > 0 correspond to a passive *spatial* inductor, passive *temporal* inductor and passive resistance, respectively, with transform inputs and outputs  $W(s_1,s_2)$  and  $Y(s_1,s_2)$ , respectively. We denote the respective transform pairs by  $w(x,ct) \Leftrightarrow W(s_1,s_2)$  and  $y(x,ct) \Leftrightarrow Y(s_1,s_2)$ , respectively. The steady-state input-output frequency-response of (10) is found by setting  $s_1 = j\omega_1$  and  $s_2 = j\omega_2$ , leading to the 2D frequency response transfer function

$$T(j\omega_1, j\omega_2) = \frac{R}{R + j(L_1\omega_1 + L_2\omega_2)} = \frac{Y(j\omega_1, j\omega_2)}{W(j\omega_1, j\omega_2)}$$
(11)

From (5), the network under consideration is 2D resonant on the 2D line-shaped region

$$\omega_1 L_1 + \omega_2 L_2 = 0 \tag{12}$$

passing through the frequency-origin (Note: *In 2D, capacitors are not required to induce resonance*). At all finite frequencies where (12) is satisfied (i.e. throughout the 2D passband), network energy resonates between the two inductance elements and  $T(j\omega_1, j\omega_2)$  is unity. By choosing  $L_1 = \cos\theta$  and  $L_2 = \sin\theta$ ,  $0 \le \theta \le 90^\circ$ , we can orient the axis of the 2D passband to the angle  $\theta$ . A typical response is shown in Fig. 3. The shape of the 2D gain  $|T(j\omega_1, j\omega_2)|$  of the filter may be envisaged in 2D frequency space by noting that  $L_1\omega_1 + L_2\omega_2 = \gamma$  describes, for constant  $\lambda$ , a line that is parallel to the 2D passband and along which  $|T(j\omega_1, j\omega_2)|$  is constant and less than unity. Importantly,  $|T(j\omega_1, j\omega_2)|$  decreases monotonically with increasing values of  $|\lambda|$  with the two -3dB lines having gain 0.707 given by  $\lambda = \pm R$ . We make the following summary observations (see (Bruton and Bartley,1985) for details):

- 1. At 2D resonance, that is on the frequency-line in  $(\omega_1, \omega_2)$  where  $\gamma = 0$ , the transfer function  $T(j\omega_1, j\omega_2) = 1$ . This defines the 2D passband axis and unity gain on the centre of the beam-shaped passband.
- 2. Along all directions orthogonal to the passband axis in  $(\omega_1, \omega_2)$ , the magnitude and phase frequency response of the filter correspond to that of a first-order low-

pass transfer function, monotonically-decreasing with distance from the passband axis.

3. The uniform –3dB bandwidth of the beam is given by  $\omega_{-3dB} = R / \sqrt{L_1^2 + L_2^2}$ .

#### 3.2 The Transfer-functions of the First-order Beam Filter in the 2D s- and z- Domains

Although the inverse 2D Laplace transform of equation (10) yields a continuous-domain partial differential equation for the input-output transfer-function, practical implementations have so far been in the discrete-domain of 2D finite-difference equations, implemented in the form of digital circuits. Transformation to the discrete-domain is z = 1

achieved by applying the normalized 2D bilinear transform (2D BLT)  $s_k = \frac{z_k - 1}{z_k + 1}, k = 1, 2$ , to

equation (10) leads, after considerable algebraic manipulation (Bruton and Bartley,1985), to the 2D **z**-transform transfer function

$$H(z_{1}, z_{2})|_{T\left(\frac{z_{1}-1}{z_{1}+1}, \frac{z_{2}-1}{z_{2}+1}\right)} = \frac{\left(1+z_{1}^{-1}\right)\left(1+z_{2}^{-1}\right)}{1+\left(b_{10}+b_{11}z_{2}^{-1}\right)z_{1}^{-1}+b_{01}z_{2}^{-1}} \equiv \frac{Y(z_{1}, z_{2})}{W(z_{1}, z_{2})}$$

$$(13)$$

where  $W(z_1,z_2) \stackrel{2D}{\Leftrightarrow} w(n_1 \Delta x, n_2 c \Delta T_{CLK})$  and  $Y(z_1,z_2) \stackrel{2D}{\Leftrightarrow} y(n_1 \Delta x, n_2 c \Delta T_{CLK})$ , respectively, and where  $b_{ij} = \left(R + (-1)^i L_1 + (-1)^j L_2\right) / \left(R + L_1 + L_2\right)$ . The above application of the 2D BLT, which is a conformal mapping between the 2D Laplace and 2D **z**-domain, results in a distortion of the high frequency part of the 2D passband, known an bilinear warping, that leads to a practical limitation of the upper frequency  $\left(0.5\pi < |\omega_2| < \pi\right)$  of the beam-shaped passband. This effects of this limitation may be avoided by suitable temporal and/or spatial oversampling of the input signal.

For example, here we shall employ a temporal over-sampling factor of 2 for which we show in Fig. 4 the correpsonding 'weakly-warped' magnitude response of the discrete-domain frequency-response transfer-function over the useful range  $|\omega_2| \le 0.5\pi$ .



Fig. 3. The 2D continuous-domain steady-state Magnitude Frequency Response  $|T(j\omega_1, j\omega_2)|$  of the filter in (10) for  $|\omega_k| \le \pi, k = 1, 2$ , and R = 0.1,  $L_1 = \cos(30^\circ)$ ,  $L_2 = \sin(30^\circ)$ .



Fig. 4. A Beam-shaped Response that is warped by the 2D BLT, shown in the usable range  $-\pi \le \omega_1 \le \pi$  and  $-0.5\pi \le \omega_2 \le 0.5\pi$ . The beam shape at frequencies  $\left(0.5\pi < |\omega_2| < \pi\right)$  are not used in our application because the beam-shape is significantly off-axis, due to binear warping. The interested reader is referred to (Madanayake and Bruton; Bruton,2003) for details.

#### 3.3 On Implementing the 2D Difference-Equations Using Differential Operators

Taking the inverse 2D **z**-transform of (10) leads to the 2D space-time input-output direct-form difference-equation, which we have shown can be implemented in massively-parallel systolic-array hardware for real-time filtering applications. However, the recently proposed hybrid-form signal flow graph (Madanayake, Hum et al.; Madanayake,2008), although relatively complicated to design, offers lower complexity and high-speeds of operation than direct-form methods. In this paper, we pursue the hybrid-form signal flow graph method. The 2D **z**-domain transfer-function that leads to the so-called hybrid-form structure (an architecture that enjoys the high-speeds of operation of direct-form structures while being as low in complexity as so-called differential-form structures) can be obtained by methods available in the literature (Madanayake, Hum et al.; Madanayake and Bruton,2007; Madanayake,2008). For example, we may write a suitable transfer function in terms of the spatial differential operator  $z_{1D}^{-1} \equiv z_1^{-1}/(1+z_1^{-1})$  as

$$H(z_{1}, z_{2}) = \frac{1 + z_{2}^{-1}}{1 - \alpha \underbrace{\frac{z_{1}^{-1}}{1 + z_{1}^{-1}}}_{z_{1}^{-1}} \left(1 + z_{2}^{-1}\right) + \beta z_{2}^{-1}} = \frac{Y(z_{1}, z_{2})}{W(z_{1}, z_{2})}$$
(14)

where we require

$$\alpha = \frac{2\cos\theta}{R + \cos\theta + \sin\theta}, \quad \beta = 1 - \frac{2\sin\theta}{R + \cos\theta + \sin\theta}$$
 (15)

Note that the passband gain in (14) is scaled by the constant  $\frac{R}{R+L_1}$ , relative to the direct-form case (Bertschmann, Bartley and Bruton; Madanayake, Hum et al.; Liu and Bruton,1989; Madanayake and Bruton,2007; Madanayake,2008), and is ignored in the following because it is not of practical significance. Re-writing the direct-form transfer-function using the spatial-differential operator results in just two filter coefficients in the denominator of (14) instead of three, implying a 33% reduction in the number of parallel hardware multipliers required in circuit realizations, relative to direct-form realizations.

The difference-equation realizations described here lead to practical-bounded-input-bounded-output (practical-BIBO) stable (Agathoklis and Bruton,1983) performance under finite precision arithmetic, assuming zero initial conditions (ZICs) for both spatial and temporal iterations. Methods that guarantee ZICs are considered later in the following.

### 4. A Real-time High Throughput Implementation using the Hybrid-form Systolic-Array Processor Architecture

Systolic-array processors are massively-parallel computers having identical, synchronously clocked, fully-pipelined, high throughput identical processing elements, which are connected in a linear- or meshed-array configuration



Fig. 5. Overview of the plane-wave filter implementation consisting of N parallel processing core modules (PPCMs), which have an internal signal flow graph based on the hybrid-form transfer-function in equation (14).

(Sid-Ahmed; Kung,1988a; Kung,1988b; Shanbhag,1991; Rader,1996; Zajc, Sernec and Tasic,2000). Such systolic-array processors are modular, regular, and locally interconnected, making them well-suited for real-time signal processing using application-specific VLSI hardware for digital signal processing applications at radio-frequency.

Research on novel systolic-array architectures for 2D/3D IIR frequency-planar digital planewave filters for beamforming applications has lead to field-programmable gate-array (FPGA) based single-chip multiprocessor implementations capable of real-time operation at a sustained arithmetic throughput of one-frame-per-clock-cycle (OFPCC), a requirement for real-time plane-wave filtering at RF using linear- or rectangular-arrays of antenna elements (Hum, Madanayake and Bruton; Madanayake, Bruton and Comis; Madanayake and Bruton; Madanayake and Bruton; Madanayake, Hum et al.; Madanayake, Hum and Bruton; Madanayake,2004; Madanayake,2008; Madanayake and Bruton,2008). The required OFPCC throughput rate, required for multi-GHz implementations, arises due to the fact that the signals of interest are of ultra-wide RF bandwidth, which leads to Nyquist sample rates that are at least twice the full RF bandwidth of the signal.

The beamformers therefore directly sample RF signals from the antennas without down-conversion (or bandpass sampling), and leads to frame sample rates in the GHz. Such excessively-high frame sample rates (multiple GHz) make software-based realizations infeasible using traditional DSP technologies. Our research indicates (Madanayake and Bruton; Madanayake, 2008) that massively-parallel synchronously-clocked, speed-optimized,

fully-pipelined systolic-array processors are currently the best available solution for the broadband real-time DSP-based radio-frequency (RF) beamforming applications using sampled antenna arrays (Arnold Van Ardenne; Ellingson,1999; Liberti Jr. and Rappaport,1999; Weem, Noratos and Popovic,1999; Frederick, Wang and Itoh,2002; Do-Hong and Russer,2004; Rodenbeck, Sang-Gyu, Wen-Hua et al.,2005; Madanayake,2008; Devlin,Spring 2003).

#### 4.1 Overview of the Architecture

The massively-parallel systolic-array architecture consists of an array of identical parallel-processing core-modules (PPCMs), sometimes called "processing elements" in the literature (Kung,1988a). Each PPCM is dedicated to processing an antenna element, and signals from all N elements are amplified using a low-noise amplifier (LNA), low-pass filtered (LPF), and time-synchronously sampled using N identical analog signal processing chains. The PPCMs and analog-to-digital converters (ADCs) are clocked using a single-phase master clock signal, of frequency  $F_{CLK} = 1/\Delta T_{CLK}$ . The PPCMs are derived using the recently-proposed hybrid-form signal flow graph having the required **z**-domain transfer-function (14) (Hum, Madanayake et al.; Madanayake, Hum et al.).

The PPCMs that comprise the systolic-array processor are fully-parallel, speed-maximized, fully-pipelined, multi-input-multi-output (MIMO) processors, each consisting of 2 input ports and 2 output ports. A PPCM at spatial location  $n_1$  has its input port A connected to the ADC at location  $n_1$  and input port B connected to the output port C of the PPCM at location  $n_1 - 1$ . Port D provides the computed output signal  $y(n_1\Delta x, n_2c\Delta T_{CLK})$  for spatial location  $n_1$ .

#### 4.2 Inter-PPCM and Intra-PPCM Pipelines

The PPCMs are pipelined such that signals entering through the input ports  $A_{n_1}$  and  $B_{n_1}$  undergo p additional clocked delays, as a result of internal pipelining. These additional delays can be compensated by delaying the input signal  $A_{n_1+1}$  by  $(n_1+1)p$  clock cycles, leading to a delay of  $(n_1+1)\Delta T_x$  seconds, where  $\Delta T_x = p\Delta T_{CLK}$  is the pipelining latency of a PPCM. For N PPCMs, the final output signal at the output of the  $N^{th}$  PPCM therefore undergoes a pipelining delay of Np clock cycles. When 2D space-time output signals are required, the output signals from each PPCM must be fed through additional clocked FIFOs, having depth  $(N-1-n_1)p$ , so that the signals at all spatial output locations are uniformly delayed by Np clock cycles. The implemented transfer-function is therefore modified, in the presence of pipelining, to the linear phase-delayed form  $H(z_1,z_2)z_2^{-Np}$  which has no effect on the magnitude frequency-response function (because  $|e^{-j\omega_2Np\Delta_{CLK}}|=1$ .)



Fig. 6. Hybrid-form PPCM circuit having 2 inputs, 2 outputs, 4 two-input parallel adder/subtractors, and 2 parallel hardware multipliers.

#### 4.3 Design of the Hybrid-form PPCMs

Having obtained a basic overview of the systolic-array architecture, we now derive the internal signal flow and internal components of each PPCM. Recall the 2D hybrid-form transfer-function (14):

$$H(z_{1}, z_{2}) = \frac{1 + z_{2}^{-1}}{1 - \alpha \frac{z_{1}^{-1}}{1 + z_{1}^{-1}} \left(1 + z_{2}^{-1}\right) + \beta z_{2}^{-1}} \equiv \frac{Y(z_{1}, z_{2})}{W(z_{1}, z_{2})}$$
(16)

Cross-multiplying terms in (16), we get the 2D z-domain input-output form, given by

$$(1+z_2^{-1})W(z_1,z_2) = \left(1-\alpha \frac{z_1^{-1}}{1+z_1^{-1}}1+z_2^{-1}+\beta z_2^{-1}\right)Y(z_1,z_2),\tag{17}$$

leading to

$$Y(z_{1}, z_{2}) = \frac{\left(W(z_{1}, z_{2}) + \alpha \frac{z_{1}^{-1}}{1 + z_{1}^{-1}} Y(z_{1}, z_{2})\right)}{1 + \beta z_{2}^{-1}} \left(1 + z_{2}^{-1}\right)$$
(18)

Multiplying both sides by  $z_2^{-p}$  yields the required form



Fig. 7. Interconnections between PPCMs, shown here in the mixed domain  $(n_1, z_2) \in \mathbb{ZC}$ , leads to the massively-parallel systolic-array processor implementation of the beam planewave filter.

$$Y(z_{1}, z_{2})z_{2}^{-p} = \frac{\left(W(z_{1}, z_{2})z_{2}^{-p} + \alpha z_{2}^{-p} \frac{z_{1}^{-1}}{1 + z_{1}^{-1}}Y(z_{1}, z_{2})\right)}{1 + \beta z_{2}^{-1}} \left(1 + z_{2}^{-1}\right)$$
(19)

Computing the inverse z1-transform of (19) under spatial ZICs, we obtain the 2D mixed-domain  $(n_1, z_2) \in \mathbb{ZC}$  form, given by

$$Y(n_1, z_2)z_2^{-p} = \frac{\left(W(n_1, z_2)z_2^{-p} + \alpha z_2^{-p}U(n_1, z_2)\right)}{1 + \beta z_2^{-1}} \left(1 + z_2^{-1}\right)$$
(20)

where

$$U(n_1, z_2) = Y(n_1 - 1, z_2)z_2^{-p} - U(n_1 - 1, z_2)z_2^{-p}$$
(21)

and, where  $z_2^{-p}$  is the z<sub>2</sub>-transform of the internal pipelining delays at each PPCM. Because the depth of pipelining is arbitrary, the numerator of (20) can be pipelined at will using straightforward 1D FIR filter pipelining methods, noting that this 1D FIR section has two terms  $W(n_1, z_2)$  and  $U(n_1, z_2)$  which are obtained using digital ports  $A_{n_1}$  and  $B_{n_1}$ , respectively. Equations (20) and (21) describe the 2-input-2-output z<sub>2</sub>-domain transfer-functions of a PPCM at location  $0 \le n_1 < N - 1$ . The hybrid-form signal flow-graph is thereby obtained, and is shown in Fig. 6. The first 3 PPCMs in the systolic-array are shown in Fig. 7 as an interconnection of processors.

#### 4.4 A Pipelining Example

In order to familiarize the reader with pipelining concepts, we now provide a simple example where it is assumed that p=12 internal pipelining stages are sufficient for achieving the required throughput. The 12 stage pipeline will be distributed as follows: the multiplier  $\alpha$  will consist of 3 level pipelining; the three 2-input adders/subtractors denoted A1, A2, and A3, are to have 3 levels of pipelining. It is important to ensure that all signal components that connect to a particular 2-input adder/subtractor undergo equal delays. This is essential for correct operation, and must be satisfied for all pipelined designs.

The hybrid-form signal-flow graph does not allow pipelining of A4, because only one unit-delay buffer is available in the first-order feedback loop, which is usually absorbed inside the parallel logic of multiplier  $\beta$ . Provided all feed-forward paths are fully pipelined, the critical path delay of the hybrid-form PPCM cannot be reduced beyond  $T_{CPD} \approx T_{Mul} + T_{A/S}$  where  $T_{Mul}$  and  $T_{A/S}$  are the propagation delays of a parallel multiplier and adder/subtractor circuit, respectively. The maximum speed of operation for a hybrid-form PPCM is therefore less than  $F_{CLK} \leq 1/\Delta T_{CPD}$  unless additional speed-optimization methods, based on look-ahead optimization, are employed. This method is discussed in the next section.



Fig. 8. Signal flow graph of a hybrid-form PPCM having 12 cycles of pipeline latency (arbitrarily chosen for the purpose of demonstration). The 12 clock-cycles of additional pipelining can be used as required to reduce the critical-path delay (CPD) of the systolic-array.

The pipelined version, having p=12 for the hybrid form PPCM, is shown in Fig. 8. We now describe look-ahead speed-optimization of the internal 1D temporal IIR digital filter section having transfer-function  $\frac{1+z_2^{-1}}{1+\beta z_2^{-1}}$  that enables much greater levels of real-time throughput at the cost of additional circuit complexity.

#### 4.5 Additional Look-Ahead Speed-Maximization

We extend well-known 1D İİR pipelining using "look-ahead" optimization, a method pioneered by Parhi et al (Parhi; Parhi,1991; Parhi,1999). Look-ahead is a method for reducing additional delays into a critical feedback loop and is based on pole-zero cancellation of 1D z-domain transfer-functions.

In section 4.4, we described an example for which 10 additional delays are distributed in the forward (that is, FIR, also known as feed-forward) signal paths of the PPCM, such that the critical path delay of the PPCM (and therefore, of the systolic-array) is reduced to the latency for a multiply-add-operation, denoted  $\Delta T_{CPD}$ . The speed-bottleneck for this example lies within the first-order feed-back IIR filter, which has a simple real-pole at  $z_2 = -\beta$  where it may be shown that  $|\beta| \le 1$  for passive filter network prototypes. Because this pole is within (or on) the unit circle  $|z_2| = 1$  the 1D IIR filter section is unconditionally stable (ignoring effects due to finite precision).

Let us further assume that our objective is to halve the critical path delay using look-ahead optimization of the IIR section. This can be achieved by increasing the number of internal delays in the first-order feedback loop to 2 (causing the feedback loop to increas in order): this may be easilty achieved by multiplying both numerator and denominator of

$$\frac{1+z_2^{-1}}{1+\beta z_2^{-1}} \text{ by } 1-\beta z_2^{-1} \text{ leading to the } 2^{\text{nd}} \text{ order section, given by } \frac{\left(1+z_2^{-1}\right)\left(1-\beta z_2^{-1}\right)}{1-\beta^2 z_2^{-2}} \text{ leading to }$$

a new critical path delay in the feedback loop  $T_{CPD,LA} \approx T_{CPD}$  / 2, implying an almost 100% increase in the maximum speed of operation (Parhi; Parhi; Parhi and Messerschmitt; Parhi and Messerschmitt; Sundarajan and Parhi; Parhi and Messerschmitt,1989; Parhi,1991; Parhi,1999). This "look-ahead" speed-maximization process may be repeated: for example,

by multiplying the numerator and denominator of 
$$\frac{\left(1+z_2^{-1}\right)\left(1-\beta z_2^{-1}\right)}{1-\beta^2 z_2^{-2}}$$
 by  $1+\beta^2 z_2^{-2}$  leads

to a 4th-order feedback loop having transfer-function 
$$\frac{\left(1+z_2^{-1}\right)\left(1-\beta z_2^{-1}\right)\left(1+\beta^2 z_2^{-2}\right)}{1-\beta^4 z_2^{-4}}$$
 which

allows the multiplier  $\beta^4$  to consist of 3 levels of internal pipelining, while the fourth delay can be used in the 2-input adder that completes the feedback loop (Madanayake and Bruton; Madanayake,2008). The additional terms in the numerator that appear due to the application of look-ahead speed-maximization lead to additional circuit complexity – this is

the price for the extensive gain in real-time throughput, which is 300% for  $4^{th}$  order feedback loops. The additional arithmetic circuits that appear in the feed-forward sections can be easily pipelined by increasing the depth of pipelining p as required. In our example, we have increase the depth of pipelining up to p = 22 which allows 3-level pipelining to all additional adders/subtractors and multipliers in the PPCM.

#### 5. FPGA Circuit Prototypes

In this section, we provide a proof-of-concept circuit design using a field programmable gate array (FPGA) device. An example implementation of a hybrid-form systolic-array processor containing 21 fully pipelined PPCMs is provided. The target FPGA is a Xilinx Virtex-4 Sx35-10ff668 device installed on a Nallatech BenADDA daughter card, which in turn is installed on a Nallatech BenONE mainboard. This particular combination is widely known as the Xilinx XtremeDSP Kit-4.

The logic design flow starts with the Xilinx System Generator (XSG) design tool, which is a plug-in for Matlab/Simulink. We chose XSG as our FPGA design tool, although conventional design methods based on hardware description languages such as VHDL or Verilog may also be attempted. The modular regular nature of the systolic-array, together with the complicated pipelines and dataflow structure, makes the use of a graphical FPGA design method such as XSG, easier, compared to text-based design tools. We however note that XSG, in the end, leades to synthesizable VHDL (or Verilog), which is subsequently processed by conventional FPGA logic synthesis tools such as the Xilinx Synthesis Tool (XST) or Synplify Pro.

#### 5.1 Finite Precision Arithmetic and the FPGA Circuit

The arithmetic circuits on the FPGA are obviously based on finite precision hardware blocks for the multipliers, adders/subtractors, and memory devices. The designation of precisions (word sizes) is an important design step that requires extensive further research. Our example is based on experience with many similar circuits, and is largely a result of experiential learning accumulated over several years of research on similar systolic-arrays. At this time, a comprehensive design method that can lead to optimal finite precision levels (in terms of hardware resource consumption, quantization noise statistics, power consumption, and throughput) is not available, and is an interesting subject for research activities.

The following example assumes input signals obtaining from 4-bit A/D converters. Preliminary studies show that 3 bit A/D converters are quite sufficient for ultra-wideband wireless communications applications. Our choice of 4-bits in our A/D converters results from 1-bit overdesign, mainly as a margin of safety, in order to ensure good performance



Fig. 9. Xilinx FPGA circuit for a hybrid-form PPCM having 12 cycles of pipeline latency (corresponding to the signal-flow graph in Fig. 8).



Fig. 10. First 4 PPCMs of a hybrid-form systolic-array FPGA circuit showing inter-PPCM interconnections. The FPGA circuit is tested on-chip using stepped hardware co-simulation using a 2D unit impulse input at PPCM #1, with inputs of PPCM #2, #3, ..., #21, set to zero, leading to the 2D measured impulse response  $h(n_1\Delta x, n_2c\Delta T_{CLK})$ . A bit-true cycle-accurate FPGA circuit simulation of the 2D impulse response is available in the Matlab variables simout, simout1, ..., simout20, and the measured on-chip FPGA circuit response are available in Matlab variables h0, h1, ..., h20.

from a real-world application. The multiplier coefficients are assumed to be 12 bits, with the binary point at position 10. All other registers, including quantized outputs from multipliers and adder/subtractor blocks, are fixed at 14-bits, with binary point assumed at position 10. The design of a PPCM is shown in Fig. 9, followed by the systolic-array, in Fig. 10. The finite precision values at various locations on the PPCM signal flow graph can be widely optimized against various requirements, but is not attempted here, because we are only interested in giving our readers a basic design overview of the hybrid-form systolic-array processor.

The FPGA circuit was tested, using on-chip hardware-in-the-loop co-simulation, using Matlab/Simulink, XSG, and FUSE, using the XtremeDSP Kit-4 device, which was installed on the 5V 32-bit PCI slot of the host PC. Figure 11 shows the measured 2D magnitude frequency response of an example beam filter having spatial DOA  $\phi_o = 25^\circ$ , and bandwidth parameter R = 0.02, computed for 21 spatial samples, and 256 time samples, of the impulse response. The "uneven" nature of the measured response is attributed to quantization effects, and magnitude sensitivity, for which a comprehensive study remains as useful future research.



Fig. 11. Measured 2D magnitude response,  $-\pi \le \omega_1 \le$  and  $-0.5\pi \le \omega_2 \le 0.5\pi$ , obtained from a 21 PPCM FPGA implementation using on-chip hardware co-simulation using the Xilinx XtremeDSP Kit-4. Quantization effects cause the implementation to have extra ripples in both pass- and stop-bands, and lead to addition of AGWN. A detailed study of quantization effects remain for future work.

#### 5.2 High-speed Implementation Technologies

At present, systolic-array implementations of the proposed 2D IIR beam plane-wave filters have been limited to proof-of-concept realizations on FPGA circuits that operate at clock rates of up to 100 MHz. However, real-world electromagnetic applications requires frame rates in excess of 1 GHz and can be high as 21 GHz for full-band UWB radio systems. FPGA circuit implementations are often impractical for product applications and are mostly used as prototypes for eventual implementation using application-specific integrated circuits (ASICs) using high-speed VLSI platforms such as the state-of-the-art 40nm digital CMOS process. Porting the available FPGA designs to 40nm CMOS (or similar) VLSI technology remains an exciting field for future research.

#### 5.3 Field-Programmable Object Arrays (FPOAs) and Asynchronous FPGA Circuits

In general, although FPGAs from vendors such as Xilinx and Altera are limited to speeds less than  $\approx 300$  MHz for most *recursive filter designs*, future developments may facilitate the use of conventional FPGA technology to implement the proposed systolic arrays at 1 GHz or higher. Furthermore, it should be noted that fab-less semiconductor technologies, such as MathStar's (*http://www.mathstar.com/*) Arrix field programmable object array (FPOA) devices (Anonymous,2007) and high-speed "picoPIPE" FPGAs capable of 1.5 GHz operation (Anonymous,2008) from *Achronix Semiconductor (http://www.achronix.com/*), are emerging as an alternative to conventional ASIC solutions, and forms a basis for future research.

FPOAs, from MathStar, consists of an array of hard IP blocks such as arithmetic-and-logic units (ALUs) and multiply-accumulate (MACs) blocks, within a reconfigurable switching fabric, which are ideal for systolic-array realizations due to their modularity, regularity, and local interconnectivity. These "objects" are pre-fabricated onto the FPOA structure, and meet stringent timing standards, enabling deterministic design at 1 GHz which are independent of the logic being implemented. On the other hand, *Speedster* FPGAs from *Achronix Semiconductor*, employ asynchronous handshaking protocols between combinational logic blocks, which they describe as the merging of clock and data tokens into one signal, which in turn, according to *Achronix* documentation, enables faster operation compared to conventional synchronous FPGA architectures. The Speedster family boasts 1.5 GHz, and is potential candidate for real-time implementation of the systolic-array architectures described herein, following future research.

#### 6. Conclusions

The above new systolic implementation of a 2D IIR frequency-beam filter transfer function has promising engineering applications for the directional enhancement of a propagating broadband space-time plane-wave received on an array of sensors. A particularly important case is the use of an array of broadband antennas for the directional enhancement (that is, beamforming) of ultra-wideband electromagnetic plane-waves.

A massively-parallel systolic-array custom architecture, that is capable of processing one linear frame per clock cycle (OFPCC) with detailed design and optimization information, has been described. The architecture is based on the recently proposed hybrid-form 2D signal flow graph, which has been shown to be optimal in terms of critical path delay (hence

maximum throughput, because at OFPCC, the clock rate is equal to the frame rate in these architectures) and low computational complexity.

A design example for the proposed systolic-array processor architecture has been described using a Xilinx Virtex-4 Sx35 FPGA device, and the Matlab/Simulink based FPGA design tool called Xilinx System Generator. The example FPGA implementation of the 2D IIR frequency- beam filter was tested on-chip using the hardware-in-the-loop verification method called 'hardware co-simulation', and the on-chip 2D unit-impulse response was *measured*, which in turn led to *measured* 2D frequency response results that confirm correct implementation of the hardware.

Although the FPGA-based example is generally too slow for microwave imaging applications, it serves as a validation of the proposed OFPCC systolic-array processor and can be used in its current form for slower applications in audio, ultra-sound, and lower radio frequencies (of up to approximately 100 MHz). Finally, promising new VLSI implementation platforms are described here, which may eventually enable the proposed architecture to operate at the required multi-GHz clock frequency to enable real-time ultra-wideband digital smart antenna array applications.

#### 7. References

- Agathoklis, P. and L. T. Bruton (1983). "Practical-BIBO stability of N-dimensional discrete systems." Proc. Inst. Elec. Eng. **130**, **Pt. G**(6): 236-242.
- Anonymous (2007). Arrix FPOA Overview. Available online at http://www.mathstar.com.
- Anonymous (2008). Using High-Performance FPGAs for Advanced Radio Signal Processing. . Available online at http://www.achronix.com.
- Arnold Van Ardenne. The Technology Challenges for the Next Generation Radio Telescopes. Perspectives on Radio Astronomy Technologies for Large Antenna Arrays, Netherlands Foundation for Research in Astronomy.
- Bertschmann, R. K., N. R. Bartley and L. T. Bruton A 3-D integrator-differentiator double-loop (IDD) filter for raster-scan video processing. IEEE Intl. Symp. on Circuits and Systems, ISCAS'95.
- Bolle, M. (1994). A Closed-form Design Method for 3-D Recursive Cone Filters IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP.
- Bruton, L. T. (2003). "Three-dimensional cone filter banks." IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications **50**(2): 208-216.
- Bruton, L. T. and N. R. Bartley (1985). "Three-dimensional image processing using the concept of network resonance." IEEE Trans. on Circuits and Systems **32**(7): 664-672.
- Dansereau, D. (2003). 4D Light Field Processing and its Application to Computer Vision. Electrical and Computer Engineering. Calgary, University of Calgary. **MSc**.
- Dansereau, D. and L. T. Bruton (2007). "A 4-D Dual-Fan Filter Bank for Depth Filtering in Light Fields." Signal Processing, IEEE Transactions on **55**(2): 542-549.
- Devlin, M. (Spring 2003) "How to Make Smart Antenna Arrays." Xcell Journal Online
- Do-Hong, T. and P. Russer (2004). Signal Processing for Wideband Smart Antenna Array Applications. IEEE Microwave Magazine. **5**.

- Dudgeon, D. E. and R. M. Mersereau (1990). Multidimensional Digital Signal Processing. Englewood Cliffs, N.J. 07632, Prentice-Hall.
- Ellingson, S. W. (1999). A DSP Engine for a 64-Element Array. Proceedings on Perspectives for Radio Astronomy-- Technologies for Large Antenna Arrays, Netherlands.
- Frederick, J. D., Y. Wang and T. Itoh (2002). "A smart antenna receiver array using a aingle RF channel and digital beamforming." IEEE Trans. on Microwave Theory and Techniques **50**(12): 3052-3058.
- Ghavami, M., L. B. Michael and R. Kohno Ultra wideband signals and systems in communication engineering, John Wiley and Sons., Inc.
- Gunaratne, T. K. and L. T. Bruton "Beamforming of Broadband-bandpass Plane Waves using Polyphase 2D FIR Trapezoidal Filters." IEEE Trans. on. Circuits and Systems: Regular Papers 55(3): 838-850.
- Hum, S. V., H. L. P. A. Madanayake and L. T. Bruton "UWB Beamforming using 2D Beam Digital Filters." IEEE Trans. on Antennas and Propagation **57**(3): 804-807.
- Hum, S. V., M. Okoniewski and R. J. Davies "Modeling and Design of Electronically Tunable Reflectarrays." IEEE Trans. on Antennas and Propagation 55(8): 2200-2210.
- Huseyin Arslan, Zhi-Ning Chen and Maria-Gabriella Di Benedetto (2006). Ultra-wideband Wireless Communication, Wiley Interscience.
- J. Roderick, H. Krishnaswamy, K.Newton, et al. (2006). "Silicon-based ultra-wideband beamforming." IEEE Journal of Solid-State Circuits **41**(8): 1726-1739.
- Johnson, D. H. and D. E. Dudgeon (1993). Array Signal Processing-Concepts and Techniques. Englewood Cliffs, N.J. 07632, Prentice-Hall.
- Khademi, L. Reducing the computational complexity of FIR 2D fan and 3D cone filters [MSc Thesis]. Electrical and Computer Engineering, University of Calgary.
- Khademi, L. and L. T. Bruton On the limitations of narrow 2D fan filters speech processing. IEEE 2003 Pacific Rim Conference on Communications, Computers, and Signal Processing (PACRIM'03).
- Kuenzle, B. and L. T. Bruton "3-D IIR filtering using decimated DFT-polyphase filter bank structures." IEEE Trans. on Circuits and Systems I: Regular Papers **53**(2): 394-408.
- Kuenzle, B. and L. T. Bruton (2005). A novel low-complexity spatio-temporal ultra wide-angle polyphase cone filter bank applied to sub-pixel motion discrimination. IEEE Intl. Symp. on Circuits and Systems, ISCAS'05, Kobe, Japan.
- Kung, S. Y. (1988a). VLSI Array Processors, Prentice-Hall, Englewood Cliffs, N.J.
- Kung, S. Y. (1988b). VLSI Array Processors: Designs and Applications. 1988 IEEE International Symp. on Circuits and Systems, ISCAS'88.
- Liberti Jr., J. C. and T. S. Rappaport (1999). Smart Antennas for Wireless Communications-IS-95 and Third Generation CDMA Applications. Upper Saddle River, N.J. 07632, Prentice-Hall.
- Litva, J. and T. K.-Y. Lo (1996). Digital Beamforming in Wireless Communications, Artech House.
- Liu, Q. and L. T. Bruton (1989). "Design of 3-D planar and beam recursive digital filters using spectral transformation." IEEE Trans. Circuits and Systems **36**(3): 365-374.
- Madanayake, A. (2004). FPGA Architectures for 2D/3D Digital Filters. Electrical and Computer Engineering. Calgary, University of Calgary. **MSc:** 205.

Madanayake, A. (2008). Real-time FPGA Architectures for Space-time Frequency-planar MDSP. Electrical and Computer Engineering. Calgary, University of Calgary. **PhD:** 371

- Madanayake, A. and L. Bruton A Review of 2D/3D IIR Plane-wave Real-time Digital Filter Circuits. IEEE Canadian Conference on Electrical and Computer Engineering, CCECE'05, Saskatoon, Sasketchawan, Canada.
- Madanayake, A., L. Bruton and C. Comis FPGA architectures for real-time 2D/3D FIR/IIR plane wave filters. IEEE Intl. Symp. on Circuits and Systems, ISCAS'04.
- Madanayake, A. and L. T. Bruton "A Speed-optimized systolic-array processor architecture for spatio-temporal 2D IIR broadband beam filters." IEEE Trans. on. Circuits and Systems: Regular Papers 55(7): 1953 1966.
- Madanayake, H. L. P. A. and L. T. Bruton "A Systolic-array Architecture for First-Order 3D IIR Frequency-planar Filters." IEEE Trans. Circuits and Systems: Regular Papers 55(6): 1546-1559.
- Madanayake, H. L. P. A. and L. T. Bruton (2007). "Low-complexity distributed-parallel-processor for 2D IIR broadband beam plane-wave filters." Canadian Journal of Electrical and Computer Engineering (CJECE) **32**(3): 123-131.
- Madanayake, H. L. P. A. and L. T. Bruton (2008). A Real-time Systolic Array Processor Implementation of Two-dimensional IIR Filters for Radio-frequency Smart Antenna Applications. IEEE Intl. Symp. on Circuits and Systems (ISCAS'08), Seattle.
- Madanayake, H. L. P. A., S. V. Hum and L. T. Bruton "A Systolic Array 2D IIR Broadband RF Beamformer." IEEE Trans. on Circuits and Systems-II: Express Briefs **55**(12): 1244-1248.
- Madanayake, H. L. P. A., S. V. Hum and L. T. Bruton UWB Beamforming Using Digital 2D Frequency-planar Filters. IEEE 2008 Antenna and Propagation Society Symposium/URSI Symposium, San Diego.
- Parhi, K. K. "Finite word effects in pipelined recursive filters." IEEE Trans. on Signal Processing, IEEE Trans. on Acoustics, Speech, and Signal Processing. **39**(6): 1450-1454.
- Parhi, K. K. "Pipelining in algorithms with quantizer loops." IEEE Trans. on Circuits and Systems **38**(7): 745-754.
- Parhi, K. K. (1991). "Finite word effects in pipelined recursive filters." IEEE Trans. on Signal Processing [see also IEEE Trans. on Acoustics, Speech, and Signal Processing] **39**(6): 1450-1454.
- Parhi, K. K. (1999). VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley and Sons.
- Parhi, K. K. and D. Messerschmitt Look-ahead computation: Improving iteration bound in linear recursions. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, ICASSP'87.
- Parhi, K. K. and D. G. Messerschmitt Pipelined VLSI Recursive Filter Architectures using Scattered Look-Ahead and Decomposition. 1988 IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, ICASSP88, New York, N.Y., USA.
- Parhi, K. K. and D. G. Messerschmitt (1989). "Concurrent architectures for two-dimensional recursive digital filtering." IEEE Trans. on Circuits and Systems **36**(6): 813-829.
- Rader, C. M. (1996). VLSI Systolic Arrays for Adaptive Nulling. IEEE Signal Processing Magazine. **13:** 29-49.

- Ramamoorthy, P. A. and L. T. Bruton "Design of stable two-dimensional analog and digital filters with applications in image processing." Int. J. Circuit Theory Appl. 7: 229-245.
- Rodenbeck, C. T., K. Sang-Gyu, T. Wen-Hua, et al. (2005). "Ultra-wideband low-cost phased-array radars." Microwave Theory and Techniques, IEEE Transactions on **53**(12): 3697-3703.
- Schroeder, H. and H. Blume (2000). One- and Multidimensional Signal Processing-Algorithms and Applications in Image Processing, John Wiley and Sons, Ltd.
- Shanbhag, N. R. (1991). "An improved systolic architecture for 2-D digital filters." Signal Processing, IEEE Transactions on [see also Acoustics, Speech, and Signal Processing, IEEE Transactions on] **39**(5): 1195-1202.
- Sid-Ahmed, M. A. "A systolic realization for 2-D digital filters." IEEE Trans. on Acoustics, Speech, and Signal Processing **37**(4): 560-565.
- Silva, R., R. Worrel and A. Brown Reprogrammable, Digital Beam Steering GPS Receiver Technology for Enhanced Space Vehicle Operations. Core Technologies for Space Systems Conference, Colorado Springs, CO.
- Staderini, E. M. (2002). "UWB radars in medicine." Aerospace and Electronic Systems Magazine, IEEE 17(1): 13-18.
- Sundarajan, V. and K. K. Parhi Synthesis of folded multidimensional DSP systems. IEEE Intl. Symp. on Circuits and Systems (ISCAS'98).
- Van Ardenne, A. (2000). Concepts of the Square Kilometre Array; toward the new generation radio telescopes. IEEE 2000 Intl. Symp. on Antennas and Propagation.
- Weem, J. P., B. M. Noratos and Z. Popovic (1999). Broadband Array Considerations for SKA. Proceedings on Perspectives for Radio Astronomy-- Technologies for Large Antenna Arrays.
- Zajc, M., R. Sernec and J. Tasic (2000). Array processors for DSP: implementation considerations. 10th Mediterranean Electrotechnical Conference, 2000, MELECON 2000



## IntechOpen

# IntechOpen



## **VLSI**Edited by Zhongfeng Wang

ISBN 978-953-307-049-0 Hard cover, 456 pages Publisher InTech Published online 01, February, 2010 Published in print edition February, 2010

The process of Integrated Circuits (IC) started its era of VLSI (Very Large Scale Integration) in 1970's when thousands of transistors were integrated into one single chip. Nowadays we are able to integrate more than a billion transistors on a single chip. However, the term "VLSI" is still being used, though there was some effort to coin a new term ULSI (Ultra-Large Scale Integration) for fine distinctions many years ago. VLSI technology has brought tremendous benefits to our everyday life since its occurrence. VLSI circuits are used everywhere, real applications include microprocessors in a personal computer or workstation, chips in a graphic card, digital camera or camcorder, chips in a cell phone or a portable computing device, and embedded processors in an automobile, et al. VLSI covers many phases of design and fabrication of integrated circuits. For a commercial chip design, it involves system definition, VLSI architecture design and optimization, RTL (register transfer language) coding, (pre- and post-synthesis) simulation and verification, synthesis, place and route, timing analyses and timing closure, and multi-step semiconductor device fabrication including wafer processing, die preparation, IC packaging and testing, et al. As the process technology scales down, hundreds or even thousands of millions of transistors are integrated into one single chip. Hence, more and more complicated systems can be integrated into a single chip, the so-called System-on-chip (SoC), which brings to VLSI engineers ever increasingly challenges to master techniques in various phases of VLSI design. For modern SoC design, practical applications are usually speed hungry. For instance, Ethernet standard has evolved from 10Mbps to 10Gbps. Now the specification for 100Mbps Ethernet is on the way. On the other hand, with the popularity of wireless and portable computing devices, low power consumption has become extremely critical. To meet these contradicting requirements, VLSI designers have to perform optimizations at all levels of design. This book is intended to cover a wide range of VLSI design topics. The book can be roughly partitioned into four parts. Part I is mainly focused on algorithmic level and architectural level VLSI design and optimization for image and video signal processing systems. Part II addresses VLSI design optimizations for cryptography and error correction coding. Part III discusses general SoC design techniques as well as other applicationspecific VLSI design optimizations. The last part will cover generic nano-scale circuit-level design techniques.

#### How to reference

In order to correctly reference this scholarly work, feel free to copy and paste the following:

Arjuna Madanayake and Leonard T. Bruton (2010). Radio-Frequency (RF) Beamforming Using Systolic FPGA-based Two Dimensional (2D) IIR Space-Time Filters, VLSI, Zhongfeng Wang (Ed.), ISBN: 978-953-307-049-0, InTech, Available from: http://www.intechopen.com/books/vlsi/radio-frequency-rf-beamforming-using-systolic-fpga-based-two-dimensional-2d-iir-space-time-filters



#### InTech Europe

University Campus STeP Ri Slavka Krautzeka 83/A 51000 Rijeka, Croatia Phone: +385 (51) 770 447

Fax: +385 (51) 686 166 www.intechopen.com

#### InTech China

Unit 405, Office Block, Hotel Equatorial Shanghai No.65, Yan An Road (West), Shanghai, 200040, China 中国上海市延安西路65号上海国际贵都大饭店办公楼405单元

Phone: +86-21-62489820 Fax: +86-21-62489821



© 2010 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the <u>Creative Commons Attribution-NonCommercial-ShareAlike-3.0</u> <u>License</u>, which permits use, distribution and reproduction for non-commercial purposes, provided the original is properly cited and derivative works building on this content are distributed under the same license.



