Most Efficient Digital Filter Structures: The Potential of Halfband Filters in Digital Signal Processing

In this book the reader will find a collection of chapters authored/co-authored by a large number of experts around the world, covering the broad field of digital signal processing. This book intends to provide highlights of the current research in the digital signal processing area, showing the recent advances in this field. This work is mainly destined to researchers in the digital signal processing and related areas but it is also accessible to anyone with a scientific background desiring to have an up-to-date overview of this domain. Each chapter is self-contained and can be read independently of the others. These nineteenth chapters present methodological advances and recent applications of digital signal processing in various domains as communications, filtering, medicine, astronomy, and image processing.

In Section 4 of this chapter we consider the application of the two-channel DF as a building block of a multiple channel tree-structured FDMUX filter bank according to Fig.  2, typically applied for on-board processing in satellite communications [Danesfahani et al. (1994); Göckler & Felbecker (2001); Göckler & Groth (2004); Göckler & Eyssele (1992)]. In case of a great number of channels and/or challenging bandwidth requirements, implementation of the front-end DF is crucial, which must be operated at (extremely) high sampling rates. To cope with this issue, in Section 4 we present an approach to parallelise at least the front end of the FDMUX filter bank according to Fig. 2.

Single halfband filters 1
In this Section 2 of this chapter we recall the properties of the well-known HBF with real coefficients (real HBF with centre frequencies f c ∈{f 0 , f 4 } = {0, f n /2} according to (1)), and investigate those of the complex HBF with their passbands (stopbands) centred at f c = c · f n 8 , c = 1, 2, 3, 5, 6, 7 that require roughly the same amount of computation as their real HBF prototype ( f c = f 0 = 0). In particular, we derive the most efficient elementary SFG for sample rate alteration. These will be given both for LP FIR [Göckler (1996b)] and MP IIR HBF for real-and complex-valued input and/or output signals, respectively. The expenditure of all eight versions of HBF according to (1) is determined and thoroughly compared with each other. The organisation of Section 2 is as follows: First, we recall the properties of both classes of the afore-mentioned real HBF, the linear-phase (LP) FIR and the minimum-phase (MP) IIR approaches. The efficient multirate implementations presented are based on the polyphase decomposition of the filter transfer functions [Bellanger (1989); Göckler & Groth (2004); Mitra (1998); Vaidyanathan (1993)]. Next, we present the corresponding results on complex HBF (CHBF), the classical HT, by shifting a real HBF to a centre frequency according to (2)

Real halfband filters (RHBF)
In this subsection we recall the essentials of LP FIR and MP IIR lowpass HBF with real-valued impulse responses h(k)=h k ←→ H(z),w h e r eH(z) represents the associated z-transform transfer function. From such a lowpass (prototype) HBF a corresponding real highpass HBF is readily derived by using the modulation property of the z-transform [Oppenheim & Schafer (1989)] by setting in accordance with (1) resulting in a frequency shift by f 4 = f n /2 (Ω 4 = π).

Linear-Phase (LP) FIR filters
Throughout this Section 2 we describe a real LP FIR (lowpass) filter by its non-causal impulse response with its centre of symmetry located at the time or sample index k = 0 according to where the associated frequency response H(e jΩ ) ∈ R is zero-phase [Mitra & Kaiser (1993); Oppenheim & Schafer (1989)].

Specification and properties
A real zero-phase (LP) lowpass HBF, also called Nyquist(2)filter [Mitra & Kaiser (1993)], is specified in the frequency domain as shown in Fig. 5, for instance, for an equiripple or constrained least squares design, respectively, allowing for a don't care transition band between passband and stopband [Mintzer (1982); Mitra & Kaiser (1993); Schüssler & Steffen (1998)]. Passband and stopband constraints δ p = δ s = δ are identical, and for the cut-off frequencies we have the relationship: As a result, the zero-phase desired function D(e jΩ ) ∈ R as well as the frequency response H(e jΩ ) ∈ R are centrosymmetric about D(e jπ/2 )=H(e jπ/2 )= 1 2 . From this frequency domain symmetry property immediately follows H(e jΩ )+H(e j(Ω−π) )=1, indicating that this type of halfband filter is strictly complementary [Schüssler & Steffen (1998)]. According to (5), a real zero-phase FIR HBF has a symmetric impulse response of odd length N = n + 1 (denoted as type I filter in [Mitra & Kaiser (1993)]), where n represents the even filter order. In case of a minimal (canonic) monorate filter implementation, n is identical to the minimum number n mc of delay elements required for realisation, where n mc is known as the McMillan degree [Vaidyanathan (1993)]. Due to the odd symmetry of the HBF zero-phase frequency response about the transition region (don't care band according to Fig. 5), roughly every other coefficient of the impulse response is zero [Mintzer (1982); Schüssler & Steffen (1998)], resulting in the additional filter length constraint: Hence, the non-causal impulse response of a real zero-phase FIR HBF is characterized by [Bellanger et al. (1974); Göckler & Groth (2004);Mintzer (1982); Schüssler & Steffen (1998)]: giving rise to efficient implementations. Note that the name Nyquist(2)filter is justified by the zero coefficients of the impulse response (9). Moreover, if an HBF is used as an anti-imaging filter of an interpolator for upsampling by two, the coefficients (9) are scaled by the upsampling factor of two replacing the central coefficient with h 0 = 1 [Fliege (1993); Göckler & Groth (2004); Mitra (1998)]. As a result, independently of the application this coefficient does never contribute to the computational burden of the filter.

Design outline
Assuming an ideal lowpass desired function consistent with the specification of Fig. 5 with a cut-off frequency of Ω t =( Ω p + Ω s )/2 = π/2 and zero transition bandwidth, and minimizing the integral squared error, yields the coefficients [Göckler & Groth (2004); Parks & Burrus (1987)] in compliance with (9): This least squares design is optimal for multirate HBF in conjunction with spectrally white input signals since, e.g in case of decimation, the overall residual power aliased by downsampling onto the usable signal spectrum is minimum [Göckler & Groth (2004)]. To master the Gibbs' phenomenon connected with (10), a centrosymmetric smoothed desired function can be introduced in the transition region [Parks & Burrus (1987)]. Requiring, for instance, a transition band of width ΔΩ = Ω s − Ω p > 0 and using spline transition functions for D(e jΩ ), the above coefficients (10) are modified as follows [Göckler & Groth (2004); Parks & Burrus (1987)]: Least squares design can also be subjected to constraints that confine the maximum deviation from the desired function: The Constrained Least Squares (CLS) design [Evangelista (2001); Göckler & Groth (2004)]. This approach has also efficiently been applied to the design of high-order LP FIR filters with quantized coefficients [Evangelista (2002)]. Subsequently, all comparisons are based on equiripple designs obtained by minimization of the maximum deviation max H(e jΩ ) − D(e jΩ ) ∀Ω on the region of support according to [McClellan et al. (1973)]. To this end, we briefly recall the clever use of this minimax design procedure in order to obtain the exact values of the predefined (centre and zero) coefficients of (9), as proposed in [Vaidyanathan & Nguyen (1987)]: To design a two-band HBF of even order n = N − 1 = 4m − 2, as specified in Fig. 5, start with designing i) a single-band zero-phase FIR filter g(k) ←→ G(z) of odd order n/2 = 2m − 1f o rap a s s b a n dc u t -o f ff r e q u e n c y of 2Ω p which, as a type II filter [Mitra & Kaiser (1993)], has a centrosymmetric zero-phase frequency response about G(e jπ )=0, ii) upsample the impulse response g(k) by two by inserting between any pair of coefficients an additional zero coefficient (without actually changing the sample rate), which yields an interim filter impulse response h ′ (k) ←→ H ′ (z 2 ) of the desired odd length N with a centrosymmetric frequency response about H ′ (e jπ/2 )=0 [Göckler & Groth (2004); Vaidyanathan (1993)], iii) lift the passband (stopband) of H ′ (e jΩ ) to 2 (0) by replacing the zero centre coefficient with 2h(0)=1, and iv) scale the coefficients of the final impulse response h(k) ←→ H(z) with 1 2 .

Efficient implementations
Monorate FIR filters are commonly realized by using one of the direct forms [Mitra (1998)]. In our case of an LP HBF, minimum expenditure is obtained by exploiting coefficient symmetry, as it is well known [Mitra & Kaiser (1993); Oppenheim & Schafer (1989)]. The count of operations or hardware required, respectively, is included below in Table 1 (column MoR). Note that the "multiplication" by the central coefficient h 0 does not contribute to the overall expenditure. The minimal implementation of an LP HBF decimator (interpolator) for twofold down(up)sampling is based on the decomposition of the HBF transfer function into two (type 1) polyphase components [Bellanger (1989); Göckler & Groth (2004); Vaidyanathan (1993)]: In the case of decimation, downsampling of the output signal (cf. upper branch of Fig. 1) is shifted from filter output to system input by exploiting the noble identities [Göckler & Groth (2004); Vaidyanathan (1993)], as shown in Fig. 6(a). As a result, all operations (including delay and its control) can be performed at the reduced (decimated) output sample rate f d = f n /2: Fig. 6(b), the input demultiplexer of Fig. 6(a) is replaced with a commutator where, for consistency, the shimming delay z −1/2 d := z −1 must be introduced [Göckler & Groth (2004)]. As an example, in Fig. 7(a) an optimum, causal real LP FIR HBF decimator of n = 10th order and for twofold downsampling is recalled [Bellanger et al. (1974)]. Here, the odd-numbered coefficients of (9) are assigned to the zeroth polyphase component E 0 (z d ) of Fig. 6(b), whereas the only non-zero even-numbered coefficient h 0 belongs to E 1 (z d ). For implementation we assume a digital signal processor as a hardware platform. Hence, the overall computational load of its arithmetic unit is given by the total number of operations N Op = N M + N A , comprising multiplication (M) and addition (A), times the operational clock frequency f Op [Göckler & Groth (2004)]. All contributions to the expenditure are listed in Table 1 as a function of the filter order n, where the McMillan degree includes the shimming delays. Obviously, both coefficient symmetry (N M < n/2) and the minimum memory property (n mc < n [Bellanger (1989); Fliege (1993); Göckler & Groth (2004) MoR: f Op = f n Dec: f Op = f n /2 Int: f Op = f n /2 n mc n n/2 + 1 N M (n + 2)/4 N A n/2 + 1 n/2 N Op 3n/4 + 3/2 3n/4 + 1/2  [Göckler & Groth (2004)], for Nyquist(M)filters with M > 2o n lyeither coefficient symmetry or the minimum memory property can be exploited.) The application of the multirate transposition rules on the optimum decimator according to Fig. 7(a), as detailed in Section 3 and [Göckler & Groth (2004)], yields the optimum LP FIR HBF interpolator, as depicted in Fig. 6(c) and Fig. 7(b), respectively. Table 1 shows that the interpolator obtained by transposition requires less memory than that published in [Bellanger (1989); Bellanger et al. (1974)].

Minimum-Phase (MP) IIR filters
In contrast to FIR HBF, we describe an MP IIR HBF always by its transfer function H(z) in the z-domain.

Specification and properties
The magnitude response of an MP IIR lowpass HBF is specified in the frequency domain by D(e jΩ ) , as shown in Fig. 8, again for a minimax or equiripple design. The constraints of the designed magnitude response H(e jΩ ) are characterized by the passband and stopband deviations, δ p and δ s , according to [Lutovac et al. (2001); Schüssler & Steffen (1998)] related by The cut-off frequencies of the IIR HBF satisfy the symmetry condition (6), and the squared magnitude response H(e jΩ ) 2 is centrosymmetric about D(e jπ/2 ) 2 = H(e jπ/2 ) 2 = 1 2 . We consider real MP IIR lowpass HBF of odd order n.
H(z) has a single pole at the origin of the z-plane, and (n − 1)/2 complex-conjugated pole pairs on the imaginary axis within the unit circle, and all zeros on the unit circle [Schüssler & Steffen (2001)]. Hence, the odd order MP IIR HBF is suitably realized by a parallel connection of two allpass polyphase sections as expressed by where the allpass polyphase components can be derived by alternating assignment of adjacent complex-conjugated pole pairs of the IIR HBF to the polyphase components. The polyphase components A l (z 2 ), l = 0, 1 consist of cascade connections of second order allpass sections: where the coefficients a i , i = 0, 1, ..., ( n−1 2 − 1),w i t ha i < a i+1 , denote the squared moduli of the HBF complex-conjugated pole pairs in ascending order; the complete set of n poles is given by 0, ±j √ a 0 , ±j √ a 1 , ..., ±j a n−1 2 −1 [Mitra (1998)]. (

Design outline
In order to compare MP IIR and LP FIR HBF, we subsequently consider elliptic filter designs.
Since an elliptic (minimax) HBF transfer function satisfies the conditions (6) and (13), the design result is uniquely determined by specifying the passband Ω p (stopband Ω s )c u t -o f f frequency and one of the three remaining parameters: the odd filter order n, allowed minimal stopband attenuation A s = −20log(δ s ) or allowed maximum passband attenuation A p = −20log(1 − δ p ). There are two most common approaches to elliptic HBF design. The first group of methods is performed in the analogue frequency domain and is based on classical analogue filter design techniques: The desired magnitude response D(e jΩ ) of the elliptic HBF transfer function H(z) to be designed is mapped onto an analogue frequency domain by applying the bilinear transformation [Mitra (1998); Oppenheim & Schafer (1989)]. The magnitude response of the analogue elliptic filter is approximated by appropriate iterative procedures to satisfy the design requirements [Ansari (1985); Schüssler & Steffen (1998;2001); Valenzuela & Constantinides (1983)]. Finally, the analogue filter transfer function is remapped to the z-domain by the bilinear transformation. The other group of algorithms starts from an elliptic HBF transfer function, as given by (17). The filter coefficients a i , i = 0, 1, ..., ( n−1 2 − 1) are obtained by iterative nonlinear optimization techniques minimizing the peak stopband deviation. For a given transition bandwidth, the maximum deviation is minimized e.g. by the Remez exchange algorithm or by Gauss-Newton methods [Valenzuela & Constantinides (1983); Zhang & Yoshikawa (1999)]. For the particular class of elliptic HBF with minimal Q-factor, closed-form equations for calculating the exact values of stopband and passband attenuation are known allowing for straightforward designs, if the cut-off frequencies and the filter order are given [Lutovac et al. (2001)].

Efficient implementation
In case of a monorate filter implementation, the McMillan degree n mc is equal to the filter order n. Having the same hardware prerequisites as in the previous subsection on FIR HBF, the computational load of hardware operations per output sample is given in Table 2 (column MoR). Note that multiplication by a factor of 0.5 does not contribute to the overall expenditure. In the general decimating structure, as shown in Fig. 9(a), decimation is performed by an input commutator in conjunction with a shimming delay according to Fig. 6(b). By the underlying exploitation of the noble identities [Göckler & Groth (2004); Vaidyanathan (1993)], the cascaded second order allpass sections of the transfer function (17) Fig. 9(a) operate at the reduced output sampling rate f d = f n /2, and the McMillan degree n mc is almost halved. The optimum interpolating structure is readily derived from the decimator by applying the multirate transposition rules (cf. Section 3 and [Göckler & Groth (2004)]). Computational complexity is presented in Table 2, also indicating the respective operational rates f Op for the N Op arithmetical operations. Elliptic filters also allow for multiplierless implementations with small quantization error, or implementations with a reduced number of shift-and-add operations in multipliers [Lutovac & Milic (1997;2000); Milic (2009)].

Comparison of real FIR and IIR HBF
The comparison of the Tables 1 and 2 shows that N FIR Op < N IIR Op for the same filter order n, where all operations are performed at the operational rate f Op , as given in these Tables. Since, however, the filter order n IIR < n FIR or even n IIR ≪ n FIR for any type of approximation, the computational load of an MP IIR HBF is generally smaller than that of an LP FIR HBF, as it is well known [Lutovac et al. (2001); Schüssler & Steffen (1998)]. The relative computational advantage of equiripple minimax designs of monorate IIR halfband filters and polyphase decimators [Parks & Burrus (1987)], respectively, is depicted in Fig. 10 where, in extension to [Lutovac et al. (2001)], the expenditure N Op is indicated as a parameter along with the filter order n. Note that the IIR and FIR curves of the lowest order filters differ by just one operation despite the LP property of the FIR HBF. A specification of a design example is deduced from Fig. 10: n IIR = 5a n dn FIR = 14, respectively, with a passband cut-off frequency of f p = 0.1769 f n at the intersection point of the associated expenditure curves: Fig. 11. As a result, the stopband attenuations of both filters are the same (cf. Fig. 10). In addition, for both designs the typical pole-zero plots are shown [Schüssler & Steffen (1998;2001)]. From the point of view of expenditure, the MP IIR HBF decimator (N Op = 9, n mc = 3) outperforms its LP FIR counterpart (N Op = 12, n mc = 8).

Linear-Phase (LP) FIR filters
In the FIR CHBF case the frequency shift operation (3) is immediately applied to the impulse response h(k) in the time domain according to (3). As a result of the modulation of the impulse response (9) of any real LP HBF on a carrier of frequency f 2 according to (18), the complex-valued CHBF impulse response is obtained. (Underlining indicates complex quantities in time domain.) By directly equating (19) and relating the result to (9), we get: where, in contrast to (5), the imaginary part of the impulse response is skew-symmetric about zero, as it is expected from a Hilbert-Transformer. Note that the centre coefficient h 0 is still real, whilst all other coefficients are purely imaginary rather than generally complex-valued.

Specification and properties
All properties of the real HBF are basically retained except of those which are subjected to the frequency shift operation of (18). This applies to the filter specification depicted in Fig. 5 and, hence, (6) modifies to  where Ω p+ represents the upper passband cut-off frequency and Ω s− the associated stopband cut-off frequency. Obviously, strict complementarity (7) is retained as follows where (3) is applied in the frequency domain.

Efficient implementations
The optimum implementation of an n = 10th order LP FIR CHBF for twofold downsampling is again based on the polyphase decomposition of (20) according to (12). Its SFG is depicted in Fig. 12(a) that exploits the odd symmetry of the HT part of the system. Note that all imaginary units are included deliberately. Hence, the optimal FIR CHBF interpolator according to Fig. 12(b), which is derived from the original decimator of Fig. 12(a) by applying the multirate transposition rules [Göckler & Groth (2004)], performs the dual operation with respect to the underlying decimator. Since, however, an LP FIR CHBF is strictly rather than power complementary (cf. (23)), the inverse functionality of the decimator is only approximated [Göckler & Groth (2004)].
In addition, Fig. 13 shows the optimum SFG of an LP FIR CHBF for decimation of a complex signal by a factor of two. In essence, it represents a doubling of the SFG of Fig. 12(a). Again, the dual interpolator is readily derived by transposition of multirate systems, as outlined in Section 3. The expenditure of the half-(R ⇋ C) and the full-complex (C → C) CHBF decimators and their transposes is listed in Table 3. A comparison of Tables 1 and 3 shows that the overall numbers of operations N CFIR Op of the half-complex CHBF sample rate converters (cf. Fig. 12) are almost the same as those of the real FIR HBF systems depicted in Fig. 7. Only the number of delays is, for obvious reasons, higher in the case of CHBF.

Minimum-Phase (MP) IIR filters
In the IIR CHBF case the frequency shift operation (3) is again applied in the z-domain. Using (18), this is achieved by substituting the complex z-domain variable in the respective transfer functions H(z) and all corresponding SFG according to:

Efficient implementations
Introducing (24) into (16) performs a frequency-shift of the transfer function H(z) by f 2 = f n /4 (Ω 2 = π/2): The optimum general block structure of a decimating MP IIR HT, being up-scaled by 2, is shown in Fig. 14(a) along with the SFG of the 1st (system theoretic 2nd) order allpass sections (b), where the noble identities [Göckler & Groth (2004); Vaidyanathan (1993)] are exploited. By  doubling this structure, as depicted in Fig. 15, the IIR CHBF for decimating a complex signal by two is obtained. Multirate transposition [Göckler & Groth (2004)] can again be applied to derive the corresponding dual structures for interpolation. The expenditure of the half-(R ⇋ C) and the full-complex (C → C) CHBF decimators and their transposes is listed in Table 4. A comparison of Tables 2 and 4 shows that, basically, the half-complex IIR CHBF sample rate converters (cf. Fig. 14) require almost the same expenditure as the real IIR HBF systems depicted in Fig. 9.

Comparison of FIR and IIR CHBF
As it is obvious from the similarity of the corresponding expenditure tables of the previous subsections, the expenditure chart Fig. 10 can likewise be used for the comparison of CHBF

Complex Offset Halfband Filters (COHBF)
A complex offset HBF, a Hilbert-Transformer with a frequency offset of Δ f = ± f n /8 relative to an RHBF, is readily derived from a real HBF according to Subsection 2.1 by applying the zT modulation theorem (3) with c ∈ {1, 3, 5, 7},asintroducedin (2): As a result, the real prototype HBF is shifted to a passband centre frequency of f c ∈ ± f n 8 , ± 3 f n 8 . In the sequel, we predominantly consider the case f c = f 1 (Ω 1 = π/4).

Linear-Phase (LP) FIR filters
Again, the frequency shift operation (3) is applied in the time domain. However, in order to get the smallest number of full-complex COHBF coefficients, we introduce an additional complex scaling factor of unity magnitude. As a result, the modulation of a carrier of frequency f c according to (28) by the impulse response (9) of any real LP FIR HBF yields the complex-valued COHBF impulse response: where − n 2 ≤ k ≤ n 2 and c = 1, 3, 5, 7. By directly equating (39) for c = 1, and relating the result to (9), we get: where, in contrast to (21), the impulse response exhibits the symmetry property: Note that the centre coefficient h 0 is the only truly complex-valued coefficient where, fortunately, its real and imaginary parts are identical. All other coefficients are again either purely imaginary or real-valued. Hence, the symmetry of the impulse response can still be exploited, and the implementation of an LP FIR COHBF requires just one multiplication more than that of a real or complex HBF [Göckler (1996b)].

Specification and properties
All properties of the real HBF are basically retained except of those which are subjected to the frequency shift operation according to (28). This applies to the filter specification depicted in Fig. 5 and, hence, (6) modifies to  where Ω p+ represents the upper passband cut-off frequency and Ω s− the associated stopband cut-off frequency. Obviously, strict complementarity (7) reads as follows H(e j(Ω−c π 4 ) )+H(e j(Ω−π(1+c/4)) )=1.

Efficient implementations
The optimum implementation of an n = 10th order LP FIR COHBF for twofold downsampling is again based on the polyphase decomposition of (40). Its SFG is depicted in Fig. 16(a) that exploits the coefficient symmetry as given by (41). The optimum FIR COHBF interpolator according to Fig. 16(b) is readily derived from the original decimator of Fig. 16(a) by applying the multirate transposition rules, as discussed in Section 3. As a result, the overall expenditure is again retained (c.f. invariant property of transposition [Göckler & Groth (2004)]). In addition, Fig. 17 shows the optimum SFG of an LP FIR COHBF for decimation of a complex signal by a factor of two. It represents essentially a doubling of the SFG of Fig. 16(a). The dual interpolator can be derived by transposition [Göckler & Groth (2004)].
The expenditure of the half-(R ⇋ C) and the full-complex (C → C) LP COHBF decimators and their transposes is listed in ) calling for only one extra multiplication. The number n mc of delays is, however, of the order of n, since a (nearly) full delay line is needed both for the real and imaginary parts of the respective signals. Note that the shimming delays are always included in the delay count. (The number of delays required for a monorate COHBF corresponding to Fig. 17 is 2n.)

Minimum-Phase (MP) IIR filters
In the IIR COHBF case the frequency shift operation (3) is again applied in the z-domain. This is achieved by substituting the complex z-domain variable in the respective transfer functions H(z) and all corresponding SFG according to: Dec: R → C Int: C → R Dec: C → C Int: C → C n mc n 2n N M n 2n N A 3(n − 1) 6(n − 1)+2 6(n − 1) N Op 4n − 3 8n − 4 8n − 6 Table 6. Expenditure of minimum-phase IIR COHBF; n:or der ,n mc : McMillan degree, N M (N A ): number of multipliers (adders), operational clock frequency: f Op = f n /2

Efficient implementations
Introducing (34) in (16), the transfer function is frequency-shifted by f 1 = f n /8 (Ω = π/4): The optimal structure of an n = 5th order MP IIR COHBF decimator for real input signals is shown in Fig. 18(a) along with the elementary SFG of the allpass sections Fig. 18(b). Doubling of the structure according to Fig. 19 allows for full-complex signal processing. Multirate transposition [Göckler & Groth (2004)] is again applied to derive the corresponding dual structure for interpolation. The expenditure of the half-(R ⇋ C) and the full-complex (C → C) COHBF decimators and their transposes is listed in Table 6. A comparison of Tables 2 and 6 shows that the half-complex IIR COHBF sample rate converter (cf. Fig. 18(a)) requires almost twice, whereas the full-complex IIR COHBF (cf. Fig. 19) requires even four times the expenditure of that of the real IIR HBF system depicted in Fig. 9.

Comparison of FIR and IIR COHBF
LP FIR COHBF structures allow for implementations that utilize the coefficient symmetry property. Hence, the required expenditure is just slightly higher than that needed for CHBF. On the other hand, the expenditure of MP IIR COHBF is almost twice as high as that of the corresponding CHBF, since it is not possible to exploit memory and coefficient sharing. Almost the whole structure has to be doubled for a full-complex decimator (cf. Fig. 19).

Conclusion: Family of single real and complex halfband filters
We have recalled basic properties and design outlines of linear-phase FIR and minimum-phase IIR halfband filters, predominantly for the purpose of sample rate alteration by a factor of two, which have a passband centre frequency out of the specific set defined by (1). Our  It has been confirmed that, for the even-numbered centre frequencies c ∈ {0, 2, 4, 6},M P IIR HBF outperform their LP FIR counterparts the more the tighter the filter specifications. However, for phase sensitive applications (e.g. software radio employing quadrature amplitude modulation), the LP property of FIR HBF may justify the higher amount of computation to some extent.
In the case of the odd-numbered HBF centre frequencies of (2), c ∈ {1, 3, 5, 7},t h e r ee x i s t specification domains, where the computational loads of complex FIR HBF with frequency offset range below those of their IIR counterparts. This is confirmed by the two bottom rows of Table 7, where this table lists contribution. This sectoral computational advantage of LP FIR COHBF is, despite n IIR < n FIR , due to the fact that these FIR filters still allow for memory sharing in conjunction with the exploitation of coefficient symmetry [Göckler (1996b)]. However, the amount of storage n mc required for IIR HBF is always below that of their FIR counterparts.

Halfband filter pairs 2
In this Section 3, we address a particular class of efficient directional filters (DF). These DF are composed of two real or complex HBF, respectively, of different centre frequencies out of the set given by (1). To this end, we conceptually introduce and investigate two-channel frequency demultiplexer filter banks (FDMUX) that extract from an incoming complex-valued frequency division multiplex (FDM) signal, being composed of up to four uniformly allocated independent user signals of identical bandwidth (cf. Fig. 20), two of its constituents by concurrently reducing the sample rate by two Göckler & Groth (2004). Moreover, the DF shall allow to select any pair of user signals out of the four constituents of the incoming FDM signal, where the individual centre frequencies are to be selectable with minimum switching effort. At first glance, there are two optional approaches: The selectable combination of two filter functions out of a pool of i) two RBF according to Subsection 2.1 and two CHBF (HT), as described in Subsection 2.2, where the centre frequencies of this filter quadruple are given by (1) with c ∈{ 0, 2, 4, 6},o rii) four COHBF, as described in Subsection 2.3, where the centre frequencies of this filter quadruple are given by (1) with c ∈{1, 3, 5, 7}. Since centre frequency switching is more crucial in case one (switching between real and/or complex filters), we subsequently restrict our investigations to case two, where the FDM input spectrum must be allocated as shown in Fig. 20. These DF with easily selectable centre frequencies are frequently used in receiver front-ends to meet routing requirements [Göckler (1996c)], in tree-structured FDMUX filter banks [Göckler & Felbecker (2001); Göckler & Groth (2004); Göckler & Eyssele (1992)], and, in modified form, for frequency re-allocation to avoid hard-wired frequency-shifting ; Eghbali et al. (2009)]. Efficient implementation is crucial, if these DF are operated at high sampling rates at system input or output port. To cope with this high rate challenge, we introduce a systematic approach to system parallelisation according to [Groth (2003)] in Section 4 . In continuation of the investigations reported in Section 2, we combine two linear-phase (LP) FIR complex offset halfband filters (COHBF) with different centre frequencies, being characterized by (1)   and two output signals Göckler (1996a). For convenience, we map the original odd indices c ∈{1, 3, 5, 7} of the COHBF centre frequencies to natural numbers as defined by for subsequent use throughout Section 3. Section 3 is organized as follows: In Subsection 3.1, we detail the statement of the problem, and recall the major properties of COHBF needed for our DF investigations. In the main Subsection 3.2, we present and compare two different approaches to implement the outlined LP DF for signal separation with selectable centre frequencies: i) A four-channel uniform complex-modulated FDMUX filter bank undercritically decimating by two, where the respective undesired two output signals are discarded, and ii) a synergetic connection of two COHBF that share common multipliers and exploit coefficient symmetry for minimum computation. In Subsection 3.3, we apply the transposition rules of [Göckler & Groth (2004)] to derive the dual DF for signal combination (FDM multiplexing). Finally, we draw some further conclusions in Subsection 3.4.  (38), o ∈{0, 1, 2, 3}, with the RHBF impulse response h(k) defined by (9). According to (39), highest efficiency is obtained by additionally introducing a suitable complex scaling factor of unity magnitude:

Statement of the DF problem
where − N−1 2 ≤ k ≤ N−1 2 and o ∈{0, 1, 2, 3}. By directly equating (39), and relating the result to (9) with a suitable choice of the constant a = 2o + 1 compliant with (29), we get : with the symmetry property: The respective COHBF centre coefficient is the only truly complex-valued coefficient, where its real and imaginary parts always possess identical moduli. All other coefficients are either purely imaginary or real-valued. Obviously, all frequency domain symmetry properties, including also those related to strict complementarity, are retained in the respective frequency-shifted versions, cf. Subsection 2.3.1 and [Göckler & Damjanovic (2006a)].

FDMUX approach
Using time-domain convolution, the I = 4 potentially required complex output signals, decimated by 2 and related to the channel indices o ∈{0, 1, 2, 3}, are obtained as follows: where the complex impulse responses of channels o are introduced in causal (realizable) form.
Replacing the complex impulse responses with the respective modulation forms (39), and setting the constant to a =(2o + 1)(N − 1)/2, we get: where h[k − (N − 1)/2] represents the real HBF prototype (9) in causal form. Next, in order to introduce an I-component polyphase decomposition for efficient decimation, we split the convolution index κ into two indices: where p = 0, 1, 2, I − 1 = 3a n dr = 0,1,...,⌊(N − 1)/I⌋ = ⌊(N − 1)/4⌋.A sar e s u l t ,i t follows from (44): Rearranging the exponent of the exponential term according to π 4 (4r + p)(2o + 1)=2πro + πr + p π 4 + 2π 4 op, (46) can compactly be rewritten as [Oppenheim & Schafer (1989)]: where the quantity  encompasses all complex signal processing to be performed by the modified causal HBF prototype. An illustrative example with an underlying HBF prototype filter of length N = n + 1 = 11 is shown in Fig. 22 [Göckler & Groth (2004)]. Due to polyphase decomposition (45) and (46), sample rate reduction can be performed in front of any signal processing (shimming delays: z −1 ). Always two polyphase components of the real and the imaginary parts of the complex input signal share a delay chain in the direct form implementation of the modified causal HBF, where all coefficients are either real-or imaginary-valued except for the centre coefficient h 0 = 1 2 e j π 4 . A sar esu lt ,on lyN + 3 real multiplications must be performed to calculate a set of complex output samples at the two (i.e. all) DF output ports. Furthermore, for the FDMUX DF implementation a total of (3N − 5)/2 delays are needed (not counting shimming delays). The calculation of v p (m), p = 0, 1, 2, 3, is readily understood from the signal flow graph (SFG) Fig. 22, where for any filter length Na l w a y so n eof these quantities vanishes as a result of the zero coefficients of (9). Hence, the I = 4 point IDFT, depicted in Fig. 23(a,b) in detailed form, requires only 4 real additions to provide a complex output sample at any of the output ports o ∈{ 0, 1, 2, 3}; Fig. 23(b). Channel selection, for instance as shown in Fig. 21, is simply achieved by selection of the respective two output ports of the SFG of Figs.22 and 23(a), respectively. Moreover, the remaining two unused output ports may be deactivated by disconnection from power supply.    (39) subsequently: a = 2o + 1). These impulse responses are presented in Table 8 as a function of the channel number o ∈{0, 1, 2, 3} for the non-zero coefficients of (40), related to the respective real RHBF coefficients. Except for the centre coefficient exhibiting identical real and imaginary parts, one half of the coefficientsisreal(R)andindependent of the desired centre frequency represented by the channel indices o ∈{ 0, 1, 2, 3}. Hence, these coefficients are common to all four transfer functions. The other half of the coefficients is purely imaginary (I: i.e., their real parts are zero) and dependent of the selected centre frequency. However, this dependency on the channel number is identical for all these coefficients and just requires a simple sign operation. Finally, the repetitive pattern of the coefficients, as a result of coefficient symmetry (41), is reflected in Table 8. A COHBF implementation of a demultiplexing DF aiming at minimum computational load must exploit the inherent coefficient symmetry (41), cf. Table 8. To this end, we consider the COHBF as depicted in Fig. 17 of Subsection 2.3.1, applying input commutators for sample rate reduction. In contrast to the FDMUX approach of Fig. 22, the SFG of Fig. 17 is based on the transposed FIR direct form Bellanger (1989); Mitra (1998), where the incoming signal samples are concurrently multiplied by the complete set of all coefficients, and the delay chains are directly connected to the output ports. When combining two of these COHBF Based on the outlined DF implementation strategy, an illustrative example is presented in Fig.  24 with an underlying RHBF of length N = 11. The front end for polyphase decomposition and sample rate reduction by 2 is identical to that of the FDMUX approach of Fig. 22. Contrary to the former approach, the delay chains for the odd-numbered coefficients are outbound and duplicated (rather than interlaced) to allow for simple channel selection. As a result, channel selection is performed by combining the respective sub-sequences that have passed the R-set coefficients (cf. Table 8) with those having passed the corresponding I-set coefficients, where the latter sub-sequences are pre-multiplied by b i =(−1) o i ; o i ∈{0, 1, 2, 3}, i ∈{I, II}. Multipliers and delays for the centre coefficient h 0,o i signal processing are implemented similarly to Fig. 22 without need for duplication of delays. However, the post-delay inner lattice must be realized for each transfer function individually; its channel dependency follows from Table 8 and (40): where o i ∈{ 0, 1, 2, 3}, i ∈{ I, II} and h 0 = 1/2 according to (9). Rearranging (49) yields with obvious abbreviations: It is easily recognized that the inner lattices of Fig. 24 implement the operations within the brackets of (50) with their results displayed at the respective inner nodes A, B, C, D. In compliance with (50), these inner node sequences must be multiplied by the respective signs d i =( −1) ⌊o i /2⌋ ; o i ∈{ 0, 1, 2, 3}, i ∈{ I, II}, prior to their combination with the above R/I sub-sequences.
To calculate a set of complex output samples at the two DF output ports, obviously the minimum number of (N + 5)/2 real multiplications must be carried out. Furthermore, for  the COHBF approach to DF implementation a total of (5N − 11)/2 delays are needed (not counting shimming delays, z −1 , and the two superfluous delays at the input nodes of the outer delay chains, indicated in grey). Finally, we want to show and emphasise the simplicity of the channel selection procedure. There is a total of 8 summation points, the inner 4 lattice output nodes A, B, C, and D, and the 4 system output port nodes, where the signs of some input sequences of the output port nodes must be set compliant to the desired channel transfer functions: o i ∈{0, 1, 2, 3}, i ∈{I, II}.The sign selection is most easily performed as shown in Fig. 25. A concise survey of the required expenditure of the two approaches to the implementation of a demultiplexing DF is given in Table 9, not counting sign manipulations for channel selection. Obviously, the COHBF approach requires the minimum number of multiplications (N + 5)/2 (5N − 11)/2 COHBF ex.: N = 11 8 22 Table 9. Comparison of expenditure of FDMUX and COHBF DF approaches at the expense of a higher count of delay elements. Finally, it should be noticed that the DF group delay is independent of its (FDMUX or COHBF) implementation.

Linear-phase directional combination filter
Using transposition techniques, we subsequently derive DF being complementary (dual) to those presented in Subsection 3.2: They combine two complex-valued signals of identical sampling rate f d that are likewise oversampled by at least 2 to an FDM signal, where different oversampling factors allow for different bandwidths. An example can be deduced from

Transposition of complex multirate systems
The goal of transposition is to derive a system that is complementary or dual to the original one: The various filter transfer functions must be retained, demultiplexing and decimating operations must be replaced with the dual operations of multiplexing and interpolation, respectively [Göckler & Groth (2004)].
The types of systems we want to transpose, Figs.22 and 24, represent complex-valued 4 × 2 multiple-input multiple-output (MIMO) multirate systems. Obviously, these systems are composed of complex monorate sub-systems (complex filtering of polyphase components) and real multirate sub-systems (down-and upsampler), cf. [Göckler & Groth (2004)]. While the transposition of real MIMO monorate systems is well-known and unique [Göckler & Groth (2004); Mitra (1998)], in the context of complex MIMO monorate systems the Invariant (ITr) and the Hermitian (HTr) transposition must be distinguished, where the former retains the original transfer functions, H T o (z)=H o (z) ∀o, as desired in our application. As detailed in [Göckler & Groth (2004)], the ITr is performed by applying the transposition rules known for real MIMO monorate systems provided that all imaginary units "j", both of the complex input and output signals and of the complex coefficients, are conceptually considered and treated as multipliers within the SFG 3 (denoted as truly complex implementation), as to be seen from Figs.22 and 24. The transposition of an M-downsampler, representing a real single-input single-output (SISO) multirate system, uniquely leads to the corresponding M-upsampler, the complementary (dual) multirate system, and vice versa [Göckler & Groth (2004)]. 3 The imaginary units of the input signals and the coefficients must not be eliminated by simple multiplication and consideration of the correct signs in subsequent adders; this approach would transform the original complex MIMO SFG to a corresponding real SFG, where the direct transposition of the latter would perform the HTr [Göckler & Groth (2004)].

265
Most Efficient Digital Filter Structures: The Potential of Halfband Filters in Digital Signal Processing

www.intechopen.com
Connecting all of the above considerations, the ITr transposition of a complex-valued MIMO multirate system is performed as follows [Göckler & Groth (2004)]: • The system SFG to be transposed must be given as truly complex implementation.
• Reverse all arrows of the given SFG, both the arrows representing signal flows and those symbolic arrows of down-and upsamplers or rotating switches (commutators), respectively.
As a result of transposition [Göckler & Groth (2004)] • all input (output) nodes become output (input) nodes, a 4 × 2 MIMO system is transformed to a 2 × 4MIMOsystem, • the number of delays and multipliers is retained, • the overall number of branching and summation nodes is retained, and • the overall number of down-and upsamplers is retained.

Transposition of the SFG of the COHBF approach to DF
As an example, we transpose the SFG of the COHBF approach to the implementation of a separating DF, as depicted in Fig. 24. The application of the transposition rules of the preceding Subsection 3.3.1 to the SFG of Fig. 24 results in the COHBF approach to a multiplexing DF shown in Fig. 26. The invariant properties are easily confirmed by comparing the original and the transposed SFG. Hence, the numbers of delays and multipliers required by both DF systems being mutually dual are identical. As expected, the numbers of adders required are different, since the overall number of branching and summation nodes is retained only. Moreover, it should be noted that also the simplicity of the channel selection procedure is retained. To this end, we have shifted the channel-dependent sign-setting operators d i = (−1) ⌊o i /2⌋ , o i ∈{ 0, 1, 2, 3}, i ∈{ I, II}, to more suitable positions in front of the summation nodes G and H. Again, there is a total of 8 summation points, where the signs of the respective input sequences must be adjusted: The 4 inner lattice output nodes A, B, C, and D, the 2 input summation nodes E and F immediately fed by the imaginary parts of the input sequences, and the 2 inner post-lattice summing nodes G and H. At all these summation nodes, the signs of some or all input sequences must be set in compliance with the desired channel transfer functions: H o (z), o i ∈{0, 1, 2, 3}, i ∈{I, II}, cf. Fig. 26. The sign selection is again most easily performed, as shown in Fig. 27.

Conclusion: Halfband filter pair combined to directional filter
In this Section 3, we have derived and analyzed two different approaches to linear-phase directional filters that separate from a complex-valued FDM input signal two complex user signals, where the FDM signal may be composed of up to four independent user signals: The FDMUX approach (Subsection 3.2.1) needs the least number of delays, whereas the synergetic COHBF approach (Subsection 3.2.2) requires minimum computation. Signal extraction is always combined with decimation by two. While the four frequency slots of the user signals to be processed (corresponding to the four potential DF transfer functions H o (z), o i ∈{ 0, 1, 2, 3}, i ∈{ I, II}, centred according to (38); cf. Fig. 21 ) are equally wide and uniformly allocated, as indicated in Fig. 28  user signals may possess different bandwidths. However, each user signal must completely be contained in one of the four frequency slots, as exemplified in Fig. 28. Furthermore, by applying the transposition rules of [Göckler & Groth (2004)], the corresponding complementary (dual) combining directional filters have been derived, where the multiplication rates and the delay counts of the original structures are always retained. Obviously, transposing a system allows for the derivation of an optimum dual system by applying the simple transposition rules, provided that the original system is optimal. Thus, a tedious re-derivation and optimization of the complementary system is circumvented. Nevertheless, it should be noted that by transposition always just one particular structure is obtained, rather than a variety of structures [Göckler & Groth (2004)]. Finally, to give an idea of the required filter lengths required, we recall the design result reported in [Göckler & Eyssele (1992)] where, as depicted in the above Fig. 21(a,b), the passband, stopband and transition bands were assumed equally wide: With an HBF prototype filter length of N = 11 and 10 bit coefficients, a stopband attenuation of > 50dB was achieved.

Parallelisation of tree-structured filter banks composed of directional filters 4
In the subsequent Section 4 of this chapter we consider the combination of multiple two-channel DF investigated in Section 3 to construct tree-structured filter banks. To this end, we cascade separating DF in a hierarchical manner to demultiplex (split) a frequency division multiplex (FDM) signal into its constituting user signals: this type of filter bank (FB) is denoted by FDMUX FB; Fig. 2. Its transposed counterpart (cf. Subsection 3.3.1), the FMUX FB, is a cascade connection of combining DF considered in Subsection 3.3 to form an FDM signal of independent user signals. Finally, we call an FDMUX FB followed by an FMUX FB an FDFMUX FB, which may contain a switching unit for channel routing between the two FB. Subsequently, we consider an application of FDFMUX FB for on-board processing in satellite communications. If the number of channels and/or the bandwidth requirements are high, efficient implementation of the high-end DF is crucial, if they are operated at (extremely) high sampling rates. To cope with this issue, we propose to parallelise the at least the front-end (back-end) of the FDMUX (FMUX) filter bank. For this outlined application, we give the following introduction and motivation. Digital signal processing on-board communication satellites (OBP) is an active field of research where, in conjunction with frequency division multiplex (FDMA) systems, presently two trends and challenges are observed, respectively: i) The need of an ever-increasing number of user channels makes it necessary to digitally process, i.e. to demultiplex, cross-connect and remultiplex, ultra-wideband FDM signals requiring high-end sampling rates that range considerably beyond 1GHz [Arbesser-Rastburg et al. (2002); Maufroid et al. (2004;2003); Rio-Herrero & Maufroid (2003); Wittig (2000)], and ii) the desire of flexibility of channel bandwidth-to-user assignment calling for simply reconfigurable OBP systems [Abdulazim & Göckler (2005); Göckler & Felbecker (2001); Johansson & Löwenborg (2005); Kopmann et al. (2003)]. Yet, overall power consumption must be minimum demanding highly efficient FB for FDM demultiplexing (FDMUX) and remultiplexing (FMUX). Two baseline approaches to most efficient uniform digital FB, as required for OBP, are known: a) The complex-modulated (DFT) polyphase (PP) FB applying single-step sample rate alteration [Vaidyanathan (1993)], and b) the multistage tree-structured FB as depicted in Fig. 2, where its directional filters (DF) are either based on the DFT PP method 4 Underlying original publication: Göckler et al. (2006) [Göckler & Groth (2004); Göckler & Eyssele (1992)] according to Subsection 3.2.1, or on the COHBF approach investigated in Subsection 3.2.2. For both approaches it has been shown that bandwidth-to-user assignment is feasible within reasonable constraints ; Johansson & Löwenborg (2005); Kopmann et al. (2003)]: A minimum user channel bandwidth, denoted by slot bandwidth b, can stepwise be extended by any integer number of additional slots up to a desired maximum overall bandwidth that shall be assigned to a single user. However, as to challenge i), the above two FB approaches fundamentally differ from each other: In a DFT PP FDMUX (a) the overall sample rate reduction is performed in compliance with the number of user channels in a single step: all arithmetic operations are carried out at the (lowest) output sampling rate [Vaidyanathan (1993)]. In contrast, in the multistage FDMUX (b) the sampling rate is reduced stepwise, in each stage by a factor of two [Göckler & Eyssele (1992)]. As a result, the polyphase approach (a) inherently represents a completely parallelised structure, immediately usable for extremely high front-end sampling frequencies, whereas the high-end stages of the tree-structured FDMUX (b) cannot be implemented with standard space-proved CMOS technology. Hence, the tree structure, FDMUX as well as FMUX, calls for a parallelisation of the high rate stages. As motivated, this contribution deals with the parallelisation of multistage multirate systems.
To this end, we recall a general systematic procedure for multirate system parallelisation [Groth (2003)], which is deployed in detail in Subsection 4.1. For proper understanding, in Subsection 4.2 this procedure is applied to the high rate front-end stages of the FDMUX part of the recently proposed tree-structured SBC-FDFMUX FB [Abdulazim & Göckler (2005); ], which uniformly demultiplexes an FDM signal always down to slot level (of bandwidth b) and that, after on-board switching, recombines these independent slot signals to an FDM signal (FMUX) with different channel allocation -FDFMUX functionality. If a single user occupies a multiple slot channel, the corresponding parts of FDMUX and FMUX are matched for (nearly) perfect reconstruction of this wideband channel signal -SBC functionality [Vaidyanathan (1993)]. Finally, some conclusions are drawn.

Sample-by-sample approach to parallelisation
In this subsection, we introduce the novel sample-by-sample processing (SBSP) approach to parallelisation of digital multirate systems, as proposed by [Groth (2003)] where, without any additional delay, all incoming signal samples are directly fed into assigned units for immediate signal processing. Hence, in contrast to the widely used block processing (BP) approach, SBSP does not increase latency. In order to systematically parallelise a (multirate) system, we distinguish four procedural steps [Groth (2003)]: 1. Partition the original system in (elementary SISO or MIMO) subsystems E(z) with single or multiple input and/or output ports, respectively, still operating at the original high clock frequency f n = 1/T that are simply amenable to parallelisation. To enumerate some of these: Delay, multiplier, down-and up-sampler, summation and branching, but also suitable compound subsystems such as SISO filters and FFT transform blocks. 2. Parallelise each subsystem E(z) in an SBSP manner according to the desired individual degree of parallelisation P,w h e r eP ∈ N. To this end, each subsystem is cascaded with a P-fold SBSP serial-to-parallel (SP) commutator for signal decomposition (demultiplexing) followed by a consistently connected P-fold parallel-to-serial (PS) commutator for recomposition (remultiplexing) of the original signal, as depicted in Fig. 29(a). Here, obviously P =  . P-Parallelisation of SISO subsystem E(z) to P × P MIMO system E(z d ) P SP = P PS ,a n dp ∈ [0, P − 1] denotes the relative time offsets of connected pairs of down-and up-samplers, respectively. Evidently, the P output signals of the SP interface comprise all polyphase components of its input signal in a time-interleaved (SBSP) manner at a P-fold lower sampling rate f d = f n /P [Göckler & Groth (2004); Vaidyanathan (1993)].
Since the subsequent PS interface is inverse to the preceding SP interface [Göckler & Groth (2004)], the SP-PS commutator cascade has unity transfer with zero delay in contrast to the (P − 1)-fold delay of the BP Delay-Chain Perfect-Reconstruction system [Göckler & Groth (2004); Vaidyanathan (1993)], as anticipated (cf. also Fig. 30). After this preparation, P-fold parallelisation is readily achieved by shifting the (SISO) subsystem E(z) between the SP and PS interfaces by exploiting the noble identities [Göckler & Groth (2004); Vaidyanathan (1993)] and some novel generalized SBSP multirate identities [Groth (2003); Groth & Göckler (2001)]. Thus, as shown in Fig. 29(b), the two interfaces are interconnected by an equivalent P × P MIMO system E(z d ), which represents the P-fold parallelisation of E(z), where all operations of which are performed at the P-fold reduced operational clock frequency f d .
3. Reconnect all parallelised subsystems exactly in the same manner as in the original system. This is always given, since parallelisation does not change the original numbers of input and output ports of SISO or MIMO subsystems, respectively. 4. Eliminate all interfractional cascade connections of PS-SP interfaces using the obvious multirate identity depicted in Fig. 30. Note that this elimination process requires identical up-and down-sampling factors, P out,a PS = P in,b SP , of each PS-SP interface cascade restricting free choice of P for subsystem parallelisation. As a result of parallelisation, all input signals of the original (possibly MIMO) system are decomposed into P time-interleaved polyphase components by a SP demultiplexer for subsequent parallel processing at a P-fold lower rate, and all system output ports are provided with a PS commutator to interleave all low rate subsignals to form the high speed output signals. For illustration, we present the parallelisation of a unit delay z −1 := z −1/P d ,andofanM-fold down-sampler with zero time offset [Groth (2003)], as shown in Fig. 31. The unit delay (a) is realized by P parallel time-interleaved shimming delays to be implemented by suitable system control: where permutation is introduced for straightforward elimination of interfractional PS-SP cascades according to Fig. 30 (I : Identity matrix). In case of down-sampling Fig. 31(b), to increase efficiency, the P parallel down-samplers of the diagonal MIMO system E(z d ) are merged with the P down-samplers of the SP interface. Hence, by using suitable multirate  identities [Groth (2003)], the contiguous PM-fold down-samplers of the SP demultiplexer have a relative time offset of M.

Parallelisation of SBC-FDFMUX filter bank
Subsequently, we deploy the parallelisation of the high rate FDMUX front-end section of the versatile tree-structured SBC-FDFMUX FB for flexible channel and bandwidth allocation [Abdulazim & Göckler (2005); Abdulazim et al. (2007)]. The first three hierarchically cascaded stages of the FDMUX are shown in Fig.  32 in block diagram form applying BP. In each stage, ν = 1, 2, 3, the respective input spectrum is split into two subbands of equal bandwidth in conjunction with decimation by two. For convenience of presentation, all DF have identical coefficients and, in contrast to Section 3, are assumed as critically sampling 2-channel DFT PP FB with zero frequency offset (cf. ). The branch filter transfer functions H λ (z ν ), λ = 0, 1, represent the two PP components of the prototype filter [Göckler & Groth (2004); Vaidyanathan (1993)] where, by setting z ν := e jΩ (ν) with Ω (ν) = 2π f / f ν and ν = 1, 2, 3, the respective frequency responses H λ (e jΩ (ν) ) are obtained, which are related to the operational sampling rate f ν of stage ν. The respective DF lowpass   ); z ν := e jΩ (ν) , and highpass filter transfer functions of stage ν, related to the original sampling rate 2 f ν , are generated by the two branch filter transfer functions H λ (z ν ), λ = 0, 1, in combination with the simple "butterfly" across the output ports of each DF: Summation produces the lowpass, subtraction the complementary highpass filter transfer function Bellanger (1989); Kammeyer & Kroschel (2002); Mitra (1998);Schüssler (2008); Vaidyanathan (1993). Assuming, for instance, a high-end input sampling frequency of f n = f 0 = 2.4GHz [Kopmann et al. (2003); Maufroid et al. (2003)], the operational clock rate of the third stage is f 3 = f n /2 3 = 300MHz, which is deemed feasible using present-day CMOS technology. Hence, front-end parallelisation has to reduce operational clock of all subsystems preceding the third stage down to f d = f 3 = 300MHz. This is achieved by 8-fold parallelisation of input branching and blocking (delay z −1 0 ), 4-fold parallelisation of the first stage of the FDMUX tree (comprising input decimation by two, the PP branch filters H λ (z 1 ), λ = 0, 1, and butterfly), and of the input branching and blocking (delay z −1 1 ) of the second stage and, finally, corresponding 2-fold parallelisation of the two parallel 2-channel FDMUX FB of the second stage of the tree, as indicated in Fig. 32. The result of parallelisation, as required above, is shown in Fig. 33, where all interfractional interfaces have been removed by straightforward application of identity of Fig. 30. Subsequently, parallelisation of elementary subsystems is explained in detail: 1. Down-Sampling by M = 2: In compliance with Fig. 31(b), each 2-fold down-sampler is replaced with P ν units in parallel for 2P ν -fold down-sampling with even time offset 2p,where p = 0, 1, 2, 3 applies to the first tree stage (P 1 = 4),andp = 0, 1 to the second stage (P 2 = 2). The result of 4-fold parallelisation of the front end input down-sampler of the upper branch (ν = 1, λ = 0) is readily visible in Fig. 33 preceding filter MIMO block H 1 0 (z d ): In fact, it represents an 8-to-4 parallelisation, where all odd PP components are removed according to Fig. 31 H 1 (z 1 ). To this end, as required by Fig. 32, the unit delay z −1 0 is parallelised by P 0 = 8, as shown in Fig. 31(a), while the subsequent down-sampler applies P 1 = 4, as described above w.r.t. Fig. 31(b). Immediate cascading of parallelised unit delay (P 0 = 8) and down-sampling (P 1 = 4, M = 2) (as induced by Fig. 31) shows that only those four PP components of the parallelised delay with even time offset (p = 0, 2, 4, 6) are transferred via the 4-branch SP-input interface of down-sampling (2P 1 = 8) to its PS-output interface with naturally ordered time offsets p = 0, 1, 2, 3 w.r.t. P 1 = 4. Hence, only those retained 4 out of 8 PP components of odd time index p = 7, 1, 3, 5, being provided by the unit delay's SP-input interface and delayed by z −1 0 = z −1/8 d , are transferred (mapped) to the P 1 = 4 up-samplers with timing offset p = 0, 1, 2, 3 of the 4-branch PS-output interface of the down-sampler. Fig. 33 shows the correspondingly rearranged signal flow graph representation of stage 1 input section (ν = λ = 1). As a result, the upper branch of stage 1, H 0 (z 1 ) → H 1 0 (z d ),i sf e db yt h ee v e n -i n d e x e d PP components of the high rate FDMUX input signal, whereas the lower branch H 1 (z 1 ) → H 1 1 (z d ) is provided with the delayed versions of the PP components of odd index, as depicted in Fig. 33. Hence, as in the original system Fig. 32, the input sequence is completely fed into the parallelised system. This procedure is repeated with the input branching and blocking sections of the subsequent stages ν = 2, 3: The PP branch filters H 0 (z ν ) → H ν 0 (z d ) parallelised by P ν ,whereP 2 = 2and P 3 = 1 (P 1 = 4), are provided with the even-numbered PP components of the respective input signals with timing offsets in natural order. Contrary, the set of PP components of odd index is always delayed by z −1/P ν−1 d and fed into filter blocks H 1 (z ν ) → H ν 1 (z d ) in crossed manner (cf. input section λ = 1). 3. P ν -fold Parallelisation of PP branch filters H λ (z ν ) → H ν λ (z d ), λ = 0, 1; ν = 1, 2, is achieved by systematic application of the procedure condensed in Fig. 29 (for details cf. Göckler & Groth (2004); Groth (2003)). To this end, H λ (z ν ) is decomposed in P ν PP components of correspondingly reduced order, which are arranged to a MIMO system by 273 Most Efficient Digital Filter Structures: The Potential of Halfband Filters in Digital Signal Processing www.intechopen.com exploiting a multitude of multirate identities Groth (2003); Groth & Göckler (2001). The resulting P ν × P ν MIMO filter transfer matrix H ν λ (z d ) contains each PP component of H λ (z ν ) P ν times: Thus, the amount of hardware is increased P ν times whereas, as desired for feasibility, the operational clock rate is concurrently reduced by P ν . Hence, the overall expenditure, i.e. the number of operations times the respective operational clock rate Göckler & Groth (2004), is not changed. 4. Parallelisation of butterflies combining the output signals of associated PP filter blocks is straightforward: For each (time-interleaved) PP component of the respective signals a butterfly has to be foreseen, as shown in Fig. 33.

Conclusion: Parallelisation of multirate systems
In this Section 4, a general and systematic procedure for parallelisation of multirate systems, for instance as investigated in Sections 2 and 3, has been presented . Its application to the high rate decimating FDMUX front end of the tree-structured SBC-FDFMUX FB Abdulazim & Göckler (2005);  has been deployed in detail. The stage ν degree of parallelisation P ν , ν = 0, 1, 2, 3, is diminished proportionally to the operational clock frequency f ν of stage ν and is, thus, adapted to the actual sampling rate. As a result, after suitable decomposition of the high rate front end input signal by an input commutator in P 0 = P max polyphase components (as depicted for P max = 8 in Fig. 33), all subsequent processing units are likewise operated at the same operational clock rate f d = f n /P 0 = f 0 /P 0 . Since inherent parallelism of the original tree-structured FDMUX (Fig. 32) has attained P max = 8 in the third stage, and the output signals of this stage represent the desired eight demultiplexed FDM subsignals, interleaving PS-output commutators are no longer required, as to be seen in Fig. 33. Finally, it should be noted that parallelisation does not change overall expenditure; yet, by multiplying stage ν hardware by P ν , the operational clock rates are reduced by a factor of P ν to a feasible order of magnitude, as desired. Applying the rules of multirate transposition (cf. Subsection 3.3.1 or Göckler & Groth (2004)) to the parallelised FDMUX front end, the high rate interpolating back end of the tree-structured SBC-FDFMUX FB is obtained likewise and exhibits the same properties as to expenditure and feasibility Groth (2003). Hence, the versatile and efficient tree-structured filter bank (FDMUX, FMUX, SBC, wavelet, or any combination thereof) can be used in any (ultra) wide-band application without any restriction.

Summary and conclusion
In Section 2 we have introduced and investigated a special class of real and complex FIR and IIR halfband bandpass filters with the particular set of centre frequencies defined by (1). As a result of the constraint (1), almost all filter coefficients are either real-valued or purely imaginary-valued, as opposed to fully complex-valued coefficients. Hence, this class of halfband filters requires only a small amount of computation. In Section 3, two different options to combine two of the above FIR halfband filters with different centre frequencies to form a directional filter (DF) have been investigated. As a result, one of these DF approaches is optimum w.r.t. to computation (most efficient), whereas the other requires the least number of delay elements (minimum McMillan degree). The relation between separating DF and DF that combine two independent signals to an FDM signal via multirate transposition rules has extensively been shown. Finally, in Section 4, the above FIR directional filters (DF) have been combined to tree-structured multiplexing and demultiplexing filter banks. While this procedure is straightforward, the operating clock rates within the front-or back-ends may be too high for implementation. To this end, we have introduced and described to some extent the systematic graphically induced procedure to parallelise multirate systems according to [Groth (2003)]. It has been applied to a three-stage demultiplexing tree-structured filter bank in such a manner that all operations throughout the overall system are performed at the operational output clock. As a result, parallelisation makes the system feasible but retains the computational load.