An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures

Kyo Takahashi; Naoki Honma; Yoshitaka Tsunekawa

doi:10.5772/16337

Author Information

Show +

Kyo Takahashi*
- Iwate Industrial Research Institute, Japan
Naoki Honma
- Iwate University, Japan
Yoshitaka Tsunekawa
- Iwate Industrial Research Institute, Japan

*Address all correspondence to:

1. Introduction

In recent years, adaptive filters are used in many applications, for example an echo canceller, a noise canceller, an adaptive equalizer and so on, and the necessity of theirimplementations is growing up in many fields.Adaptive filters require various performances of a high speed, lower power dissipation, good convergence properties, small output latency, and so on.The echo-canceller used in the videoconferencing requires a fast convergence property and a capability to track the time varying impulse response (Makino & Koizumi, 1988).Therefore, implementations of very high order adaptive filters are required.In order to satisfy these requirements, highly-efficient algorithms and architectures are desired.The adaptive filter is generally constructed by using the multipliers, adders and memories, and so on, whereas, the structure without multipliers has been proposed.

The LMS adaptive filter using distributed arithmetic can be realized by using adders and memories without multipliers, that is, it can be achieved with a small hardware.ADistributed Arithmetic (DA) is an efficient calculation method of an inner product of constant vectors, and it has been used in the DCT realization.Furthermore, it is suitable for time varying coefficient vector in the adaptive filter.Cowan and others proposed a Least Mean Square (LMS) adaptive filter using the DAonan offset binary coding (Cowan & Mavor, 1981; Cowan et al, 1983).However, it is found that the convergence speed of this method is extremely degraded (Tsunekawa et al, 1999).This degradation results from an offset bias added to an inputsignal coded onthe offset binary coding.To overcome this problem, an update algorithm generalized with 2’s complement representation has been proposed (Tsunekawa et al, 1999), and the convergence condition has been analyzed (Takahashi et al, 2002).The effective architectures for the LMS adaptive filter using the DA have been proposed (Tsunekawa et al, 1999; Takahashi et al, 2001).The LMS adaptive filter using distributed arithmetic is expressed by DA-ADF.The DA is applied to the output calculation, i.e., inner product of the input signal vector and coefficient vector.The output signal is obtained by the shift and addition of the partial-products specified with the bit patterns of the N-th order input signal vector.This process is performed from LSB to MSB direction at the everysampling instance, where the B indicates the word length.The B partial-products used to obtain the output signal are updated from LMB to MSB direction.There exist 2^N partial-products, and the set including all the partial-products is called Whole Adaptive Function Space (WAFS).Furthermore, the DA-ADF using multi-memory block structure that uses the divided WAFS (MDA-ADF) (Wei & Lou, 1986; Tsunekawa et al, 1999) and the MDA-ADF using half-memory algorithm based on the pseudo-odd symmetry property of the WAFS (HMDA-ADF) have been proposed (Takahashi et al, 2001).The divided WAFS is expressed by DWAFS.

In this chapter, the new algorithm and effective architecture of the MDA-ADF are discussed. The objectives are improvements of the MDA-ADF permitting the increase of an amount of hardware and power dissipation. The convergence properties of the new algorithm are evaluated by the computer simulations, and the efficiency of the proposed VLSI architecture is evaluated.

2. LMS adaptive filter

An N-tap input signal vector S(k) is represented as

S(k)=[s(k),s(k−1),⋯,s(k−N+1)]TE1

where, s(k) is an input signal at k time instance, and the T indicates a transpose of the vector.

The output signal of an adaptive filter is represented as

y(k)=ST(k)W(k)E2

where, W(k) is the N-th coefficient vector represented as

W(k)=[w0(k),w1(k),⋯,wN−1(k)]TE3

and the w_i(k) is an i-th tap coefficient of the adaptive filter.

The Widrow’s LMS algorithm (Widrow et al, 1975) is represented as

W(k+1)=W(k)+2μe(k) S(k)E4

where, the e(k), μ and d(k) are an error signal, a step-size parameter and the desired signal, respectively. The step-size parameter deterimines the convergence speed and the accuracy of the estimation. The error signal is obtained by

e(k)=d(k)−y(k)E5

The fundamental structure of the LMS adaptive filter is shown in Fig. 1.The filter input signal s(k) is fed into the delay-line, and shifted to the right direction every sampling instance. The taps of the delay-line provide the delayed input signal corresponding to the depth of delay elements.The tap outputs are multiplied with the corresponding coefficients, the sum of these products is an output of the LMS adaptive filter.The error signal is defined as the difference between the desired signal and the filter output signal.The tap coefficients are updated using the products of the input signals and the scaled error signal.

Figure 1.
Fundamental Structure of the 4-tap LMS adaptive filter.

3. LMS adaptive filter using distributed arithmetic

In the following discussions, the fundamentals of the DA on the 2’s complement representation and the derivation of the DA-ADF are explained.The degradation of the convergence property and the drastic increase of the amount of hardware in the DA-ADF are the serious problems for its higher order implementation. As the solutions to overcome the problems, the multi-memory block structure and the half-memory algorithm based on the pseudo-odd symmetry property of WAFS are explained.

3.1. Distributed arithmetic

The DA is an efficient calculation method of an inner product by a table lookupmethod (Peled &Liu, 1974).Now, let’s consider the inner product

y=aTv=∑i=1NaiviE6

of the N-th order constant vector

a=[a0,a1,⋯a(N−1)] TE7

and the variable vector

v=[v0,v1,⋯v(N−1)] TE8

In Eq.(8), the v_i is represented on B-bit fixed point and 2’s complement representation, that is,

−1≤vi<1E9

and

vi=−vi0+∑k=1B−1vik2−k,

i=0,1,⋯,N−1E10

In the Eq.(10), v_i^k indicates the k-th bit of v_i, i.e., 0 or 1.By substituting Eq.(10) for Eq.(6),

y=−Φ(v00,v10,⋯,vN−10)+∑k=1B−1Φ(v0k,v1k,⋯,vN−1k)2−kE11

is obtained. Thefunction Φwhich returns the partial-product corresponding argument is defined by

Φ(v0k,v1k,⋯,vN−1k)≡∑i=0N−1aivikE12

Eq.(11) indicates that the inner product of y is obtained as the weighted sum of the partial-products. The first term of the right side is weighted by -1, i.e., sign bit, and the following terms are weighted by the 2^-k. Fig.2 shows the fundamental structure of the FIR filter using the DA (DA-FIR).The function table is realized using the Read Only Memory (ROM), and the right-shift and addition operation is realized using an adder and register.The ROM previously includes the partial-products determined by the tap coefficient vector and the bit-pattern of the input signal vector.From above discussions, the operation time is only depended on the word length B, not on the number of the term N, fundamentally.This means that the output latency is only depended on the word length B. The FIR filter using the DA can be implemented without multipliers, that is, it is possible to reduce the amount of hardware.

Figure 2.
Fundamental structure of the FIR filter using distributed arithmetic.

3.2. Derivation of LMS adaptive algorithm using distributed arithmetic

The derivation of the LMS algorithm using the DA on 2’s complement representation is as follows. The N-th order input signal vector in Eq.(1) is defined as

S(k)≡A(k) FE13

Using this definition, the filter output signal is represented as

y(k)=ST(k)W(k)=FTAT(k)W(k)E14

In Eq.(13) and Eq(14), an address matrix which is determined by the bit pattern of the input signal vector is represented as

A(k)=[b0(k)b0(k−1)⋯b0(k−N+1)b1(k)b1(k−1)⋯b1(k−N+1)⋮⋮⋱⋮bB−1(k)bB−1(k−1)⋯bB−1(k−N+1)]TE15

and a scaling vector based on the 2’s complement representation is represented as

F=[−20,2−1,⋯,2B−1]TE16

where, b_i(k) is an i-th bit of the input signal s(k).In Eq.(14), A^T(k)W(k) is defined as

P(k)≡AT(k)W(k)E17

and the filter output is obtained as

y(k)=FTP(k)E18

The P(k) is called adaptive function space (AFS), and is a B-th order vector of

P(k)=[p0(k),p1(k),⋯,pB−1(k)]TE19

The P(k) is a subset of the WAFS including the elements specified by the row vectors (access vectors) of the address matrix. Now, multiplying both sides by A^T(k), Eq.(4) becomes

AT(k)W(k+1)=AT(k){W(k)+2μe(k)A(k)F}=AT(k)W(k)+2μe(k)AT(k)A(k)FE20

Furthermore, by using the definitions described as

P(k)≡AT(k)W(k)E21

and

P(k+1)≡AT(k)W(k+1)E22

the relation of them can be explained as

P(k+1)=P(k)+2μe(k)AT(k)A(k)FE23

It is impossible to perform the real-time processing because of the matrix multiplication of A^T(k)A(k) in Eq.(22).

To overcome this problem, the simplification of the term of A^T(k)A(k)F in Eq.(22) has been also achieved on the 2’s complement representation (Tsunekawa et al, 1999).By using the relation

E[AT(k)A(k)F]=0.25NFE24

the simplified algorithm becomes

P(k+1)=P(k)+0.5μNe(k)FE25

where, the operator E[] denotes an expectation. Eq.(24) can be performed by only shift and addition operations without multiplications using approximated μN with power of two, that is, the fast real-time processing can be realized.The fundamental structure is shown in Fig.3, and the timing chart is also shown in Fig.4.The calculation block can be used the fundamental structure of the DA-FIR, and the WAFS is realized by using a Random Access Memory (RAM).

Figure 3.
Fundamental structure of the DA-ADF.

3.3. Multi-memory block structure

The structure employing the DWAFS to guarantee the convergence speed and the small

Figure 5.
Fundamental structure of the MDA-ADF.

hardware for higher order filtering has been proposed (Wei & Lou, 1986; Tsunekawa et al, 1999).The DWAFS is defined for the divided address-line of R bit. For a division number M, the relation of N and R isrepresented as

R=N/ME26

The capacity of the individual DWAFS is 2^R words, and the total capacity becomes smaller2^RMwords than the DA’s 2^N words.For the smaller WAFS, the convergence of the algorithm can be achieved by smaller iterations. The R-th order coefficient vector and the related AFS is represented as

Wm(k)=[wm0(k),wm1(k),⋯,wm(R−1)(k)] TE27

Pm(k)=[pm0(k),pm1(k),⋯,pm(B−1)(k)] TE28

(m=0,1,⋯,M−1;

R=N/ME29

)

where, the AFS is defined as

Pm(k)≡AmT(k)Wm(k)E30

Therefore, the filter output signal is obtained by

y(k)=∑m=1MFTPm(k)E31

where, the address matrix is represented as

Am(k)=[bm0(k)bm0(k−1)⋯bm0(k−R+1)bm1(k)bm1(k−1)⋯bm1(k−R+1)⋮⋮⋱⋮bm(B−1)(k)bm(B−1)(k−1)⋯bm(B−1)(k−R+1)]TE32

The update formula of the MDA-ADF is represented as

Pm(k+1)=Pm(k)+0.5μRe(k)FE33

The fundamental structure and the timing chart are shown in Fig.5 and 6, respectively.

3.4. Half-memory algorithm using pseudo-odd symmetry property

It is known that there exists the odd symmetry property of the WAFS in the conventional DA-ADF on the offset binary coding (Cowan et al, 1983). Table 1 shows the example of the oddsymmetry property in case of R=3.The stored partial-product for the inverted address has an equal absolute values and a different sign. Using this property, the MDA-ADF is can be realized with half amount of capacity of the DWAFS.This property concerned with aproperty of the offset binary coding. However, the pseudo-odd symmetry property of WAFS on the 2’s complement representation has been found (Takahashi et al, 2001). The stored partial-product for the inverted address is a nearly equal absolute value and adifferent sign.The MDA algorithm using this property is called half-memory algorithm, andthe previous discussed MDA algorithm is called full-memory algorithm.The access method of the DWAFS is represented as follows(Takahashi et al, 2001).

Address	Stored partial-product
000	-0.5w₀ -0.5w₁-0.5w₂
001	-0.5w₀ -0.5w₁+0.5w₂
010	-0.5w₀ +0.5w₁-0.5w₂
011	-0.5w₀ +0.5w₁+0.5w₂
100	+0.5w₀ -0.5w₁-0.5w₂
101	+0.5w₀ -0.5w₁+0.5w₂
110	+0.5w₀ +0.5w₁-0.5w₂
111	+0.5w₀ +0.5w₁+0.5w₂

Table 1.

Example of the odd-symmetry property of the WAFS on the offset binary coding. This property is approximately achieved on the 2’s complement representation.

Read the partial-products

begin

for i:=1 to B do

begin

if address_MSB = 0 then

Read the partial-product using R-1 bits address;

if address_MSB = 1 then

Invert the R-1 bits of address;

Read the partial products using inverted R-1 bits address;

Obtain the negative read value;

end

Update the WAFS

begin

for i:=1 to B do

begin

if address_MSB = 0 then

Add the partial-product and update value;

if address_MSB = 1 then

Invert the R-1 bits of address;

Obtain the negative update value;

Add the partial-product and the negative update value;

endend

The expression of “addressMSB” indicates the MSB of the address.Fig.7 shows the difference of the access method between MDA and HMDA.The HMDA accesses the WAFS with the R-1 bits address-line without MSB, and the MSB is used to activate the 2’s complementors located both sides of the WAFS.Fig. 8 shows the comparison of the convergence properties of the LMS, MDA, and HMDA.Results are obtained by the computer simulations.The simulation conditions are shown in Table 2. Here, IRER represents an impulse response error ratio. The step-size parameters ofthe algorithms were adjusted so as to achieve a final IRER of -49.5 [dB]. It is found that both the MDA(R=1) and the HMDA(R=2) achieve a good convergence properties that is equivalent to the LMS’s one. Since both the MDA(R=1) and the HMDA(R=2) access the DWAFS with 1 bit address,the DWAFS is the smallest size and defined for every tap of the LMS.The convergence speed of the MDA is degraded by increasing R(Tsunekawa et al, 1999).This means that larger capacity of the DWAFS needs larger iteration for the convergence.Because of smaller capacity, the convergence speed of the HMDA(R=4) is faster than the MDA(R=4)’s one(Takahashi et al, 2001).The HMDA can improve the convergence speed and reduce the capacity of the WAFS, i.e., amount of hardware, simultaneously.

Figure 7.
Comparison of the access method for the WAFS. (a) Full-memory algorithm (b) Half-memory algorithm.

Figure 8.
Comparison of the Convergence properties.

Simulation Model	System identification problem
Unknown system	128 taps low-pass FIR filter
Method	LMS, MDA, and HMDA
Measurement	Impulse Response Error Ratio(IRER)
Number of taps	128
Number of address-line	1, 2, 4 for MDA, 2 and 4 for HMDA
Input signal	White Gaussian noise, variance=1.0, average=0.0
Observation noise	White Gaussian noise independent to the input signal, 45dB

Table 2.

Computer simulation conditions.

4.New algorithm and architecture

The new algorithm and effective architecture can be obtained by applying the following techniques and ideas. 1) In the DA algorithm based on 2’s complement representation, the pseudo-odd symmetry property of WAFS is applied to the new algorithm and architecture from different point of view on previously proposed half-memory algorithm. 2) A pipelined structure with separated output calculation and update procedure is applied. 3) The delayed update method (Long, G. Et al, 1989, 1992; Meyer & Agrawal, 1993; Wang, 1994) is applied. 4) To reduce the pitch of pipeline, two partial-products are pre-loaded before addition in update procedure. 5) The multi-memory block structure is applied to reduce an amount of hardware for higher order. 6) The output calculation procedure is performed from LSB to MSB, whereas, the update procedure is performed with reverse direction.

4.1. New algorithm with delayed coefficient adaptation

To achieve a high-speed processing, the parallel computing of the output calculation and theupdate in the MDA-ADF is considered.It is well known that the delayed update method enables the parallel computation in the LMS-ADF permitting the degradation of the convergence speed. This method updates the coefficients using previous error signal and input signal vector in the LMS-ADF.

Now, let’s apply this method to the MDA-ADF. In the MDA and the HMDA, both of the output calculation and the update are performed from LSB to MSB.However, in this new algorithm, the output calculation procedure is performed from LSB to MSB, whereas, the update procedure is performed with reverse direction. Here, four combinations of the direction for the two procedures exist.However, it is confirmed by the computer simulations that the combination mentioned above is the best for theconvergence property when the large step-size parameter close to the upper bound is selected.Examples of the convergence properties for the four combinations are shown in Fig.9. In these simulations, the step-size of 0.017 is used for the (a) and (c), and 0.051 is used for the (b) and (d) to achieve a final IRER of -38.9 [dB]. Both the (b) and (d) have a good convergence properties that is equivalent to the LMS’s one, whereas, the convergence speed of the (a) and (c) is degraded.This implies that the upper bound of the (a) and (c) becomes lower than the (b), (d) and LMS’s one.

Figure 9.
Comparison of the convergence characteristics for the different combinations of the direction on the output calculation and update. The step-size of 0.017 is used for the (a) and (c), and 0.051 is used for the (b) and (d).

Figure 10.
Relation of the timing between read and write of the DWAFS.

In the HMDA-ADF, the activation of the 2’s complementor is an exceptional processing for the algorithm, that is, the processing time increases. The new algorithm is performed without the activation of the 2’s complementor by use of the pseudo-odd symmetry property. This is realized by using the address having inverted MSB instead of the 2’s complementor. This new algorithm is called a simultaneous update algorithm, and the MDA-ADF using this algorithm is called SMDA-ADF. Fig. 10 shows the timing of the read and write of the DWAFS. The partial-product is read after writing the updated partial-products.

The SMDA algorithm is represented as follows.The filter output is obtained as the same manner in the MDA-ADF.The m-th output of the m-th DWAFS is

ym(k)= ∑i=0B−1F'B−1−iTPm,B−1−i(k)E34

The output signal is the sum of these M outputs, and this can be expressed as

y(k)=∑m=1Mym(k)E35

The scaling vectors are

F0'(k)=[20,0,⋯,0] T, F1'(k)=[0,2−1,⋯,0] T,⋯, FB−1'(k)=[0,0,⋯,2−B+1] TE36

The address matrix including the inverted MSB for the output calculation is representedas

Amout(k)=[b¯m0(k)b¯m0(k−1)⋯b¯m0(k−R+1)bm1(k)bm1(k−1)⋯bm1(k−R+1)⋮⋮⋱⋮bm(B−1)(k)bm(B−1)(k−1)⋯bm(B−1)(k−R+1)]TE37

This algorithm updates the two partial-products according to the address and itsinverted address, simultaneously. When the delays in Fig.10 is expressed by d, the addressmatrix for the update procedure is representedas

Amup(k)=[b¯m0(k)b¯m0(k−1)b¯m0(k−R+1)⋮⋮⋯⋮bm(B−1−d)(k)bm(B−1−d)(k−1)bm(B−1−d)(k−R+1)bm(B−d)(k−1)bm(B−d)(k−2)⋱bm(B−d)(k−R)⋮⋮⋮bm(B−1)(k−1)bm(B−1)(k−2)⋯bm(B−1)(k−R)]TE38

The update formulas are

Pm,i(k+1)=Pm,i(k)+0.5μRe(k−1)Fi'E39

P¯m,i(k+1)=P¯m,i(k)-0.5μRe(k−1)Fi'E40

(

m=1,2,⋯,M; i=0,1,⋯,B−dE41

)

and

Pm,i(k+1)=Pm,i(k)+0.5μRe(k−2)Fi'E42

P¯m,i(k+1)=P¯m,i(k)-0.5μRe(k−2)Fi'E43

m=1,2,⋯,M; i=B−d−1,⋯,B−1E44

The error signal is obtained by

e(k)=d(k)−y(k)E45

In Eq.(38) and Eq.(41), P_m,i(k) is the AFS specified by the inverted addresses.

4.2. Evaluation of convergence properties

The convergence properties are evaluated by the computer simulations.Table 3 shows the simulation conditions, and Fig.11 shows the simulation results.The step-size parameters of the algorithms were adjusted so as to achieve a final IRER of -49.8 [dB]. The SMDA and the HMDA(Takahashi et al, 2001) with R=2 achieve a good convergence properties that is equivalent to the LMS’s one.The convergence speed of the DLMS (LMS with 1 sample delayed update) degrades against the LMS’s one because of the delayed update with 1 sample delay, whereas, in spite of the delayed update with 1 and 2 sample delays, the SMDA with R=2 can achieve a fast convergencespeed.

4.3. Architecture

Fig.12 shows the block diagram of the SMDA-ADF. Examples of the sub-blocks are shown in Fig.13, Fig.14, and Fig.15. In Fig.12, the input signal register includes (2N+1)B shift-registers. The address matrix is provided to the DWAFSModule (DWAFSM) from the input register. The sum of the M-outputs obtained from M-DWAFSM is fed to the Shift-Adder.

After the shift and addition in B times, the filter output signal is obtained.The obtained two error signals, the e(k-1) and the - e(k-1), are scaled during reading the partial-products to be updated. In Fig.13, the DWAFSM includes the 2^R+2 B-bit register, 1 R-bit register, 2 decoders,5 selectors, and 2 adders. The decoder provides the select signal to the selectors. The twoelements of DWAFS are updated, simultaneously. Fig.16 shows the timing chart of the SMDA-ADF.The parallel computation of the output calculation and update procedure are realized by the delayed update method.

Simulation Model	System identification problem
Unknown system	128 taps low-pass FIR filter
Method	LMS, DLMS, SMDA, and HMDA
Measurement	Impulse Response Error Ratio(IRER)
Number of taps	128
Number of address-line	2 and 4 for DA method
Input signal	White Gaussian noise, variance=1.0, average=0.0
Observation noise	White Gaussian noise independent to the input signal, 45dB

Table 3.

Simulation conditions.

Figure 11.
Comparison of the convergence properties.

Figure 12.
Block Diagram of the SMDA-ADF.

Figure 13.
Example of the DWAFS module for R=2.

Figure 14.
Example of the multi-input register and shift-adder.

Figure 15.
Example of the input registers for B=4 and R=2.The (a) is for output calculation, and the (b) is for update.

Figure 16.
Timing chart of the SMDA-ADF. The sampling period is equal to the word length of the input signal. The current update procedure begins after delays.

4.4. VLSI evaluations

The VLSI architecture of the SMDA is evaluated with comparison to the previous proposed methods using the multipliers, the MDA (Tsunekawa et al, 1999), conventional LMS, pipelined DLMS (Meyer & Agrawal, 1993), pipelined LMS structure (Harada et al, 1998), and pipelined NCLMS (Takahashi et al, 2006). Table 4 shows the evaluation conditions. The result for the SMDA and MDA are shown in Table 5, and the others using the multipliers are shown in Table 6. These results were obtained by a VLSI design system PARTHENON (NTT DATA Corporation, 1990). It is found that the SMDA can achieve the high-sampling rate of 380% and small output latency of 67% against the MDA, whereas, the power dissipation and the area are increased. However, the improvement of the sampling rate and latency exceed the degradation of the power dissipation and the area.The methods in Table 6 need both of the very large amount of gates andthe area against the SMDA. From these results, it is found that the SMDA has advantages of small amount of hardware, a sampling rate close to the LMS.

Methods	LMS, Pipelined-DLMS, Pipelined-LMS,Pipelined-NCLMS, SMDA
Number of taps	128
Word length	16 bit
Division number	64
VLSI library	0.8 micron CMOS standard cell, 5V
Adder	Carry look-ahead adder
Multiplier	Booth’s encode algorithm, Wallace tree, Carry look-ahead adder

Table 4.

The condition of the VLSI evaluation.

	MDA	SMDA
Machine cycle [ns]	31	21
Sampling rate [MHz]	0.79	3.00
Latency [ns]	713	479
Power dissipation [W]	6.40	16.47
Area [mm²]	36	54
Number of gates	175,321	258,321

Table 5.

Comparison of the VLSI evaluations for the MDA and the SMDA.

	LMS	Pipe-DLMS	Pipe-LMS	Pipe-NCLMS
Machine cycle [ns]	297	63	131	61
Sampling rate [MHz]	3.37	15.87	7.63	16.39
Latency [ns]	214	8,064	131	61
Power dissipation [W]	6.23	25.79	27.33	18.20
Area [mm²]	297	205	429	187
Number of gates	1,570,084	997,760	2,082,304	916,893

Table 6.

Comparison of the VLSI evaluations for ADFs employing multipliers.

5. Conclusions

In this chapter, the new LMS algorithm using distributed arithmetic and its VLSI architecture have been presented. According the discussions, we conclude as follows:

The SMDA-ADF can achieve the good convergence speed, higher sampling rate and small output latency than the conventional MDA-ADF.
The small amount of hardware is the feature of the SMDA-ADF against the ADFsemploying the multipliers.
In spite of the delayed adaptation, the convergence speed is equivalent to the LMS’s one.
The convergence property depends on the combination of the direction of the output and update procedure.The output calculation from LSB to MSB and the update procedure with reverse direction is the best, when the step-size parameter is close to the upper bound.

References

1. WidrowB.GloverJ. R.Mc CoolJ. M.KaunitzJ.WilliamsC. S.HearnR. H.ZeidlerJ. R.DongE.GoodlinR. C.1975Adaptive noise cancelling: Principles and applications. Proc. IEEE, 6316921716
2. MakinoS.KoizumiN.1988Improvement on Adaptation of an Echo Canceller in a Room. IEICE Letter Fundamentals, Vol. J 71-A, 1222122214
3. TsunekawaY.TakahashiK.ToyodaS.MiuraM.1999High-performance VLSI Architecture of Multiplier-less LMS Adaptive Filter Using Distributed Arithmetic. IEICE Trans. Fundamentals, J82-A1015181528
4. TakahashiK.TsunekawaY.ToyodaS.MiuraM.2001High-performance Architecture of LMS Adaptive Filter Using Distributed Arithmetic Based on Half-Memory Algorithm. IEICE Trans. Fundamentals, J84-A6777787
5. TakahashiK.TsunekawaY.TayamaN.SekiK.2002Analysis of the Convergence Condition of LMS Adaptive Digital Filter Using Distributed Arithmetic. IEICE Trans. Fundamentals, E85-A612491256
6. TakahashiK.KannoD.TsunekawaY.2006High-Performance Pipelined Architecture of Adaptive Digital Filter Employing FIR Filter with Minimum Coefficients Delay and Output Latency. IEICE Trans. Fundamentals, J89-A1211301141
7. CowanC. F. N.MavorJ.1981New digital adaptive-filter implementation using distributed-arithmetic techniques. IEE Proc., 128Pt.F, 4225230
8. CowanC. F. N.SmithS. G.ElliottJ. H.1983A digital adaptive filter using a memory-accumulatorarchitecture:theory and realization. IEEE Trans. Acoust., Speech \& Signal Process., 313541549
9. PeledA.LiuB.1974A new hardware realization of digital filters. IEEE Trans. Acoust., Speech \& Signal Process., 2212456462
10. WeiC. H.LouJ. J.1986Multimemory block structure for implementing a digital adaptive filter using distributed arithmetic. IEE Proc., 133Pt.G, 11926
11. LongG.LingF.ProakisJ. G.1989The LMS algorithm with delayed coefficient adaptation. IEEE Trans. Acoust. Speech Signal Process., 37913971405
12. LongG.LingF.ProakisJ. G.1992Corrections to “The LMS algorithm with delayed coefficient adaptation”. IEEE Trans. Acoust. Speech Signal Process., 401230232
13. MeyerM. D.AgrawalD. P.1993A high sampling rate delayed LMS filter architecture. IEEE Trans. Circuits \&Syst. II, 4011727729
14. WangC. L.1994Bit-serial VLSI implementation of delayed LMS transversal adaptive filters. IEEE Trans. SignalProcessing, 42821692175
15. HaradaA.NishikawaK.KiyaH.1998Pipelined Architecture of the LMS Adaptive Digital Filter with the Minimum Output Latency. IEICE Trans. Fundamentals, E81-A815781585
16. NTT DATA Corporation1990PARTHENON User’s Manual. Japan.

[1] 1. WidrowB.GloverJ. R.Mc CoolJ. M.KaunitzJ.WilliamsC. S.HearnR. H.ZeidlerJ. R.DongE.GoodlinR. C.1975Adaptive noise cancelling: Principles and applications. Proc. IEEE, 6316921716

[2] 2. MakinoS.KoizumiN.1988Improvement on Adaptation of an Echo Canceller in a Room. IEICE Letter Fundamentals, Vol. J 71-A, 1222122214

[3] 3. TsunekawaY.TakahashiK.ToyodaS.MiuraM.1999High-performance VLSI Architecture of Multiplier-less LMS Adaptive Filter Using Distributed Arithmetic. IEICE Trans. Fundamentals, J82-A1015181528

[4] 4. TakahashiK.TsunekawaY.ToyodaS.MiuraM.2001High-performance Architecture of LMS Adaptive Filter Using Distributed Arithmetic Based on Half-Memory Algorithm. IEICE Trans. Fundamentals, J84-A6777787

[5] 5. TakahashiK.TsunekawaY.TayamaN.SekiK.2002Analysis of the Convergence Condition of LMS Adaptive Digital Filter Using Distributed Arithmetic. IEICE Trans. Fundamentals, E85-A612491256

[6] 6. TakahashiK.KannoD.TsunekawaY.2006High-Performance Pipelined Architecture of Adaptive Digital Filter Employing FIR Filter with Minimum Coefficients Delay and Output Latency. IEICE Trans. Fundamentals, J89-A1211301141

[7] 7. CowanC. F. N.MavorJ.1981New digital adaptive-filter implementation using distributed-arithmetic techniques. IEE Proc., 128Pt.F, 4225230

[8] 8. CowanC. F. N.SmithS. G.ElliottJ. H.1983A digital adaptive filter using a memory-accumulatorarchitecture:theory and realization. IEEE Trans. Acoust., Speech \& Signal Process., 313541549

[9] 9. PeledA.LiuB.1974A new hardware realization of digital filters. IEEE Trans. Acoust., Speech \& Signal Process., 2212456462

[10] 10. WeiC. H.LouJ. J.1986Multimemory block structure for implementing a digital adaptive filter using distributed arithmetic. IEE Proc., 133Pt.G, 11926

[11] 11. LongG.LingF.ProakisJ. G.1989The LMS algorithm with delayed coefficient adaptation. IEEE Trans. Acoust. Speech Signal Process., 37913971405

[12] 12. LongG.LingF.ProakisJ. G.1992Corrections to “The LMS algorithm with delayed coefficient adaptation”. IEEE Trans. Acoust. Speech Signal Process., 401230232

[13] 13. MeyerM. D.AgrawalD. P.1993A high sampling rate delayed LMS filter architecture. IEEE Trans. Circuits \&Syst. II, 4011727729

[14] 14. WangC. L.1994Bit-serial VLSI implementation of delayed LMS transversal adaptive filters. IEEE Trans. SignalProcessing, 42821692175

[15] 15. HaradaA.NishikawaK.KiyaH.1998Pipelined Architecture of the LMS Adaptive Digital Filter with the Minimum Output Latency. IEICE Trans. Fundamentals, E81-A815781585

[16] 16. NTT DATA Corporation1990PARTHENON User’s Manual. Japan.

An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures

Adaptive Filtering

Author Information

Kyo Takahashi*

Naoki Honma

Yoshitaka Tsunekawa

1. Introduction

2. LMS adaptive filter

Figure 1.

3. LMS adaptive filter using distributed arithmetic

3.1. Distributed arithmetic

Figure 2.

3.2. Derivation of LMS adaptive algorithm using distributed arithmetic

Figure 3.

Figure 4.

3.3. Multi-memory block structure

Figure 5.

Figure 6.

3.4. Half-memory algorithm using pseudo-odd symmetry property

Table 1.

Figure 7.

Figure 8.

Table 2.

4.New algorithm and architecture

4.1. New algorithm with delayed coefficient adaptation

Figure 9.

Figure 10.

4.2. Evaluation of convergence properties

4.3. Architecture

Table 3.

Figure 11.

Figure 12.

Figure 13.

Figure 14.

Figure 15.

Figure 16.

4.4. VLSI evaluations

Table 4.

Table 5.

Table 6.

5. Conclusions

References

Continue reading from the same book

Adaptive Filtering