Open access peer-reviewed chapter

# The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering

By Marwan Jaber

Submitted: December 21st 2010Reviewed: May 5th 2011Published: September 6th 2011

DOI: 10.5772/25070

## 1. Introduction

In other words filters are devices or systems that processes (or reshapes) the input signal according to some specific rules to generate an output signal Figure 1.

Filters could be classified in two categories linear and non-linear filters where in linear filters there is a linear relationship between input and output of the filter that satisfy the following property:

x1y1x2y2thenax1+bx2=ay1+by2E1

meanwhile in non-linear filters; there is a nonlinearity between input and output of the filter that satisfy the following property:

x1y1=x12x2y2=x22thenx1+x2=(x1+x2)2E2

In this chapter we will be focusing on adaptive filtering which could automatically adjust (or adapt) in the face of changing environments and changing system requirements by training them to perform specific filtering or decision-making tasks and in which they should be some “adaptation algorithm” for adjusting the system’s parameters figure (2) [7].

Different filter structures could be implemented in the adaptive filter of figure 2 such as:

• Transversal Filter Structure (FIR)

• IIR Filter Structure

• Lattice Filter Structure

• Non-Linear Filter Structure (Volterra Filter)

in which the adaptation approaches are based on:

• Wiener filter theory where the optimum coefficients of a linear filter are obtained by minimization of its mean-square error (MSE)

• Method of least squares where The performance index is the sum of weighted error squares

The main objective of the adaptation algorithm is to set the filter parameters,θ(n)¯in such a way that its output tries to minimize a meaningful objective function F (cost function in some literatures) involving the reference signal (or desired signal) which will be based on

• Mean Square Error

• Least Square Error

• weighted least square

• instantaneous square value

and the minimization algorithms are based on the following methods:

• Newton’s method

• Quasi-Newton method

• Steepest-descent method

## 2. Signals

Before pursuing the study of adaptive systems, it is important to refresh our memory with some useful definitions from the stochastic process theory. The representation of signals could fall into two categories:

• Deterministic Signals

• Random Signals

### 2.1. Deterministic signals

A deterministic discrete-time signal is characterized by a defined mathematical function of the time index n, with n= 0, ±1, ±2, ⋯, such as:

x(n)=eαncos(wn)+u(n)E3

where u(n) is the unit-step sequence and The response of a linear time-invariant filter to an input x(n) is given by:

y(n)=x(n)h(n)=k=x(k)h(nk)E4

where h(n) is the impulse response of the filter. Knowing that The Z-transform and its inverse of a given sequence x(n) is defined as:

Z(x(n))=X(Z)=n=x(n)Znx(n)=12πjcX(Z)Zn1dZE5

where C is a counter clockwise closed contour in the region of convergence of X(z) and encircling the origin of the z-plane as a result; by taking the Z-transform of both sides of equation (4), we will obtain

Y(Z) = H(Z) × X(Z)

For finite finite-energy waveform, it is convenient to use the discrete-time Fourier transform defined as:

F(xn)=X(ejw)=n=x(n)ejwnE6

### 2.2. Random signals

In many real life situations, observations are made over a period of time and they are influenced by random effects, not just at a single instant but throughout the entire interval of time or sequence of times. In a “rough” sense, a random process is a phenomenon that varies to some degree unpredictably as time goes on. If we observed an entire time-sequence of the process on several different occasions, under presumably “identical” conditions, the resulting observation sequences, in general, would be different. A random variable (RV) is a rule (or function) that assigns a real number to every outcome of a random experiment, while a stochastic random process is a rule (or function) that assigns a time function to every outcome of a random experiment. The elements of a stochastic process, {x(n)}, for different value of time-index n, are in general complex-valued random variable that are characterized by their probability distribution functions. A stochastic random process is called stationary in the strict sense if all of its (single and joint) distribution function is independent of a shift in the time origin.

In this subsection we will be reviewing some useful definitions in stochastic process:

Stochastic Average

mx=E[x(n)]E7

where E is the expected value of x(n).

Autocorrelation function for a stochastic process x(n)

ϕxx(n,m)=E[x(n)xm]E8

where x* denotes the complex conjugate of x and the symmetry properties of correlation function is:

ϕxx(k)=ϕxx(k)E9

Furthermore a stochastic random process is called stationary in the wide sense if mxand ϕxx(n,m)are independent of a shift in the time origin for any k, m, and n therefore,

mx(n)=mx(n+k)=constantforallnE10

and

ϕxx(n,m)=ϕxx(n+k,m+k)E11

which meansϕxx(n,m)depends only on the difference n – m or

ϕxx(k)=E[x(n)x(nk)]E12

Special case

ϕxx(0)=E[|x(n)|2]=meansquareofx(n)E13

The Z-transform ofϕxx(k)is given by

Φxx(Z)=k=ϕxx(k)ZkE14

where it can be easily shown that

Φxx(Z)=Φxx(1Z)E15

Equation (16) implies that ifΦxx(Z)is a rational function of z, then its poles and zeros must occur in complex-conjugate reciprocal pairs. Moreover, the points that belong to the region of convergence ofΦxx(Z)also occur in complex-conjugate pairs which suggest that the region of convergence ofΦxx(Z)must be of the form|a||z|1|a|which will cover the unit circle|z|=1. By assuming thatΦxx(Z)is convergent on the unit circle’s contour C and by letting Z = ejw then,

ϕxx(0)=12πππΦxx(ejw)dwE16

and if mx = 0, we will have

σx2=E[|x(n)|2]=12πππΦxx(ejw)dwE17

Cross-correlation function for two stochastic processes x(n) and y(n)

ϕxy(n,m)=E[x(n)ym]E18

and the symmetry properties of the correlation function is:

ϕxy(k)=ϕyx(k)E19

The Z-transform ofϕxy(k)is given by

Φxy(Z)=k=ϕxy(k)ZkE20

where it can be easily shown that

Φxy(Z)=Φxy(1Z)E21

Auto-covariance function for stationary process is defined as:

γxx(k)=E[(x(n)mx)(x(nk)mx)]=ϕxx(k)|mx|2E22

and the symmetry properties of auto-covariance function

γxx(k)=γxx(k)E23

Special case

γxx(0)=σx2=varianceofx(n)E24

Cross-covariance function is defined as:

γxy(k)=E[(x(n)mx)(y(nk)my)]=ϕxy(k)mxmyE25

and the symmetry properties of the cross-covariance function is:

γxy(k)=γyx(k)E26

#### 2.2.1. Power spectral density

Consider the wide-sense stationary random process {x(n)} from which we will consider a window of 2N+1 elements of x(n) such as

xN(n)={x(n),NnN0,otherwiseE27

and the discrete-time Fourier transform ofxN(n)is computed as:

XN(ejw)=n=xN(n)ejwn=n=NNx(n)ejwnE28

By conjugating both sides of equation 29 with:

XN(ejw)=n=NNx(m)ejwmE29

we will obtain

|XM(ejw)|2=n=NNm=NNx(n)x(m)ejw(nm)E30

and by taking the expected value of equation 31, we will have:

E[|XM(ejw)|2]=n=NNm=NNE[x(n)x(m)]ejw(nm)E31

which after simplification by imposing k = n – m will yield:

12N+1E[|XN(ejw)|2]=k=2N2N(1|k|2N+1)ϕxx(k)ejwkE32

By assuming that the summation on the right hand side of equation 33 will converge for large k then

limN(12N+1E[|XN(ejw)|2])=k=2N2Nϕxx(k)ejwk=Φxx(ejw)E33

The functionΦxx(ejw)is called the power spectral density of the stochastic, wide-sense stationary process {x(n)} which is always real and non-negative.

#### 2.2.2. Response of linear system to stochastic processes

Given the Linear Time-Invariant System (LTI) illustrated in figure 3 where we will assume thatϕxx(k)is known therefore, the Z-transform ofϕxy(k)could be elaborated as:

Φxy(Z)=k=ϕxy(k)Zk=k=E[x(n)y(nk)]ZkE34

and

y(n)=l=h(l)x(nl)E35

To be noted that the following expressions could be easily derived:

Φyy(Z)=H(Z)H(1Z)Φxx(Z)E36

and

Φxy(ejw)=H(ejw)Φxx(ejw)Φyx(ejw)=H(ejw)Φxx(ejw)Φyy(ejw)=|H(ejw)|Φxx(ejw)E37

## 3. Regression

Data fitting is one of the oldest adaptive systems which are a powerful tool, allowing predictions of the present or future events to be made based on information about the past or present events. The two basic types of regression are linear regression and multiple regressions.

### 3.1. Linear regression

The goal of linear regression is to adjust the values of slope and intercept to find the line that best predicts d from x (Figure 4) or in other words linear regression estimates how much d changes when x changes by one unit.

The slope quantifies the steepness of the line which is equals to the change in d for each unit change in x. If the slope is positive, d increases as d increases. If the slope is negative, d decreases as d increases. The d intercept is the d value of the line when x equals zero. It defines the elevation of the line.

The deviation from the straight line which represents the best linear fits of a set of data as shown in figure 5 is expressed as:

dw × x + b

or more specifically,

d(n) = w × x(n) +b + e(n) = y(n) + e(n)

where e(n) is the instantaneous error that is added to y(n) (the linearly fitted value), w is the slope and b is the y intersect (or bias). More precisely, the goal of regression is to minimize the sum of the squares of the vertical distances of the points from the line. The problem can be solved by a linear system with only two free parameters, the slope w and the bias b.

Figure 6 is called the Linear Regression Processing Element (LRPE) which is built from two multipliers and one adder. The multiplier w scales the input, and the multiplier b is a simple bias, which can also be thought of as an extra input connected to the value +1.The parameters (b, w) have different functions in the solution.

#### 3.1.1. Least squares for linear model

Least squares solves the problem by finding the best fitted line to a set of data; for which the sum of the square deviations (or residuals) in the d direction are minimized (figure 7).

The goal is to find a systematic procedure to find the constants b and w which minimizes the error between the true value d(n) and the estimated value y(n) which is called the linear regression.

d(n)(b+wx(n))=d(n) y(n)=e(n)E38

The best fitted line to the data is obtained by minimizing the error e(n) which is computed by the mean square error (MSE) that is a widely utilized as performance criterion

ξ=12Nn=1Ne(n)2E39

where N in the number of observations and ξ is the mean square error.

Our goal is to minimize ξ analytically, which can be achieved by taking the partial derivative of this quantity with respect to the unknowns and equate the resulting equations to zero, i.e.

{δξδb=0δξδw=0E40

which yields after some manipulation to:

b=nx(n)2nd(n)nx(n)nx(n)d(n)N[n(x(n)x¯)2]w=n(x(n)x¯)(d(n)d¯)n(x(n)x¯)2E41

where the bar over the variable means the mean value and the procedure to determine the coefficients of the line is called the least square method.

#### 3.1.2. Search procedure

The purpose of least squares is to find parameters (b, w1, w2, …, wp) that minimize the difference between the system output y(n) and the desired response d(n). So, regression is effectively computing the optimal parameters of an interpolating system which predicts the value of d from the value of x.

Figure 8 shows graphically the operation of adapting the parameters of the linear system in which the system output y is always a linear combination of the input x with a certain bias b according to the equation y= wx + b. Changing b modifies the y intersect, while changing w modifies the slope. The goal of linear regression is to adjust the position of the line such that the average square difference between the y values (on the line) and the cloud of points d(n) i.e. the error e(n), is minimized.

The key point is to recognize the information transmitted by the error which can be used to optimally place the line and this could be achieved by including a subsystem that accepts the error and modifies the parameters of the system. Thus, the error e(n) is fed-back to the system and indirectly affects the output through a change in the parameters (b,w). With the incorporation of the mechanism that automatically modifies the system parameters, a very powerful linear system can be built that will constantly seek optimal parameters. Such systems are called Adaptive systems, and are the focus of this chapter.

### 3.2. Multiple regression

With the same reasoning as above, the regression for multiple variables could be derived as:

e(i)=d(i)(b+k=0pw(k)x(i,k))=d(i)k=0pw(k)x(i,k)E42

where for this case is to find the coefficient vector W that minimizes the MSE of e(i) over the i samples (Figure 9).

The mean square error (MSE) becomes for this case:

ξ=12Nn(d(n)k=0pw(k)x(n,k))2E43

where the solution of this equation can be found by taking the derivatives of ξ with respect to the unknowns (w(k)), and equating the result to zero which will yield to the famous normal matrix equation expressed as:

nx(n,j)d(n)=k=0pw(k)nx(n,k)x(n,j)E44

for j = 0, 1, …, p.

By defining:

R(n,j)=1Nnx(n,k)x(n,j)E45

as the autocorrelation function,

P(j)=1Nnx(n,j)d(n)E46

as the cross-correlation of the input x for index j and the desired response y and substituting these definitions into Eq. (47), the set of normal equations can be written simply as:

P=RWorW=R1PE47

where W is a vector with the p+1 weights wi in which W* represents the value of the vector for the optimum (minimum) solution. The solution of the multiple regression problems can be computed analytically as the product of the inverse of the autocorrelation of the input samples multiplied by the cross-correlation vector of the input and the desired response.

All the concepts previously mentioned for linear regression can be extended to the multiple regression case where J in matrix notation is illustrated as:

ξ=[WTRW2PTW+nd(n)2N]E48

where T means the transpose and the values of the coefficients that minimize the solution are:

ξ=0=RWPorW=R1PE49

## 4. Wiener filters

The Wiener filter is a filter proposed by Norbert Wiener during the 1940s and published in 1949 [8]. Its purpose was to reduce the amount of noise present in a signal in comparison with an estimation of the desired noiseless signal. This filter is an MSE-optimal stationary linear filter which was mainly used for images degraded by additive noise and blurring. The optimization of the filter is achieved by minimizing mean square error defined as the difference between the output filter and the desired response (Figure 10) which is known as the cost function expressed as:

ξ=E[|en|2]E50

In signal processing, a causal filter is a linear and time-invariant causal system. The word causal indicates that the filter output depends only on past and present inputs. A filter whose output also depends on future inputs is non-causal. As a result two cases should be considered for the optimization of the cost function (equation 53).

• The filter W(Z) is causal and or FIR (Finite Impulse Response).

• The filter W(Z) is non-causal and or IIR (Infinite Impulse Response Filter).

### 4.1. Wiener filter – the transversal filter

LetW¯be the transversal filter’s tap weights illustrated in figure 11 which is defined as:
W¯=[w0w1wN1]TE51

where T denotes the vector transpose and

X¯=[x0x1xN1]TE52

be the input vector signal where two cases should be treated separately depending on the required application:

• Real valued input signal

• Complex valued input signal

#### 4.1.1. Wiener filter – the transversal filter - real-valued input signal

The output filter could be expressed as:

y(n)=i=0N1wix(ni)=W¯TX(n)¯=X(n)¯TW¯E53

Where

e(n)=d(n)y(n)=d(n)X(n)¯TW¯E54

The performance or cost function expressed in equation 53 could be elaborated as:

ξ=E[|en|2]=E[(d(n)W¯TX(n)¯)(d(n)X(n)¯TW¯)]=E[d(n)2]W¯TE[X(n)¯d(n)]E[X(n)¯Td(n)]W¯+W¯TE[X(n)¯X(n)¯T]W¯E55

By examining equation 58 we could easily notice thatE[X(n)¯d(n)]is the cross correlation function which will be defined as:

P¯E[X(n)¯d(n)]=[p0p1pN1]TE56

and with the same reasoning as above the autocorrelation functionE[X(n)¯X(n)¯T]could be expressed as:

RE[X(n)¯X(n)¯T]=[r0,0r0,1r0,N1r1,0r1,1r1,N1rN1,0rN1,1rN1,N1]E57

Knowing thatE[d(n)X(n)¯T]=P¯TandW¯TP¯=P¯TW¯therefore, the cost function would be:

ξ=E[d(n)2]2W¯TP¯+W¯TRW¯E58

whereW¯in the quadratic function expressed in equation 47 will have a global minimum if and only if R is a positive definite Matrix.

The gradient method is the most commonly used method to compute the tap weights that minimizes the cost function therefore,

ξ=ξwi=[ξw0ξw1ξwN1]=0E59

which yield after expanding equation 61 to:

ξwi=00=2pi+l=0N1wl(rli+ril)l=0N1wl(rli+ril)=2piE60

where we have assumed in the expanded equationξ=E[d(n)2]2l=0N1plwl+l=0N1m=0N1wlwlrlmthat:

l=0N1m=0N1wlwlrlm=l=0liN1m=0miN1wlwlrlm+wim0liN1wmrim+wi2riiE61

Knowing that rli = ril due to the symmetry property of the autocorrelation function of real-valued signal equation 49 could be expressed as:

l=0N1wlril=piE62

for i = 0, 1, …, N – 1 which could be expressed in matrix notation as:

RW¯op=P¯E63

known as Weiner-Hopf equation, whose solution for the optimal tap-weight vectorW¯opand by assuming that R has an inverse matrix, is:

W¯op=R1P¯E64

Finally, the minimum value of the cost function can be expressed as:

ξmin=E[d(n)2]W¯opTP¯=E[d(n)2]W¯opTRW¯op=E[d(n)2]P¯TR1P¯E65

#### 4.1.2. Wiener filter – the transversal filter - complex-valued input signal

The data transmission of the basebands QPSK and the QAM is complex valued random signals where the filter’s tap weight vector is also assumed to be complex therefore, the cost function for this case could be expressed as:

ξ=E[|e(n)|2]=E[e(n)e(n)]E66

and the gradient of the cost function would be:

wic=E[e(n)wice(n)+wice(n)e(n)]E67

By examining figure 11 we could easily notice that equation 57 could be formulated as:

e(n)=d(n)k=0N1w(k)x(nk)E68

and since d is independent of w and since

wice(n)=x(ni)wicwiwice(n)=x(ni)wicwiwicwi=wiwi,R+jwiwi,I=1+j(j)=0wicwi=wiwi,R+jwiwi,I=1+j(j)=2E69

Where the sub-indices R and I refers to the real part and imaginary part of the complex number therefore, the gradient of the cost function (Equation 70) would be:

wicξ=2E[e(n)x(ni)]E70

The optimum filter’s tap-weights e0(n) are obtained by setting the complex gradient of equation 73 to zero yielding:

E[eo(n)x(ni)]=0E71

By defining

x¯(n)[x(n)x(n1)x(nN+1)]T[x(n)*x(n1)*x(nN+1)*]Hw¯(n)[w0*w1*wN1*]T[w(0)w1xN1]HE72

where H denotes the complex-conjugate transpose or Hermitian and by re-writing equation 71 as:

e(n)=d(n)w¯oHx(n)E73

whose solution is:

Rw¯o=p¯E74

where

R=E[x¯(n)x¯(n)H]p¯=E[x¯(n)d(n)]E75

Equation 77 is known as Weiner-Hopf equation for the complex valued signals and the minimum of the cost function will be:

ξmin=E[d(n)2]w¯oHRw¯oE76

## 5. Least Mean Square algorithm (LMS algorithm)

The purpose of least squares is to is to find the optimal filter’s tap weights that that minimize the difference between the system output y(n) and the desired response d(n) or in other words to minimize the cost function. Instead of solving the Weiner-Hopf equation in order to obtain the optimal filter’s tap weights as seen previously, their exists other iterative methods which employ an iterative search method that starts with an arbitrary initial weightW¯(0), then a recursive search method that may require many iterations in order to converge to the optimal filter’s tap weightsW¯o. The most important iterative methods are the gradient based iterative methods which are listed below:

• Steepest Descent Method

• Newton’s Method

### 5.1. Steepest descent method

The steepest descent method (also known as the gradient method) is the simplest example of a gradient based method that minimizes a function of several variables. This process is employing an iterative search method that starts with an arbitrary initial weightW¯(0), and then at the kth iterationW¯(k)is updated according to the following equation:

W¯(k+1)=W¯(k)μkξE77

where µ is positive known as the step size andkξdenotes the gradient vector ξ=2RW¯2p¯evaluated at the pointW¯=W¯(k). Therefore, equation 80 could be formulated as:

W¯(k+1)=W¯(k)2μ(RW¯(k)p¯(k))=(12μR)(W¯(k)W¯0)E78

Knowing that the auto-correlation matrix R may be diagonalized by using the unitary similarity decomposition:

R=QΛQTE79

whereΛis a diagonal matrix consisting of the auto-correlation matrix R eigenvalues λ0λ1λN1and the columns of Q contain the corresponding orthonormal eigenvectors and by defining the vectorv¯(k)as:

v¯(k)=W¯(k)W¯0E80

As a result equation 81 could be expressed as:

v¯(k+1)=(I2μQΛQT)v¯(k)E81

which after mathematical manipulation and by using the vector transformation v¯(k)'=QTv¯(k)could be written as:

vi(k+1)'=(12μλi)vi(k)'E82

For i = 0, 1, 2,..., N – 1 andv¯(k)'=[v0(k)'v1(k)'vN1(k)']that yields:

vi(k)'=(12μλi)kvi(0)'E83

Equation 86 converges to zero if and only if the step-size parameter μ is selected so that:

|12μλi|1E84

which means

0μλiE85

for all i or equivalently

0μλmaxE86

where λmax is the maximum of the eigenvaluesλ0λ1λN1.

### 5.2. Newton’s method

By replacing the scalar step-size μ with a matrix step-size given by μR-1in the steepest descent algorithm (equation 81) and by usingp¯=RW¯0, the resulting algorithm is:

W¯(k+1)=W¯(k)μR1kξE87

Substitutingξ=2RW¯2p¯in equation 90 we will get:

W¯(k+1)=W¯(k)2μR1(RW¯(k)p¯)=(12μ)W¯(k)+2μR1p¯=(12μ)W¯(k)+2μW¯0E88

and by subtractingW¯0from both sides of equation 91 will yield:

W¯(k+1)W¯0=(12μ)(W¯(k)W¯0)E89

where in actual implementation of adaptive filters, the exact values ofkξand R – 1 are not available and have to be estimated.

### 5.3. Least Mean Square (LMS) algorithm

The LMS algorithm is a practical scheme for realizing Wiener filters, without explicitly solving the Wiener-Hopf equation and this was achieved in the late 1960’s by Widrow [2] who proposed an extremely elegant algorithm to estimate the gradient that revolutionized the application of gradient descent procedures by using the instantaneous value of the gradient as the estimator for the true quantity which means replacing the cost functionξ=E[e(n)2]by its instantaneous coarse estimateξ^=e(n)2into steepest descent algorithm. By doing so, we will obtain

W¯(n+1)=W¯(n)μe(n)2E90

whereW¯(n)=[w0(n)w1(n)wN1(n)]Tand

=[w0w1wN1]TE91
.

The ith element of the gradient vectore(n)2is:

e(n)2wi=2e(n)e(n)wi=2e(n)y(n)wi=2e(n)x(ni)E92

e(n)2=2e(n)x¯(n)E93

that will be replaced in equation 93 to obtain:

W¯(n+1)=W¯(n)+2μe(n)x¯(n)E94

where

x¯(n)=[x(n)x(n1)x(nN+1)]TE95
.

Equation 96 is known as the LMS recursion and the summary of the LMS algorithm illustrated on figure 12 will be as follow:

1. Input

2. tap-weight vector,W¯(n)

3. input vector, x¯(n)

4. desired output, d(n)

5. Output

6. Filter output, y(n)

7. Tap-weight vector update, W¯(n+1)

8. Filtering:y(n)=W¯(n)Tx¯(n)

9. Error estimation: e(n) = d(n) - y(n)

## 6. Classifying adaptive filtering applications

Various applications of adaptive filtering differ in the manner in which the desired response is extracted. In this context, we may distinguish four basic classes of adaptive filtering applications (depicted in Figures 13 to 16, which follow):

• Identification

• Inverse Modeling

• Prediction

• Interference Cancelling

### Table 1.

The following notations are used in Figures 11-15:

u = input applied to the adaptive filter

y = output of the adaptive filter

d = desired response

e = d – y = estimation error

The functions of the four basic classes of adaptive filtering applications are summarized as follow:

Identification (Figure 13).

The notion of a mathematical model is fundamental to sciences and engineering. In the class of applications dealing with identification, an adaptive filter is used to provide a linear model that represents the best fit to an unknown plant. The plant and the adaptive filter are driven by the same input. The plant output supplies the desired responses for the adaptive filter. If the plant is dynamic in nature, the model will be time varying.

Inverse Modeling (Figure 14).

In this second class of applications, the adaptive filter provides an inverse model representing the best fit to an unknown noisy plant. Ideally, the inverse model has a transfer function equal to the reciprocal of the plant’s transfer function. A delayed version of the plant input constitutes the desired response for the adaptive filter. In some applications, the plant input is used without delay as the desired response.

Prediction (Figure 15).

In this example, the adaptive filter provides the best prediction of the present value of a random signal. The present value of the signal serves the purpose of a desired response for the adaptive filter. Past values of the signal supply the input applied to the adaptive filter. Depending on the application of interest, the adaptive filter output or the estimation error may service as the system output. In the first case, the system operates as a predictor; in the latter case, it operates as a prediction error filter.

Interference Cancelling (Figure 16).

In this final class of applications, the adaptive filter is used to cancel unknown interference contained in a primary signal, with the cancellation being optimized in some sense. The primary signal serves as the desired response for the adaptive filter. A reference signal is employed as the input to the adaptive filter. The reference signal is derived from a sensor or set of sensors located in relation to the sensor(s) supplying the primary signal in such a way that the information-bearing signal component is weak or essentially undetectable.

## 7. Active noise suppressor ANC

In this subsection we propose a new approach of noise control named ANS, where we propose stability-guaranteed algorithm for an adaptive filters, which can be derived on a basis of the strictly positive real property of the error model treated in adaptive system theory. It is important to assure the stability of the adaptive system especially in presence of unknown disturbances and mismatch in the order of the adaptive filter. Experimental results, performed on real mining noise, validate the effectiveness of the proposed stable algorithm [9].

### 7.1. Sounds and hearing

When sound waves from a point source strike a plane wall, they produce reflected circular wave fronts as if there were an "image" of the sound source at the same distance on the other side of the wall. If something obstructs the direct sound from the source from reaching your ear, then it may sound as if the entire sound is coming from the position of the "image" behind the wall. This kind of sound imaging follows the same laws of reflection as your image in a plane mirror (figure 17). The reflection of sound follows the law "angle of incidence equals angle of reflection" as the light does and other waves or the bounce of a billiard ball off the bank of a table figure (18).

The main item of note about sound reflections off of hard surfaces is the fact that they undergo a 180-degree phase change upon reflection. This can lead to resonance such as standing waves in rooms. It also means that the sound intensity near a hard surface is enhanced because the reflected wave adds to the incident wave, giving pressure amplitude that is twice as great in a thin "pressure zone" near the surface. This is used in pressure zone microphones to increase sensitivity. The doubling of pressure gives a 6-decibel increase in the signal picked up by the microphone figure (19). Since the reflected wave and the incident wave add to each other while moving in opposite directions, the appearance of propagation is lost and the resulting vibration is called a standing wave. The modes of vibration associated with resonance in extended objects like strings and air columns have characteristic patterns called standing waves. These standing wave modes arise from the combination of reflection and interference such that the reflected waves interfere constructively with the incident waves. An important part of the condition for this constructive interference is the fact that the waves change phase upon reflection from a fixed end. Under these conditions, the medium appears to vibrate in segments or regions and the fact that these vibrations are made up of traveling waves is not apparent - hence the term "standing wave" figure (19).

Two traveling waves, which exist in the same medium, will interfere with each other figure (20). If their amplitudes add, the interference is said to be constructive interference, and destructive interference if they are "out of phase" and subtract figure (21). Patterns of destructive and constructive interference may lead to "dead spots" and "live spots" in auditorium acoustics.

Interference of incident and reflected waves is essential to the production of resonant standing waves figure (22).

The sound intensity from a point source of sound will obey the inverse square law if there are no reflections or reverberation figure (23). Any point source, which spreads its influence equally in all directions without a limit to its range, will obey the inverse square law. This comes from strictly geometrical considerations. The intensity of the influence at any given radius r is the source strength divided by the area of the sphere. Being strictly geometric in its origin, the inverse square law applies to diverse phenomena. Point sources of gravitational force, electric field, light, sound or radiation obey the inverse square law.

A plot of this intensity drop shows that it drops off rapidly figure (24). A plot of the drop of sound intensity according to the inverse square law emphasizes the rapid loss associated with the inverse square law. In an auditorium, such a rapid loss is unacceptable. The reverberation in a good auditorium mitigates it. This plot shows the points connected by straight lines but the actual drop is a smooth curve between the points.

Reverberation is the collection of reflected sounds from the surfaces in an enclosure like an auditorium. It is a desirable property of auditoriums to the extent that it helps to overcome the inverse square law drop-off of sound intensity in the enclosure. However, if it is excessive, it makes the sounds run together with loss of articulation - the sound becomes muddy, garbled figure (25).

### 7.2. ANC

In order to cancel unwanted noise, it is necessary to obtain an accurate estimate of the noise to be cancelled. In an open environment, where the noise source can be approximated as a point source, microphones can be spaced far apart as necessary and each will still receive a substantially similar estimate of the background noise. However in a confined environment containing reverberation noise caused by multiple sound reflections, the sound field is very complex and each point in the environment has a very different background noise signal. The further apart the microphones are, the more dissimilar the sound field. As a result, it is difficult to obtain an accurate estimate of the noise to be cancelled in a confined environment by using widely spaced microphones.

The proposed model is embodied in a dual microphone noise suppression system in which the echo between the two microphones is substantially cancelled or suppressed. Reverberations from one microphone to the other are cancelled by the use of first and second line echo cancellers. Each line echo canceller models the delay and transmission characteristics of the acoustic path between the first and second microphones Figure (26).

If the two microphones are moved closer together, the second microphone should provide a better estimate of the noise to be cancelled in the first microphone. However, if the two microphones are placed very close together, each microphone will cause an additional echo to strike the other microphone. That is, the first microphone will act like a speaker (a sound source) transmitting an echo of the sound field striking the second microphone. Similarly, the second microphone will act like a speaker (a sound source) transmitting an echo of the sound field striking the first microphone. Therefore, the signal from the first microphone contains the sum of the background noise plus a reflection of the background noise, which results in a poorer estimate of the background noise to be cancelled figures (27) and (28).

In a first embodiment, a noise suppression system in accordance with the proposed model acts as an ear protector, cancelling substantially all or most of the noise striking the dual microphones of the ear set figure (29). In a second embodiment, a noise suppression system in accordance with the present invention acts a noise suppression communication system, suppressing background noise while allowing communication signals to be heard by the wearer figures (30 and 31).

The conceptual key to this proposed model is that the signals received at two closely spaced microphones in a multi-path acoustic environment are each made up of a sum of echoes of the signal received at the other one. This leads to the conclusion that the difference between the two microphone signals is a sum of echoes of the acoustic source in the environment. In the absence of a speech source, the ANS scheme proposed by Jaber first attempts to isolate the difference signal at each of the microphones by subtracting from it an adaptively predicted version of the other microphone signal. It then attempts to adaptively cancel the two difference signals. When speech is present (as detected by some type of VAD-based strategy), the adaptive cancellation stage has its adaptivity turned off (i.e. the impulse responses of the two FIR filters, one for each microphone, are unchanged for the duration of the speech). The effect here is that the adaptive canceller does not end up cancelling the speech signal contained in the difference between the two microphone signals.

The Simulink implementation of the noise suppression system illustrated in figure 29 is displayed in figure 32 where figures 33 and 34 display the noise captured by Mic.1 and Mic. 2 respectively, meanwhile figure 35 shows the filter output.

The Simulink implementation of the Active Noise Suppressor illustrated in Figure 30 with no Voice Activity Detection (VAD) attached is sketched in 36 and the taking off plane’s noise captured by Mic.1 and Mic. 2 are presented in figures 37 and 38 respectively meanwhile figure 39 shows the system output.

With an accurate front-point and end-point detection VAD algorithm [10] implemented in our proposed models illustrated in figures 30-31, an active noise suppressor is obtained.

The simulation results have been obtained from 25 seconds of noisy voice conditions as shown in Fig. 40 at Mic. 1 and the clean speech of our output filter is shown in figure 41 mean while figure 42 will illustrate the clean speech on a reduced scale [11].

## 8. The ultra high speed LMS algorithm implemented on parallel architecture

There are many problems that require enormous computational capacity to solve, and therefore the success of computational science to accurately describe and model the real world has helped to fuel the ever increasing demand for cheap computing power. Scientists are eager to find ways to test the limits of theories, using high performance computing to allow them to simulate more realistic systems in greater detail. Parallel computing offers a way to address these problems in a cost effective manner. Parallel Computing deals with the development of programs where multiple concurrent processes cooperate in the fulfilment of a common task. Finally, in this section we will develop the theory of the parallel computation of the widely used algorithms named the least-mean-square (LMS) algorithm[1] - by its originators, Widrow and Hoff (1960) [2].

### 8.1. The spatial radix-r factorization

This section will be devoted in proving that discrete signals could be decomposed into r partial signals and whose statistical properties remain invariant therefore, given a discrete signal xn of size N

xn=[x0x1x2x(N1)]E96

and the identity matrix Ir of size r

Ir=[10...001...0............00...1]=[I(l,c)]={1forl=c0elsewhereE97

for l = c = 0, 1,..., r – 1.

Based on what was proposed in [2]-[9]; we can conclude that for any given discrete signal x(n) we can write:

x(l,n)=[I(l,c)]×[x(rn)x(rn+1)x(rn+p)]E98

is the product of the identity matrix of size r by r sets of vectors of size N/r (n = 0,1,.., N/r -1) where the lth element of the nth product is stored into the memory address location given by

l = (rn + p)

for p = 0, 1, …, r – 1.

The mean (or Expected value) of xn is given as:

E[x]=n=0N1xnNE99

which could be factorizes as:

E[x]=n=0(N/r)1x(rn)++n=0(N/r)1x(rn+(r1))N=μ(x)=1r×n=0(N/r)1x(rn+p)Nr=μ(x(rn+p))rE100

therefore, the mean of the signal xn is equal to sum of the means of its r partial signals divided by r for p = 0, 1, …, r – 1.

Similarly to the mean, the variance of the signal xn equal to sum of the variances of its r partial signals according to:

Var(x)=σx2=E[(xμ)2]=n=0N1(x(n)μ)2=n=0(N/r)1(x(rn)μ(rn))2++n=0(N/r)1(x(rn+(r1))μ(rn+(r1)))2=p=0r1σx(rn+p)2E101

### 8.2. The parallel implementation of the least squares method

The method of least squares assumes that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least square error) from a given set of data. Suppose that the N data points are (x0, y0), (x1, y1)… (x(n – 1), y(n – 1)), where x is the independent variable and y is the dependent variable. The fitting curve d has the deviation (error) σ from each data point, i.e., σ0 = d0 – y0, σ1 = d1 – y1... σ(n – 1) = d(n – 1) – d(n – 1) which could be re-ordered as:

σ0=d0y0,σr=dryr,σ2r=d2ry2r,...,σrn=drnyrn,σ1=d1y1,...,σ(rn+1)=d(rn+1)y(rn+1),...,σ(rn+(r1))=d(rn+(r1))y(rn+(r1))E102

for n = 0, 1, …, (N/r) – 1.

According to the method of least squares, the best fitting curve has the property that:

J=σ02+...+σrn2+σ12+...+σ(rn+1)2+...+σ(rn+(r1))2=j0=0r1n=0(Nr)1[d(rn+j0)y(rn+j0)]2=j0=0r1n=0(Nr)1σ(rn+j0)2=aminimumE103

The parallel implementation of the least squares for the linear case could be expressed as:

e(n)=d(n)(b+wx(n))=d(n)y(n)=(Ir×[d(rn+j0)y(rn+j0)])=(Ir×[e(rn+j0)])E104

for j0 = 1, …, r – 1 and in order to pick the line which best fits the data, we need a criterion to determine which linear estimator is the “best”. The sum of square errors (also called the mean square error (MSE)) is a widely utilized performance criterion.

J=12Nn=0N1e(n)2=(1r)j0=0r1(12(Nr))n=0(Nr)1((Ir×[e(rn+j0)]))2E105

which yields after simplification to:

J=(1r)j0=0r1(Ir×[(12(Nr))n=0(Nr)1e(rn+j0)2])=(1r)j0=0r1(Ir×[Jj0])=(1r)j0=0r1Jj0E106

where Jj0is the partial MSE applied on the subdivided data.

Our goal is to minimize J analytically, which according to Gauss can be done by taking its partial derivative with respect to the unknowns and equating the resulting equations to zero:

{Jb=0Jw=0E107
(109)

which yields:

{Jb=((1r)j0=0r1(Ir×[Jj0]))b=(1r)j0=0r1(Jj0)b=0Jw=((1r)j0=0r1(Ir×[Jj0]))w=(1r)j0=0r1(Jj0)w=0E108

With the same reasoning as above the MSE could be obtained for multiple variables by:

J=12Nn(d(n)k=0pw(k)x(n,k))2=1r(j0=0r112(Nr)n=0(Nr)1[d(rn+j0)k=0pw((rn+j0),k)x((rn+j0),k)]2)=1rj0=0r1Jj0E109

for j0 = 1, …, r – 1 and where Jj0is the partial MSE applied on the subdivided data.

The solution to the extreme (minimum) of this equation can be found in exactly the same way as before, that is, by taking the derivatives of Jj0with respect to the unknowns (wk), and equating the result to zero.

Instead of solving equations 110 and 111 analytically, a gradient adaptive system can be used which is done by estimating the derivative using the difference operator. This estimation is given by:

wJ=δJδwE110

where in this case the bias b is set to zero.

### 8.3. Search of the performance surface with steepest descent

The method of steepest descent (also known as the gradient method) is the simplest example of a gradient based method for minimizing a function of several variables [12]. In this section we will be elaborating the linear case.

Since the performance surface for the linear case implemented in parallel, are r paraboloids each of which has a single minimum, an alternate procedure to find the best value of the coefficient wj0(k)is to search in parallel the performance surface instead of computing the best coefficient analytically by Eq. 110. The search for the minimum of a function can be done efficiently using a broad class of methods that use gradient information. The gradient has two main advantages for search.

• The gradient can be computed locally.

• The gradient always points in the direction of maximum change.

If the goal is to reach the minimum in each parallel segment, the search must be in the direction opposite to the gradient. So, the overall method of search can be stated in the following way:

Start the search with an arbitrary initial weightwj0(0), where the iteration is denoted by the index in parenthesis (Fig 43). Then compute the gradient of the performance surface atwj0(0), and modify the initial weight proportionally to the negative of the gradient atwj0(0). This changes the operating point towj0(1). Then compute the gradient at the new positionwj0(1), and apply the same procedure again, i.e.

wj0(k+1)=wj0(k)ηJj0(k)E111

where η is a small constant and Jj0(k)denotes the gradient of the performance surface at the kth iteration of j0 parallel segment. η is used to maintain stability in the search by ensuring that the operating point does not move too far along the performance surface. This search procedure is called the steepest descent method (fig 43)

If one traces the path of the weights from iteration to iteration, intuitively we see that if the constant η is small, eventually the best value for the coefficient w* will be found. Whenever w>w*, we decrease w, and whenever w<w*, we increase w.

### 8.4. The radix- r parallel LMS algorithm

Based on what was proposed in [2] by using the instantaneous value of the gradient as the estimator for the true quantity which means by dropping the summation in equation 108 and then taking the derivative with respect to w yields:

J(k)=(1r×(Ir×[Jj0(k)]))=wJ(k)=w(1r×(Ir×[Jj0(k)]))w12Nre2=w12Nrj0=0r1n=0(Nr)1(Ir×[e(rn+j0)])21rw(Ir×[e(rn+j0)(k)x(rn+j0)(k)])E112

What this equation tells us is that an instantaneous estimate of the gradient is simply the product of the input to the weight times the error at iteration k. This means that with one multiplication per weight the gradient can be estimated. This is the gradient estimate that leads to the famous Least Means Square (LMS) algorithm (Fig 44).

If the estimator of Eq.114 is substituted in Eq.113, the steepest descent equation becomes

Ir×[wj0(k+1)]=Ir×(wj0(k)+((1r)(ηj0e(rn+j0)(k)x(rn+j0)(k))))E113

for j0 = 0,1, …, r -1.

This equation is the r Parallel LMS algorithm, which is used as predictive filter, is illustrated in Figure 45. The small constant η is called the step size or the learning rate.

### 8.5. Simulation results

The notion of a mathematical model is fundamental to sciences and engineering. In class of applications dealing with identification, an adaptive filter is used is used to provide a linear model that represents the best fit (in some sense) to an unknown signal. The LMS Algorithm which is widely used is an extremely simple and elegant algorithm that is able to minimize the external cost function by using local information available to the system parameters. Due to its computational burden and in order to speed up the process, this paper has presented an efficient way to compute the LMS algorithm in parallel where it follows from the simulation results that the stability of our models relies on the stability of our r parallel adaptive filters. It follows from figures 47 and 48 that the stability of r parallel LMS filters (in this case r = 2) has been achieved and the convergence performance of the overall model is illustrated in figure 49. The complexity of the proposed method will be reduced by a factor of r in comparison to the direct method illustrated in figure 46. Furthermore, the simulation result of the channel equalization is illustrated in figure 50 in which the blue curves represents our parallel implementation (2 LMS implemented in parallel) compared to the conventional method where the curve is in a red colour.

## Notes

• M Jaber “Method and apparatus for enhancing processing speed for performing a least mean square operation by parallel processing” US patent No. 7,533,140, 2009

## How to cite and reference

### Cite this chapter Copy to clipboard

Marwan Jaber (September 6th 2011). The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering, Adaptive Filtering, Lino Garcia, IntechOpen, DOI: 10.5772/25070. Available from:

### Related Content

#### This Book

Edited by Lino Garcia Morales

Next chapter

#### An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures

By Kyo Takahashi, Naoki Honma and Yoshitaka Tsunekawa

#### Related Book

Edited by Lino Garcia Morales

First chapter