## 1. Introduction

Over the past decades a number of new adaptive filter algorithms have been elaborated and applied to meet demands for faster convergence and better tracking properties than earlier techniques could offer. The Filtered LMS algorithm is currently the most popular method for adapting a filter, due to its simplicity and robustness, which have made it widely adopted in many applications. Applications include adaptive channel equalization, adaptive predictive speech coding, Noise Suppression and on-line system identification. Recently, because of the progress of digital signal processors, a variety of selective coefficient update of gradient-based adaptive algorithms could be implemented in practice. Different types of adaptive algorithms have been developed and used in conventional adaptive filters such as, filtered LMS algorithms [1], [2], [3] and [4], filtered X-LMS algorithms [1], [2], [5], and [6], filtered NLMS algorithms and RLS algorithms [1] and [2]. As a result, this chapter surveys sequential filter adaptation techniques and some applications for transversal FIR filter.

In other words filters are devices or systems that processes (or reshapes) the input signal according to some specific rules to generate an output signal Figure 1.

Filters could be classified in two categories linear and non-linear filters where in linear filters there is a linear relationship between input and output of the filter that satisfy the following property:

meanwhile in non-linear filters; there is a nonlinearity between input and output of the filter that satisfy the following property:

In this chapter we will be focusing on adaptive filtering which could automatically adjust (or adapt) in the face of changing environments and changing system requirements by training them to perform specific filtering or decision-making tasks and in which they should be some “adaptation algorithm” for adjusting the system’s parameters figure (2) [7].

Different filter structures could be implemented in the adaptive filter of figure 2 such as:

Transversal Filter Structure (FIR)

IIR Filter Structure

Lattice Filter Structure

Non-Linear Filter Structure (Volterra Filter)

in which the adaptation approaches are based on:

Wiener filter theory where the optimum coefficients of a linear filter are obtained by minimization of its mean-square error (MSE)

Method of least squares where The performance index is the sum of weighted error squares

The main objective of the adaptation algorithm is to set the filter parameters,

and the minimization algorithms are based on the following methods:

## 2. Signals

Before pursuing the study of adaptive systems, it is important to refresh our memory with some useful definitions from the stochastic process theory. The representation of signals could fall into two categories:

### 2.1. Deterministic signals

A deterministic discrete-time signal is characterized by a defined mathematical function of the time index *n*, with *n=* 0, ±1, ±2*,* ⋯, such as:

where *u*_{(n)} is the unit-step sequence and The response of a linear time-invariant filter to an input *x*_{(n)} is given by:

where *h*_{(n)} is the impulse response of the filter. Knowing that The Z-transform and its inverse of a given sequence *x*_{(n)} is defined as:

where C is a counter clockwise closed contour in the region of convergence of *X(z)* and encircling the origin of the *z*-plane as a result; by taking the Z-transform of both sides of equation (4), we will obtain

*Y*(*Z*) = *H*(*Z*) × *X*(*Z*)

For finite finite-energy waveform, it is convenient to use the discrete-time Fourier transform defined as:

### 2.2. Random signals

In many real life situations, observations are made over a period of time and they are influenced by random effects, not just at a single instant but throughout the entire interval of time or sequence of times. In a “rough” sense, a random process is a phenomenon that varies to some degree unpredictably as time goes on. If we observed an entire time-sequence of the process on several different occasions, under presumably “identical” conditions, the resulting observation sequences, in general, would be different. A random variable (RV) is a rule (or function) that assigns a real number to every outcome of a random experiment, while a stochastic random process is a rule (or function) that assigns a time function to every outcome of a random experiment. The elements of a stochastic process, {*x*_{(n)}}, for different value of time-index *n*, are in general complex-valued random variable that are characterized by their probability distribution functions. A stochastic random process is called stationary in the strict sense if all of its (single and joint) distribution function is independent of a shift in the time origin.

In this subsection we will be reviewing some useful definitions in stochastic process:

Stochastic Average

where E is the expected value of *x*_{(n)}.

Autocorrelation function for a stochastic process *x*_{(n)}

where *x*^{*} denotes the complex conjugate of *x* and the symmetry properties of correlation function is:

Furthermore a stochastic random process is called stationary in the wide sense if *k, m,* and *n* therefore,

and

which means*n – m* or

Special case

The Z-transform of

where it can be easily shown that

Equation (16) implies that if*z*, then its poles and zeros must occur in complex-conjugate reciprocal pairs. Moreover, the points that belong to the region of convergence of*Z* = *e*^{jw} then,

and if *m*_{x} = 0, we will have

Cross-correlation function for two stochastic processes *x*_{(n)} and *y*_{(n)}

and the symmetry properties of the correlation function is:

The Z-transform of

where it can be easily shown that

Auto-covariance function for stationary process is defined as:

and the symmetry properties of auto-covariance function

Special case

Cross-covariance function is defined as:

and the symmetry properties of the cross-covariance function is:

#### 2.2.1. Power spectral density

Consider the wide-sense stationary random process {*x*_{(n)}} from which we will consider a window of 2*N+*1 elements of *x*_{(n)} such as

and the discrete-time Fourier transform of

By conjugating both sides of equation 29 with:

we will obtain

and by taking the expected value of equation 31, we will have:

which after simplification by imposing *k* = *n – m* will yield:

By assuming that the summation on the right hand side of equation 33 will converge for large *k* then

The function*x*_{(n)}} which is always real and non-negative.

#### 2.2.2. Response of linear system to stochastic processes

Given the Linear Time-Invariant System (LTI) illustrated in figure 3 where we will assume that

and

To be noted that the following expressions could be easily derived:

and

## 3. Regression

Data fitting is one of the oldest adaptive systems which are a powerful tool, allowing predictions of the present or future events to be made based on information about the past or present events. The two basic types of regression are linear regression and multiple regressions.

### 3.1. Linear regression

The goal of linear regression is to adjust the values of slope and intercept to find the line that best predicts *d* from *x* (Figure 4) or in other words linear regression estimates how much *d* changes when *x* changes by one unit.

The slope quantifies the steepness of the line which is equals to the change in *d* for each unit change in *x*. If the slope is positive, *d* increases as *d* increases. If the slope is negative, *d* decreases as *d* increases. The *d* intercept is the *d* value of the line when *x* equals zero. It defines the elevation of the line.

The deviation from the straight line which represents the best linear fits of a set of data as shown in figure 5 is expressed as:

*d* ≈ *w × x* + *b*

or more specifically,

*d*_{(n)} = *w*
*× x*_{(n)} +*b* + *e*_{(n)} = *y*_{(n)} + *e*_{(n)}

where *e*_{(n)} is the instantaneous error that is added to *y*_{(n)} (the linearly fitted value), *w* is the slope and *b* is the *y* intersect (or bias). More precisely, the goal of regression is to minimize the sum of the squares of the vertical distances of the points from the line. The problem can be solved by a linear system with only two free parameters, the slope *w* and the bias *b*.

Figure 6 is called the Linear Regression Processing Element (LRPE) which is built from two multipliers and one adder. The multiplier *w scales the input*, and the multiplier *b is a simple bias*, which can also be thought of as an extra input connected to the value +1.The parameters (*b, w*) have different functions in the solution.

#### 3.1.1. Least squares for linear model

Least squares solves the problem by finding the best fitted line to a set of data; for which the sum of the square deviations (or residuals) in the *d* direction are minimized (figure 7).

The goal is to find a systematic procedure to find the constants *b* and *w* which minimizes the error between the true value *d*_{(n)} and the estimated value *y*_{(n)} which is called the linear regression.

The best fitted line to the data is obtained by minimizing the error *e*_{(n)} which is computed by the mean square error (MSE) that is a widely utilized as performance criterion

where *N* in the number of observations and ξ is the mean square error.

Our goal is to minimize ξ analytically, which can be achieved by taking the partial derivative of this quantity with respect to the unknowns and equate the resulting equations to zero, i.e.

which yields after some manipulation to:

where the bar over the variable means the mean value and the procedure to determine the coefficients of the line is called the least square method.

#### 3.1.2. Search procedure

The purpose of least squares is to find parameters (*b, w*_{1}, *w*_{2}, …, *w*_{p}) that minimize the difference between the system output *y*_{(n)} and the desired response *d*_{(n)}. *So, regression is effectively computing the optimal parameters of an interpolating system* which predicts the value of *d* from the value of *x*.

Figure 8 shows graphically the operation of adapting the parameters of the linear system in which the system output *y* is always a linear combination of the input *x* with a certain bias *b* according to the equation *y= wx + b*. Changing *b* modifies the *y* intersect, while changing *w* modifies the slope. The goal of linear regression is to adjust the position of the line such that the average square difference between the *y* values (on the line) and the cloud of points *d*_{(n)} i.e. the error *e*_{(n)}, is minimized.

The key point is to recognize the information transmitted by the error which can be used to optimally place the line and this could be achieved by including a subsystem that accepts the error and modifies the parameters of the system. Thus, the *error*
*e*_{(n)}
*is fed-back to the system* and indirectly affects the output through a change in the parameters (*b,w*). With the incorporation of the mechanism that automatically modifies the system parameters, a very powerful linear system can be built that will constantly seek optimal parameters. Such systems are called Adaptive systems, and are the focus of this chapter.

### 3.2. Multiple regression

With the same reasoning as above, the regression for multiple variables could be derived as:

where for this case is to find the coefficient vector *W* that minimizes the MSE of *e*_{(i)} over the *i* samples (Figure 9).

The mean square error (MSE) becomes for this case:

where the solution of this equation can be found by taking the derivatives of ξ with respect to the unknowns (*w*_{(k)}), and equating the result to zero which will yield to the famous *normal matrix equation* expressed as:

for *j* = 0, 1, …, *p*.

By defining:

as the autocorrelation function,

as the cross-correlation of the input *x* for index *j* and the desired response *y* and substituting these definitions into Eq. (47), the set of normal equations can be written simply as:

where *W* is a vector with the *p+*1 weights *w*_{i} in which *W** represents the value of the vector for the optimum (minimum) solution. The solution of the multiple regression problems can be computed analytically as the product of the inverse of the autocorrelation of the input samples multiplied by the cross-correlation vector of the input and the desired response.

All the concepts previously mentioned for linear regression can be extended to the multiple regression case where *J* in matrix notation is illustrated as:

where *T* means the transpose and the values of the coefficients that minimize the solution are:

## 4. Wiener filters

The Wiener filter is a filter proposed by Norbert Wiener during the 1940s and published in 1949 [8]. Its purpose was to reduce the amount of noise present in a signal in comparison with an estimation of the desired noiseless signal. This filter is an MSE-optimal stationary linear filter which was mainly used for images degraded by additive noise and blurring. The optimization of the filter is achieved by minimizing mean square error defined as the difference between the output filter and the desired response (Figure 10) which is known as the cost function expressed as:

In signal processing, a causal filter is a linear and time-invariant causal system. The word *causal* indicates that the filter output depends only on past and present inputs. A filter whose output also depends on future inputs is non-causal. As a result two cases should be considered for the optimization of the cost function (equation 53).

The filter

*W*(*Z*) is causal and or FIR (Finite Impulse Response).The filter

*W*(Z) is non-causal and or IIR (Infinite Impulse Response Filter).

### 4.1. Wiener filter – the transversal filter

Letwhere *T* denotes the vector transpose and

be the input vector signal where two cases should be treated separately depending on the required application:

#### 4.1.1. Wiener filter – the transversal filter - real-valued input signal

The output filter could be expressed as:

Where

The performance or cost function expressed in equation 53 could be elaborated as:

(55) |

By examining equation 58 we could easily notice that

and with the same reasoning as above the autocorrelation function

Knowing that

where*R* is a positive definite Matrix.

The gradient method is the most commonly used method to compute the tap weights that minimizes the cost function therefore,

which yield after expanding equation 61 to:

where we have assumed in the expanded equation

Knowing that *r*_{li} = *r*_{il} due to the symmetry property of the autocorrelation function of real-valued signal equation 49 could be expressed as:

for *i* = 0, 1, …, *N* – 1 which could be expressed in matrix notation as:

known as Weiner-Hopf equation, whose solution for the optimal tap-weight vector*R* has an inverse matrix, is:

Finally, the minimum value of the cost function can be expressed as:

#### 4.1.2. Wiener filter – the transversal filter - complex-valued input signal

The data transmission of the basebands QPSK and the QAM is complex valued random signals where the filter’s tap weight vector is also assumed to be complex therefore, the cost function for this case could be expressed as:

and the gradient of the cost function would be:

By examining figure 11 we could easily notice that equation 57 could be formulated as:

and since *d* is independent of *w* and since

(69) |

Where the sub-indices *R* and *I* refers to the real part and imaginary part of the complex number therefore, the gradient of the cost function (Equation 70) would be:

The optimum filter’s tap-weights *e*_{0(n)} are obtained by setting the complex gradient of equation 73 to zero yielding:

By defining

where *H* denotes the complex-conjugate transpose or Hermitian and by re-writing equation 71 as:

whose solution is:

where

Equation 77 is known as Weiner-Hopf equation for the complex valued signals and the minimum of the cost function will be:

## 5. Least Mean Square algorithm (LMS algorithm)

The purpose of least squares is to is to find the optimal filter’s tap weights that that minimize the difference between the system output *y*_{(n)} and the desired response *d*_{(n)} or in other words to minimize the cost function. Instead of solving the Weiner-Hopf equation in order to obtain the optimal filter’s tap weights as seen previously, their exists other iterative methods which employ an iterative search method that starts with an* arbitrary initial weight* then a recursive search method that may require many iterations in order to converge to the optimal filter’s tap weights

### 5.1. Steepest descent method

The steepest descent method (also known as the gradient method) is the simplest example of a gradient based method that minimizes a function of several variables. This process is employing an iterative search method that starts with an arbitrary initial weight^{th} iteration

where µ is positive known as the step size and

Knowing that the auto-correlation matrix R may be diagonalized by using the unitary similarity decomposition:

where

As a result equation 81 could be expressed as:

which after mathematical manipulation and by using the vector transformation

For i = 0, 1, 2,..., N – 1 and

Equation 86 converges to zero if and only if the step-size parameter μ is selected so that:

which means

for all i or equivalently

where λ_{max} is the maximum of the eigenvalues

### 5.2. Newton’s method

By replacing the scalar step-size μ with a matrix step-size given by μR^{-1}in the steepest descent algorithm (equation 81) and by using

Substituting

and by subtracting

where in actual implementation of adaptive filters, the exact values of^{ – 1} are not available and have to be estimated.

### 5.3. Least Mean Square (LMS) algorithm

The LMS algorithm is a practical scheme for realizing Wiener filters, without explicitly solving the Wiener-Hopf equation and this was achieved in the late 1960’s by Widrow [2] who proposed an extremely elegant algorithm to estimate the gradient that revolutionized the application of gradient descent procedures by using the instantaneous value of the gradient as the estimator for the true quantity which means replacing the cost function

where

The i^{th} element of the gradient vector

which leads to:

that will be replaced in equation 93 to obtain:

where

.Equation 96 is known as the LMS recursion and the summary of the LMS algorithm illustrated on figure 12 will be as follow:

## 6. Classifying adaptive filtering applications

Various applications of adaptive filtering differ in the manner in which the desired response is extracted. In this context, we may distinguish four basic classes of adaptive filtering applications (depicted in Figures 13 to 16, which follow):

The following notations are used in Figures 11-15:

u = input applied to the adaptive filter

y = output of the adaptive filter

d = desired response

e = d – y = estimation error

The functions of the four basic classes of adaptive filtering applications are summarized as follow:

Identification (Figure 13).

The notion of a mathematical model is fundamental to sciences and engineering. In the class of applications dealing with identification, an adaptive filter is used to provide a linear model that represents the best fit to an unknown plant. The plant and the adaptive filter are driven by the same input. The plant output supplies the desired responses for the adaptive filter. If the plant is dynamic in nature, the model will be time varying.

Inverse Modeling (Figure 14).

In this second class of applications, the adaptive filter provides an inverse model representing the best fit to an unknown noisy plant. Ideally, the inverse model has a transfer function equal to the reciprocal of the plant’s transfer function. A delayed version of the plant input constitutes the desired response for the adaptive filter. In some applications, the plant input is used without delay as the desired response.

Prediction (Figure 15).

In this example, the adaptive filter provides the best prediction of the present value of a random signal. The present value of the signal serves the purpose of a desired response for the adaptive filter. Past values of the signal supply the input applied to the adaptive filter. Depending on the application of interest, the adaptive filter output or the estimation error may service as the system output. In the first case, the system operates as a predictor; in the latter case, it operates as a prediction error filter.

Interference Cancelling (Figure 16).

In this final class of applications, the adaptive filter is used to cancel unknown interference contained in a primary signal, with the cancellation being optimized in some sense. The primary signal serves as the desired response for the adaptive filter. A reference signal is employed as the input to the adaptive filter. The reference signal is derived from a sensor or set of sensors located in relation to the sensor(s) supplying the primary signal in such a way that the information-bearing signal component is weak or essentially undetectable.

## 7. Active noise suppressor ANC

In this subsection we propose a new approach of noise control named ANS, where we propose stability-guaranteed algorithm for an adaptive filters, which can be derived on a basis of the strictly positive real property of the error model treated in adaptive system theory. It is important to assure the stability of the adaptive system especially in presence of unknown disturbances and mismatch in the order of the adaptive filter. Experimental results, performed on real mining noise, validate the effectiveness of the proposed stable algorithm [9].

### 7.1. Sounds and hearing

When sound waves from a point source strike a plane wall, they produce reflected circular wave fronts as if there were an "image" of the sound source at the same distance on the other side of the wall. If something obstructs the direct sound from the source from reaching your ear, then it may sound as if the entire sound is coming from the position of the "image" behind the wall. This kind of sound imaging follows the same laws of reflection as your image in a plane mirror (figure 17). The reflection of sound follows the law "angle of incidence equals angle of reflection" as the light does and other waves or the bounce of a billiard ball off the bank of a table figure (18).

The main item of note about sound reflections off of hard surfaces is the fact that they undergo a 180-degree phase change upon reflection. This can lead to resonance such as standing waves in rooms. It also means that the sound intensity near a hard surface is enhanced because the reflected wave adds to the incident wave, giving pressure amplitude that is twice as great in a thin "pressure zone" near the surface. This is used in pressure zone microphones to increase sensitivity. The doubling of pressure gives a 6-decibel increase in the signal picked up by the microphone figure (19). Since the reflected wave and the incident wave add to each other while moving in opposite directions, the appearance of propagation is lost and the resulting vibration is called a standing wave. The modes of vibration associated with resonance in extended objects like strings and air columns have characteristic patterns called standing waves. These standing wave modes arise from the combination of reflection and interference such that the reflected waves interfere constructively with the incident waves. An important part of the condition for this constructive interference is the fact that the waves change phase upon reflection from a fixed end. Under these conditions, the medium appears to vibrate in segments or regions and the fact that these vibrations are made up of traveling waves is not apparent - hence the term "standing wave" figure (19).

Two traveling waves, which exist in the same medium, will interfere with each other figure (20). If their amplitudes add, the interference is said to be constructive interference, and destructive interference if they are "out of phase" and subtract figure (21). Patterns of destructive and constructive interference may lead to "dead spots" and "live spots" in auditorium acoustics.

Interference of incident and reflected waves is essential to the production of resonant standing waves figure (22).

The sound intensity from a point source of sound will obey the inverse square law if there are no reflections or reverberation figure (23). Any point source, which spreads its influence equally in all directions without a limit to its range, will obey the inverse square law. This comes from strictly geometrical considerations. The intensity of the influence at any given radius r is the source strength divided by the area of the sphere. Being strictly geometric in its origin, the inverse square law applies to diverse phenomena. Point sources of gravitational force, electric field, light, sound or radiation obey the inverse square law.

A plot of this intensity drop shows that it drops off rapidly figure (24). A plot of the drop of sound intensity according to the inverse square law emphasizes the rapid loss associated with the inverse square law. In an auditorium, such a rapid loss is unacceptable. The reverberation in a good auditorium mitigates it. This plot shows the points connected by straight lines but the actual drop is a smooth curve between the points.

Reverberation is the collection of reflected sounds from the surfaces in an enclosure like an auditorium. It is a desirable property of auditoriums to the extent that it helps to overcome the inverse square law drop-off of sound intensity in the enclosure. However, if it is excessive, it makes the sounds run together with loss of articulation - the sound becomes muddy, garbled figure (25).

### 7.2. ANC

In order to cancel unwanted noise, it is necessary to obtain an accurate estimate of the noise to be cancelled. In an open environment, where the noise source can be approximated as a point source, microphones can be spaced far apart as necessary and each will still receive a substantially similar estimate of the background noise. However in a confined environment containing reverberation noise caused by multiple sound reflections, the sound field is very complex and each point in the environment has a very different background noise signal. The further apart the microphones are, the more dissimilar the sound field. As a result, it is difficult to obtain an accurate estimate of the noise to be cancelled in a confined environment by using widely spaced microphones.

The proposed model is embodied in a dual microphone noise suppression system in which the echo between the two microphones is substantially cancelled or suppressed. Reverberations from one microphone to the other are cancelled by the use of first and second line echo cancellers. Each line echo canceller models the delay and transmission characteristics of the acoustic path between the first and second microphones Figure (26).

If the two microphones are moved closer together, the second microphone should provide a better estimate of the noise to be cancelled in the first microphone. However, if the two microphones are placed very close together, each microphone will cause an additional echo to strike the other microphone. That is, the first microphone will act like a speaker (a sound source) transmitting an echo of the sound field striking the second microphone. Similarly, the second microphone will act like a speaker (a sound source) transmitting an echo of the sound field striking the first microphone. Therefore, the signal from the first microphone contains the sum of the background noise plus a reflection of the background noise, which results in a poorer estimate of the background noise to be cancelled figures (27) and (28).

In a first embodiment, a noise suppression system in accordance with the proposed model acts as an ear protector, cancelling substantially all or most of the noise striking the dual microphones of the ear set figure (29). In a second embodiment, a noise suppression system in accordance with the present invention acts a noise suppression communication system, suppressing background noise while allowing communication signals to be heard by the wearer figures (30 and 31).

The conceptual key to this proposed model is that the signals received at two closely spaced microphones in a multi-path acoustic environment are each made up of a sum of echoes of the signal received at the other one. This leads to the conclusion that the difference between the two microphone signals is a sum of echoes of the acoustic source in the environment. In the absence of a speech source, the ANS scheme proposed by Jaber first attempts to isolate the difference signal at each of the microphones by subtracting from it an adaptively predicted version of the other microphone signal. It then attempts to adaptively cancel the two difference signals. When speech is present (as detected by some type of VAD-based strategy), the adaptive cancellation stage has its adaptivity turned off (i.e. the impulse responses of the two FIR filters, one for each microphone, are unchanged for the duration of the speech). The effect here is that the adaptive canceller does not end up cancelling the speech signal contained in the difference between the two microphone signals.

The Simulink implementation of the noise suppression system illustrated in figure 29 is displayed in figure 32 where figures 33 and 34 display the noise captured by Mic.1 and Mic. 2 respectively, meanwhile figure 35 shows the filter output.

The Simulink implementation of the Active Noise Suppressor illustrated in Figure 30 with no Voice Activity Detection (VAD) attached is sketched in 36 and the taking off plane’s noise captured by Mic.1 and Mic. 2 are presented in figures 37 and 38 respectively meanwhile figure 39 shows the system output.

With an accurate front-point and end-point detection VAD algorithm [10] implemented in our proposed models illustrated in figures 30-31, an active noise suppressor is obtained.

The simulation results have been obtained from 25 seconds of noisy voice conditions as shown in Fig. 40 at Mic. 1 and the clean speech of our output filter is shown in figure 41 mean while figure 42 will illustrate the clean speech on a reduced scale [11].

## 8. The ultra high speed LMS algorithm implemented on parallel architecture

There are many problems that require enormous computational capacity to solve, and therefore the success of computational science to accurately describe and model the real world has helped to fuel the ever increasing demand for cheap computing power. Scientists are eager to find ways to test the limits of theories, using high performance computing to allow them to simulate more realistic systems in greater detail. Parallel computing offers a way to address these problems in a cost effective manner. Parallel Computing deals with the development of programs where multiple concurrent processes cooperate in the fulfilment of a common task. Finally, in this section we will develop the theory of the parallel computation of the widely used algorithms named the least-mean-square (LMS) algorithm[1] - by its originators, Widrow and Hoff (1960) [2].

### 8.1. The spatial radix-r factorization

This section will be devoted in proving that discrete signals could be decomposed into r partial signals and whose statistical properties remain invariant therefore, given a discrete signal x_{n} of size N

and the identity matrix I_{r} of size r

for l = c = 0, 1,..., r – 1.

Based on what was proposed in [2]-[9]; we can conclude that for any given discrete signal x_{(n)} we can write:

is the product of the identity matrix of size r by r sets of vectors of size N/r (n = 0,1,.., N/r -1) where the l^{th} element of the n^{th} product is stored into the memory address location given by

l = (rn + p)

for p = 0, 1, …, r – 1.

The mean (or Expected value) of x_{n} is given as:

which could be factorizes as:

therefore, the mean of the signal x_{n} is equal to sum of the means of its r partial signals divided by r for p = 0, 1, …, r – 1.

Similarly to the mean, the variance of the signal x_{n} equal to sum of the variances of its r partial signals according to:

### 8.2. The parallel implementation of the least squares method

The method of least squares assumes that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least square error) from a given set of data. Suppose that the N data points are (x_{0}, y_{0}), (x_{1}, y_{1})… (x_{(n – 1)}, y_{(n – 1)}), where x is the independent variable and y is the dependent variable. The fitting curve d has the deviation (error) σ from each data point, i.e., σ_{0} = d_{0} – y_{0}, σ_{1} = d_{1} – y_{1}... σ_{(n – 1)} = d_{(n – 1)} – d_{(n – 1)} which could be re-ordered as:

(102) |

for n = 0, 1, …, (N/r) – 1.

According to the method of least squares, the best fitting curve has the property that:

(103) |

The parallel implementation of the least squares for the linear case could be expressed as:

for j_{0} = 1, …, r – 1 and in order to pick the line which best fits the data, we need a criterion to determine which linear estimator is the “best”. The sum of square errors (also called the mean square error (MSE)) is a widely utilized performance criterion.

which yields after simplification to:

where

Our goal is to minimize J analytically, which according to Gauss can be done by taking its partial derivative with respect to the unknowns and equating the resulting equations to zero:

(109)which yields:

(108) |

With the same reasoning as above the MSE could be obtained for multiple variables by:

(109) |

for j_{0} = 1, …, r – 1 and where

The solution to the extreme (minimum) of this equation can be found in exactly the same way as before, that is, by taking the derivatives of _{k}), and equating the result to zero.

Instead of solving equations 110 and 111 analytically, a gradient adaptive system can be used which is done by estimating the derivative using the difference operator. This estimation is given by:

where in this case the bias b is set to zero.

### 8.3. Search of the performance surface with steepest descent

The method of steepest descent (also known as the gradient method) is the simplest example of a gradient based method for minimizing a function of several variables [12]. In this section we will be elaborating the linear case.

Since the performance surface for the linear case implemented in parallel, are r paraboloids each of which has a single minimum, an alternate procedure to find the best value of the coefficient

The gradient can be computed locally.

The gradient always points in the direction of maximum change.

If the goal is to reach the minimum in each parallel segment, the search must be in the direction opposite to the gradient. So, the overall method of search can be stated in the following way:

Start the search with an arbitrary initial weight

where η is a small constant and ^{th} iteration of j_{0} parallel segment. η is used to maintain stability in the search by ensuring that the operating point does not move too far along the performance surface. This search procedure is called the steepest descent method (fig 43)

If one traces the path of the weights from iteration to iteration, intuitively we see that if the constant η is small, eventually the best value for the coefficient w* will be found. Whenever w>w*, we decrease w, and whenever w<w*, we increase w.

### 8.4. The radix- r parallel LMS algorithm

Based on what was proposed in [2] by using the instantaneous value of the gradient as the estimator for the true quantity which means by dropping the summation in equation 108 and then taking the derivative with respect to w yields:

(112) |

What this equation tells us is that an instantaneous estimate of the gradient is simply the product of the input to the weight times the error at iteration k. This means that with one multiplication per weight the gradient can be estimated. This is the gradient estimate that leads to the famous Least Means Square (LMS) algorithm (Fig 44).

If the estimator of Eq.114 is substituted in Eq.113, the steepest descent equation becomes

for j_{0} = 0,1, …, r -1.

This equation is the r Parallel LMS algorithm, which is used as predictive filter, is illustrated in Figure 45. The small constant η is called the step size or the learning rate.

### 8.5. Simulation results

The notion of a mathematical model is fundamental to sciences and engineering. In class of applications dealing with identification, an adaptive filter is used is used to provide a linear model that represents the best fit (in some sense) to an unknown signal. The LMS Algorithm which is widely used is an extremely simple and elegant algorithm that is able to minimize the external cost function by using local information available to the system parameters. Due to its computational burden and in order to speed up the process, this paper has presented an efficient way to compute the LMS algorithm in parallel where it follows from the simulation results that the stability of our models relies on the stability of our r parallel adaptive filters. It follows from figures 47 and 48 that the stability of r parallel LMS filters (in this case r = 2) has been achieved and the convergence performance of the overall model is illustrated in figure 49. The complexity of the proposed method will be reduced by a factor of r in comparison to the direct method illustrated in figure 46. Furthermore, the simulation result of the channel equalization is illustrated in figure 50 in which the blue curves represents our parallel implementation (2 LMS implemented in parallel) compared to the conventional method where the curve is in a red colour.