Open access

Perceptual Echo Control and Delay Estimation

Written By

Kirill Sakhnov, Ekaterina Verteletskaya and Boris Simak

Submitted: 14 October 2010 Published: 05 July 2011

DOI: 10.5772/16665

From the Edited Volume

Adaptive Filtering Applications

Edited by Lino Garcia

Chapter metrics overview

3,243 Chapter Downloads

View Full Metrics

1. Introduction

Echo phenomenon has been always existed in telecommunications networks. Generally it has been noticed on long international telephone calls. As technology advances and the data transmission methods tend more to packet-switching concepts, the traditional echo problem remained up-to-date. An important issue in echo analysis is a round-trip delay of the network. This is a time interval required for a signal from speaker’s mouth, across the communication network through the transmit path to the potential source of the echo, and then back across the network again on the receive path to the speaker’s ear. The main problem associated with IP-based networks is that the round-trip delay can be never reduced below its fundamental limit. There is always a delay of at least two to three packet sizes (50 to 80 ms) (Choi et al., 2004) that can make the existing network echo more audible (Gordy & Goubran, 2006). Therefore, all Voice over IP (VoIP) network terminals should employ echo cancellers to reduce the amplitude of returning echoes. A main parameter of each echo canceller is a length of its coverage. The coverage means the length of time that the echo canceller stores its approximation in memory. The adaptive filter should be long enough to model an unknown system properly, especially in case of VoIP applications (Nisar et al., 2009; Youhong et al., 2005). On the other hand, it is known that an active part of the network echo path is usually much smaller compared to the whole echo path that has to be covered by the adaptive filtering algorithm. That is why the knowledge of the echo delay is important for using echo cancellers in packet-switching networks. Today, there is a wide family of adaptive filtering algorithms that can exploit sparseness of the echo path to reduce high computational complexity associated with long echo paths (Dyba, 2008; Hongyang & Dyba, 2008; Khong & Naylor, 2006; Hongyang & Dyba, 2009). In this chapter, we discuss numerous methods used for estimation of echo delay. Algorithms based on cross-correlation function and adaptive filters are used in the art. We will consider both types of them, discuss their advantages and drawbacks. Afterwards, we will pay our attention to the adaptive filtering techniques. We provide a study on different partial, proportionate, sparseness-controlled time- and frequency-domain adaptive filters. The readers will get closer to an issue of echo cancellation, which is relevant in nowadays telecommunications networks. Ones will able to recognize important features and particular areas of implementation of various adaptive algorithms. Further, we are giving a short introduction to the issue of echo control for telecommunications networks. This description emphasises on two most important aspects of perceptual echo control, which are echo loudness and echo delay.

Advertisement

1.1. Echo control issue

In the very beginning of the telephone age, all calls were made through an analog pair of copper wires. The technology has progressively moved to digital circuit switched networks over the past several decades. Today most of the phone traffic is handled by the Public Switched Telephone Network (PSTN), which provides end-to-end dedicated circuits. During the last years a move to packet-switched networks has been initiated to support voice traffic over Internet Protocol (IP). The main reason for the move from circuit-switched voice networks to packet-switched networks is to enable convergence between data services and voice services. It is of economical interest to be able to use the same equipment for voice and data traffic. Reduced cost of placing a phone call is another reason, since the voice-packet is treated and routed much in the same way as any other data packet (note that Quality of Service plays a vital role in this process). Thus, conventional long distance tariffs have a tendency to be completely eliminated in Voice over IP (VoIP) networks as well.

Echo issue has long been recognized as a problem on telecommunications networks, though generally it has been noticed mostly on international telephone calls or when using speaker phones. As technology advances and the information transmission methods tend more to packet-switching concepts, the traditional echo problem should be reviewed and updated. Previously unconsidered factors now play an important part in the echo characteristics. This section describes the echo delay problem, which is often encountered in packet-switched networks. This problem is highlighted in relation to VoIP networks. More specific details on the process of locating and eliminating echoes are included in conclusion to the chapter.

Consider a simple voice telephone call, where an echo occurs when you hear your own voice repeated. An echo is the audible leak-through of your own voice into your own receive path. Every voice conversation has always at least two participants. From the perspective of each participant, there are two voice paths in every call:

Transmit path – The transmit path is usually depicted as Tx path. In a conversation, the transmit path is created when any person begins speaking. The sound is transmitted from the mouth of the speaker to the ear of the listener.

Receive path – The receive path is also called the return and depicted as Rx path. In a conversation, the receive path is created when a person hears the conversation coming from the mouth of another speaker.

Fig. 1 illustrates a simple diagram of a voice call between two persons A (Kirill) and B (Kate). From the user A’s perspective, the Tx path carries his voice to the user B’s ear, and the Rx path carries the user B’s voice to the user A’s ear.

Figure 1.

A simple telephone call scenario

There is one significant factor in the echo analysis, and especially for the packet-switching networks. It is a round-trip delay of the voice network. The round-trip delay is the length of time required for an utterance from the user A’s mouth, across the network on the Tx path to the source of the leak, and then back across the network again on the Rx path to the user A’s ear. Let’s define two important statements about echo nature, which are the following:

The louder the echo (echo amplitude), the more annoying it is,

The longer the round-trip delay (the “later” the echo), the more annoying it is.

Table 1 shows how time delay can affect the quality of a voice conversation.

One-Way Delay Range (ms)Effect on Voice Quality
0-25This is the expected range for national calls. There are no difficulties during conversation.
25-150This is the expected range for international calls using a terrestrial transport link and IP telephony, which includes only one instance of IP voice. This range is acceptable for most users, assuming the use of echo control devices.
150-400This is the expected range for a satellite link. Delays in this range can interrupt the normal flow of a conversation. A high-performance echo canceller must be used and careful network planning is necessary.
Greater than 400This is excessive delay and must be avoided by network planning.

Table 1.

Effect of Delay on Voice Quality

Fig. 2 shows how the echo disturbance influenced by the two parameters: delay and echo level. The metric called Talker Echo Loudness Rating (TELR) denotes the level difference between the voice and echo signals. The “acceptable” curve represents the limit for acceptable talker echo performance for all digital networks. The fact that the speaker A, in Fig. 1, hears an echo illustrates one of the basic characteristics of echo: perceived echo most likely indicates a problem at the other end of the call. The problem that is producing the echo that A hears, the leakage source, is somewhere on B’s side of the network. If the person B was experiencing echo, the problem would be on the user A’s side.

The perceived echo usually originates in the terminating side of the network for the following two reasons:

Leakage happens only in analog circuits. Voice traffic in the digital portions of the network does not pass from one path to another.

Echo arriving after very short time, about 25 milliseconds, is generally imperceptible, because it is masked by the physical and electrical side-tone signal.

A hybrid transformer is often main source of the electrical signal leakage. The typical analog telephone terminal is 2-wire device: a single pair of conductors is used to carry both the Tx and Rx signals. For analog trunk connections, known as 4-wire transmission, two pairs of conductors carry separate Tx and Rx signals. Digital trunks (T1/E1) can be virtual 4-wire links because they also carry separate Tx and Rx signals. A hybrid is a transformer that is used to interface 4-wire links to 2-wire links. Fig. 4 shows a hybrid transformer in an analog tail circuit. Because a hybrid transformer is a non-ideal physical device, a certain fraction of the 4-wire incoming (Rx) signal will be reflected into 4-wire outgoing (Tx) signal. A typical fraction for a properly terminated hybrid in a PBX is about -25 decibels (dB), meaning that the reflected signal (the echo) will be a version of the Rx signal attenuated by about 25 dB. For a PSTN POTS (Plain Old Telephone Service) termination, the expected value is between 12 and 15 dB. Echo strength is expressed in dB as a measurement called Echo Return Loss (ERL). Therefore, and ERL of 0 dB indicates that the echo is the same amplitude as the original source. A large ERL indicates a negligible echo. Remember that an echo must have both sufficient amplitude and sufficient delay to be perceived. For local calls with one-way delay from 0 to 25 ms, an echo of strength of -25 dB relative to the speech level of the talker is generally quiet enough to not be annoying. For a one-way delay in the range of 25 to 150 ms, the ERL should exceed 55 dB to eliminate the perception of echo from the end-user perspective, as recommended in ITU-T recommendation G.168 on echo cancellation (ITU-T G.168, 2002). In this case echo cancellation is required.

Figure 2.

Talker echo tolerance curves (ITU-T G.131, 2003)

Advertisement

2. Echo delay estimation using cross-correlation

The following section presents a study of cross-correlation-based Time Delay Estimation (TDE) algorithms. The main purpose is to analyze a number of methods, in order to find the most suitable one for real-time speech processing. As TDE is an important topic during transmission of voice signals over packet-switching telecommunication systems, it is vital to estimate the true time delay between Tx and Rx speech signals. We consider algorithms processing both in time- and frequency domains. An echo delay problem associated with IP-based transport networks is also included into the discussion. An experimental comparison of the performance of numerous methods based on cross-correlation, normalized cross-correlation and a generalized cross-correlation function is presented.

2.1. General scenario of delay estimation using cross-correlation functions

The known problem associated with IP-based networks is that the round-trip delay can be never reduced below its fundamental limit. There is always a delay of at least two to three packet sizes (50 to 80 ms) that can make the existing network echo more audible. Therefore, all Voice over IP (VoIP) network terminals should employ echo cancellers to reduce the amplitude of returning echoes. A main parameter of each echo canceller is a length of coverage. Echo canceller coverage specifies the length of time that the echo canceller stores its approximation in memory. The adaptive filter should be long enough to model an unknown system properly, especially in case of VoIP applications. On the other hand, it is known that the active part of the network echo path is usually much smaller compared to the whole echo path that has to be covered by the adaptive filtering algorithm inside the echo canceller. That is why the knowledge of the echo delay is important for using echo cancellers in packet-switching networks successfully. In general, every communications system includes a communications network and communications terminals on the both sides of the network. The communications terminals could be telephones, soft phones, and wireless voice communication devices. Fig. 3 illustrates how an echo assessment device can be arranged into the defined system. The echo delay estimator has to monitor two parallel channels. An outgoing voice channel transmits an original voice waveform from the first terminal through the communications network to the second terminal. An incoming voice channel receives an echo waveform of the original signal returning from the second terminal through the communications network back. This is a delayed and attenuated version of the original voice signal.

Figure 3.

Arrangement of echo assessment module in the network

Figure 4.

General block diagram of delay estimator

Fig. 4 illustrates a general block diagram of the echo delay estimator. The echo delay estimator computes correlation between two voice channels for different set of delays in parallel manner (Carter, 1976). The delay-shift with the largest cross-correlation coefficient is selected as the delay estimate. Fig. 5 illustrates, in a flowchart form, steps performed when implementing a method of echo delay estimation utilizing cross-correlation algorithms. Once started from block 1, block 2 calculates the cross-correlation function for a buffer of input samples of the Rx and Tx signals. Block 3 utilizes cross-correlation coefficients to compute the similarities between the transmitted signal and the received signal over a range of delays. For each particular delay, the similarity is obtained. Once the similarities have been determined for each delay within the range of delays, block 4 chooses a delay that produces the greatest similarity metric for the given input frames. Consequently, block 5 indicates that the estimation process is completed.

Figure 5.

Flowchart for estimating echo delay value

2.2. Algorithms proceeding in time-domain

Time domain implementation of Cross-Correlation Function (CCF) and Normalized CCF (NCCF) is presented. The cross-correlation function for a successive par of speech frames can be estimated by (Mueller, 1975)

RxyτMIN(m)=n=DD+L1x(n)y(nm)n=0,...,L1,m=0,...,L1RxyτMAX(m)=n=DD+L1x(n)y(nm)n=0,...,L1,m=0,...,L1E1

Here, x(n) simply denotes a frame of the outgoing signal, y(n) is related to a frame of the incoming signal. According to Fig. 4, the estimation of the CCF is done for a supposed range of delays. The time-shift, τ, which is always in range of [τmin; τmax] and causes the maximal peak value of the CCF is declared as an estimate of the true echo delay TD. Similarly to the CCF, an estimate of the NCCF is done (Buchner et al., 2006)

RxynτMIN(m)=n=DD+L1x(n)y(nm)ExEyn=0,...,L1,m=0,...,L1RxynτMAX(m)=n=DD+L1x(n)y(nm)ExEyn=0,...,L1,m=0,...,L1E2

Here, Ex and Ey denotes a short-term energy of the outgoing and the incoming frames. These values are calculated using the following equations

Ex=n=DD+L1x2(n)E3
Ey=n=DD+L1y2(nm)E4

Let us further consider generalized cross-correlation algorithms, which operate in the frequency domain (Youn et al., 1983; Zetterberg et al., 2005).

2.3. Algorithms proceeding in frequency-domain

More sophisticated way how to provide TDE is to compute the cross-correlation function in the frequency domain. This process in literature is called Generalized Cross-Correlation (GCC) (Hertz, 1986). The idea behind this method is to perform pre-filtering of the input signals before calculating CCF. It makes possible to improve the accuracy of delay estimation. Note that the filtering procedure is performed in the frequency domain. Let us describe this process in more details. It is well known, that the simple cross-correlation function, Rxy, between signals x(n) and y(n) is related to the cross-power density function (cross-power spectrum), Gxy, by the general inverse Fourier transform relationship, as

Rxy(m)=Gxy(f)ej2πfmdfE5

When x(n) and y(n) have been filtered with filters having transfer functions H1(f) and H2(f), the cross-power spectrum between the filter out-puts is given by

Gxyg(f)=H1(f)H2(f)Gxy(f)E6

Consequently, the Generalized Cross-Correlation Function (GCCF) between x(n) and y(n) is given by (Knapp & Carter, 1976)

Rxyg(m)=Ψg(f)Gxy(f)ej2πfmdfE7
Ψg(f)=H1(f)H2(f)E8

Here, Ψg, is a generalized weighting function. Table 2 represents weighting functions that were used for experiments with speech signals (Wilson & Darrell, 2006).

The parameter γxy denotes a complex coherence function. It can be calculated as (Tianshuang & Hongyu, 1996)

γxy(f)=Gxy(f)Gxx(f)Gyy(f)E9
Gxx(f)=Rxx(m)ej2πfmdmE10
Gyy(f)=Ryy(m)ej2πfmdmE11

Here, Gxx(f) and Gyy(f) are auto-power spectra of the outgoing and the incoming signal; Rxx(m) and Ryy(m) are auto-correlation functions of the same signals. Fig. 6 illustrates a block diagram of the implemented generalized cross-correlation algorithm, where the Fast Fourier Transform (FFT) is used for auto-spectra and cross-spectrum calculation. After the cross-power spectrum is estimated, it is multiplied by the corresponding GCC weighting function. The inverse FFT is used for obtaining the time domain generalized-cross correlation function. This operation is repeated for the specified range of possible delays. After the whole process has completed, the time shift with maximum corresponding peak value is declared as an estimation of the true delay.

Table 2.

Various GCC weighting functions

Figure 6.

Diagram of the implemented generalized cross-correlation algorithm

2.4. Discussion over experimental results

We used MATLAB software as a simulation environment. The time difference between time when the outgoing signal leaves the voice terminal and moment when the incoming signal containing the echo of the original signal arrives back from the network is referred to as a true echo delay. This value for the first three figures that are presented below equals 6ms (48 samples). For the purpose of TDE it is also necessary to specify time interval through which the value of the true delay is searched. To cover the 6ms delay we choose the interval between 0 and 10ms what corresponds to the maximum delay value of 60 samples. Afterwards we present the estimation results for a group of different delays. It helps to understand better performance of the algorithms. Unfortunately, because of the non-stationary nature of human speech, the CCF is not reliable for all situations. Its performance highly depends on numerous factors, i.e. signal strength, signal-to-noise ratio (SNR), etc (Chen et al., 2006). The NCCF is not so sensitive to the sudden changes in the signal’s amplitude. It outperforms the CCF when we work with low level signals. The advantages of the algorithms proceeding in the frequency domain compared to the algorithms operating in the time domain are accuracy and reduced computational complexity. Fig. 7 illustrates the outputs of the GCC algorithms, which were presented in Table 2.

Figure 7.

Time delay estimation using GCCF

Table 3 and 4 provides us along with the following results. The joint comparison was done in terms of the estimation accuracy of the algorithms. The group of delays was chosen for this experiment. Delay values are consistent with the ones referenced in the corresponding ITU-T recommendation G.131 (ITU-T G.131, 2003). Once the respective cross-correlation function was calculated, its maximum peak value is detected using the searching procedure described in Fig. 4. SCC is related to the Standard CC function.

[ms]SCCROTHSCOTPHATCPS-2HTECKARTHBWIENER
54,95,15,23,75,24,44,25,25,2
109,710,310,37,510,38,88,310,410,3
2019,520,620,715,020,717,716,720,820,7
3029,230,931,022,431,026,525,031,231,0
5048,751,451,637,451,644,241,752,151,7
10097,3102,9103,374,8103,388,583,3104,2103,4
200194,6205,7206,6149,6206,6177,0166,6208,3206,8
300292,0308,6309,9224,4309,9265,4249,9312,5310,3

Table 3.

Mean values of estimated delays

[ms]SCCROTHSCOTPHATCPS-2HTECKARTHBWIENER
54,95,15,23,75,24,44,25,25,2
109,710,310,37,510,38,88,310,410,3
2019,520,620,715,020,717,716,720,820,7
3029,230,931,022,431,026,525,031,231,0
5048,751,451,637,451,644,241,752,151,7
10097,3102,9103,374,8103,388,583,3104,2103,4
200194,6205,7206,6149,6206,6177,0166,6208,3206,8
300292,0308,6309,9224,4309,9265,4249,9312,5310,3

Table 4.

Root mean square deviation of estimated delays

The abscissa of the largest peak value is the estimated delay. Note that 50 trial speech records for each processor were evaluated to obtain the mean value and the Root Mean Square Deviation (RMSD) parameter (Anderson & Woessner, 1992). Not only different speech signals, but various hybrid impulse response models have been used. The results for delays from 5 to 300 ms are presented in the corresponding tables. Table 3 contents the mean values, whether Table 4 illustrates the estimated RMSD values.

Advertisement

3. Echo delay estimation using adaptive filters

In this section, we introduce methods for extracting an echo delay between speech signals using adaptive filtering algorithms. We know that time delay estimation is an initial step for many speech processing applications. Conventional techniques that estimate a time difference of arrival between two signals are based on the peak determination of the generalized cross-correlation between these signals. To achieve a good precision and stability in estimation, the input sequences have to be multiplied by an appropriate weighting function. Regularly, the weighting functions are dependent on the signals power spectra. The spectra are generally unknown and have to be estimated in advance. An implementation of the time delay estimation via the adaptive least mean squares is analogous to estimating the Roth generalized cross-correlation weighting function. The estimated parameters using the adaptive filter have a smaller variance, because it avoids the need for the spectrum estimation. In the following, we discuss proportionate and partial-update adaptive techniques and consider their performance in term of delay estimation.

Time delay estimation (TDE) has always been and remains a popular research topic. It is known, that the main problem associated with IP-based networks is the round-trip delay, which can never be reduced below its fundamental limit. A number of efforts were made in order to improve the TDE precision. Various methods based on the Generalized Cross-Correlation (GCC) were proposed. The GCC algorithms arrange a pre-filter to obtain the modified signal spectrum for optimal time delay estimation. To specify the filter’s characteristic, it requires a priori knowledge of the statistics of the received signals. However, the efficiency of the algorithms decreases considerably when little or no prior knowledge about the signal statistics is known. From the time when Widrow proposed an adaptive filtering technique based on Least Mean Squares (LMS) (Widrow, 2005; Haykin, 2001), an adaptive theory also found an application to delay estimation. An adaptive implementation of the time delay estimation via Widrow’s LMS algorithm is usually referred to as TDLMS. Comparing to the GCC algorithms, the adaptive filtering techniques do not require a priori information of the signal statistics, because the estimation of the signal spectrum is no longer needed. The adaptive filtering algorithms determine the time delay in an iterative manner. There are comparative studies, which provide comparison of the LMS versus the generalized cross-correlation (Zetterberg et al., 2005). Generally, the time domain implementation of any adaptive filter is associated with high computational complexity. It directly depends on the length of the adaptive filter. In order to reduce the computational load of the TDLMS (Emadzadeh et al., 2008), we offer using adaptive filtering algorithms with reduced computational complexity. They provide savings comparing to the conventional adaptive algorithms. In the following, we discuss each of the algorithms in greater details. First, a general scenario for the adaptive time delay estimation using a simple Normalized Least Mean Squares (NLMS) adaptive filtering algorithm is presented. Afterwards, we introduce the proportionate and partial-updated algorithms proceeding in the time domain. A new partial-updated proportionate NLMS algorithm is outlined. A comparison between the TDE algorithms is made in context of the network echo delay estimation.

3.1. General scenario of delay estimation using adaptive filters

Traditionally, the NLMS algorithm is used for the echo canceller implementation. It applies a Finite Impulse Response (FIR) adaptive filter with adjustable weights for modelling the unknown echo path’s impulse response. The NLMS algorithm minimizes the least mean squares difference between two signals: the reference incoming signal and the filtered original (outgoing) signal (see Fig. 7).

Figure 8.

Time delay estimation using adaptive filter

Basically, the NLMS algorithm is a simple extension of the Widrow’s LMS algorithm. According to Fig. 7, the adaptive filter weights are tuned iteratively using a special rule. The important parameter controlling the whole adaptation process is referred to as a step-size parameter. It should be varied in time in order the algorithm to be able to track non-stationary changes in the echo path’s impulse response. Unlike the LMS algorithm, the NLMS algorithm’s step-size is adjusted according to the instantaneous short-time energy of the input signal. The adaptive filtering process using the NLMS algorithm can be described by the following set of equations

w(n+1)=w(n)+1xT(n)x(n)μ0e(n)x(n)E12
e(n)=y(n)wT(n)x(n)=y(n)i=0L1w(n)x(ni)w(n)=[w1(n),w2(n),...,wL(n)]Tx(n)=[x(n),x(n1),...,x(nL+1)]Tμ(n)=[μ1(n),μ2(n),...,μL(n)]TE13

where w is a vector containing L coefficients of the adaptive filter; x is a vector consisting of L samples of the input signal x(n); e(n) is a difference between the reference signal, y(n), and the adaptive filter output during the nth iteration; μ0 is the fixed NLMS step-size parameter from the interval (0;1). Fig. 8 illustrates the basic principle of the NLMS adaptive algorithm. While the signal x(n) corresponding to the outgoing voice signal (that notation is used in the previous section) is referred to as the far-end signal, the signal y(n) corresponding to the incoming voice signal is referred to as the near-end signal. Basically, the near-end signal y(n) is the delayed and attenuated version of the far-end signal x(n).

Figure 9.

Principle of the normalized least mean squares algorithm

3.2. Time-domain adaptive algorithms

Knowing the adaptive theory, it is trivial that the delay estimation can be achieved by selecting the largest value from the adaptive filter weights vector, w. There is only one issue that has to be taken into account. The adaptive filter needs some time in order to converge to the optimal performance. The existing adaptive algorithms differ from each other with different convergence properties and computational memory requirements. The robust fast converging algorithms are primarily used in the acoustical echo cancellation applications. They take a lot of computational resources. In our case, it is not necessary to apply the complex algorithms, because the adaptive filter is not directly used for the purpose of echo cancellation, but for the delay estimation. Therefore, the reduced complexity adaptive filtering algorithms became the subject of our interest.

3.2.1. Proportionate adaptive filtering algorithms

In the following, we provide a reader along with a principal of the proportionate adaptive technique. The proportionate normalized least mean squares (PNLMS) algorithm proposed in (Duttweiler, 2000) has been developed for use especially in the telephone network environment. For hybrid echo cancellers, it is reasonable to assume that the echo path has a sparse character (i.e., many IR’s (Impulse Response) coefficients are close to zero). Although there are studies and research on the multiple reflection echo paths, a typical echo path impulse response in the practical communication networks has only one reflection, which means all the active coefficients are occupied in a continuous area of the whole echo span. Therefore, we do not need to update all the filter weights to achieve a nominal performance. One can adjust and operate only with non-zero active coefficients. Basically, this is a main idea behind the PNLMS algorithm and other subsequently derived proportionate adaptive filters (Hongyang & Doroslovacki, 2006; Paleologu et al., 2006). Proportionate approaches achieve their higher convergence rate by using the fact that the active part of network echo path is usually much smaller (4-8ms) compared to 64-128 ms of the whole echo path that has to be covered by the adaptive filter. In case of voice transmission over the packet-switching network, these numbers may be more considerable. In the PNLMS algorithm, the adaptive step-size parameters are assigned to all the filter coefficients. They are calculated from the last estimate of the filter weights in such a way that a larger coefficient receives a larger increment. As a result, the convergence rate can be increased the fact that the active taps are adjusted faster than non-active coefficients. Therefore for the sparse IR, the PNLMS algorithm converges much faster comparing to the NLMS. This feature is an advantage especially when it is necessary to estimate the long echo delays. The PNLMS algorithm can be described using the following equations:

w(n+1)=w(n)+1xT(n)G(n1)x(n)μ0G(n1)e(n)x(n)E14
G(n1)=diag{g0(n1),...,gL1(n1)}E15

where G(n-1) is a diagonal matrix adjusting the step-size parameters, μ0 is an overall step-size parameter. The diagonal elements of G(n) are estimated as follows:

γl(n)=max{ρmax[δp,|w0(n)|,...,|wL1(n)|],|wl(n)|}E16
gl(n)=γl(n)i=0L1γi(n),
0lL1E17

Parameters δp and ρ are positive numbers with typical values for δp = 0.01 and for ρ = 5/L. The first term in (17), ρ, prevents wl(n) from stalling when it is much smaller than the largest coefficient and δp regularizes the updating when all coefficients have zero values at initialization. In spite of the sparse system identification, which is a vital requirement for the fast converging adaptive filters, there is another requirement. It is directly addressed to the adaptive filter implementation. The algorithm should have reasonable power concerns. Unfortunately, the PNLMS algorithm has drawbacks. One of them is an increase in the computational complexity by 50% compared to the NLMS algorithm. Furthermore, the PNLMS algorithm shows the slow convergence rate after the fast initial start. It is because of the slow convergence rate dedicated to the small coefficients (Gay, 1998). The increased computational complexity can be reduced by the way of selective partial-updating. In turn, the slow convergence of the PNLMS in the stable state can be improved by switching from the PNLMS to NLMS equations after the fast initial convergence has been achieved (Benesty & Gay, 2002). Recently researchers proposed the partial-update techniques for using along with the adaptive algorithms. The partial-update algorithms reduce computational complexity by updating only a subset of the filter weights per iteration. They can be relevant for applications requiring fast real-time processing. Unfortunately, there is another side to the coin. The fewer coefficients you are going to update per iteration, there is more misalignment presented in the algorithm. Therefore, a certain trade-off should be made when selecting the number of coefficients to update. The following subsection presents several partial-update algorithms along with the proposed modification of the partial-update PNLMS algorithm.

3.2.2. Partial-update adaptive filtering algorithms

The partial-update algorithms can be seen to exploit the sparseness of the echo path in two different ways. It is known that when the unknown system’s impulse response is sparse, many of the adaptive filter’s weights can be approximated to zero. Alternatively, the sparseness may be present in the weight update vector as a consequence of the distribution of the input samples in the (Lx1) input vector, xn=[x(n), x(n-1), …, x(n-L+1)]T. In both these cases, exploiting the sparseness properties can reduce complexity and improve performance of the adaptive algorithm (Fevrier & Gelfand, 1999). Some of the first work on the partial-update algorithms was done by Douglas (Douglas, 1997). It presents the periodic and the sequential updating schemes for the Max-NLMS algorithm. However, these partial-update algorithms show slow convergence 2properties compared to the full-update algorithms. The reason is inconsistent updating schemes. More recently, the partial-updating concept was developed by Aboulnasr (Aboulnasr, 1999). It leads to the M-Max NLMS algorithm and supporting convergence analysis (Aboulnasr & Mayyas, 1998). Another block-updating scheme for the NLMS algorithm was studied by Schertler (Schertler, 1998). The latter work was published by Dogancay and Tanrikulu. They consider approaches for more robust Affine Projection Algorithm (APA) (Dogancay & Tarinkulu, 2001, 2002). Further, we give a summary of the general partial-update algorithms along with the proposed one.

M-Max NLMS

The algorithm selects a specified number of the coefficients providing the largest reduction in the mean squared error per iteration (Naylor & Sherliker, 2003). Only M out of the total L filter coefficients are updated. Those M coefficients are the ones associated with the M largest values within the following vector |x(n - i + 1)|; i = 1;...; L. The update equations for this algorithm are

wi(n+1)=wi(n)+1xT(n)x(n)μ0e(n)xi(n)E18
wi(n)={1,i{Mmaximaof|x(ni+1)|}0,otherwise,0iLE19

One of the features of the M-MAX-NLMS algorithm is that it reduces the complexity of the adaptive filter by selectively updating the coefficients while maintaining the closest performance to the full-update NLMS algorithm. We present misalignment curves for the algorithm in the up-following section. The misalignment value was calculated using the equation formula

Misalignment(n)=10log10(j=1L(hj(n)wj(n))21=jL(hj(n))2)E20

Here, wj is the j-th coefficient of the adaptive filter, while hj is the j-th coefficient of the simulated echo path’s IR.

Selective-partial-update NLMS

This algorithm opposed to the M-Max NLMS has a block structure. An objective behind the latter is the same: it reduces computational costs by updating a subset of the filter coefficients. But first, the vector x(n) and the coefficient vector w(n) are arranged into K blocks of length M = L/K, where L is an integer as in

x(n)=[x1T(n)x2T(n)...xKT(n)]TE21
w(n)=[w1T(n)w2T(n)...wKT(n)]TE22

The coefficient vector’s blocks w1(n), w2(n),…,wK(n) represent candidate subsets that can be updated during the current iteration. For a single-block updating scheme, the constrained minimization problem, which is solved by the NLMS algorithm, can be written as

wi(n+1)=wi(n)+1xi(n)2μ0e(n)xi(n)E23

The selection of the block that has to be updated is made by determining the block with the smallest squared-Euclidian-norm update. According to Eq. (24), that justification can be described by the following terms

i=argmin1/xj(n)21jM=argmaxxj(n)21jME24

Generalization from the single-block to the multiple-block updating scheme is done through the following. Suppose that only the B (B<K) blocks with the largest magnitudes are selected to be updated. Let a vector IB = {i1, i2,…, iB} denote the subset of B blocks out of {1, 2, …, K} to be updated. Thus, the equation for the B-block updating scheme is

wIB(n+1)=wIB(n)+1xIB(n)2μ0e(n)xIB(n)E25

where xIB and wIB are defined as follows

xIB(n)=[xi1T(n)xi2T(n)...xiBT(n)]TE26
wIB(n)=[wi1T(n)wi2T(n)...wiBT(n)]TE27

The computational and memory requirements of the selective-partial-update NLMS algorithm are almost identical to those of the selective-block-update algorithm proposed in (Aboulnasr & Mayyas, 1998). Nevertheless, simulation results illustrated in the next section shows that this approach does not lead to the reasonable trade-off between performance and simplicity. The algorithm’s efficiency is weaker than the one of the M-Max NLMS algorithm. As an alternative approach, a sparse-partial-update NLMS algorithm applies more relevant selection criterion.

Sparse-partial-update NLMS

This algorithm utilize a so-called sparse-partial (SP) weight selection criterion (Jinhong & Doroslovacki, 2008). The adaptive filter weights are updated based on the largest product of the multiplication of x(n) and w(n). The SP-NLMS single-block update equations are given by

wi(n+1)=wi(n)+1xT(n)x(n)μ0e(n)xi(n)E28
wi(n)={1,i{Mmaximaof|x(ni+1)|wi(n)}0,otherwise,0iLE29

Hongyang and Dyba recently suggested a generalization for updating B blocks out of K, i.e.

wIB(n+1)=wIB(n)+1xIBT(n)xIB(n)μ0e(n)xIB(n)E30
IB={i:isoneoftheBlargestamongwi1T(n)xi1(n),...,wiBT(n)xiB(n)}E31

Simple-partial-update PNLMS

This proposed algorithm exploits the sparseness of the communication channel to speed up the initial convergence and employs the partially updating scheme to reduce the computational complexity. A selection procedure is performed in accordance with the estimated magnitude of the channel’s impulse response. The S-PNLMS algorithm for single -block update is defined as follows. Arrange x(n) and w(n) into K blocks of length M = L/K in the same way as it is done in Eq. (22, 23). Then let Gi(n) denote the corresponding MxM block of the diagonal weighing matrix, G(n). The recursion for updating adaptive filter weights is given by

wi(n+1)=wi(n)+1xiT(n)Gi(n1)xi(n)μ0Gi(n1)e(n)xi(n)E32

where the block selection is done according to the following

i=argminGj(n)1jME33

It is different to

i=argminxjGj(n)xj1jME34

which is used with the SPU-PNLMS algorithm. It is apparently from the simulations that the S-PNLMS has similar performance to the SP-NLMS and outperforms the SPU-PNLMS algorithm. The S-PNLMS for updating B blocks out of M has these update equations

wIB(n+1)=wIB(n)+1xIBT(n)GIB(n1)xIB(n)μ0GIB(n1)e(n)xIB(n)E35
IB={i:Gi(n)isoneoftheBlargestamongG1(n),...,GM(n)}E36

Further, we provide the comparison results for the presented algorithms and demonstrate their performance while estimating the predefined echo delay. Table 5 illustrates the computational complexity of the full-update algorithms and shows saving achieved by the partially updating schemes.

ALG.MULT.ADD.DIV.
NLMS3L+13L-11
M-Max-NLMS3M+13M-11
SPU-NLMS3M*B+13M*B -11
SP-NLMS3M*B +13M*B -11
PNLMS6L+14L-2L+1
M-Max-PNLMS6M +14M -2M +1
SPU-PNLMS6M*B +14M*B -2M*B +1
SP-PNLMS6M*B +14M*B -2M*B +1
S-PNLMS6M*B +14M*B -2M*B +1

Table 5.

Comparison of computational complexity

3.3. Discussion over experimental results

To evaluate the performance of the adaptive filtering algorithms, we implemented them in MATLAB environment. The working scenario is as follows. The filters have to estimate the predefined echo path’s impulse responses specified in the ITU-T Recommendation G.168 (ITU-T G.168, 2002). They consist of 1024 taps. The overall step-size parameter, μ0, is chosen to be 0.1. The control parameters ρ and δp are chosen to be 0.001 and 0.01 respectively. In the first part of the experiment, we look at the misalignment curves of the M-Max-, SPU-, SP- and S-PNLMS algorithms. They are illustrated in Fig. 9 below. The SPU-updating scheme produces the worst results. The proposed S-criterion considerably outperforms it especially in terms of the initial convergence speed. The rest of the algorithms have nearly the same convergence and tracking performance.

Figure 10.

Misalignment curves of the partial-update algorithms

All the algorithms, except the M-Max-PNLMS, show poor results when the M value equals 64. It can be explained by the fact that the active part of the IR is approximately 16ms long. This value corresponds to 128 samples for sampling frequency of 8kHz, therefore, 64 samples are not enough to cover the active region completely. Regarding to the dissimilar selection criterion, the M-Max-PNLMS algorithm can deal relatively well with that problem. The Max-updating formula does not count with the sparse character of the IR. It performs selection according to the distribution of the values of the input vector. Otherwise, its drawback is lower initial convergence speed comparing to the SP-PNLMS algorithm. The second part of our experiment concerns the performance of the adaptive algorithms versus the ones based on the generalized cross-correlation function. They are compared in the context of the time delay estimation. The time difference between time when the outgoing signal leaves the voice terminal and the moment when the incoming signal containing the echo of the original signal arrives back from the network is referred to as a true echo delay. Table 6 concludes the results for the AF and GCC algorithms for different delay values. Fig. 10 shows the estimated echo delay diagram for the echo path that has a 60 ms pure delay. It can be observed that the outputs from the adaptive filter in the steady state have much smaller variance than the results obtained from the algorithm based on the generalized cross-correlation function.

Figure 11.

Estimated echo delay curves

[ms]ROTHSCOTCPS-2HBM-MaxSPUSPSsamples
55,15,25,25,25,15,15,05,040
1010,310,310,310,410,310,310,410,180
2019,619,720,020,020,520,520,120,3160
3029,329,630,130,030,830,830,130,2240
5049,049,749,449,851,351,350,350,1400
10098,199,498,899,6102,5102,5100,5100,6800
200196,2198,8197,6199,1205,1205,1201,3201,21600
300294,2298,1296,4298,7307,6307,6301,8301,92400

Table 6.

Mean values of estimated echo delays

Advertisement

4. Partial, proportionate and sparse control for multi-delay filters

Taking into the account the fact that the generalized cross-correlation algorithms operate in the frequency domain and use advantages of the fast Fourier transform, reasonable computational savings for adaptive filtering algorithms can be achieved as well. The basic operation underlying frequency domain adaptive filters is the transformation of the input signal into a more desirable wave form before its adaptive processing. This is accomplished by the Discrete Fourier Transform (DFT) whereby the input signal is transformed to the frequency domain as shown in Fig. 11.

Figure 12.

Frequency domain adaptive filter configuration

Frequency Domain (FD) adaptive filters have primarily two advantages compared to time domain implementations (Dohnal, 1995). The first advantage is the potentially large savings in the computational complexity. The Fast Fourier Transform (FFT) is an efficient implementation of the DFT, which provides these savings. A second advantage is that the DFT structures generate signals that are approximately uncorrelated (orthogonal) (Shynk, 1992). As a result, a time-varying step-size parameter is used for each adaptive weight, thereby allowing a more uniform convergence rate across the adaptive filter. Thus, the overall convergence rate of the algorithm may be improved, sometimes approaching that achievable with Recursive Least Squares (RLS) algorithms without a similar increase in the computational complexity (Widrow & Stearns, 1985). Many other attractive features and variation of the FD adaptive filters can be found in the literature (Khong et al., 2007).

4.1. Sparse partial-update proportionate multi-delay filter

The multi-delay frequency domain adaptation algorithm generates its individual step-size control factors that are mutually orthogonal. This advantage put the Multi-Delay block Frequency domain (MDF) algorithm closer to the analytical RLS algorithm. It differs from the algorithms processing in the time domain with better convergence properties and lower computational complexity. The MDF algorithm decreases the processing delay associated with frequency domain adaptive filters. The delay value is directly proportional to the number of the filter coefficients. The multi-delay approach solves this problem by partitioning the time-domain filter of length L into K sub-filters each of length N with L=KN. Consequently, the delay value is reduced by a factor of L/N compared to the full length approach. The diagram of the MDF algorithm is shown in Fig. 12. This algorithm uses the FFT of size NFFT. It equals to the smallest power of two integers larger than or equal to 2L/K=2KN/K=2N. The MDF algorithm works as follows. Let us first define m, as the frame index. The speech signals are operated on the frame-by-frame basis.

The first step of the MDF algorithm is to convert the most recent overlapped input samples to the frequency domain as follows

Xf(K,m)=diag{FFT[x0(m1),x1(m1),...,xN1(m1),x0(m),x1(m),...,xN1(m)]T}E37

The input vectors incoming to each of the sub-filter blocks can be obtained via the frame index shifting without invoking any computation as follows:

Figure 13.

Block diagram of the MDF algorithm

Xf(k,m)=Xf(k+1,m1),k=1,...,K1E38

The described approach suggests that only one FFT is needed per frame iteration in order to transform the input vector into the frequency domain. It implies a significant computation saving. The output and the error frequency-domain vectors can be expressed and calculated as

d(m)=lastNtermsof{FFT1[k=1KXf(k,m)Wf(k,m)]}E39
Ef(m)=FFT{0,0,...,0Nzeros,[y(m)d(m)Nterms]}E40

where Wf(k,m) is the kth coefficient vector and d(m) is the desired vector. The coefficient update equations to minimize the Mean Square Error (MSE) are the following

φt(k,m)=firsthalfof{FFT1[Xf(k,m)[SMDF(m)+δ]1Ef(m)]}E41
SMDF(m)=λSMDF(m1)+(1λ)Xf(k,m)Xf(k,m)E42
Φf(k,m)=FFT[φt(k,m),0,0,...,0Nzeros]TE43
Wf(k,m+1)=Wf(k,m)+KμkΦf(k,m)E44

where k=1,2,…, K and μk is the block step-size control parameter, 0<<λ<1 is the forgetting factor. Further, we examine how the partial update technique may be incorporated into the MDF algorithm. For frequency-domain selection, the known approach is to select frequency bins corresponding to the largest magnitude Fourier transform of the tap-input over all the sub-filter blocks k=1,2,…, K (Khong et al,2008). However, the selection is better to perform on the coefficient-block basis (Deng & Dyba, 2008). It is faster and takes less computation power. Therefore, first we need to calculate metrics for each of the sub-filter in order to perform selection procedure. The number of metrics equals the number of sub-filter blocks K. One particular frequency-domain metric, Qk, may, for example, represent the sum of the frequency bins in the corresponding sub-filter block

Qf(k,m)=j=12N|Xfj(k,m)|E45

There is a little bit different selection approach, which shows better convergence performance compared to the last one. Define the 2L x 1 vector χ(m), which consists of the concatenated Fourier transform of the input signal across all sub-filters, as

χ(m)=[XfT(0,m),...,XfT(K1,m)]=[χ1(m),...,χ2L(m)]TE46

Each element in the 2L x 2L diagonal selection matrix P(m)

pl(m)={1,iflcorrespondstoMfmaximaof|χl(m)|,l=0,...,2L10,otherwiseE47

The tap-selection routine is done by multiplying this matrix with the corresponding input in every kth sub-filter, as it is shown below

X˜f(k,m)=Pk(m)Xf(k,m)E48

Otherwise, if it is supposed that the algorithm models a sparse network impulse response, the sparse partial updating scheme should be involved. The control metrics are calculated as follows (k=1,2,…, K)

Qf(k,m)=Xf(k,m)Wf(k,m)E49

Here, we suggest using a new type of metric, in order to choose sub-filter blocks for updating. Note, that it should be calculated in the time-domain as follows

Qt(k,m)=i=0N1|wi(k,m)|Lj=0L1(wj(m))2+εE50

We have to remember that the minimum number, M, of the sub-filter blocks to be updated must equal or be greater than two (2≤M<K). This is because the pure delay of the echo path is never known. Thus, if only one sub-filter is chosen, it can not cover all the possibilities of the variation of echo path. Another important question for frequency domain adaptive filters is the FFT length? – From our point of view, it is reasonable using the FFT of 256 samples long. This is an optimal length, which can provide satisfactory results for the high range of MDF algorithms. We have tried to apply different values, but they did not bring sufficient improvements. For example, if the echo path consists of 1024 taps, the adaptive filter should be divided in 1024/(256/2) sub-filters. The number K of sub-filters will equal 8.

Before we present a proposed frequency domain algorithm, let us resume the following. The partial update frequency-domain algorithms may use the values of the calculated block metrics as selection criteria. Each type of metrics has its own features. In case of (13, 18, 19), only K (K<<2L) metrics are sorted in order to select M (2≤M<K) active subfilters. Since K is a small number compared to the entire number 2L of subfilters coefficients in (14, 15, 16, 17), the sorting overhead may be reduced by factor 2L/N. We have spoken about the features of sparse echo paths and the possibility of improving their convergence rate by assigning the individual step-size parameters proportionally to the coefficient magnitudes. In the art, there is a big family of various MDF algorithms. Researchers are trying to improve algorithms performance from one hand and to reach computational savings from the other hand. Further we present our approach how to deal with this contradictory problem. First, we describe the basic idea of sparseness measure, and show how to incorporate it into the MDF framework. The concept of sparseness refers to a representational scheme where only a few units (out of a large massive) are effectively used to represent typical data vectors. In effect, this implies most units taking values close to zero while only few take significantly non-zero values. Numerous sparseness measures have been proposed and used in the literature to date (Benesty et al, 2006). Such measures quantify how much energy of a vector is packed into only a few components. On a normalized scale, the sparsest possible vector (only a single component is non-zero) should have a sparseness of one, whereas a vector with all elements equal should have a sparseness of zero. In our approach, we use s sparseness measure based on the relationship between the L1 norm and the L2 norm:

ξ(m)=1wj(m)1Lwj(m)2+ε=1i=0L1|wj(m)|Lj=0L1(wj(m))2+εE51

Here, L is the length of the adaptive filter in the time domain. This function evaluates to unity if and only if w contains only a single non-zero component, and takes a value of zero if and only of all components are equal (up to signs), interpolating smoothly between the two extremes. We use this sparseness measure ξ of the time domain estimated filter coefficients to adaptively control the number of taps that need to be updated

Mt=L(1ξ(m))E52

Note that the total number of corresponding frequency domain coefficients is 2L for an L-long time domain adaptive filter, therefore

Mf=2Mt=2L(1ξ(m))E53

If selection is performed on the block-based basis, then the corresponding number of sub-filters in frequency domain equals

M=round{2L(1ξ(m))2N}=round{K(1ξ(m))}E54

Fig. 13 shows sparseness measure for sparse and dispersive impulse responses. It is also shown the number of adaptive filter coefficients that should be updated, if partial updating scheme is considered.

The convergence process, by itself, can be divided into two stages, i.e. before and after the convergence of the impulse response (or some time, before and after the convergence of the considerable/large coefficients). It has been showed in the time domain adaptive filtering that it is better using PNLMS/IPNLMS at the initial stage for fast convergence, and afterwards switching to NLMS for assuring better misalignment. Here, for recognizing between two stages in adaptation process, we propose to use the following parameter

η(m)=10log10(j=0L1(wj(m))2L0,001)E55

Figure 14.

Sparseness measure and its impaction on length of adaptive filter

The quantity in Eq. (56) represents an energy measure in [dBm] within an estimated impulse response. Fig. 14 demonstrates the curve for this parameter for two frequency domain algorithms. It can be observed that after 0.5 seconds estimation of IR energy measure for IPMDF stops fluctuating. Consequently, this fact can be used for switching between different adaptation schemes.

Figure 15.

Estimated misalignment and energy measure for MDF and IPMDF algorithms

Before representing the proposed MDF scheme, declare the following statements:

I. Utilization of sparseness measure, 0<ξ(m)<1 (for real IR)

ξ(m)<0.7

IR is considered dispersive

Mt = L, if fully updated scheme is chosen (mostly during initial period)

L/K ≤ Mt < L, if partial updated scheme is chosen

ξ(m)≥0.7

  • IR is considered sparse

  • Mt = L.(1- ξ(m))

II. Utilized updating schemas

  • non-partial

  • χ – based selection (coefficient-based), 2L-vector

  • μ - based selection (block-based), K-vector

III. Utilization of IR energy measure, η

  • |η(m) - η(m-1)| ≤ Δη, switching to the MDF algorithm

  • |η(m) - η(m-1)| > Δη, switching to the IPMDF (SC-IPMDF) algorithm

Finally, our proposed algorithm can be described as:

1st stage: |η(m) - η(m-1)| ≤ Δη

4.2. Discussion over experimental results

To compare the performance of the proposed algorithm with reference MDF, IPMDF, SC-IPMDF adaptive filtering algorithms, we implement all four adaptive filters in MATLAB environment. These MDF-based filters estimate the impulse response of the predefined echo paths, which are specified by the ITU-T Recommendation (ITU-T G.168, 2002). The tested impulse responses are 1024 taps long. All experiments are performed using pre-recorded real speech signals. The number K of sub-filters equals 8. The following values for control parameters are used: μinitial = 0.1, αinitial = -0.75, ε = 10-3, Δη = 0.05 dBm, Tη = 8. The performance of each algorithm is studied using the normalized misalignment parameter, which can be estimated as follows

MIS=10log10(hw(m)2h2)in[dB]E56

where h is a true impulse response of length L. Another criterion is Echo Return Loss Enhancement (ERLE), which is used in real-life environment to evaluate performance

ERLE=10log10(y(m)d(m)2y(m)2),[dB]E57

where y(m) is a desired signal (echo) and d(m) is adaptive filter’s output. Note that any reliable adaptive filter with disabled residual echo suppressor has to achieve ERLE of -15dB within 1 second after starting convergence process (ITU-T G.131, 2003). Fig. 15, 16, 17 illustrate an application of μ-based selection metric. This kind of metric is used for sub-filter selection, when estimated sparse measure parameter, ξ, equals or larger than 0.7. This value was defined experimentally during multiple trials for numerous types of echo path. We suggest using μ-based selection metric as an individual block step-size parameter. It helps accelerating a speed of convergence by allocating larger step-size values for currently updated sub-filters. If you look at the diagram illustrated in Fig. 17 carefully, you will notice that the energy, which is available for adaptation, is concentrated around the sparse region of the echo path. Thus, this fact can be used for selecting sub-filters to be updated along with setting the step-size parameter for these sub-filters. When estimation of sparse measure, ξ, is smaller than 0.7, we suggest switching to the χ – based selection metric.

Figure 16.

Sparse impulse response and estimated μ-based selection metrics

Figure 17.

Dispersive impulse response and estimated μ-based selection metrics

Fig. 18 demonstrates the normalized misalignment and ERLE parameters obtained for real speech signals. The proposed partially updated scheme for MDF shows the similar performance comparing to the other three fully updated frequency domain algorithms. During our future work we are going to enhance the above described algorithm and propose a new class of partial sparse-controlled robust algorithms, which will work reliably, even in double-talk situation. We will apply all the knowledge, which were presented within this particular chapter. Further to conclude the chapter, let us provide summary of material and make several contributions according to the proposed algorithms.

Figure 18.

Step-size parameter estimated using μ-based selection metric

Figure 19.

Misalignment and ERLE curves

Advertisement

5. Conclusion

The first section outlines a basic principle of echo control in packet-based networks. It explains why it is so important to provide monitoring during telephone conversations. When delivering the VoIP service in the packet-switching network, it is important to have the value of the echo delay under control. The increasing transmission delay associated with packet data transmission can make a negligible echo more annoying. Therefore, it is suggested using the echo assessment algorithm. It is purpose is to add an additional attenuation to a particular voice channel (which in terms means to activate an echo canceller), so as to remove the unwanted echo in time. In the second section, we consider an opportunity of using cross-correlation for estimating echo delays. That section provides readers along with up-to-date correlation-based TDE algorithms, which we use to estimate the echo path delays. The problem of long delays taken place in the packet-switching network is considered as a topic of interest. The experiments show that the algorithms precision decreases with increasing transmission delays. The generalized cross-correlation algorithms operating in the frequency domain provide more reliable result comparing to the standard cross-correlation and normalized cross-correlation algorithms. As an alternative to correlation-based methods, techniques, which use adaptive filtering algorithms, can be also applied. Therefore, the third section presents numerous partial-update algorithms and their application to delay estimation. The echo assessment is based on the reduced complexity partial-update adaptive filters. The experiments show a reliable performance of these algorithms. However, their precision suffers during the initial stage of convergence. According to the ITU-T Recommendation G.168, this period should not last more than one second. The Multi-Delay block Frequency domain (MDF) adaptive algorithm can easily outperform all existing time domain algorithms. Moreover, taking into the account the fact that the generalized cross-correlation algorithms operate in the frequency domain and use advantages of the fast Fourier transform, further computational savings for the adaptive filters are achieved in the frequency domain. Therefore, the fourth section deals with partial, proportionate, and sparse-controlled adaptive filtering algorithms working in the frequency domain. What we claimed, within this section, is: a new metric for performing partial updating; a new approach for designating transitions between MDF and IPMDF-based updating schemas; a method for estimating step-size control parameter; a new partially updated sparseness-controlled improved proportionate multi-delay filter; all the approaches are suitable for implementation whether in time or frequency domains. The proposed algorithm has both a performance compared to the IPMDF and SC-MDF algorithms and reduced computational complexity along with the adjustable step-size parameter. Although the preferred embodiments of the proposed algorithm have been described, it will be understood by those skilled in the art that various changes may be made thereto without departing from the main scope of the invention or the appended claims.

Advertisement

Acknowledgments

This work was supported by the Grant Agency of the Czech Technical University in Prague, grant No. SGS 10/275/OHK3/3T/13 and by Grant The Ministry of Education, Youth and Sports No. MSM6840770014.

References

  1. 1. ChoiB.K.MoonS.Zhi-LiZ. 2004 Analysis of Point-To-Point Packet Delay In an Operational Network, Proceedings of INFOCOM 2004, Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies, 3 17971807 , 0-78038-355-9 Kong, March 7-11, 2004
  2. 2. GordyJ. D.GoubranR. A. 2006 On the Perceptual Performance Limitations of Echo Cancellers in Wideband Telephony, IEEE Transactions on Audio, Speech, and Language Processing, 14 1 3342 , 1558-7916
  3. 3. NisarK.HasbullahH.SaidA. M. 2009 Internet Call Delay on Peer to Peer and Phone to Phone VoIP Network, Proceedings of ICCET ‘09, International Conference on Computer Engineering and Technology, 2 517520 , 978-1-42443-334-6 Singapore, Januar 22-24, 2009
  4. 4. DybaR. A. 2008 Parallel Structures for Fast Estimation of Echo Path Pure Delay and Their Applications to Sparse Echo Cancellers, Proceedings of CISS 2008, 42nd Annual Conference on Information Sciences and Systems, 241245 , 978-1-42442-246-3 Princeton, NJ, March 19-21, 2008
  5. 5. HongyangD.DybaR. A. 2008 Efficient Partial Update Algorithm Based on Coefficient Block for Sparse Impulse Response Identification, Proceedings of CISS 2008, 42nd Annual Conference on Information Sciences and Systems, 233236 , 978-1-42442-246-3 Princeton, NJ, March 19-21, 2008
  6. 6. KhongA. W. H.NaylorP. A. 2006 Efficient Use Of Sparse Adaptive Filters, Proceedings of ACSSC ‘06, Fortieth Asilomar Conference on Signals, Systems and Computers, 13751379 , 1-42440-784-2 Grove, CA, October 29-November 1, 2006
  7. 7. HongyangD.DybaR. A. 2009 Partial Update PNLMS Algorithm for Network Echo Cancellation, Proceedings of ICASSP 2009, IEEE International Conference on Acoustics, Speech and Signal Processing, 13291332 , 978-1-42442-353-8 Taipei, April 19-24, 2009
  8. 8. ITU-T Recommendation G.131 2003 Talker Echo and its Control
  9. 9. ITU-T Recommendation G.168 2002 Digital network echo cancellers
  10. 10. CarterG. C. 1976 Time Delay Estimation, Ph.D. dissertation, University of Connecticut, Storrs, CT, 6770
  11. 11. MuellerM. 1975 Signal Delay, IEEE Transactions on Communications, 23 11 13751378
  12. 12. BuchnerH.BenestyJ.GanslerT.KellermannW. 2006 Robust Extended Multidelay Filter and Double-talk Detector for Acoustic Echo Cancellation, IEEE Transactions on Audio, Speech, and Language Processing, 14 5 16331644 , 1558-7916
  13. 13. YounD. H.AhmedN.CarterG. C. 1983 On the Roth and SCOTH Algorithms: Time-Domain Implementations, In Proceedings of the IEEE, 71 4 536538 , 0018-9219
  14. 14. ZetterbergV.PetterssonM. I.ClaessonI. 2005 Comparison Between Whitened Generalized Cross-correlation and Adaptive Filter for Time Delay Estimation, Proceedings of MTS/IEEE, OCEANS, 3 0-93395-734-3 D.C., September 17-23, 2005
  15. 15. HertzD. 1986 Time Delay Estimation by Combining Efficient Algorithms and Generalized Cross-correlation Methods, IEEE Transactions on Acoustics, Speech and Signal Processing, 34 1 17 , 0096-3518
  16. 16. KnappC.CarterG. C. 1976 The Generalized Correlation Method for Estimation of Time Delay, IEEE Transactions on Acoustics, Speech and Signal Processing, 24 4 320327 , 0096-3518
  17. 17. WilsonK. W.DarrellT. 2006 Learning a Precedence Effect-Like Weighting Function for the Generalized Cross-Correlation Framework, IEEE Transactions on Audio, Speech, and Language Processing, 14 6 21562164 , 1558-7916
  18. 18. TianshuangQ.HongyuW. 1996 An Eckart-weighted adaptive time delay estimation method, IEEE Transactions on Signal Processing, 44 9 23322335 , 0105-3587X
  19. 19. ChenJ.BenestyJ.HuangY. A. 2006 The SCOT Weighted Adaptive Time Delay Estimation Algorithm Based on Minimum Dispersion Criterion, Journal of EURASIP on Applied Signal Processing, 2006 978-1-42447-047-1
  20. 20. AndersonM. P.WoessnerW. W. 1992 Applied Groundwater Modeling: Simulation of Flow and Advective Transport, Academic Press (2nd Edition ed.), 978-0-12059-485-6 USA
  21. 21. WidrowB. 2005 Thinking about thinking: the discovery of the LMS algorithm, IEEE Magazine on Signal Processing, 22 1 100106 , 1053-5888
  22. 22. HaykinS. 2001 Adaptive Filter Theory, Fourth Edition, Prentice-Hall, 0-13090-126-1
  23. 23. ZetterbergV.PetterssonM. I.ClaessonI. 2005 Comparison Between Whitened Generalized Cross-correlation and Adaptive Filter for Time Delay Estimation, Proceedings of MTS/IEEE, OCEANS, 3 0-93395-734-3 D.C., September 17-23, 2005
  24. 24. EmadzadehA. A.LopesC. G.SpeyerJ. L. 2008 Online time delay estimation of pulsar signals for relative navigation using adaptive filters, Proceedings of 2008 IEEE/ION, Position, Location and Navigation Symposium, 714719 , 978-1-42441-536-6 Monterey, CA, May 5-8, 2008
  25. 25. DuttweilerD. L. 2000 Proportionate normalized least-mean-squares adaptation in echo cancellers, IEEE Transactions on Speech and Audio Processing, 8 5 508518 , 1063-6676
  26. 26. HongyangD.DoroslovackiM. 2006 Proportionate adaptive algorithms for network echo cancellation, IEEE Transactions on Signal Processing, 54 5 17941803 , 0105-3587X
  27. 27. PaleologuC.BenestyJ.CiochinaS. 2010 An improved proportionate NLMS algorithm based on the l0 norm, Proceedings of ICASSP ‘10, 2010 IEEE International Conference on Acoustics Speech and Signal Processing, 309312 , 1520-6149 Dallas, TX, March 14-19, 2010
  28. 28. GayS. L. 1998 An efficient, fast converging adaptive filter for network echo cancellation, Proceedings of the Thirty-Second Asilomar Conference on Signals, Systems & Computers, 1 394398 , 0-78035-148-7 Grove, CA, November 1-4, 1998
  29. 29. BenestyJ.GayS. L. 2002 An improved PNLMS algorithm, Proceedings of ICASSP ‘02, 2002 IEEE International Conference on Acoustics Speech and Signal Processing, 2 18811884 , 1520-6149 Minneapolis, MN, USA, April 27-30, 2002
  30. 30. FevrierI. J.GelfandS. B.FitzM. P. 1999 Reduced complexity decision feedback equalization for multipath channels with large delay spreads, IEEE Transactions on Communications, 47 6 927937 , 0090-6778
  31. 31. DouglasS. C. 1997 Adaptive filters employing partial updates, IEEE Transactions on Circuits and Systems II, 44 3 209216 , 1057-7130
  32. 32. AboulnasrT.MayyasK. 1999 Complexity reduction of the NLMS algorithm via selective coefficient update, IEEE Transactions on Signal Processing, 47 5 14211424 , 0105-3587X
  33. 33. AboulnasrT.MayyasK. 1998 MSE analysis of the M-Max NLMS adaptive algorithm, Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 3 article ID 10.1109/ICASSP.1998.681776, 16691672 , 0-78034-428-6 WA, May 12-15, 1998
  34. 34. SchertlerT. 1998 Selective block update of NLMS type algorithms, In Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 3 17171720 , 0-78034-428-6 WA, May 12-15, 1998
  35. 35. DogancayK.TanrikuluO. 2001 Adaptive filtering algorithms with selective partial updates, IEEE Transactions on Circuits and Systems II, 48 8 762769 , 1057-7130
  36. 36. NaylorP. A.SherlikerW. 2003 A short-sort M-Max NLMS partial-update adaptive filter with applications to echo cancellation, Proceedings of ICASSP ‘03, 2003 IEEE International Conference on Acoustics Speech and Signal Processing, 5 373376 , 0-78037-663-3 Kong, April 7-10, 2003
  37. 37. JinhongW.DoroslovackiM. 2008 Partial update NLMS algorithm for sparse system identification with switching between coefficient-based and input-based selection, Proceedings of CISS 2008, 42nd Annual Conference on Information Sciences and Systems, 3 237240 , 978-1-42442-246-3 Princeton, NJ, March 19-21, 2008
  38. 38. ShynkJ. J. 1992 Frequency-Domain and Multirate Adaptive Filtering, IEEE Transactions on Signal Processing Magazine, 9 474475 , 1053-5888
  39. 39. WidrowB.StearnsS. D. 1985 Adaptive Signal Processing, Prentice-Hall, 0-13004-029-0
  40. 40. KhongA. W. H.NaylorP. A.BenestyJ. 2007 A low delay and fast converging improved proportionate algorithm for sparse system identification, EURASIP Journal on Audio, Speech, and Music Processing, 2007 Article ID 84376, 8 pages
  41. 41. KhongA. W. H.XiangL.DoroslovackiM.NaylorP. A. 2008 Frequency domain selective tap adaptive algorithms for sparse system identification, Proceedings of ICASSP 2008, International Conference on Acoustics, Speech and Signal processing, 229233 , 978-1-42441-483-3 Las Vegas, NV, March 31-April 4, 2008
  42. 42. DengH.DybaR. 2008 Efficient Partial Update Algorithm Based on Coefficient Block for Sparse Impulse Response Identification, Proceedings of CISS 2008, Conference on Information Sciences and Systems, 233236 , 978-1-42442-246-3 Princeton, NJ, March 19-21, 2008
  43. 43. BenestyJ.HuangY. A.ChenJ.NaylorP. A. 2006 Adaptive algorithms for the identification of sparse impulse responses, In: Selected Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, and Speech Processing, Hänsler, E.; Schmidt, G., Springer, 125153 , 978-3-54033-212-1 Berlin

Written By

Kirill Sakhnov, Ekaterina Verteletskaya and Boris Simak

Submitted: 14 October 2010 Published: 05 July 2011