Performance Evaluation of Adaptive Algorithms for Wave Field Analysis/Synthesis Using Sound Field Simulations

The main objective of a multi-channel sound reproduction system is to give an optimal acoustical sensation to the listener. These systems are designed to produce sounds that are as natural as possible, so that the listener does not realize that they are generated by a loudspeakers system. For this purpose, the knowledge of only audio temporal information is not sufficient: also spatial information is needed. Furthermore, since the invention of stereophony, it is well known that at least two loudspeakers are needed in order to generate a virtual source that is not spatially localized at the speaker position. However, the stereo reproduction is very limited because the optimal source localization is focused on one point, called sweet spot. Starting from this assumption, in the recent literature, research efforts have focused on reproduction techniques that use an extremely high number of loudspeakers in order to reproduce not only a simple audio source but a complete sound field. Various techniques, able to record and to reproduce the entire sound field, have been proposed in the literature (Berkhout et al., 1993; Daniel et al., 2003; Fazi et al., 2008). One of the most studied techniques is Wave Field Synthesis (WFS) that is directly based on the Huygens’ principle. It has been introduced in the late ’80s by Berkhout, who showed that audio reproduction can be linked to audio holography concepts (Berkhout, 1988). WFS is based on Kirchhoff-Helmholtz integral which permits the calculation of the pressure field inside a volume by knowing pressure and normal particle velocity on the enclosing surface. The underlying idea of WFS is to generate a sound field inside a volume bounded by an array of loudspeakers. Actually, the surface is reduced to a 2D curve positioned on the ear plane. The number of loudspeakers on this curve depends on the desired localization quality. Similarly, Wave Field Analysis (WFA) implements a sound field recording technique based on microphone arrays (Hulsebos et al., 2002). Therefore, this approach allows to record the entire sound field in the recording room (WFA) and subsequently to reproduce it in the listening room (WFS) more or less accurately depending on loudspeakers/microphones number. As it happens in traditional reproduction systems, digital algorithms based on adaptation processing have to be applied inside the WFS/WFA in order for these techniques to be used in real applications (music, cinema, theatre, etc.). Examples of adaptive algorithms that are very useful in real applications are Acoustic Echo Cancellation (AEC), Active Noise Control (ANC), room compensation, etc. (Haykin, 1996). A straightforward implementation of these algorithms in WFA/WFS systems is not feasible due to the dramatically high 25


Introduction
The main objective of a multi-channel sound reproduction system is to give an optimal acoustical sensation to the listener.These systems are designed to produce sounds that are as natural as possible, so that the listener does not realize that they are generated by a loudspeakers system.For this purpose, the knowledge of only audio temporal information is not sufficient: also spatial information is needed.Furthermore, since the invention of stereophony, it is well known that at least two loudspeakers are needed in order to generate a virtual source that is not spatially localized at the speaker position.However, the stereo reproduction is very limited because the optimal source localization is focused on one point, called sweet spot.Starting from this assumption, in the recent literature, research efforts have focused on reproduction techniques that use an extremely high number of loudspeakers in order to reproduce not only a simple audio source but a complete sound field.Various techniques, able to record and to reproduce the entire sound field, have been proposed in the literature (Berkhout et al., 1993;Daniel et al., 2003;Fazi et al., 2008).One of the most studied techniques is Wave Field Synthesis (WFS) that is directly based on the Huygens' principle.It has been introduced in the late '80s by Berkhout, who showed that audio reproduction can be linked to audio holography concepts (Berkhout, 1988).WFS is based on Kirchhoff-Helmholtz integral which permits the calculation of the pressure field inside a volume by knowing pressure and normal particle velocity on the enclosing surface.The underlying idea of WFS is to generate a sound field inside a volume bounded by an array of loudspeakers.Actually, the surface is reduced to a 2D curve positioned on the ear plane.The number of loudspeakers on this curve depends on the desired localization quality.Similarly, Wave Field Analysis (WFA) implements a sound field recording technique based on microphone arrays (Hulsebos et al., 2002).Therefore, this approach allows to record the entire sound field in the recording room (WFA) and subsequently to reproduce it in the listening room (WFS) more or less accurately depending on loudspeakers/microphones number.As it happens in traditional reproduction systems, digital algorithms based on adaptation processing have to be applied inside the WFS/WFA in order for these techniques to be used in real applications (music, cinema, theatre, etc.).Examples of adaptive algorithms that are very useful in real applications are Acoustic Echo Cancellation (AEC), Active Noise Control (ANC), room compensation, etc. (Haykin, 1996).A straightforward implementation of these algorithms in WFA/WFS systems is not feasible due to the dramatically high number of inputs/outputs causing an unreasonable computational complexity.This led to the introduction of Wave Domain Adaptive Filtering − WDAF (Buchner, Spors & Rabenstein, 2004).WDAF is an extension of Frequency Domain Adaptive Filtering − FDAF (Shynk, 1992): the filtering is not only performed in temporal frequency domain (as in FDAF), but also in angular frequency domain that takes into account the spatial components of the acoustic field shape.Several WDAF applications can be found in the recent literature (Peretti et al., 2007;2008;Spors et al., 2005).In order to evaluate the performance of WDAF-based algorithms, all the points of the sound field have to be analyzed in order to give a complete view of the acoustic scene at several time instants.The main topic of this chapter concerns with a detailed explanation of the steps regarding numerical sound fields simulations during the application of WDAF-based algorithms in order to analyze their performance in terms of sound field reconstruction.In section 2 the theory of WFS and WFA is summarized and their discrete versions are derived.Then, the basic concepts of WDAF are reviewed and the involved transformations are described in section 3. Therefore, in section 4, the overall simulation processing is described.The discussion starts from the derivation of the sound field of a simple monopole source.Then, the sound field of a static source is virtually reproduced by an array of loudspeakers through the WFS algorithm.In the next step, starting from the simulation of sound field recording with an array of microphones, the PC simulation of the WDAF algorithm is performed.Variations of the whole set of involved parameters are taking into account.Finally, a WDAF application to the attenuation of undesired sources is derived from the well known mono channel approach.The results of the cancellation of the entire sound field of a virtual source through an adaptive algorithm based on WDAF approach is shown.Conclusions are reported in section 5.

Wave field analysis/synthesis
WFA and WFS are techniques based on microphones and loudspeakers arrays, arranged in order to shape a closed curve.The process, which permits to obtain the sound field in the area enclosed by the microphones array starting from microphones signals, is called sound field extrapolation.Essentially, the sound field extrapolation and its subsequently reproduction can be done in three ways: • In the reproduction room the loudspeakers are positioned in the same location of the microphones in the recording room (Fig. 1(a)).This technique is called holophony, (Nicol & Emerit, 1999); • The sound field is extrapolated in arbitrary positions starting from the microphones signals.In this way the loudspeakers can not be placed in the same microphones position (Verheijen, 1997); • The room impulse response is recorded with the microphones array.In the reproduction stage this response is convolved with the desired signal, previously recorded in an anechoic chamber.In this way, any signal can be reproduced with the acoustic characteristics of the recording room (Hulsebos, 2004).
The holophonic technique does not need intermediate processing between recording and reproduction signals.However, its main limitation is given by the fact that the recorded sound field can only be reproduced through loudspeakers exactly placed at microphones positions.For this reason, the holophony technique can not be used in practical applications.On the other hand, the second and the third approaches need a processing of the recorded signals.In these cases the loudspeakers can be positioned in an arbitrary way and the loudspeakers number is not compulsorily equal to the microphones number.Even if their processing needs a higher computational load, these techniques present more flexibility and can be easily used in a real application.Sound field extrapolation can be obtained by the combination of WFA and WFS techniques.These approaches allow to record the entire sound field in the recording room (WFA) and subsequently to reproduce it in the listening room (WFS) more or less accurately depending on loudspeakers/microphones number (Fig. 1(b)).
It was found that circular microphone arrays permit a very good sound field extrapolation and the problem can be treated more easily in circular coordinates (Hulsebos et al., 2002), hence in the following this geometry is considered.

Sound field reproduction
The basic idea of WFS derives from Huygens' Principle (1678).It states that, at any time t, all the points on the wave front due to a point source can be taken as point sources for the production of secondary wavelets.Following this principle, the sound field of an arbitrary primary source p can be reproduced by a series of secondary sources s positioned on the primary wavelet.These sources can be obtained by considering the medium local vibrations on s due to p. Kirchhoff's theorem is the generalization of the Huygens' Principle: Considering a source-free volume V, the sound pressure inside the volume generated by external sources can be calculated if the pressure and the normal particle velocity on the enclosing surface S are known.
The Kirchhoff-Helmholtz integral is the mathematical formulation of the Kirchoff's theorem.
With reference to Fig. 2, considering a sound propagation in a volume V enclosed by a surface S, the Kirchhoff-Helmholtz integral is given by (Williams, 1999) where r 0 and r denote the generic position inside V and on S respectively, ω is the angular frequency, G r 0 ( r, ω) is the Green function which synthesizes the secondary sources located in r 0 , while P defines the sound pressure level.It is possible to write (1) in terms of sound pressure P and particle velocity V n normal to S. From Euler's equation we have that where c is the sound velocity, k = ω/c is the wave number, and ρ 0 is the medium density.Therefore, (1) becomes The Green function is not unique.The simplest Green functions are the monopole solutions for a source at the position r 0 By inserting (4) in (3), the 3D forward Kirchhoff-Helmholtz integral is obtained where ϕ is the angle between the vector normal to S and the vector r 0 − r (Fig. 2).From ( 6) it can be seen that the sound field inside V is obtained by a distribution of monopole and dipole sources (Berkhout et al., 1993).Equation ( 6) represents the direct integral and indicates a direct propagation due to the presence of the term e −jk| r− r 0 | .Similarly, by inserting ( 5) in (3), the 3D inverse Kirchhoff-Helmholtz integral is obtained where the term e jk| r− r 0 | indicates an inverse propagation (Yon et al., 2003).
In two dimensions, Kirchhoff-Helmholtz integral (3) is also valid considering a closed curve L instead of the surface S (Hulsebos et al., 2002).In this case, the appropriate Green functions are where H (j) i is the Hankel function of kind j and order i.The Green function ( 8) leads to 2D forward Kirchhoff-Helmholtz integral Similarly, the Green function ( 9) leads to 2D inverse Kirchhoff-Helmholtz integral It is worth underlying that the forward Kirchhoff-Helmholtz integral is used for deriving the sound field generated by a source positioned outside S, while the inverse integral is used if the source is located inside S (or L in the 2D case).
In order to obtain Kirchhoff-Helmholtz integrals relative to circular geometry of radius R, (10) and ( 11) are simplified in (Hulsebos et al., 2002) The discrete forms of ( 12) and ( 13), necessary for a practical implementation, are obtained through a sampling operation at the loudspeakers positions Fig. 3 explains the meaning of r i , r, ϕ and ∆θ.In section 4 it is shown how to implement ( 14) and ( 15) in an experimental configuration.

Sound field recording
Initially, in (Berkhout et al., 1993) the sound field recording was done through 2D microphones matrices.However, this technique was complicated and its practical realization was very difficult due to the necessity of extremely high performance hardware.In order to reduce the number of signals to be processed, the advantages of Huygens' principle were exploited also in the recording stage and microphones arrays enclosing the area of interest were used as alternative to microphones matrices (Berkhout et al., 1997).In section 2.1 we have seen the Kirchhoff-Helmholtz integrals referred to the circular array configuration.The global sound field is given by the sum of P (1) ( r, ω) and P (2) ( r, ω) which represent the inverse and forward extrapolated sound field, respectively (Hulsebos et al., 2002).By using Kirchhoff-Helmholtz integrals only the sound field within the closed curve can be extrapolated.To avoid this limitation, the sound field extrapolation through circular microphones array can be performed by using the cylindrical harmonics decomposition (Hulsebos, 2004).Cylindrical harmonics of order k θ are defined as solutions of the homogeneous wave equation in terms of cylindrical coordinates (Blackstock, 2000;Williams, 1999) k θ (kr)e jk θ θ e jk z z (16) where H (1) m and H (2) m are the Hankel function of kind 1 and 2, respectively, and m represents its order.k θ and k z are the wave numbers along the azimuth and z components of the cylindrical coordinates system defined by (r, θ, z) (Fig. 3).The far field approximation (kr >> 1) of the Hankel functions is given by
The objective is to decompose the entire sound field recorded by circular microphones array in cylindrical harmonics components.It is necessary to know both pressure and normal particle velocity values.The radial velocity can be calculated starting from the second Newton's Law 548 Computational Simulations and Applications

www.intechopen.com
In order to obtain the normal velocity in terms of cylindrical components for a circular configuration, the ∇ operator can be reduced to partial derivative with respect to radial component ∂P ∂r = jωρV n .( 23) Ṽ(k θ , ω); 3. Calculation of the expansion coefficients M (1) (k θ , ω) and M (2) (k θ , ω) in terms of cylindrical harmonics; 4. Sound field extrapolation at each point of the area through ( 36) and ( 37).
Moreover, it has been seen that the plane wave decomposition is a flexible format that can be used in order to recreate the sound field at any position (Hulsebos et al., 2002;Spors et al., 2005).The mathematical derivation of this decomposition (exploiting far field approximation of Hankel functions ( 18) and ( 19)) is given by where s (1) and s (2) are the incoming and outgoing part of the wave field, respectively.The decomposition into incoming and outgoing waves can be used in order to distinguish between sources inside and outside the recording area.

Wave domain adaptive filtering
A practical implementation of digital signal processing algorithms (room equalization, noise reduction, echo cancellation) to WFA/WFS needs efficient solutions with the aim of decreasing the high computational cost due to the extremely high number of microphones/loudspeakers.To this end, Wave Domain Adaptive Filtering (WDAF) has been introduced (Buchner, Spors & Kellermann, 2004) as an extension in the spatial domain of Fast LMS (FLMS) (Haykin, 1996;Shynk, 1992).A more detailed explanation of this technique can also be found in (Buchner, 2008;Buchner & Spors, 2008;Spors, 2005).
It is known that FLMS optimal performance arises from the orthogonality property of the DFT basis functions.Therefore, in order to apply adaptive filtering to WFA/WFS systems, a proper set of orthogonal basis functions has been identified allowing a combined spatio-temporal transform, denoted with T and T −1 , directly based on sound field extrapolation.FLMS has been derived from time-domain block-LMS (Shynk, 1992).However, in the FLMS case, the adaptation process is performed in the temporal frequency domain.By operating in the In this case, Fourier transform F t has to be replaced by a wave domain transform T .This transform derives directly from sound field extrapolation, just introduced in section 2.2.Therefore, starting from pressure p(θ, t) and particle velocity v n (θ, t) signals, the T -transform can be obtained by two Fourier transforms F t , F θ (in temporal and spatial frequency domain) followed by cylindrical harmonics decomposition M (Fig. 5(a)).The first Fourier transform is performed in the temporal frequency domain and it is given by As regards the second Fourier transform and the cylindrical harmonics decomposition, they have just been introduced in (28) (29) (34) and ( 35).
As for the inverse transform (T −1 ), we can obtain the temporal frequency domain acoustic field at any point by applying M −1 (Fig. 5(b)) given by ( 36) and (37).Finally, an inverse Fourier transform F −1 t is needed to come back to the time domain As stated, loudspeakers number and their positions in the area are totally independent from the number and the positions of the microphones and the generalization is extremely easy by using a T −1 transform at the loudspeakers location.
In (Peretti et al., 2008) a frame-by-frame WDAF implementation for streaming scenarios has been proposed.Furthermore, the computational cost of the WDAF implementation can be reduced taking advantage of filter banks theory.Moreover, the signal decomposition into subbands allows to achieve a faster convergence rate and a lower mean-square error for these adaptive algorithms.

Numerical simulations
Adaptive algorithms in WFA/WFS strictly depend on sound field quality that can be obtained through the aforementioned techniques.In fact, transformations T and T −1 , WDAF is based on, have been directly derived from sound field extrapolation steps.Subsequently, the adaptive processing is the same as the mono-channel case, hence its performance depends on the numerical implementations of T and T −1 and it is necessary to understand how the parameters, introduced in previous sections, weigh on the sound field extrapolation.The reference is an ideal monopole source pulsing at a certain frequency f = 2πω.Fig. 6 shows the sound field behaviour of an ideal 500 Hz-monopole obtained with PC simulations.It is derived by the evaluation of the sound pressure level through at the desired position r (virtual microphone) on the environment.r 0 is the monopole position and A is a fixed gain.Therefore, a matrix of virtual microphones is considered in order to obtain the monopole behaviour.

Sound field reproduction
By considering ( 12) and ( 13) it is possible to obtain the sound field of Fig. 6 through the knowledge of the sound pressure and the normal particle velocity on the perimeter of a circle of radius R. The sound pressure can be easily evaluated through (44) at positions r = [R, θ] for each θ, while the particle velocity normal to the circumference can be obtained by difference quotient between the sound pressure levels at The results of the Kirchhoff-Helmoltz integral application are shown in Fig. 7(a) and Fig. 7

(b).
A circle of radius 1 m and 48 sensors of sound pressure and 48 sensors of sound velocity have been taking into account, and forward and inverse extrapolation have been performed.Inside the circle the sound field can be reconstructed in a good way.Good performance can be obtained also considering sound source inside the sensors array (Fig. 7(c) and Fig. 7(d)).As stated in section 2.1 it is possible to obtain correct sound fields only inside the circular array.
In order to obtain suitable signals for loudspeakers, dipole sources have to be eliminated.
Recently, an analytic secondary source selection criterion has been introduced where the selection depends on the sound source position in the area of interest (Spors, 2007): if the local propagation direction of the signal to be reproduced coincides with the loudspeaker axis direction n then the loudspeaker is turned on (Fig. 8).Fig. 9 shows results of simulations by considering only monopoles as secondary sources for sound sources either outside and inside the array.As happens in time sampling, there is a minimal distance ∆d between two adjacent loudspeakers that permits to prevent the aliasing (in this case, spatial aliasing) (Spors & Rabenstein, 2006).Therefore, the maximum reproducible frequency without artefacts is c/∆d.Fig. 10 shows sound fields generated by a circular array that does not respect the spatial aliasing condition.

Wave Domain Transformations
Wave Domain Transformations T and T −1 are based on sound field extrapolation, i.e., cylindrical harmonics or plane waves components.Sound field reconstruction depends on the maximum order k θ that is used in (36).Fig. 11 shows how the sound field of a 500 Hz monopole can be extrapolated depending on k θ .The larger the area the higher the order, but there is a limit for k θ where the area does not increase anymore.Furthermore, the sound field near the origin of the circle can not be extrapolated because of the considerable floating point error arising for the presence of multiplications between numbers with different orders of magnitude.It should be noted that, k θ being equal, the width of the area is in inverse proportion with respect to frequency (Fig. 12).By considering plane waves decomposition the problem of floating pointing precision could be avoided and sound field can extrapolated also near the origin.The width of the area to be extrapolated depends on time frequency ω and spatial frequency k θ as in the cylindrical harmonics decomposition (Fig. 13).

Application example: Adaptive source cancellation
An application example of WDAF algorithm can be found in (Peretti et al., 2007) with a novel approach to adaptive source cancellation.In a realistic scenario undesired noise sources can be present in the recording room.Therefore, it would be desirable to reduce the negative effect they have on the listener's acoustic experience.The algorithm for WFA/WFS systems can be obtained by using the spatio-temporal transforms T and T −1 presented in the previous section.Let us first consider an adaptive noise cancellation scheme used in traditional systems

Conclusions
Wave Field Analysis and Synthesis are two techniques that permit direct sound field recording and reproduction using microphones and loudspeakers arrays.In order to use these techniques in real world applications (e.g., cinema, home theatre, teleconferencing), it is necessary to apply multi-channel Digital Signal Processing algorithms, already developed for traditional systems.A straightforward implementation needs an extremely high computational complexity, hence Wave Domain Adaptive Filtering (WDAF) has been introduced extending the well-known Fast LMS algorithm.Since WDAF is derived from transforms related to WFA/WFS, it inherits their strengths: the extension of optimal listening area and a better source localization in the environment.It is based on cylindrical harmonics decomposition.In this chapter, the numerical implementation of WDAF transformations have been described and results of simulations have been presented.In particular, considering circular arrays, the area, the sound field is correctly extrapolated on, becomes larger increasing the maximum order of cylindrical harmonic and decreasing the reproduced frequency.
Through the only cylindrical harmonics decomposition, some numerical problems could arise nearby the origin of the circular array.Therefore, plane wave decomposition, based on far field approximations, can be used in order to overcome this problem.Numerical simulations of the application of WDAF to adaptive noise cancellation show the effectiveness of the algorithm.

Fig. 3 .
Fig. 3. Geometry used for the driving function calculation relative to circular array.

Fig. 4 .
Fig. 4. Typical configuration of adaptive algorithms.(a) FLMS.(b) WDAF.transformed domain the computational load is particularly reduced because the convolution operation becomes a simple multiplication.Overlap and Save (OLS) and Overlap and Add (OLA) are used for avoiding circular convolution problems(Oppenheim et al., 1999).The general scheme of FLMS algorithm is shown in Fig.4(a).Lowercase letters represent time domain signals while uppercase letters represent frequency domain signals.u is the input signal, d is the reference signal and y is the output signal.W is the adaptive filter where its coefficients are updated in real-time so that the output signal y is equal to d as much as possible.Therefore, WDAF derives directly from FLMS: assuming to record the sound field with a circular array composed of M pressure and M particle velocity microphones (section 2.2), the basic scheme of WDAF is shown in Fig.4(b), where u represents the 2M input signals, d represents the 2M desired signals, y represents the 2Q outputs relative to microphone positions (in all three cases pressure and particle velocity signals are considered) and W represents the adaptive process composed of 2N filters and based on least mean square approach.In this case, Fourier transform F t has to be replaced by a wave domain transform T .This transform derives directly from sound field extrapolation, just introduced in section 2.2.Therefore, starting from pressure p(θ, t) and particle velocity v n (θ, t) signals, the T -transform can be obtained by two Fourier transforms F t , F θ (in temporal and spatial frequency domain) followed by cylindrical harmonics decomposition M (Fig.5(a)).The first Fourier transform is performed in the temporal frequency domain and it is given by

Fig. 7 .
Fig. 7. Sound fields obtained through forward and inverse Kirchhoff-Helmholtz integrals by taking into account a sensors array of 48 elements and radius 1 m.(a) and (b) Forward and inverse sound field referred to a 500 Hz-sound source positioned outside the array [0, −2].(c) and (d) Forward and inverse sound field referred to a 500 Hz-sound source positioned inside the array [0.5, 0.5].