Open access peer-reviewed chapter - ONLINE FIRST

Modeling and Testing of Loudspeakers Used in Sound-Field Control

Written By

Wolfgang Klippel

Submitted: February 25th, 2021 Reviewed: December 15th, 2021 Published: February 16th, 2022

DOI: 10.5772/intechopen.102029

Advances in Fundamental and Applied Research on Spatial Audio Edited by Brian FG Katz

From the Edited Volume

Advances in Fundamental and Applied Research on Spatial Audio [Working Title]

Dr. Brian FG Katz and Dr. Piotr Majdak

Chapter metrics overview

165 Chapter Downloads

View Full Metrics


This chapter describes the physical modeling and output-based measurement of loudspeakers, essential hardware components in sound-field control. A gray box model represents linear, time-variant, nonlinear, and non-deterministic signal distortions. Each distortion component requires a particular measurement technique that includes test stimulus generation, sound pressure measurement at selected points in 3D space, and signal analysis for generating meaningful metrics. Near-field scanning measures all signal components at a large signal-to-noise ratio with minor errors caused by loudspeaker positioning, air temperature, room reflections, and ambient noise. Holographic postprocessing based on a spherical wave expansion separates the direct sound from room reflections to assess the linear output and signal distortion. New metrics are presented that simplify the interpretation of the loudspeaker properties at single points, sound zones, and over the entire sound-field.


  • loudspeaker directivity
  • near-field scanning
  • signal distortion
  • nonlinear loudspeaker modeling
  • sound-field control
  • spatial sound application

1. Introduction

Loudspeakers play an essential role in spatial sound applications, such as conventional multi-channel sound reproduction, beam steering [1], wave-field reconstruction [2], higher-order ambisonics [3], immersive audio [4], and multi-zone contrast control [5]. Those techniques require many loudspeakers arranged in linear, planar, circular, and spherical arrays [6] to satisfy the spatial sampling theorem at higher frequencies and provide desired directivity, sufficient sound power output, and audio quality. Cost, size, weight, and energy consumption are critical factors limiting the practical application.

Sound-field control techniques can use model-based or data-based methods to calculate the individual driving signals for the loudspeakers. Both approaches prefer an idealized loudspeaker model, usually assuming a linear, time-invariant transfer behavior and omnidirectional radiation while ignoring undesired properties (e.g., distortion) and physical limitations of the loudspeaker.

Loudspeakers are not always omnidirectional, especially at high frequencies. Various theories [7, 8, 9] consider and exploit the loudspeaker directivity in sound-field control. There are exciting opportunities for loudspeaker arrays exploiting a higher-order spherical wave model used in reverberant rooms [10].

Standard characteristics describe the loudspeaker directivity in the far-field [11]. Still, this information is less relevant in applications for home, automotive, or public address systems where either the radiating surface is large (e.g., arrays, flat panel) or the distance to the listener is small. Choi et al. [12] showed that active control could cope with those conditions if the near-field properties of the loudspeaker are considered.

Xiaohui et al. [13] showed that loudspeaker nonlinearities degrade the performance of spatial sound control, as nonlinear distortions limit the acoustic contrast between “bright” and “dark” sound zones. Cobianchi et al. [14] proposed a method for measuring the directivity of the nonlinear distortion in the far-field by using sinusoidal and multi-tone stimuli. Such tests performed in the near and far-field generate a significant test effort and a high amount of data that can be difficult to interpret.

Olsen and Møller [15] showed that typical ambient temperature variations in automotive applications change the loudspeaker properties in ways that compromise the sound zone performance significantly. Production variability, heating of the voice coil, fatigue, and aging of the suspension and other soft parts (cone) can change the loudspeaker properties over time and degrade the performance in a non-adaptive control solution.

This chapter presents models and measurement techniques to assess the loudspeaker transfer behavior from the input to the sound pressure at any point in the sound-field. The objective is to generate comprehensive information for selecting loudspeakers for spatial sound applications, simulating the performance, including room interaction, and maintaining sound quality over product life.

Such measurements are intended to provide meaningful characteristics that describe the sound pressure at a local point, over a listening zone, or in all directions, simplifying loudspeaker diagnostics.


2. General loudspeaker modeling

A single loudspeaker system used in spatial audio applications can be modeled by a multiple-input-multiple-output system (MIMO), as shown in Figure 1.

Figure 1.

Modeling a loudspeaker system with multiple channels in spatial sound applications.

The loudspeaker input signals


are generated by sound-field control or other DSP algorithms fDSP applied to audio signals wm. The input signal uican be an analog voltage at the loudspeaker terminals or a digital data stream using other electrical, optical, or wireless transmission means. For each input signal ui, the loudspeaker system uses at least one electro-acoustical transducer (woofer, tweeter, full-band driver) that generates a sound pressure pi(r) at an evaluation point runder free-field condition. In modern loudspeaker systems, the transduction block fSP,i(ui,r) also performs amplification, equalization, active speaker protection against mechanical and thermal overload [16], and adaptive nonlinear control to cancel undesired signal distortion [17]. The total sound pressure output pT(r) is a linear superposition of the contributions pi(r) from all transduction blocks described as


while assuming a negligible coupling between the loudspeaker channels in the electrical, mechanical, or acoustical domain. This assumption is valid for transducers radiating sound independently into the free-field but not for multiple transducers mounted in one enclosure and working on the same air volume.

The function fSP,i(ui, r) describes the nonlinear and time-variant relationship between input uiand output signal pi(r).

The following chapter describes a single loudspeaker channel’s modeling, measurement, and quality assessment while omitting the subscript iin the input voltage u(uj = 0 for j ≠ i) and the sound pressure output p(r).

Figure 2 shows a gray box model representing the nonlinear, time-variant function fSP(u, r) under free-field conditions. At small input signal amplitudes, the linear spatial transfer function HL(f,r) describes the loudspeaker behavior, assuming that other signal distortions are negligible. Still, additional noise n(r) generated by electronics or external sources can corrupt the sound pressure output.

Figure 2.

Gray box model of a single loudspeaker channel describing the relationship between the input signal u and sound pressure output p(r) at an evaluation pointrin the free-field.

The time-variant transfer function HV(f,t) represents reversible and nonreversible changes in the loudspeaker properties caused by the stimulus, climate [15], heating [18], aging, fatigue [19], and other external influences. The function HV(f,t) is independent of the evaluation point rbecause the dominant time-variant processes are in the electrical and mechanical domains. For example, the voice coil resistance [18], the natural frequencies, and loss factors of the modal vibrations [20] affect the sound-field in the same way. Variations of the mode shape, box geometry, and other boundaries can change the loudspeaker directivity but are neglected in the modeling. The HV(f,t) variation can be monitored by endurance, environmental or accelerated-life testing defined in various loudspeaker standards [11, 21].

Nonlinear subsystem NI and ND generate harmonics and intermodulation distortions at higher amplitudes. The first nonlinear system NI in the feedback loop in Figure 2 represents the dominant nonlinearities [22] in the transduction and the mechanical suspension such as force factor, voice coil inductance, and stiffness of a moving coil speaker [23]. A network with lumped parameters models the nonlinear dynamics by generating equivalent input distortion uI added to the input signal uand transferred via the linear transfer path to any point rin the sound-field [11].

The second nonlinear subsystem ND(r) in Figure 2 represents nonlinearities in the cone, diaphragm, surround, horn, port, and other acoustic elements and generates distributed distortion pD(r). The distributed distortion pD(r) depends on the point rand cannot be represented by equivalent input distortion.

The nonlinear distortions uI and pD(r) are considered in loudspeaker design because they affect the maximum output, audio quality, size, cost, and reliability. Finally, the distortions accepted as regular properties give the best performance-cost ratio for the end-user.

Imperfections in the design, manufacturing problems, overload, and other malfunction (“rub&buzz”) generate irregular dynamics perceived as abnormal distortion pID(r) that is partly not deterministic and not predictable.


3. Acoustical loudspeaker measurements

The free model parameters and other signal-dependent characteristics introduced in the gray box model presented in Section 2 can be identified by acoustic measurements.

The sound pressure can be modeled as a superposition of desired and undesired signal components in the time domain as


and in the frequency domain as a corresponding Fourier spectrum:


The component pL represents the desired linear output separated from signal distortion components pV, pN, pID, and ncorresponding to the time-variant properties, regular loudspeaker nonlinearities, and abnormal distortion generated by irregular vibration and measurement noise, respectively.

New output-based measurement techniques compliant with IEC 60268–21 [11] provide accurate data with sufficient spatial resolution in a non-anechoic environment with minimum test effort (time, equipment).

The following sections will discuss those signal components in greater detail.

3.1 Loudspeaker positioning

The positioning of the loudspeaker in the 3D space is clearly defined by IEC 60268–21 [11] using a spherical coordinate system using the polar angle θ,azimuthal angle ϕ,and distance r. The origin Ois placed at a convenient reference point rref, usually on the radiator’s surface, grill, or enclosure, close to the supposed acoustical center. A reference axis nref is orthogonal to the radiator’s surface, and the orientation vector oref usually points upwards in a vertical direction.

3.2 Test environment

To ensure the reproducibility of the test result, it is common practice to measure loudspeakers under free-field conditions using a full-space (4π) or half-space (2π) environment. A half-space anechoic room with a solid ground floor is convenient for moving large and heavy loudspeaker systems and measuring loudspeakers mounted in or placed at a short distance from walls. The IEC standard [11] defines various methods of testing and postprocessing to generate simulated free-field conditions in a non-anechoic environment.

3.3 Far-field measurement

The traditional way to assess the loudspeaker directivity is the measurement of the spatial transfer function HL(f,rD,θ) between the input uand the sound pressure output p(rD,θ) under far-field condition [11]. The distance rD between the loudspeaker and microphone should be much larger than the size of the speaker and acoustic wavelength. The 1/rlaw valid in the far-field allows extrapolating the complex transfer function to other distances ras


using the wavenumber k = 2πf/c0and the speed of sound c0. Large loudspeakers such as loudspeaker arrays, soundbars, flat-panel speakers, and horn loudspeakers require a large measurement distance rD and a sizeable anechoic room with good air conditioning to keep the variance of the temperature field sufficiently small.

The choice of measured directions determines the angular resolution of the directional gain [11], the accuracy of coverage angle [11], and other derived far-field characteristics. 2-degree angular resolution, needed for some professional loudspeakers, requires about 16,000 measurement points. Rotating a large and heavy loudspeaker over all combinations of the two angles requires robust and accurate robotics with speed ramps to accelerate and deaccelerate the mass. A microphone array speeds up the test by simultaneously measuring the sound pressure at multiple points without moving the loudspeaker.

Common far-field measurements usually provide no information about the accuracy of the measured data. They cannot indicate errors related to the positioning of loudspeakers or microphones, insufficient sampling of complex directivity patterns, or acoustical disturbances due to wind, air temperature, static sound pressure, or ambient noise [15].

Minor positioning errors and normal variation of the speed of sound, which is usually not critical for the amplitude response, can cause significant errors in the phase response and degrade the performance of 3D sound applications. For example, a deviation of the room temperature by 2 Kelvin during the test changes the speed of sound by 1.2 m/s and the acoustic propagation time by 50 μs at a measurement distance r = 5 m, which is required to ensure far-field condition for large loudspeakers. This time delay corresponds to a positioning error of 17 mm and generates a phase error of 36 degrees at 2 kHz, increasing linearly with frequency and reaching 180 degrees at 10 kHz.

3.4 Near-field measurement

The IEC standard 60268–21 [11] recommends measurements in the near-field, which overcome the restrictions and problems faced in the far-field. However, the 1/rlaw in Eq. (5) is not applicable, and a holographic measurement technique that scans the sound pressure and fits a spherical wave model to measured data is required.

Figure 3 shows a scanning system used for measuring the sound pressure generated by a loudspeaker placed at a fixed position on a post. The microphone moves in three axes in cylindrical coordinates (r,φ,z) to multiple test points rk ∈ Sr distributed on a double layer grid Sr close to the speaker’s surface [24]. Moving a lighter microphone instead of rotating the heavier loudspeaker simplifies the robotics, allows faster speed ramps, and reduces the positioning error. Those opportunities make it possible to generate redundancy in the collected data and check the measurement’s accuracy.

Figure 3.

Nearfield measurement by placing the loudspeaker at a fixed position and moving a microphone with robotics over the scanning grid close to the speaker surface.

The scanning points are distributed on two concentric layers, as shown in Figure 3, to measure the local derivative of the sound pressure like a sound intensity probe. That is the basis for separating the outgoing wave comprising direct sound radiated by the loudspeaker (e.g., diaphragm) from the incoming wave generated by reflections on the positioning arm of the robotics, ground floor, and room walls. The close distance to the sound source increases the direct sound, which increases the signal-to-noise ratio (SNR) by more than 20 dB and significantly reduces the phase error caused by varying air properties in far-field measurements.


4. Spatial transfer function

The spatial transfer function HL(f,r) describes the linear relationship between input spectrum U(f) and sound pressure spectrum PL(f,r) generated by the loudspeaker at any point runder the free-field condition as a spherical wave expansion in Eq. (6) using general solutions Bout(f, r) of the Helmholtz equation weighted by complex coefficients in vector CL(f) [25]:


The spherical coordinates allow a separation of angular dependency using the spherical harmonics Ynmθϕfrom the radial dependency using the Hankel function of the second kind hn(2)(kr). The spherical harmonics have orthonormal properties representing a monopole (n = 0), dipoles (n = 1), quadrupoles (n = 2), and more complex sources with increasing order n.

Figure 4 illustrates the expansion for a woofer operated in a sealed enclosure at 200 Hz. The measured directivity pattern is presented as a target on the lower left-hand side and compared with the wave model for rising maximum order N. The expansion can be truncated at N = 3 because 16 coefficients weighting the spherical harmonics provide sufficient accuracy. Higher-order terms can be ignored at 200 Hz because they are 50 dB below the total sound power. The contribution of the higher-order terms rises with frequency and is required to explain the directivity pattern at 1 kHz, as shown in the upper diagram on the right-hand side.

Figure 4.

Modeling the total sound power frequency response (upper right) and directivity pattern at 200 Hz (below) of a loudspeaker by spherical wave model (upper left).

The Hankel function hn(2)(kr) in Eq. (6) models the decay of the sound pressure with rising distancer rfrom expansion point re of the spherical wave expansion. In the near-field for r < rfar, the 1/rlaw is not valid anymore because sound pressure and particle velocity are not in phase, generating an increase in the apparent power at lower distances [24]. In the far-field r> > rfar, the sound pressure decreases inversely with the rising distance rgiving 6 dB less output for doubling the distance. Thus, the apparent sound power radiated from the loudspeaker is constant and corresponds to the real power.

Figure 5 shows the power Πn(r) contributed by spherical waves of order nto the total apparent power Πa(r). Only the order n = 0 (monopole) generates a constant power output for all distances while the steepness of the power curve Πn(r) in the near-field increases with the order nof the waves.

Figure 5.

Total apparent sound power Πa(r) (thick line) generated by a loudspeaker versus radial distance r and the contribution Πn(r) of the spherical waves of order n (thin lines).

4.1 Parameters of the linear model

The optimum coefficients CL(f) in the spherical wave model in Eq. (6) can be calculated by minimizing the mean squared error between the response HL (f,rk) measured at scanning points rk∈ Sr and the modeled responses as


Normalizing the mean squared error in Eq. (7) with the total output power gives a valuable criterion efor checking the measurement’s spherical wave expansion accuracy [24].

Figure 6 shows the normalized fitting error ein the wave expansion with rising total order N. A single monopole expansion (N = 0) already gives an error reduction of 10 dB at 100 Hz. Considering the monopole and the three dipoles (N = 1) can reduce the error to minus 20 dB at 100 Hz, which means the model can explain 99% of the output power. A wave expansion of order N = 5 requiring at least 36 measurement points describes the sound output of the woofer channel below 1 kHz with sufficient accuracy (e < 1%). The increase of the fitting error at higher frequencies indicates that higher-order terms are required in the expansion to model the directivity at higher frequencies.

Figure 6.

Normalized fitting error e versus frequency f of the spherical wave expansion truncated at maximum order N (above) and corresponding identified directivity pattern shown as a balloon-plot for the corresponding order N compared with the measured target response (left-hand side below).

This example shows that the loudspeaker properties determine the maximum order Nof the expansion, the number of measurement points Kr required to identify the coefficients Ci(f), and the total scanning time.

For acoustic, esthetic, or technical reasons, most loudspeakers have a natural symmetry in the diaphragm’s shape, the cone placement on the front side of the cabinet, and the enclosure’s geometry. Symmetry factors [24] calculated from identified coefficients CL(f) during the scanning process reveal the loudspeaker’s left/right or top/bottom single-plane, dual-plane or rotational symmetry. This information can be used to align the loudspeaker position and orientation with spherical harmonics to reduce the number of measurement points required to fit the wave expansion. As illustrated in Figure 7, considering the rotational symmetry can reduce the number of measurement points to 4%, significantly speeding up the scanning process.

Figure 7.

Exploiting symmetry in the loudspeaker geometry to reduce the number of measurement points required for the spherical wave expansion.

4.2 Simulated free-field condition

The measurement of the spatial transfer function requires free-field conditions or at least simulated free-field conditions as defined in IEC standard 60268–21 [11].

The absorption of the lined walls in “anechoic” rooms is usually imperfect at low frequencies where the wavelength of the standing waves exceeds the thickness of the lining. Gating the sound pressure signal and windowing of the impulse response provides good results at higher frequencies but degrade the frequency resolution at low frequencies.

The wave separation technique based on near-field scanning on two surfaces [25] can be used to separate the direct sound from the room reflections at low and middle frequencies and complements the windowing technique at higher frequencies. The measured transfer function H′L(f,rk) with rk∈ Sr corrupted by room reflections can be modeled by a spherical wave expansion [26]


considering outgoing wave BOUT(f,rk) radiated by the loudspeaker as used in Eq. (6) and reflected waves BSR(f,rk) represented by Bessel functions of the first kind Jn(kr). The optimal coefficients CL and CSR minimizing the mean squared error between measured and modeled response can be estimated by


The coefficients CSR(f) provide the SPL response of the sound reflections shown as a dashed curve in Figure 8 that corrupts the measurement and causes a significant error below 1 kHz in the measured SPL response (thin green solid line). The CL(f) represents the SPL direct sound (thick blue solid line) measured under simulated free-field conditions.

Figure 8.

Generating simulated free-field conditions at low frequencies by separating direct sound (solid line) from the room reflections (dashed line) in the measured SPL frequency response (thin line).

4.3 Interpretation of the spatial transfer function

The interpretation of the spatial transfer function HL(f,r) can be simplified by calculating the SPL frequency response at point rin decibel as


using a fixed RMS value u˜of the input signal u(t) and the reference sound pressure pref = 20μPa. The SPL frequency response displayed in 2D or 3D plots (polar, balloon, contour) shows the directional dependency versus angles θand ϕin the far-field r > rfar as shown in Figure 9 and the local dependence versus Cartesians coordinates x,y,zin the near and far-field in Figure 10.

Figure 9.

Visualization styles for the far-field directional SPL response LSP(f,r,θ, ϕ) in spherical coordinates.

Figure 10.

Visualization of the SPL of the direct sound-field LSP(f,x,y,z) generated by a loudspeaker at 2 kHz outside the scanning surface.

The phase response at point rcalculated as


provides essential information for combining multiple loudspeaker channels in systems and arrays and applying DSP processing to control the sound-field. The total phase response φ(f,r) can be decomposed into three parts: The minimal phase φM(f, r) corresponds to the amplitude response |HL(f,r)| via the Hilbert Transform. The all-pass phase φM(f,r) reveals the polarity and other loudspeaker properties. A critical part is a total time delay


comprising the latency τDSP[11] in DSP processing and the acoustical delay depending on the distance |r-re| and the local speed of sound c0, which is a function of the temperature field TA(r) and the static sound pressure P0.

The (real) sound power ΠL(f) radiated by the loudspeaker into the far-field can be calculated by multiplying the wave coefficients CL(f) with its Hermitian transpose:


This sound power ΠL(f) is a valuable metric for describing the global acoustic output of the loudspeaker by a single value. Still, it is also a convenient basis to estimate the mean sound pressure of the diffuse sound generated in a non-anechoic room if the reverberation time is known [11].


5. Time-variant distortion

The gray box model from Figure 2 describes the time-variant distortion spectrum Pv(f,r|t) at any point rin the sound-field as


Using the spatial transfer HL(f,r), and the input spectrum U(f), and the time-variant transfer function H(f|t), which can be identified as the ratio


of two spatial transfer functions H(f,r|t0) and H(f,r|t) measured on the same loudspeaker unit under identical measurement conditions (environment, evaluation point r) at a reference time t0 and a later evaluation time t. The reference measurement at t0 assesses the loudspeaker under climatized standard conditions using a small stimulus generating negligible heating and nonlinear distortion. The subsequent measurement at time tcan be performed with any stimulus providing sufficient excitation of the loudspeaker. This measurement requires no scanning process, and the calculated time-variant transfer function Hv(f|t) is independent of the choice of the evaluation point r. Placing the microphone in the near-field ensures a good SNR.

This model is able to predict the amplitude compression at any point rin the sound-field defined in agreement with IEC standard 60268–21 [11] in decibel as


and the phase deviation:


The voice coil heating in professional stage loudspeakers can cause significant amplitude compression (up to 6 dB) in the output signal. Fatigue and climate changes can also shift the resonance frequencies of modal cone vibrations, causing more than 90-degree phase deviation. Those variations can impair the intended superposition of multiple loudspeakers’ output in spatial sound applications.


6. Nonlinear distortions

The regular nonlinear distortions found in the sound pressure output pN(t,r) are symptoms of loudspeaker nonlinearities modeled by subsystems NI and ND(r) shown in Figure 2. The input signal ustrongly influences the generation process and the spectral and temporal properties of the nonlinear distortion [22].

A typical audio signal (e.g., music) has a dense excitation spectrum, as shown in Figure 11, which makes separating the nonlinear distortion pN in the sound pressure output pmore difficult. An adaptive linear filter can model the linear and time-variant components pL + pV in the output [27]. The difference signal e(t) between the measured and the modeled signal comprises nonlinear distortion and noise.

Figure 11.

Spectra of reproduced test stimuli used for nonlinear distortion measurement.

As shown in Figure 11, a sparse multi-tone complex is a stimulus able to represent typical program material such as music and speech by having similar properties such as spectral distribution and crest factor. This stimulus has pseudo-random properties generated by a standardized algorithm [11] to ensure reproducible and comparable test results. The excitation tones are not dense but sufficiently activate harmonics, intermodulation, and other nonlinear distortion components, which can easily be detected and separated from the fundamental response in the spectrum.

The prevalent measurement technique uses a single tone stimulus with a constant or varying excitation frequency fe (e.g., sinusoidal chirp [11]). The harmonic components generated at multiple frequencies nfe with n = 2, 3, 4 can be easily separated from the fundamental part at fe. This measurement technique has a long tradition and is simple but has a significant drawback: It does not consider the intermodulation distortion generated by multiple tones and music.

The measurement technique presented in the following section can also be applied to a burst signal, two-tone signal, white or pink noise, and other input signals.

6.1 Nonlinear distortion in 3D space

A comprehensive measurement of the nonlinear distortion in the 3D space requires near-field scanning providing the distortion spectrum PN(f,rk) at the grid points rk∈ Sr. The small distance between the microphone and loudspeaker ensures sufficient SNR to cope with noise. The measurement performed at high amplitudes can be integrated into the scanning process for spatial transfer function HL(f,r) measured at low amplitude (see Section 4).

Applying the spherical wave expansion to the measured distortion spectrum PN(f,rk) gives the optimal coefficients


The coefficients in vector CN(f) allow extrapolation of the distortion to any point routside the scanning surface:


However, there is a significant difference between the nonlinear coefficients CN(f) and the linear coefficients CL(f) discussed in Section 4. The linear coefficients CL(f) are parameters of a linear system. They can be identified with any broad-band stimulus and used to transfer another input signal into the sound-field, including music and speech. The nonlinear coefficients CN(f) describes the results (distortion) of loudspeaker nonlinearities that depend on the particular stimulus [22].

The sound power spectrum calculated as


is a valuable global metric to assess the nonlinear distortion radiated by the loudspeaker in all directions.

6.2 Equivalent input distortion

The standard IEC 60268–21 calculates the equivalent input distortion (EID) for a single point measurement rk by a simple approximation [28]


using the time-variant transfer functions HV(f|t) and spatial transfer function HL(f|r). This inverse filtering transforms the sound pressure distortion pN(rk) into virtual input signal u’ (rk), as illustrated in Figure 12.

Figure 12.

Block diagram illustrates the calculation of equivalent input distortion (EID) by applying inverse filtering (right) or optimal estimation (left) based on three sound pressure measurements in the near-field (middle).

The lower middle panel in Figure 12 shows the total harmonic distortion as an absolute SPL frequency response LTH,N(fe,r) measured at three different distances rk in an office room (in-situ). The near-field measurement at 2 cm provides a relatively smooth curve, while the 30 and 60 cm measurements have a lower SPL and are affected by room reflections. The filtering of the sound pressure signals p(rk) with the inverse transfer function H(f,rk)−1 generates a voltage signal u’ (rk) with the total harmonics level LTH, I + D(fe,rk) on the lower right-hand side in Figure 12. This filtering removes the peaky curve shape caused by the room reflections, and the three curves become virtually identical between 100 Hz and 1 kHz. However, noise corrupts the measurement at low frequencies, and the distributed distortion pD causes minor deviations above 800 Hz.

Those artifacts in the equivalent input distortion (EID) can be removed by minimizing the mean squared error between the estimated and the measured nonlinear distortion spectrum at the scanning points rk with k = 1,.., Kr and Kr ≥ 1:


This fitting provides the voltage level response LTH,I(f) on the left-hand side in Figure 12, representing the EID.

Figure 13 shows the equivalent input distortion spectrum UI(f) generated by multi-tone stimuli with a different spectral shaping to represent typical test signals and selected audio material. All the stimuli have the same RMS value. Cello music provides the highest low-frequency components, generating the highest voice coil displacement and harmonic components at 500 Hz. Pink noise and IEC noise [11], representing typical program material, cause harmonic and intermodulation distortion at the same SPL over a wide frequency band. The nonlinear distortion rise to higher frequencies for voice and white noise stimuli.

Figure 13.

Relative equivalent input distortion LI(f) measured with various broad-band stimuli at the same RMS input voltage.

The EID spectrum UI(f) at the input of the loudspeaker can also be easily transferred to at any point rin the 3D space by applying linear filtering:


The sound power spectrum ΠI(f) of the equivalent input distortion radiated into the far-field can be similarly calculated as the linear power ΠL(f) in Eq. (13) by using the same wave coefficients CL(f) of the linear wave modeling:


The transfer functions HL(f,r)Hv(f|r) shape the spectral components of equivalent input distortion and the input stimulus in the same way. Thus, the ratio between distortion and linear signal part is identical in the voltage, sound pressure at any point r, and power output:


This fact simplifies the distortion measurement and motivates the definition of relative distortion metrics discussed in Section 6.4. Furthermore, nonlinear control techniques [17] that cancel the EID at the loudspeaker input by synthesized compensation signal can reduce the sound pressure distortion PI(f,r) everywhere in the 3D space.

6.3 Distributed nonlinear distortion

The distributed nonlinear distortion pD(r) introduced in Section 2 is the remaining distortion part in the sound-field that EID cannot represent:


Eq. (26) uses the basic functions BOUT(f, r) from Eq. (6) for the spherical wave expansion but determines the coefficients CD(f) as:


The residual error in Eq. (27) can be used to find the maximum order Nof the wave expansion, as discussed in Section 4. The symmetry properties of the particular loudspeaker are also valuable for minimizing the scanning effort.

The coefficients CD(f) provide the sound power spectrum ΠD(f) of the distributed nonlinear distortion radiated into the far-field as:


The distributed distortion can be ignored if the sound power ΠD(f) is smaller than one-tenth of the EID sound power ΠI(f). Then a single test in the near-field of the loudspeaker is sufficient to measure the dominant EID and predict the total distortion pN in the 3D space.

6.4 Relative distortion metrics

This section introduces metrics that simplify the interpretation of the distortion components. These equations use a symbol # as a placeholder for N, I, or D representing the total, equivalent input, or distributed distortion.

Comparing the spectral components at frequency fin the nonlinear distortion P#(f,r) with the linear output signal PL(f,r) from Eq. (6) at the same point rleads to a spectral nonlinear distortion ratio (SNDR) defined in decibel as:


The SNDR is usually negative and describes the SPL difference between the distortion and the linear component at the same spectral frequency f.

It is a proper physical metric for broad-band stimuli such as typical audio signals, noise, and other artificial test stimuli. It also applies to sparse multi-tone stimuli with a resolution smaller than one-third octave by using P#(fi,r) in the nominator of Eq. (29) and the fundamental component PL(fj,r) in the denominator with the smallest frequency difference |fi- fj| for each spectral distortion component.

However, SNDR) is less useful for sinusoidal stimuli generating only a single tone with constant or varying excitation frequency (e.g., chirp) because the harmonics have a significant spectral distance to the fundamental.

An alternative approach considers the total energy ratio between the nonlinear distortion P# and the linear output signal PL for a particular stimulus. It leads to the total distortion ratio (TDR) defined in percent as:


This metric can be applied to all kinds of stimuli but is very popular for the total harmonic distortion THD measured with a single tone and plotted versus the excitation frequency fe. This metric does reveal the spectral distribution of the nonlinear distortion (second, third, and higher-order harmonics).

Referring the nonlinear sound power spectrum Π#(f) to the linear sound power ΠL(f) in Eq. (13) provides a sound power distortion ratio (SPDR):


For a multi-tone stimulus representing typical program material (IEC 60268–21), the SPDR becomes an essential, single-value characteristic for the assessment of the audio quality in a global sense.

The spectral equivalent input distortion ratio (SEIDR) defined in decibel as


compares the spectral components of distortion UI(f) with the input signal U(f). The metric LI(f) is identical with the metric LI(f,r), assessing the EID at any point rin the sound-field. It is a valid approximation for the total distortion metric LN(f,r) if the distributed distortion PD(f,r) is negligible.


7. Abnormal distortion

Loudspeaker defects such as voice coil rubbing, mechanical vibrations of loose parts, air turbulences, and other irregular nonlinear dynamics that are neither intended nor considered in the design can generate particular distortion that can significantly degrade the audio quality. A loudspeaker generating abnormal distortion, usually called “rub & buzz” should not be shipped to a customer!

Modern measurement techniques exploit unique features of abnormal distortion. Time-analysis applied to a distorted single-tone stimulus reveals a complex fine structure comprising spikes, transients, and noise-like patterns [29]. Contrary to the harmonic and intermodulation distortion discussed in Section 6, the abnormal distortions cover the entire audio band. However, they have a low RMS value, are usually close to the noise floor, and thus require a near-field measurement. Spherical wave expansion or averaging over multiple periods removes the random features of the abnormal distortion.

The IEC standard 60268–21 [11] recommends a chirp stimulus at varying excitation frequency fe and a high-pass tracking filter with a cut-off frequency fc > ncofe to separate the abnormal distortion in the measured sound pressure signal p(t). The factor nco for the cut-off frequency fc (typical value nco = 10) depends on the excitation frequency fe, the transducer type, and properties of potential defects. The optimal value for nco can be determined by maximizing the crest factor CID(r) defined according to IEC 60268–21 [11] as the ratio between peak and RMS values of the high-pass filtered signal pID as:


The crest factor CID(r) is independent of the spectral energy but describes the impulsiveness of the abnormal distortion considering the phase relationship between the spectral components. A high crest factor is a unique symptom of abnormal distortion, while the crest factor of the fundamental, regular nonlinear distortions or electronic noise is typically below 12 dB.

This fact initiated the measurement of the impulsive distortion (ID) defined in IEC 60268–21 as a peak level in decibel as


Using a peak found over a period length Tin the nominator in Eq. (33) and normalized by reference sound pressure pref. This peak level LID(fe,rk) is a helpful metric for finding the most critical excitation frequency fID and a scanning point rID ∈ Sr at the nearest position to the source (e.g., rattling), generating impulsive distortion with CID(f) > 12 dB. The maximum value found under the condition


is the basis for calculating the maximum impulsive distortion ratio (IDR) defined according to IEC 60268–21 [11] as


using a reference sound pressure level LREF measured at the standard evaluation point (on axis, r = 1 m) or a scanning point rkgenerating the largest SPL value:


Those metrics compared with meaningful limits for passing or failure are essential for the quality control of loudspeakers in manufacturing and maintenance.


8. External noise

The SNR in decibel is defined as


using reference SPL LREF from Eq. (37) and a noise SPL LN. The stationary noise caused by the microphone and other electronic parts can be measured with a muted stimulus in a single test at any point r. The instantaneous SNR can be used to validate the distortion ratios TDR in Eq. (30) and IDR in Eq. (36) to remove invalid data.


9. Metrics for sound zones

Audio quality assessment, loudspeaker diagnostics, and active sound-field control require metrics that assess the properties of the sound-field at a specific listening point described by a probability fL(r) of the ear position. The mean sound power found in such a listening zone is a less suitable metric because the listener evaluates the local sound pressure. It is more appropriate to assess the mean and the variance of the perceptual attributes (e.g., loudness) or related physical metrics (e.g., SPL) over the listening zone [30] considering the probability of the ear positioning as a weighting function fL(r). This approach is used in IEC 60268–21 [11] for defining a mean SPL over an acoustical zone, but it can easily be applied to the nonlinear distortion metrics in Eqs. (29) and (30). The variance and the maximum deviation from the mean value are also valuable characteristics of the sound zone.


10. Maximum SPL output

The maximum sound pressure output (max SPL) rated according to IEC standard 60268–21 [11] plays a primary role in adjusting the amplitude of the test stimulus in output-based testing. The max SPL can be used to calibrate any input channel (digital, analog) in passive and active systems and provides a maximum input RMS value umax, depending on the selected input channel, gain control, amplification, and applied signal processing. The amplitude compression CAC(f) from Eq. (16), the sound power distortion ratio RΠN from Eq. (31), and the maximum impulsive distortion ratio RIDR from Eq. (36) are essential criteria for rating max SPL considering the particularities of the target applications.

11. Conclusions

Acoustical measurement in the near-field of the loudspeaker can provide much of the relevant information required for designing and assessing spatial sound control applications. The spatial transfer function HL(f,r) expressed as a spherical wave expansion provides accurate sound pressure amplitude and phase information at any point rin the near and far-field. The spatial scanning effort depends on the particular loudspeaker and can be significantly minimized by considering the symmetry of the loudspeaker. In practice, the spatial transfer function HL(f,r) scanned on a prototype can be applied to other units of the same type as long as the loudspeaker geometry does not change much.

The time-variant transfer function Hv(f|t) represents changes in the material caused by heating, aging, fatigue, and production variability. No scanning is required to measure the transfer function Hv(f|t) and the equivalent input distortion UI(f), ignoring the distributed nonlinear distortion pD. Such an approximation is valid for most loudspeakers used in spatial sound applications and can be verified by scanning the nonlinear distortion in the near-field of the loudspeaker. All time-variant and nonlinear signal distortion can be extrapolated to any point in the 3D space using spherical wave expansions.

The multi-tone complex is a valuable artificial stimulus that can simplify the interpretation of the amplitude compression and the nonlinear distortion. The sinusoidal chirp is required to measure the impulsive distortion ratio, a sensitive characteristic for detecting loudspeaker defects and abnormal behavior degrading the audio quality.

An anechoic room is usually not required for performing the essential loudspeaker measurements at superior accuracy.

The methods for measuring loudspeaker characteristics presented in this chapter are compliant with modern international loudspeaker standards. They are the basis for simplifying the numerical simulation of sound-field control and selecting optimal hardware components offering a maximum performance-cost ratio.


  1. 1. Van Veen BD, Buckley KM. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine. 1988;5(2):4-24
  2. 2. Berkhout AJ, Vries DD, Vogel P. Acoustical control by wave field synthesis. Journal of the Acoustical Society of America. 1993;93:2764-2778
  3. 3. Gerzon MA. Ambisonics in multi-channel broadcasting and video. Journal of the Audio Engineering Society. 1985;33(11):859-871
  4. 4. Poletti M. Three-dimensional surround sound systems based on spherical harmonics. Journal of the Audio Engineering Society. 2005;53(11):1004-1025
  5. 5. Betlehem T, Zhang W, Poletti M, Abhayapala T. Personal sound zones: Delivering Interface-free audio to multiple listeners. IEEE Signal Processing Magazine. 2015;32:81-91
  6. 6. Zotter F. Analysis and Synthesis of Sound Radiation with Spherical Arrays [Dissertation]. Austria: University of Music and Performing Arts; 2009
  7. 7. Vries DD. Sound reinforcement by wave field synthesis: Adaptation of the synthesis operator to the loudspeaker directivity characteristics. Journal of the Audio Engineering Society. 1996;44(12):1120-1131
  8. 8. Ahrens J, Spors S. An analytical approach to 2.5 D sound field reproduction employing linear distributions of non-omnidirectional loudspeakers. In: Proc. IEEE Int. Conf. Acoust. Speech and Signal Process. (ICASSP). 2010. pp. 105-108. DOI: 10.1109/ICASSP15600.2010
  9. 9. Koyama S, Furuya K, Hiwasaki Y, Haneda Y. Sound field reproduction method in Spatio-temporal frequency domain considering directivity of loudspeakers. In: 132nd Convention of the Audio Eng. Soc., Budapest, Paper 8664. 2012. Available from:
  10. 10. Poletti MAA, Betlehem T, Abhayapala THD. Higher-order loudspeakers and active compensation for improved 2D sound field reproduction in rooms. Journal of the Audio Engineering Society. 2015;63(1/2):31-45. DOI: 10.17743/jaes.2015.0003
  11. 11. Sound System Equipment – Part 21. Acoustical (Output-Based) Measurements. Standard of International Electrotechnical Commission IEC 60268–21; 2018
  12. 12. Choi J, Kim Y, Ko S. Near and far-field control of focused sound radiation using a loudspeaker Array. In: 129th Convention of the Audio Eng. Soc., San Francisco, Paper 8198. 2010. Available from:
  13. 13. Ma X et al. Nonlinear distortion reduction in sound zones by constraining individual loudspeaker control effort. Journal of the Audio Engineering Society. 2019;57(9):641-654
  14. 14. Cobianchi M, Mizzoni F, Uncini A. Polar measurements of harmonic and multitone distortion of direct radiating and horn loaded transducers. In: 134th Convention of the Audio Eng. Soc., Rome, Paper 8915. 2013. Available from:
  15. 15. Olsen M, Møller MB. Sound zones: On the effect of ambient temperature variations in feed-forward systems. In: 142nd Convention of the Audio Eng. Soc., Berlin, Paper 9806. 2017. Available from:
  16. 16. Pedersen KM. Thermal overload protection of high-frequency loudspeakers [Rep. of final year dissertation]. UK: Salford University; 2002
  17. 17. Klippel W. Loudspeaker and headphone design approaches enabled by adaptive nonlinear control. Journal of the Audio Engineering Society. 2020;68(6):454-464. DOI: 10.17743/jaes.2020.0037
  18. 18. Klippel W. Nonlinear Modeling of the heat transfer in loudspeakers. Journal of the Audio Engineering Society. 2004;52(1/2):3-25
  19. 19. Klippel W. Mechanical fatigue and load-induced aging of loudspeaker suspension. In: 131st Convention of Audio Eng. Soc., New York, paper 8474. 2011. Available from:
  20. 20. Klippel W, Schlechter J. Distributed mechanical parameters of loudspeakers, part 1: Measurements. Journal of the Audio Engineering Society. 2009;57(7/8):500-511
  21. 21. Sound system equipment - Part 22. Electrical and Mechanical Measurements on Transducers. Standard of International Electrotechnical Commission, IEC 60268–21; 2020
  22. 22. Klippel W. Loudspeaker nonlinearities – Causes parameters, symptoms. Journal of the Audio Engineering Society. Oct 2006;54(10):907
  23. 23. Sound System Equipment – Electro-acoustical Transducers – Measurement of Large Signal Parameters, Standard of International Electrotechnical Commission, IEC 62458; 2010
  24. 24. Klippel W, Bellmann C. Holographic nearfield measurement of loudspeaker directivity. In: 141st Convention of the Audio Eng. Soc., Los Angeles, Paper 9598. 2016. Available from:
  25. 25. Williams EG. Fourier Acoustics – Sound Radiation and Nearfield Acoustical Holography. London: Academic Press; 1999
  26. 26. Melon M et al. Comparison of four subwoofer measurement techniques. Journal of the Audio Engineering Society. 2007;55(12):1077-1091
  27. 27. Klippel W, Irrgang S. Audio system evaluation with music signals. In: AES International Conference on Automotive Audio, San Francico, Paper P4–2. 2017. Available from:
  28. 28. Klippel W. Measurement and application of equivalent input distortion. Journal of the Audio Engineering Society. 2004;52(9):931-947
  29. 29. Klippel W, Seidel U. Measurement of impulsive distortion, rub and buzz and other disturbances. In: 114th Convention of Audio Eng. Soc, Paper 5734. 2003. Available from:
  30. 30. IEC 62777. Quality Evaluation Method for the Sound Field of Directional Loudspeaker Array System. Standard of the International Electrotechnical Commission. 2016

Written By

Wolfgang Klippel

Submitted: February 25th, 2021 Reviewed: December 15th, 2021 Published: February 16th, 2022