Open access peer-reviewed chapter

Spatial Principal Component Analysis of Head-Related Transfer Functions and Its Domain Dependency

Written By

Shouichi Takane

Reviewed: 10 March 2022 Published: 21 April 2022

DOI: 10.5772/intechopen.104449

From the Edited Volume

Advances in Principal Component Analysis

Edited by Fausto Pedro García Márquez

Chapter metrics overview

108 Chapter Downloads

View Full Metrics

Abstract

In this chapter, the Principal Component Analysis (PCA) was adopted to spatial variation of Head-Related Transfer Function (HRTF) or its corresponding inverse Fourier Transform, called Head-Related Impulse Response (HRIR), in order to compactly represent their spatial variation. This is called the Spatial PCA (SPCA). The SPCA was carried out for a database of HRTFs in all directions by selecting the domain as one of the HRIRs, the complex HRTFs, the frequency amplitudes of HRTFs, log-amplitudes of HRTFs, and complex logarithm of HRTFs. The minimum phase approximation was incorporated for the frequency amplitudes and log-amplitudes of HRTFs. Comparison of the accuracies in both time and frequency domains taking into account their influence on subjective evaluation showed that the log-amplitudes and complex logarithm of HRTFs are suitable for the SPCA of HRTFs.

Keywords

  • spatial principal component analysis
  • head-related transfer function
  • head-related impulse response
  • domain
  • compact representation

1. Introduction

1.1 Head-related transfer function (HRTF)

Head-Related Transfer Function (HRTF) is defined as an acoustic transfer function from sound acquired at a center point when a listener is absent to that acquired at the listener’s ear [1] in a free field (a field without any reflection). A sample of its illustration is depicted in Figure 1. As in Figure 1(a), a microphone is located at the center of a subject’s head with the subject absent. The output YAz is obtained as the response to the input Xz by using the z-transform as follows:

Figure 1.

Definition of head-related transfer function. (a) Obtaining the response YAz at the center of a subject’s head with the subject absent. (b) Obtaining the response at the ear YEz.

YAz=MzHAzSzXz,E1

where Mz and Sz are system functions corresponding to the microphone and loudspeaker, respectively. As in Figure 1(b), the same microphone is located at the subject’s ear. The output YEz is also obtained as the response to the same input Xz fed to the same loudspeaker as follows:

YEz=MzHEzSzXz.E2

the z-transform of HRTF, Hz, is acquired from YAz and YEz as follows:

Hz=YEzYAz=HEzHAz.E3

Computation of Eq. (3) eliminates the system functions of Mz and Sz when the same microphone and loudspeaker are used for the acquisition of the HRTF, except the case that either of these system functions has zeros. The HRTF is obtained as Hzz=exp where j is imaginary unit, ω=2πf is the angular frequency and f is the frequency. Time domain representation (impulse response) corresponding to Hz is called as the Head-Related Impulse Response (HRIR).

The HRTF varies due to the sound source position and has strong individuality in both objective and subjective senses. Therefore a set of HRTFs is ideally acquired individually in all sound source directions. While a study considering the efficient sampling scheme of the HRTF measurement exists [2], data size of such set of HRTFs may become numerous. There also exist many datasets involving the HRTFs (HRIRs) of multiple subjects in multiple sound source directions [3, 4, 5, 6], but the individualization using these datasets seems difficult.

1.2 Virtual auditory display (VAD) utilizing head-related transfer functions

Virtual Auditory Displays (VADs), which is a device or an equipment for presentation of an audition in certain sound field to a listener, have been developed since 1990s [7]. On the other hand, the primitive form of the VADs was proposed in 1960s [1] and Morimoto et al. applied the theory into practice in 1980 [8]. Some of the VADs are known to be based on the synthesis of transfer functions involving the Head-Related Transfer Functions (HRTFs). They require the real-time processing on their variation due to the movement of the listener and/or the sound sources. Takane et al. proposed a theory of VAD named ADVISE (Auditory Display based on VIrtual SpherE model) [9], and reported an elemental implementation of the VAD based on ADVISE [10]. The listener’s own HRTFs in all directions are ideally essential in order to carry out the synthesis. Moreover, various implementations of VADs exist based on the synthesis of binaural sound signals using the HRTFs, [7, 11, 12, 13]. Taking into account a set of HRTFs acquired for an individual in all directions, its data size must be as compact as possible with their synthesis accuracy achieved to some extent.

A possible approach to the compact representation for spatial variation of an individual HRTF is modeling. Haneda et al. proposed the Common Acoustical-Pole and Zero (CAPZ) model [14]. In the CAPZ model, it is assumed that the poles in HRTFs are independent of sound source positions while their zeros are dependent on them. They indicated that the spatial variation of the HRTFs of a dummy head was modeled in acceptable accuracy. Based on this model, Watanabe et al. proposed the interpolation method and this method showed good interpolation accuracy [15]. The CAPZ model is useful for the compact representation of the HRTFs since the source-position-independent poles makes the total number of coefficients for the representation of the HRTFs with their spatial variation. The data amount decreased by using the CAPZ model, however, is up to 50% relative to the case that all HRTFs in all directions are represented by the FIR filters with fixed length.

1.3 Head-related transfer functions and principal component analysis

Another promising method for the compact representation of HRTFs is the Principal Component Analysis (PCA) [16, 17]. In some studies, the PCA has an alternative name, the spatial feature extraction method [18, 19, 20]. Both have their theoretical basis on the PCA or the Singular Value Decomposition (SVD). In these researches, the spatial variation of HRTFs is modeled by using small number of principal components or eigenvectors. Xie called the PCA adopted to the dataset(s) of HRTFs the Spatial PCA (SPCA) of HRTFs [19]. The author uses this name after Xie in this chapter. As a result of the SPCA, a HRTF in a certain direction is represented as the linear combination of relatively small number of fixed Principal Components (PCs), meaning that these components do not change according to the sound source positions against the listener. The coefficients for the PCs represent such variation. This property has a potential for effective real-time processing concerning their spatial variation due to dynamic factors. The VAD that can synthesize the HRTFs from multiple sound sources in real-time is currently available, for example by using the computational power of the Graphics Processing Units (GPUs) [21].

Many researches have been carried out on the SPCA of HRTFs [16, 17, 18, 19, 20], but there are some differences among these studies. One of the obvious differences is the domain to execute the SPCA. Kistler et al. applied the log-amplitude of the HRTF to the SPCA [17], Chen et al. applied the complex-valued frequency spectrum [18], and Xie applied the amplitude of the HRTF with the assumption of the minimum phase approximation [19]. On the other hand, Wu et al. applied the HRIRs [20]. Xie surveyed and summarized those results in his book [22]. These studies indicate that the SPCA can be successively and commonly adopted by using each domain. In contrast, the use of different domains may bring about the different properties in the results of the SPCA. If the HRTF/HRIR can be reconstructed by using the smallest number of PCs in a certain domain, the SPCA in that domain may bring about the most compact representations. There exists a study with the similar purpose. Liang et al. compared between the SPCA of the linear and logarithmic magnitudes of the HRTFs [23]. The conclusion of this research was that the SPCA on the linear magnitudes of the HRTFs was better than that on their logarithmic magnitudes in the reconstruction accuracy of their monaural loudness spectra. However, their used HRTFs were limited only in horizontal plane, and they only dealt with two domains with the assumption of the minimum phase approximation. Furthermore, Takane proposed the new domain for the SPCA, the complex logarithm of the HRTFs [24].

In this chapter, all domains dealt with the previous researches are picked up together and the compactness brought by the SPCA using each domain is compared.

Advertisement

2. SPCA of HRTFs/HRIRs

2.1 Outline

The SPCA of HRTFs/HRIRs is outlined in this section. It is a matter of course that the SPCA of HRTFs/HRIRs is based on the PCA.

  1. Spatial average of a certain set of M vectors gmm=1M is calculated as follows:

    gav=1Mm=1Mgm.E4

  2. Covariance matrix, denoted as R, is obtained by calculating the following equation:

    R=1Mm=1MgmgavgmgavH.E5

    It is noted that H indicates the Hermitian transpose. The size of the matrix R is N×N, where N indicates the size of the vector gm.

  3. The computed matrix R is decomposed into N pairs of PCs (eigenvectors) and eigenvalues by solving the following eigenvalue problem:

    Rqk=λkqk.E6

    As a result, a set of the eigenvalues and principal components (PCs), λk and qkk=1N, is obtained. Note that λk are sorted from their largest to smallest, i. e., λ1λ2λ3λN, and the PCs are also arranged to the corresponding eigenvalues.

  4. By using the matrix Q with qk in its column vector, the weighting vector, wm, corresponding to the m-th vector gm is calculated as follows:

wm=QHgmgav.E7

As a result of the SPCA, the weight wm is approximated by using q1qK1KN as follows:

wmK=QKHgmgav,E8

where QK is a matrix with its column vectors q1qK. Length of the vector wmK becomes K.

In the above-mentioned procedure, the vectors and matrices are assumed to have complex values. The Hermitian transpose H is changed to the simple transpose, T, if the values of them are real. The value m reflects the sound source position, and also the individuals if the HRTFs/HRIRs of multiple individuals are used for assembling the covariance matrix.

The m-th vector, gm, is reconstructed by using the PCs as follows:

gmK=QKwmK+gav.E9

The computed vector, gmK in Eq. (9), becomes acceptable approximation when K<N, but this may have acceptable accuracy in principle when the Cumulative Proportion of Variance (CPV) R2K is close to 1.0. The CPV is defined by using the eigenvalues of the covariance matrix, λkk=1N, as follows:

R2K=k=1Kλkk=1Nλk,E10

where N is the total number of components, equals to the length of the vector gmm=1M.

2.2 Domains used for the SPCA of the HRTFs/HRIRs

Five domains were applied to the assembly of the covariance matrix, based on the previous researches. Kistler et al. applied the log-amplitudes of the HRTF [17], Xie dealt the amplitudes of the HRTF [19]. The minimum-phase approximation was assumed in these studies. Chen et al. dealt the complex HRTF spectrum [18], and Takane propsed the usage of the complex logarithm of HRTF [24]. At last, Wu et al. applied the time domain representation of the HRTFs, i. e., HRIRs [20]. The domains used in these studies are treated as the modeling “domains” in this chapter. The domain “I” corresponds to the application of the HRIR to the SPCA, the domain “C” corresponds to the application of the complex HRTF, The domain “F” corresponds to that of the amplitude of the HRTF, the domain “L” corresponds to that of the logarithm of the HRTF amplitude, and the domain “CL” corresponds to that of the complex logarithm of the HRTF. This is summarized in Table 1.

DomainName
HRIRI
HRTFC
Amplitude of HRTFF
Log-amplitude of HRTFL
Complex logarithm of HRTFCL

Table 1.

Names of five domains expressing modeling conditions for the SPCA.

The m-th HRIR and HRTF are respectively expressed as hm and Hm, and Hm is further decomposed into its amplitude and phase components as follows:

Hm=AmexpjΘm,E11

where j is the imaginary unit. The complex logarithm of Hm can be written as follows:

logHm=logAm+junwrapΘm=Lm+junwrapΘm.E12

Here the imaginary part of logHm, equal to the phase of the HRTF, is assumed to be unwrapped [24]. The logarithm of the HRTF amplitude vector is defined as L, i.e.,LmlogAm. It is obvious in the domains C, F, L and CL that the frequency spectrum has the following symmetric relations:

Hmk=HmNk,logHmk=logHmNk,E13
Amk=AmNk,Lmk=LmNkk=1N,E14

where Hmk, Amk, Lmk are the k-th component of the vectors Hm, Am and Lm, respectively, and * denotes the conjugate. The relations in Eqs. (13) and (14) indicate that the vector lengths can be almost halved in these domains. When the covariance matrices assembled in the domains I, C, F, L and CL are respectively denoted as RI, RC, RF, RL and RCL, the size of RI is N×N, while those of RC, RF, RL and RCL are N/2+1×N/2+1. In this point, the domains C, F, L and CL have the advantage in the compactness compared with the domain I. On the other hand, components of RC and RCL are complex while those of the covariance matrices in the other domains are real.

The domains I, C, F, L and CL mean that hm, Hm, Am, Lm and logHm are respectively the used domains for the SPCA. When their approximations are obtained by using the first K PCs, they are respectively denoted as hmIK, HmCK, AmFK, LmLK and logHmCLK. The vectors concerning the HRIR or the HRTF are calculated by using the ones estimated via the SPCA in each domain:

Domain I: From hmIK, HmIK is obtained by using Fast Fourier Transform (FFT), and AmIK is the amplitude corresponding to HmIK.

Domain C: From HmCK, AmCK is obtained by computing the corresponding amplitude. After the length of the vector HmCK is increased by applying the relation of Eq. (13), hmCK is obtained by using inverse FFT (IFFT).

Domain F: From AmFK, HmFK is estimated by computing the following equation with its minimum phase components ΘmFK calculated using Hilbert transform [25]:

HmFK=AmFKexpjΘmFK.E15

After the length of the vector HmFK is increased by applying the relation of Eq. (14), the IFFT of HmFK reveals the estimates of HRIR, denoted as hmFK.

Domain L: From LmLK, AmLK is obtained by calculating the following equation:

AmLK=expLmLK.E16

Then as in the domain F, HmLK is estimated by computing its minimum phase components ΘmKK using Hilbert transform as follows:

HmLK=AmLKexpjΘmLK.E17

After the length of the vector HmLK is increased by applying the relation of Eq. (14), the IFFT of HmLK reveals the estimates of HRIR, denoted as hmLK.

Domain CL: From logHmCLK, HmCLK is obtained by calculating the following equation:

HmCLK=explogHmCLK.E18

The following procedure is the same as the domain C. AmCLK is obtained by computing the corresponding amplitude. After the length of the vector HmCLK is increased by applying the relation of Eq. (13), hmCLK is obtained by using inverse FFT (IFFT).

Advertisement

3. Relation between number of PCs and accuracy

3.1 Conditions of analysis

A database of HRIRs of KEMAR HATS (Head And Torso Simulator) provided by Media lab. of MIT [26] was used. Liang et al. used the same data in their study with the similar purpose to the one in this chapter. While they used the HRTFs only in horizontal plane [23], all data in this database involving 710 pairs of HRIRs (total: 1420) with sampling frequency of 44.1 kHz were used for the investigation in this chapter. Number of HRIRs is 1420, corresponding to M in Eqs. (4) and (5).

The initial delay in each response was extracted, then 256 sample points were taken as the data for the analysis, windowing with latter half of 512-points Blackman-Harris window function adjusting its peak at that of the HRIR. The SPCA was executed by constructing the covariance matrices from the HRIRs (called as domain I), the HRTFs (domain C), the amplitude of HRTFs (domain F), the log-amplitude of HRTFs (domain L), and the complex logarithm of HRTFs (domain CL). The HRTFs/HRIRs in all directions (710 directions×2 ears) were used, and the average vector (Eq. (4)) and the covariance matrix (Eq. (5)) were calculated in each case.

3.2 Cumulative proportion of variance (CPV)

When the PCA is generally utilized for some data, the Cumulative Proportion of Variance (CPV), R2K, defined as Eq. (10), is used for the reference indicating how much variance is covered by using the first K PCs. The change of the CPV with PC(s) in each domain was plotted in Figure 2. It is found out from this figure that the CPV is monotonically increased and converges to 1.0 as the number of component(s) is increased in all domain. Among five domains, the domain CL has the largest CPV value for the first PC. The domain C has the fastest increase of the CPV against the number of PC(s), and its CPV value for the domain C is almost the same as that for the domain CL when the number of components is more than 7. In contrast, the domain L has the slowest increase, especially when the number of components is more than 15. This means that relatively large number of PCs is required to cover a certain proportion of variance in data.

Figure 2.

Change in cumulative proportion of variance (CPV) with number of components in SPCA for five domains.

Reference values in the CPV, more than which all the corresponding PCs are discarded, varied in the previous studies. Kistler et al. set this value to 0.90 [17], Chen et al. and Wu et al. set to 0.999 [18, 20], and Xie set this to about 0.98 [19]. Direct comparison of these values is impossible since the amount of data and analyzing purposes were different in those studies, but all of these values are more than 0.9. Therefore the least numbers of components to cover four values of the CPV, 0.90, 0.95, 0.99 and 0.999 for five domains are indicated in Table 2. Seeing Table 2, the domain CL has the smallest values among five domains in all of the CPV values, and the domain C has almost the same property. This means that the variance in the spatial variation of HRTFs can be covered by using relatively small number of PCs in these domains. The domains F and L also have smaller number of PCs when the set CPV value is small. In these domains the major PCs having large corresponding eigenvalues cover the major part of variance in data. The required number of PCs increases in the domain L when the set CPV value is large. Varying the CPV values from 0.90 to 0.999, the required number of PCs becomes five to six times in the domains I, C, F and CL, while more than ten times are required in the domain L.

DomainCPV
0.900.950.990.999
I8102039
C461120
F571431
L6113278
CL241239

Table 2.

The least number of PCs to cover the CPV in each case.

3.3 Reconstruction accuracy in time and frequency domains

The CPV is known to be an effective criterion for the coverage of variance with a certain number of PCs. However, comparison of the CPVs among five domains is impossible since the covariance matrices as the target for the PCA are different from each other. Therefore, the reconstruction accuracy, defined as the accuracy between the original HRTFs/HRIRs and the ones reconstructed with a certain number of PCs in five domains. In this chapter, the following two measures were computed in order to evaluate the reconstruction accuracy for the SPCA in five domains in both time and frequency domains:

Signal-to-Deviation Ratio (SDR): Signal-to-Deviation Ratio (SDR) is defined as the level difference between the energy (Euclid norm) of the original impulse response and that of the deviation:

SDRhĥ=10log10hhĥdB,E19

where h and ĥ respectively indicate the original and the reconstructed HRIRs, indicates the Euclid norm of the vector. The larger SDR corresponds to the closer ĥ to h.

Spectral Distortion (SD): Spectral Distortion (SD) is defined as standard deviation in log-amplitudes of two frequency spectra, as follows:

SDAÂ=1Nfk=0Nf120log10AkÂkdB,E20

where A and  are the frequency amplitude spectrum of the original and the reconstructed responses, respectively, and Ak and Âk are the k-th components of the vectors A and Â, respectively. The value of Nf is the number of frequency bin closest to 20 kHz. The smaller SD corresponds to the closer  to A.

Calculating the SDRs for the domains F and L, the corresponding original impulse responses are ones constructed with its minimum phase approximation, which are different from the ones in the domains I, C and CL. It is noted that the SDRs in each domain were computed as how much the reconstructed impulse response differs from the desired one. Such a treatment was not related to the calculation of SDs since the SD is defined by using only magnitude of the original and the reconstructed HRTFs.

3.3.1 Changes of SDR and SD with source direction

Examples of the changes of the SDR with the source direction are plotted in Figures 36. Figures 36 are figures for the SDR and the SD, respectively. For the elevation, elevation angles of 0°, 90° and − 90° respectively correspond to the horizontal plane, above and below the subject. The lines were plotted in these figures with 20° interval from −40° to 80°, namely the 6 lines are in each figure. For the azimuth angles, their arrangements are the same as the original data [26], i. e., 0°, 90°, 180° and 270° respectively correspond to the front, right, back, and left of the subject. In each figure, number of components in each domain was set to the least value satisfying two of the CPVs in Table 2. The first value is 0.95 in Figures 3 and 5, and the second is 0.999 in Figures 4 and 6.

Figure 3.

Change of SDR in source azimuth at various elevation in each cases with number of PCs set to the least value achieving the cumulative proportion of variance of 0.95 in Table 2. (a) Domain I (No. of PCs = 10). (b) Domain C (No. of PCs = 6). (c) Domain F (No. of PCs = 7). (d) Domain L (No. of PCs = 11). (e) Domain CL (No. of PCs = 4).

Figure 4.

Change of SDR in source azimuth at various elevation in each cases with number of PCs set to the least value achieving the cumulative proportion of variance of 0.999 in Table 2. (a) Domain I (No. of PCs = 39). (b) Domain C (No. of PCs = 20). (c) Domain F (No. of PCs = 31). (d) Domain L (No. of PCs = 78). (e) Domain CL (No. of PCs = 39).

Figure 5.

Change of SD in source azimuth at various elevation in each cases with number of PCs set to the least value achieving the cumulative proportion of variance of 0.95 in Table 2. (a) Domain I (No. of PCs = 10). b) Domain C (No. of PCs = 6). (c) Domain F (No. of PCs = 7). (d) Domain L (No. of PCs = 11). (e) Domain CL (No. of PCs = 4).

Figure 6.

Change of SD in source azimuth at various elevation in each cases with number of PCs set to the least value achieving the cumulative proportion of variance of 0.999 in Table 2. (a) Domain I (No. of PCs = 39). b) Domain C (No. of PCs = 20). (c) Domain F (No. of PCs = 31). (d) Domain L (No. of PCs = 78). (e) Domain CL (No. of PCs = 39).

Seeing these figures, macroscopic tendency in the change of the SDR and SD with the number of PCs is similar: the larger CPV value brings about the larger SDR and smaller SD, and these values are roughly the same when the CPV values is set equal among five domains. In contrast, it should be emphasized that the values of SD for the domains L and CL are smaller than those in the other domains, as shown in Figure 5(d),(e) and 6(d),(e). The domains using the real and complex logarithm may give relatively smaller distortions in frequency domain.

Seeing the properties in time domain according to Figures 3 and 4, the values of SDR are gradually higher when the CPV value is larger. When the azimuth corresponds to the contralateral side (around 250° ∼ 300°) and especially at the lower elevation angles (less than 0°), the relatively smaller SDR values are found out also commonly in all domains In these azimuths and elevation angles, the relatively larger SD values are also observed, as shown in Figures 5 and 6. Xie stated the same points in his articles [19, 22]. In those range of directions, the HRIRs are very small in their energy because of the subject’s head making a “shadow” making the sound from the sound source hard to reach especially in the high frequency range [1]. As a result, the HRIRs in those directions are relatively difficult to be reconstructed with a small number of PCs.

3.3.2 Spatial average of SDR and SD

In order to show the macroscopic tendencies of the relation between reconstruction accuracies and domains, accuracies in time and frequency domains with number of PCs set to K are computed in a certain domain X (X = I,C,F,L,CL), the overall average of SDR and SD were calculated by using the following equations:

AvSDRXK=10log101Mm=1M10SDRhmhmXK/10,E21
AvSDXK=1Mm=1MSDAmAmXK2.E22

Changes of the average SDR and SD with number of component(s) K in each case are plotted in Figure 7 respectively. It is clearly found from these figures that the reconstruction accuracy improves (the larger SDR and the smaller SD) commonly in all domains as the number of PC(s) increases. This means that the CPV corresponds to the tendency of the average accuracies in both time and frequency domains. However, these values are different among five domains. Seeing Figure 7(a), the largest average SDR is achieved with the domain C in most of number of PCs. The domains I and F have the similar tendency, and the domains L and CL are the lowest SDR values when the number of PC(s) is more than 5. On the other hand, as shown in Figure 7(b), the domains L and CL have exceptionally the lowest SD in almost all number of PCs. The domains L and CL has the best accuracy in frequency domain. The third lowest average SD is obtained in the domain C, and the value is almost the same as in the domains L and CL when the number of PCs are relatively large (≥35).

Figure 7.

Changes of (a) average SDR and (b) average SD with number of component(s) in five domains. For SDR, five lines respectively corresponds to AvSDRIK, AvSDRCK, AvSDRFK, AvSDRLK and AvSDRCLK, and for SD, they respectively corresponds to AvSDIK, AvSDCK, AvSDFK, AvSDLK and AvSDCLKK=140. Note that average SDR is calculated with its reference to the minimum phase HRIRs in the domains F and L.

Advertisement

4. Discussion

In the previous section, the SDR and SD computed in five domains were compared. It was shown that the larger number of PCs brings about the accurate reconstruction of HRTFs and HRIRs in all domains. As differences in domains in the SPCA of HRTFs and HRIRs, the average SDR has the largest value in the domain C, and the average SD has the smallest value in the domains L and CL for almost any given number of components. Since the HRTFs and HRIRs are used for the sound signals at the listener’s both ears, it is essential to take their subjective evaluation into account. Most of the previous researches dealt with both the objective and subjective evaluation [17, 19, 27, 28]. However, three investigations can be found, in which the relation of the subjective evaluation to the SDR and SD values. Hanazawa et al. reported the results of a hearing experiment in which the relation between the accuracy of the interpolated HRIRs with their proposed method and the sound localization performance when the sound stimuli convoluted with them were presented [29]. They showed that around 6 dB in SDR had insignificant difference from the performance when the sound stimuli convoluted with the original HRIRs. Takane et al. also carried out an experiment to evaluated his proposed estimation method of HRIRs from the impulse responses obtained in ordinary room with reflection [30]. The results showed that the subjects did not detect significant difference between the stimuli synthesized from the estimated and the original HRIRs when the SDR between them was more than about 20 dB. Nishino et al. investigated the interpolation accuracy of HRTFs in the median plane, although they did not carry out the subjective evaluation. The least (best) accuracy of their interpolation method is 2 dB in average SD [31]. Considering these researches, the least numbers of PCs to achieve the average SDR more than 20 dB and the average SD less than 2 dB were checked from the results of the SDR and SD computation. The results are shown in Table 3. It is found out from Table 3 that number of PCs satisfying each condition has contrastive feature. The domains I, C and F have relatively small numbers satisfying the condition of average SDR > 20 dB, meaning that these domain can reconstruct the HRIRs in relatively high accuracy with small numbers of PCs. On the contrary, The domains L and CL have relatively small numbers satisfying the condition of average SD < 2 dB, meaning that these domains can reconstruct the amplitude of HRTFs relatively high accuracy with small numbers of PCs. The domain C has the balanced property in both time and frequency domain accuracies.

DomainAverage SDR > 20 dBAverage SD < 2 dB
I1944
C722
F1939
L2514
CL2514

Table 3.

The least number of components to achieve average SDR > 20 dB and average SD < 2 dB in each domain.

It is known that the frequency-domain spectral features in the HRTFs are important for the sound localization especially in the median plane [1, 32]. Iida et al. proposed a parametric model of the HRTF focused on the peaks and notches in the frequency domain, and showed that the first and second lowest notches (called N1 and N2, respectively) in their frequency spectra contribute to the subjects’ perceived elevation [33]. These results may state that the local frequency domain features may cause the difference in the listener’s perceived direction, and the reconstruction accuracy in frequency domain must be well taken into account to avoid such potential difference. These researches support the accuracy in frequency domain is more important than that in time domain. Based on such subjective properties together with the objective properties shown in this chapter, the domain L and CL are suitable for the SPCA of the HRTFs more than the other domains. On the other hand, the domains L and CL require relatively large number of PCs to achieve the CPV values when the CPV is closer to 1. The previous researches indicate that the reconstruction with relatively small number of PCs could make the HRTFs/HRIRs without audible difference [17, 19, 27, 28], therefore it is expected that the domains L and CL can bring about the acceptable subjective evaluation with small number of PCs. The difference in the domains for the SPCA must be investigated more in detail especially based on the subjective evaluation, which is one of the future studies concerning the contents of this chapter.

Advertisement

5. Summary

In this chapter, the SPCA of the HRTFs was introduced, and its dependency on the domains, in which the covariance matrices are calculated, was investigated. The following points are the summary of the findings in this chapter:

  • The SPCA can be carried out commonly for all domains, i. e., the HRIRs (domain I), the (complex) HRTFs (domain C), the amplitude spectrum of HRTFs (domain F), logarithm of the amplitude spectrum of HRTFs (domain L), and the complex logarithm of HRTFs (domain CL).

  • For the domains except the domain I, the covariance matrices can be sized down to about 1/4 of the covariance matrix assembled for the domain I, according to the symmetric property of the frequency spectrum.

  • The domains I, C and F have relatively small numbers of PCs in order to achieve high time domain accuracy.

  • The domains L and CL have relatively small numbers of PCs in order to achieve high frequency domain accuracy.

  • Considering the influence on the subjective evaluation of the reconstructed HRTFs/HRIRs with their SPCA, the domains L and CL, bringing about relatively high accuracy in frequency domain, are more suitable for the SPCA of the HRTFs.

References

  1. 1. Blauert J. Spatial Hearing. Cambridge, Massachusetts, USA: MIT Press. 1983
  2. 2. Zhang W, Zhang M, Kennedy RA, Abhayapala TD. On high-resolution head-related transfer function measurements: An efficient sampling scheme. IEEE Transductions on Audio, Speech & Language Processing. 2012;20(2):575-584
  3. 3. Algazi VR, Duda RO, Thompson DM. The CIPIC HRTF database. In: Proceedings of the 2001 IEEE Workshop of the Applications of Signal Processing to Audio and Acoustics; New Platz. 2001 Cat No. 01TH8575 (5 pages)
  4. 4. Watanabe K, Iwaya Y, Suzuki Y, Takane S, Sato S. Dataset of head-related transfer functions measured with a circular loudspeaker array. Acoustical Science & Technology. 2014;35(3):159-165
  5. 5. Bomhardt R, Klein MF, Fels J. A high-resolution head-related transfer function and three-dimensional ear model database. Proceedings of Meetings on Acoustics. 2016;29:050002
  6. 6. Brinkmann F, Dinakaran M, Pelzer R, Grosche P, Voss D, Weinzierl S. A cross-evaluated database of measured and simulated HRTFs including 3D head meshes, anthropometric features, and headphone impulse responses. Journal of the Audio Engineering Society. 2019;67(9):705-718
  7. 7. Begault DR. 3-D Sound for Virtual Reality and Multimedia. Cambridge, Massachusetts, USA: AP Professional. 1994
  8. 8. Morimoto M, Ando Y. On the simulation of sound localization. Journal of the Acoustical Society of Japan (E). 1980;1(3):167-174
  9. 9. Takane S, Suzuki Y, Miyajima T, Sone T. A new theory for high definition virtual acoustic display named ADVISE. Acoustical Science & Technology. 2003;24(5):276-283
  10. 10. Takane S, Takahashi S, Suzuki Y, Miyajima T. Elementary real-time implementation of a virtual acoustic display based on ADVISE. Acoustical Science & Technology. 2003;24(5):304-310
  11. 11. Otani M, Hirahara T. A dynamic auditory display: Its design, performance, and problems in HRTF switching. In: Proceedings of Japan-China Joint Conference on Acoustics; 4–6 June 2007; Sendai: Acoustical Society of Japan/Acoustical Society of China. 2007. SS-1-3
  12. 12. Yairi S, Iwaya Y, Suzuki Y. Estimation of detection threshold of system latency of virtual auditory display. Applied Acoustics. 2007;68(8):851-863
  13. 13. Miller JD. Slab: A software-based real-time virtual acoustic environment rendering system. In: Proceedings of the 7th International Conference on Auditory Display; 29 July-1 August; Espoo: International Conference on Auditory Display. 2001. pp. 279-280. https://smartech.gatech.edu/handle/1853/50648
  14. 14. Haneda Y, Makino S, Kaneda Y, Kitawaki N. Common acoustical-pole and zero modeling of head-related transfer functions. IEEE Transductions on Speech and Audio Processing. 1999;7:188-196
  15. 15. Watanabe K, Takane S, Suzuki Y. A novel interpolation method of HRTFs based on the common-acoustical-pole and zero model. Acta Acustica united with Acustica. 2005;91:958-966
  16. 16. Martens WL. Principal components analysis and resynthesis of spectral cues to perceived direction. In: Proceedings of the International Computer Music Conference; Champaign/Urbana: International Computer Music Conference. 1987. pp. 274-281. https://hdl.handle.net/2027/spo.bbp2372.1987.040
  17. 17. Kistler DJ, Wightmann FL. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. Journal of the Acoustical Society of America. 1992;91(3):1637-1647
  18. 18. Chen J, Van Veen BD, Hecox KE. A spatial feature extraction and regularization model for the head-related transfer functions. Journal of the Acoustical Society of America. 1995;97(1):439-452
  19. 19. Xie B. Recovery of individual head-related transfer functions from a small set of measurements. Journal of the Acoustical Society of America. 2012;132(1):282-294
  20. 20. Wu Z, Chan FHY, Lam FK, Chan JCK. A time domain binaural model based on spatial feature extraction for the head-related transfer functions. Journal of the Acoustical Society of America. 1998;102(4):2211-2218
  21. 21. Watanabe K, Oikawa Y, Sato S, Takane S, Abe K. Development and performance evaluation of virtual auditory display system to synthesize sound from multiple sound sources using graphics processing unit. In: Proceedings of the 21th International Congress on Acoustics; Montreal: International Commission for Acoustics. 2013 2pEAba12 (7 pages in CD-ROM)
  22. 22. Xie B. Head-Related Transfer Function and Virtual Auditory Display. Second ed. Plantation, Florida, USA: J. Ross Pub. 2013
  23. 23. Liang Z, Xie B, Zhong X. Comparison of principal components analysis of linear and logarithmic magnitude of head-related transfer functions. In: Proceedings of the 2nd IEEE International Congress on Image and Signal Processing; Tianjin: IEEE. 2009. pp. 1-5. https://doi.org/10.1109/CISP.2009.530427
  24. 24. Takane S. Spatial principal component analysis of head-related transfer functions using their complex logarithm with unwrapping of phase. In: Proceedings of the 23rd International Congress on Acoustics; 9–13 September; Aachen. 2019. pp. 3048-3055
  25. 25. Oppenheim AV, Schafer RW. Discrete-time signal processing. third ed. Lebanon, Indiana, USA: Prentice Hall. 2010
  26. 26. Gardner WG, Martin KD. HRTF measurements of a KEMAR. Journal of the Acoustical Society of America. 1995;97:3907-3908
  27. 27. Matsui K, Ando A. Estimation of individualized head-related transfer function based on principal component analysis. Acoustical Science & Technology. 2009;30(5):338-347
  28. 28. Fink KJ, Ray L. Individualization of head-related transfer functions using principal component analysis. Applied Acoustics. 2015;87:162-173
  29. 29. Hanazawa K, Yanagawa H, Matsumoto M. Subjective evaluations of interpolated binaural impulse responses and their interpolation accuracies. In: Proceedings of the Spring Research Meeting of the Acoustical Society of Japan; Tokyo: Acoustical Society of Japan. 2006. pp. 677-678 (in Japanese)
  30. 30. Takane S, Nabatame S, Abe K, Watanabe K, Sato S. Subjective evaluation of HRIRs linearly predicted from impulse responses measured in ordinary sound field. In: Proceedings of the 39th Audio Engineering Society Japan Regional Conference; Osaka: Audio Engineering Society. 2008. pp. 1-8 (in CD-ROM)
  31. 31. Nishino T, Kajita S, Takeda K, Itakura F. Interpolating head related transfer functions in median plane. In: Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; New Paltz: IEEE. 1999. pp. 17-20. https://doi.org/10.1109/ASPAA.1999.810876
  32. 32. Asano F, Suzuki Y, Sone T. Role of spectral cues in median plane localization. Journal of the Acoustical Society of America. 1990;88(1):159-168
  33. 33. Iida K, Itoh M, Itagaki A, Morimoto M. Median plane localization using a parametric model of the head-related transfer function based on spectral cues. Applied Acoustics. 2007;68:835-850

Written By

Shouichi Takane

Reviewed: 10 March 2022 Published: 21 April 2022