List of the term of monosyllables .
The influence of indoor speech intelligibility and apparent source width (ASW) on the response of cortical brainwaves was studied using two variables, the time gap between direct and the first reflection (Δt1, ms) and the initial (<80 ms) interaural cross-correlation function (IACCE3). Comparisons were performed based on autocorrelation function (ACF) of continuous brainwave (CBW) and slow vertex response (SVR). The results are: (1) the effective delay time of ACF (τe) of β-waves (13–30 Hz) in the left hemisphere under changes in Δt1 was significantly and positively correlated with speech intelligibility (p < 0.001). (2) As ASW increased, the relative amplitude of left hemisphere A (P2-N2) tended to decrease (p < 0.05) in SVRs, while N2 latency tended to increase (p < 0.05); the lateral lemniscus in the auditory nerve was suggested to be the reactive site. (3) With regard to hemispheric specialization in brain, speech intelligibility, the main temporal factor, was found to be controlled by the left hemisphere. A subjective spatial factor, ASW, the relative amplitude of SVR was also found to decrease in the left hemisphere; nevertheless, they are coherent while the N2 latency of SVR significantly prolonged in both left and right hemisphere under changes in IACCE3.
- apparent sound width
- speech intelligibility
- subjective diffuseness
- hemispheric specialization
In human speech cognition, speech intelligibility integrates short-term memory and cerebral feedback . However, important factors constituting the spatial impressions of sound also include certain related evaluation indicators, such as the listener’s judgment of sound source direction (sense of direction) and distance (sense of proximity), apparent source width (ASW), and lateral envelopment (LEV). As suggested by Ando  and Beranek , the composition of such spatial impressions mainly depends on fluctuations of the magnitude of the interaural cross-correlation (IACC) and is especially affected by the degree of subjective diffusion of the sound field. However, listeners differ in their needs and perceptions regarding subjective diffusion and ASW.
With regard to neuron-psychology, Sperry  discovered the phenomenon of hemispheric disconnect. The cerebral specialization theory distinguishes between “speech functions” and “non-speech functions.” Certain symbols in architectural design belong to non-speech functions. For instance, the range of non-speech functions includes aesthetic perception and the feeling of balance. In particular, many non-speech symbols can be observed in environmental design. Earlier research on audio and cerebral correlations found that such common medical problems as aphasia and disturbances in tone judgment originate in the left cerebral hemisphere. Therefore, this study suggested that cerebral responses to speech and non-speech symbol in the physical environment effectively substitute for the semantic differences (SD) caused by age-related and cultural differences. Cerebral responses to communication stimuli are a direct cross-cultural and cross-age reference indicator, which is similar to the principle behind polygraph tests performed by police to examine physiological responses.
This study suggested that cerebral responses can be used to clearly and consistently examine responses to change in “speech functions” of the physical environment, or speech intelligibility, when designing a sound field. Ando  considered “speech functions” to be an important temporal factor and the result of autocorrelation function (ACF) evaluations in the brain. Therefore, the environmental effects of temporal factors were examined in this study based on the influence of speech intelligibility on the correlation between “subjective perceptions” and cerebral responses, which served as the basis for the objective design of an acoustic environment. Akita et al.  indicated that when the sensory information received by listeners is analyzed by brainwaves, this does not represent their direct experience of changes in the environment, but rather the interaction between physiology and the environment. This phenomenon is common in daily life. The intensity of cerebral evoked responses is the optimal evaluation tool . Soeta et al.  studied the effects of sound source features on subjective psychological responses and cerebral responses measured by magnetoencephalography (MEG) and reported that at different delay times of reflection sounds (Δt1 = 0, 5, 20, 60, and 100 ms) and 50 alternations, the ACF effective delay time of α-waves recorded by MEG indicated subjective preferences regarding sound fields. The methods used in this study can be summarized as follows:
The first reflection delay (Δt1) was changed to change speech intelligibility. The degrees (or process) of subjective recognition of Chinese monosyllables were determined by comparing ACF calculation results related to α-waves and β-waves among cerebral continuous brainwaves (CBW).
The IACCE3 was changed to change subjective ASW. Changes in the waveforms of auditory evoked potentials (AEPs) during listeners’ perceptions of spatial ASW were analyzed.
2. Empirical methods
2.1 Psychological test of intelligibility
This study used monosyllabic speech sound articulation and IACCE3 to quantify changes in two subjective experiences, namely, speech intelligibility and ASW. With regard to speech intelligibility, the fifth group of common Chinese monosyllabic speech sounds used in Taiwan  (female voice, Table 1) was used. Test results related to this group of monosyllabic sounds are characterized by the largest disparity in error rates because most related sounds belong to “fricative sounds” (i.e., apical vowels, such as “zh,” “ch,” “sh,” “r,” “z,” “ci,” and “si” in Bopomofo system). The amounts of fricative and non- fricative rhymes are balance (eight versus ten, respectively). The sound structure of Mandarin differs from that of other languages. In Mandarin, each character is pronounced as a monosyllable with one of five tones (i.e., types of pitch contour). Each of these tones (0–4), when used with a given monosyllable, causes the monosyllable to convey a meaning distinct from those conveyed when the monosyllable is used with the other four tones. Utterance lengths in the experiment were set to 400–500 ms. Monosyllabic presents were separated by 2.5 s. The experiment was arranged according to the arrangement used in the study by Chen et al. .
The experiment was conducted in front of two overlapping loudspeakers in a semi-anechoic room (4 × 3 and 4 m in height) at Chaoyang University of Technology. The loudspeakers (Fostex NF-1A) were located at 1.5 m right front of the center of a listener’s head. The first reflected sound was given off by the upper loudspeaker (η = 15) while another gave off the direct sound (η = 0). To vary speech intelligibility, the speech signal was assumed that emitted from the stage with a direct and a reflection sound reflected through the ceiling of the stage. The listening level was adjusted to a usual communicative sound volume of 62 dB(A) at the center of the room. The level of background noise in the semi-anechoic room was 32–42 dB(A), then the S/N ratio are approximate to 30–40 dB. The setup of the instrumental diagram (EEG recordings) could be referred to Figure 1, since they were same as that in the spatial ASW experiment stated below. The settings of the physical parameters used in the experiment are shown in Table 2. Figure 2 shows the experimental results that indicate 62 listeners who were significantly able to distinguish sounds using percentage syllable articulation (PSA) tests . To determine PSA, those written syllables are compared with the original syllables to find the percentage of syllables written correctly.
|Item||Conditions of experiments|
|Δt1 (ms)||Delay gap: 0 ms, 35 ms, 100 ms, 150 ms, 200 ms|
|SPL of individual loudspeakers||Direct sound: 60 dB(A); first reflection, Δt1: 55 dB(A)|
|Reverberation times||RT ≑ 0.1 s|
2.2 Psychological quantification test of ASW
The paired-comparison method  was used in the psychological quantification test of subjective ASW. The experiment was conducted in the same venue as the first experiment. Three loudspeakers (one for direct sounds and two for reflected sounds) were located at 1.5 m from the center of a listener’s head; the incidence combinations (ξ, η) are: (0°, ±15°), (0°, ±55°), (0°, ±90°) and (0°, +15°, −55°) on the horizontal plane. 2 kHz pure-tone (1 ms) sounds were produced. The IACCE3 (0.35, 0.57, 0.68, and 0.81) [12, 13] of the sound field was changed by changing the angle of incidence stated above and the sound pressure level. As a result, different subjective ASWs were generated (Table 3). The instrumental setup of testing spatial ASW and the process of AEPs recordings are interpreted in Figure 1. The participants (80 students) determined ASWs using paired comparisons. The interval between sound prompts within one group was 2 s and the interval between groups was 10 s; in total, six groups were used. The participants were asked to immediately determine and record the relative probability of ASWs. Each questionnaire was conducted for 1 min. The psychological scale values of ASWs are shown in Figure 3 calculated using Thurstone’s Case V . Non-linear correlation was observed in the IACCE3 result .
|IACCE3 (setup values)||Amplitude of direct sound (A0)||I-1, SPL/dB(A)||Amplitude of first reflection (A1)|
|I-2, SPL/dB(A)||Amplitude of second reflection (A2)||I-3, SPL/dB(A)||ΣL Total SPL dB(A)|
2.3 Brainwave physiological experiment methods
2.3.1 Brainwave analysis method
After the fast Fourier transform (FFT) was applied to the brainwaves, ACF of CBW calculations were performed for the α-waves (8–13 Hz) and β-waves (13–30 Hz) of the left and right hemispheres. In the earlier study by Chen and Ando , 100 Hz α-waves and 500 Hz β-waves were sampled according to the sampling frequency laws and, after A/D conversion (16 bits), input into a computer to calculate the effective duration (τe) of CBWs’ ACF (Figure 4). In ACF calculations of τe values in the study by Chen and Chan , the 0.3 s integration time (2 T) of monosyllabic speech sounds was suggested to be the most effective. Eventually, the monosyllabic signals were played in this study included simulation of the first delay time . Therefore, the integration time (2 T) of ACF of continuous brainwaves (CBW) used in calculation was adjusted to 0.5 s. As shown in Figure 4, substantial differences were observed in the ACF waveforms of α-waves and β-waves under the same first delay time settings.
To explore the changes in subjective perceptions of ASW, AEPs of nine participants were induced, recorded and analyzed as in the psychological intelligibility experiment. However, a spatial impression of a sound signal is a short-term memory phenomenon. Therefore, waveforms induced by the brain AEPs are normally used to observe changes in responses to weak brainwave signals (about 10–100 μV in amplitude when measured from the scalp). Clear consistent brain waveforms are usually obtained by applying the signal averaging method  to responses that occur within 500 ms after auditory stimulation (Figure 5). In this study, 180 times of averaging process was applied here since the wave form of slow vertex responses (SVR) were clearly obtained. The movements (latency) of waveform peaks and troughs in the wave relative amplitude can reflect the activation of different parts of auditory nerves [19, 20]. As shown in Figure 5, this study changed ASW perceptions by changing the sound arrival orientation and energy while fixed reflection delay and echo times (Δt1 < 15.5 ms, RT ≑ 0.1 s) (Table 3). This study suggested that brain waveforms could be observed using SVR because the preset reaction time exceeded 10 ms. In such way, AEPs were obtained. Consequently, when potentials P1, P2, N1, and N2 of posterior waveforms (30–200 ms) among different AEPs were generated by different sound stimulus, the relative amplitudes of P1-N1 and P2-N2, and P1, P2, N1, and N2 latency were observed .
2.3.2 Brainwave recording method
With regard to brainwave recording, eight participants (and other nine in the AEPs experiments) sat on comfortable office chairs in the semi-anechoic room at Chaoyang University of Technology and their brainwaves were induced and recorded. The room temperature was maintained at 22 ± 2°C. All subjects were prohibited from drinking any alcohol for a period of 3 days before the brainwave recordings were conducted, and they refrained from smoking for 1 h before both experiments. They were instructed to concentrate on listening to the signals during the presentation. The participating subjects were eight male students (plus another nine male students in the AEPs experiments) aged 22–24 years old with normal hearing ability, as confirmed by an audiometry test and right-handed test (self-administered). The audiometry test detects sensorineural hearing loss (damage to the nerve or cochlea) and conductive hearing loss (damage to the eardrum or the tiny ossicle bones). Pure-tone subjective audiometry, in which air conduction hearing thresholds in decibels (dB) for a frequency range of 250–8000 Hz are plotted on an audiogram for each ear independently, was applied. All of the subjects had to be qualified as normal with a pure-tone audiogram (less than 25 dB) for both ears prior to the brainwave experiments and questionnaires.
Electrodes used to explore brainwaves were positioned at the participants’ T3 and T4 head points according to the international 10–20 system . Electric potentials were examined using eardrops on the left and right sides. Unipolar induction of continuous brainwaves in the left and right hemispheres was performed. The G2 electrode was attached between the eyebrows for eye movement reference. The electrode system was grounded each time the brainwaves were recorded in order to avoid external electric interference. The settings of the simulated sound field were similar to that in the aforementioned psychological experiment . The collected brainwave data was analyzed and processed by NI LabVIEW software. The setup of the instrumental diagram is shown in Figure 1. During the brainwave experiments, the subjects had to be relaxed while paying close attention to the sound stimuli. Brainwaves are extremely sensitive to any incoming stimuli or stress. For the purpose of this study, a relaxed state but one also focused on environmental variations was considered the best condition for the subjects during the brainwave recording process. For the recordings, periods of blinking had to be disregarded. Thus, a monitor was set up in the anechoic chamber to identify these periods, and these sections were later removed from the recordings.
3. Empirical results and discussion
3.1 Influence of speech intelligibility on brainwaves
Monosyllabic speech sounds had a major effect on both α-waves (F = 12.96 (9, 2488), p < 0.001) and β-waves (F(9, 2488) = 5.21, p < 0.001) at different first reflection delay times. As shown in Figures 6 and 7, the ACF of β-waves recorded in the left hemisphere was positively correlated with subjective perceptions of speech intelligibility. With regard to α-waves in the left hemisphere, brainwave responses tended to increase at 100 ms for all sounds apart from “tzuen1.” It is not clear whether these results were related to the nasal sounds “uen.” The psychological experiment results (Figure 2) showed that the lowest articulation rates were observed for four sounds at 100 ms reflection delays. An opposite tendency was detected in the articulation rate results related to “yu2,” “he4,” and “ian1.” The 100 ms delay is close to the 135 ms slow response delay proposed in Ando’s  study on sound field preferences (echo disturbance). The displeased response of α-waves to the delay time of reflection  requires further investigation.
Figure 7 shows changes in β-waves. Consistent results were obtained with regard to the influence of the delay time of reflection on the left hemisphere (F(9, 2488) = 5.21, p < 0.001). However, no significant differences were observed in psychological reactions to speech intelligibility. A significant relation between articulation rates and the order of reactions was detected in the mean values related to the right hemisphere. Thus, left hemisphere showed changes in β-waves in relation to the order of delay time on reflection but speech intelligibility reactions.
3.2 Changes in subjective perception of ASW and brainwaves
The findings related to SVR to evoked potentials of nine participants are shown in Figure 8. With regard to the left hemisphere, SVR relative amplitude were consistently and inversely related to quantified psychological scale values (F(1, 16) = 4.90, p < 0.05). However, clear results were difficult to obtain due to the small difference between ASW (−0.16) and ASW (0.03).
Latency changes in the left and right hemispheres indicated the presence of a significant difference between ASW (0.03) and ASW (0.45) only at N2 in the left hemisphere (F(1, 16) = 11.09, p < 0.05). The tendency of ASW (0.45) latency being smaller than ASW (0.03) latency in the left hemisphere can be seen from Figure 9, whereas in the right hemisphere, ASW (0.45) latency was consistently larger than ASW (0.03) latency. The results showed that relative amplitude in the left hemisphere were caused by subjective perceptions of ASW, which influenced the participants’ preference toward a sound field. The consistency of latency at N2 was due to the activation of neural sites, which was clearly observed between ASW (0.03) and ASW (0.45), as well as at IACCE3 of 0.56–0.68. Thus, the brain did not have a major effect on the corresponding changes at the extreme IACCE3 values of 0.35 and 0.81, which corresponded to the psychological reaction results presented in Figure 3.
The arrangements and results of the aforementioned brainwave experiments indicated that when simple physical changes in a sound field and complex psychological feedbacks affect cerebral brainwave reactions, the correspondence of the cerebral specialization theory with the results becomes very complicated. In general, in this study, the left hemisphere tended to be activated in both temporal and spatial aspects based on the sound field. When the participants’ brainwaves were recorded during the judgment task, the brain activation in the right hemisphere tended to reflect the discriminated object more closely. When CBW were observed during research on speech intelligibility, the left hemisphere showed clear reactions to the first reflection delay time of sound field (Figure 7). However, the degree of speech intelligibility is a reflection of the complex thinking process that occurs in the right hemisphere (cerebral feedback). This phenomenon was supported by the subjective ASW experiment. With regard to changes in spatial factors, the left hemisphere received information about sound field changes when the IACCE3 value changed. ASW changes between (ASW (0.03) and ASW (0.45)), which were more evident in the right hemisphere, affected both right and left hemispheres. They are coherent while the N2 latency of SVR significantly prolonged in both left and right hemispheres under changes of subjective diffuseness in IACCE3 found by Ando et al. . Different sites are activated by brainwaves during focused and ambient use of the brain.
Cerebral specialization has been reported to be determined by focused conscious decisions. For instance, Floel et al.  conducted a spatial—visual focus experiment and used a Doppler ultrasound system and magnetic resonance imaging (MRI) equipment to observe the brain reactions of right-handed participants; the researchers found that both spatial recognition and speech functions were activated in the right hemisphere, which corresponded to clinical experiment results.
Nevertheless, for CBW researches, we conclude that α-waves (8–12 Hz) mainly responds to the emotional reactions; β-waves (13–30 Hz) reacts to the auditory matter drift (Figures 6 and 7). But the left hemisphere leads focus or attention on the varying of situational conditions (Figures 8 and 9), and the right one blends with imaginable feeling and experience. Hemispheric specialization has to pay attention to the conditioned response, conscientious and careful detail to setup each brainwaves’ experiment.
I would like to express my thanks to the graduated students, Yong-Shang Chen and Qi-Wen Lin who have, in brainwaves’ experiments, helped me in the course of preparing this study. In particular, I wish to thank Professor Em. Yoichi Ando, who kindly gave me directions on my brain research and data analysis methodology. I am also indebted to the Ministry of Science and Technology Taiwan, for their 2 years period (2007 and 2012) of financial support to complete this research. Meanwhile, the application conformed to academic ethics for the conduct of research. Moreover, during the brainwave experiments, the subjects were assured of their safety and told the procedure was non-invasive. Special thanks are due to my many colleagues for their participation in the experiments involving the subjective judgments and the brainwaves’ recordings.
Glossary of symbols
apparent source width, a sound perception of the subjective diffuseness occurred from beginning to 80 ms of stimulus delay gap between direct and first reflection in a defuse sound field binaural initial (<80 ms) interaural cross-correlation function autocorrelation function effective delay of autocorrelation function (ACF) continuous brainwave, a term to distinguish from an evoked potential (EP) or evoked response within EEG slow vertex response, an evoked potential is a direct result after a specific sensory stimulus in the period of 10–500 ms listener envelopment, a sound perception of the subjective diffuseness occurred after 80 ms of stimulus semantic differences, a method of questionnaire employed the scale of responses caused by a psychological affection example of a monosyllable in Taiwanese’s life speech vertical angles at a median plane, 0° started from the front of head at ear height angles at clockwise horizontal plane, 0° started from the front of head at ear height sound pressure level measured by a sound level meter in a fast time-weighting mode auditory evoked potential percentage syllable articulation
apparent source width, a sound perception of the subjective diffuseness occurred from beginning to 80 ms of stimulus
delay gap between direct and first reflection in a defuse sound field
binaural initial (<80 ms) interaural cross-correlation function
effective delay of autocorrelation function (ACF)
continuous brainwave, a term to distinguish from an evoked potential (EP) or evoked response within EEG
slow vertex response, an evoked potential is a direct result after a specific sensory stimulus in the period of 10–500 ms
listener envelopment, a sound perception of the subjective diffuseness occurred after 80 ms of stimulus
semantic differences, a method of questionnaire employed the scale of responses caused by a psychological affection
example of a monosyllable in Taiwanese’s life speech
vertical angles at a median plane, 0° started from the front of head at ear height
angles at clockwise horizontal plane, 0° started from the front of head at ear height
sound pressure level measured by a sound level meter in a fast time-weighting mode
auditory evoked potential
percentage syllable articulation