Improvement on Sound Quality of the Body Conducted Speech from Optical Fiber Bragg Grating Microphone

© 2012 Nakayama et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Improvement on Sound Quality of the Body Conducted Speech from Optical Fiber Bragg Grating Microphone

between a patient and an operator in a Magnetic Resonance Imaging (MRI) room which has a noisy sound environment with a strong magnetic field (A. Moelker et al., 2005).Conventional microphone such as an accelerometer composed of magnetic materials are not allowed in this environment, which requires a special microphone made of non-magnetic material.
For this environment the authors proposed a speech communication system that uses a BCS microphone with an optical fiber bragg grating (OFBG microphone) (M.Nakayama et al., 2011).It is composed of only non-magnetic materials, is suitable for the environment and should provide clear signals using our retrieval method.Previous research using an OFBG microphone demonstrated the effectiveness and performance of signal extraction in an MRI room.Its performance of speech recognition was evaluated using an acoustic model constructed with unspecified normal speech (M.Nakayama et al., 2011).It is concluded that an OFBG microphone can produce a clear signal with an improved performance compared to an acoustic model made by unspecified speeches.The original signal of an OFBG microphone enabled conversation however some stress was felt because its signal was low in sound quality.Therefore one of the research aims is to improve the quality with our retrieval method which used differential acceleration and noise reduction methods.
In this chapter, it will be shown in experiments and discussions for the body-conducted speeches with the method which is measured with an accelerometer and an OFBG microphone, as one of topics is a state-of-the-art in the research field of signal extraction under noisy environment.Especially, it is mainly investigated in evaluations of the microphones, signal retrievals with the method and applying the method to a signal in sentence unit long for estimating and recovering of sound qualities.

Conventional body-conducted speech microphone
Speech as air-conducted sound is easily affected by surrounding noise.In contrast, bodyconducted speech is solid-propagated sound and thus less affected by noise.A word is uttered by a 20-year-old male in a quiet room.Table 1 details the recording environments for microphone and acclerometer emploied in this research.Speech is measured 30 cm from the mouth using a microphone, and body-conducted speech is extracted from the upper lip using the accelerometer as conventional microphone which is shown in Figure 1.This microphone position is that commonly used for the speech input of a car navigation system.The upper lip, as a signal-extraction position, provides the best cepstral coefficients as feature parameters for speech recognition (S.Ishimitsu et al., 2004).Figures 2 and 3 show uttered words "Asahi" in quiet room, taken from the JEIDA database, which contains 100 local place names (S.Itahashi, 1991).Speech is measured a cleary signal in frequency characteristics however body-conducted speech lacks high-frequency components above 2 kHz.So the performance is reduced when the signal is used for the recognition directory.

Optical Fiber Bragg Grating microphone
To extend testing to scenarios such as that in which noise sound is generated with strong magnetic field, in communications between a patient and an operator in an MRI room, an OFBG microphone is employed to record body-conducted speech there because it can measure a clearer signal than an accelerometer and be used in an environment with a strong magnetic field.It is examined the effectiveness of the microphone in an MRI room in which a magnetic field is produced by an open-type magnetic resonance imaging system.Tables 2  and 3 detail the recording environments for OFBG microphone which is shown in Figure 4. Noise levels in the room did not measure at the recording point such as the mouth of the speaker because a sound-level meter did not permit into the room since it composed from magnetic materials.Therefore, the noise level is measured at the entrance of the room, and consequently may be higher than the noise level at the signal recording point; the noise level is given in Table 2. Owing to patient discomfort during the recordings, only 20 words and 5 sentences were recorded in the room where a scene is shown in Figure 5. Figure 6 shows the body-conducted speech recorded from the OFBG microphone in the room when activated a MRI.Compared the signal with conventional BCS, it is clearer than that for body-conducted speech measured by accelerometer because characteristics of frequencies above 2 kHz can be found.
Improvement on Sound Quality of the Body Conducted Speech from Optical Fiber Bragg Grating Microphone 181

Speech recognition with OFBG microphone
The quality of the signal recorded with the OFBG microphone, is higher than the quality of BCS recorded with accelerometer.Generally, the quality of speech sound is evaluated by the mean opinion score from 1 to 5 however this requires much evaluation data to achieve adequate significance levels.For the reason, it is evaluated the sound quality through speech recognition using acoustic models estimated with the speech of unspecified speakers as results of recognition performances.In speech recognition, the best candidate is chosen and decided by likelihoods derived from acoustic models and feature parameters such as cepstral parameters, which are calculated from the recorded speech (D. Li, andD. O'Shaughnessy, 2003) (L. Rabiner, 1993).As a result, the recognition performances and likelihoods are statistical results since human errors and other factors are not considered.

Experimental conditions
Table 4 shows the experimental conditions for isolated word recognition in speech recognition.The experiment employs the Julius, speech recognition decoder, which is a large-vocabulary continuous-speech recognition system for Japanese language (T.Kawahara et al., 1999) (A. Lee et al., 2001).The decoder requires a dictionary, acoustic models and language models.The dictionary describes connections of sub-words in each word, such as phonemes and syllables, which are the acoustic models.Language models give the probability for a present word given a former word in corpora.The purpose of the experiment is only the evaluation of the clarity or the similarity of signals and acoustic models.Since language models are not required in this experiment, Julian version 3.4.2 is used for isolated-word recognition especially.Thus, the experiments are used the same acoustic models estimated by HTK with JNAS to evaluate closeness of signals when highest recognition performance is achieved (S.Young et al., 2000) (K.Itou et al, 1999).

Experimental results
Table 5 shows recognition results of isolated word recognition in each data set, and

Improvement on sound quality of body-conducted speech in word unit
The OFBG microphone can measure a high quality signal compared to a BCS of an accelerometer.To realize conversations without stress, signals with improved in sound qualities are required.Consequently, one of aims in the research is to invent and examine a method for improving sound quality.Many researchers and researches which are already introduced in the chapter of introduction, are unaware that a BCS does not have frequency components 2 kHz and higher.Mindful of this condition, conventional retrieval methods for BCS that need the speech and its parameters are proposed and investigated, however speech is not measured easily in noisy environments.Therefore a signal retrieval method for a BCS only performs well with itself.In realizing this progressive idea, the method is invented a signal retrieval method without speech and the other parameters because effective frequency components in signals over 2 kHz are found however there contains very low gains.

Differential acceleration
Formula (1) shows an equation for estimating using the differential acceleration from the original BCS.

  () ( 1 ) ()
xdifferential(i) is the differential acceleration signal that is calculated from each frame of a BCS.Because of low gains in its amplitude, it requires adjusting to a suitable level for hearing or processing.Figure 7 shows a differential acceleration estimated from Figure 6 using Formula (1), with the adjusted gain.It seems that the differential acceleration signal is composed of speech mixed with stationary noise, so we expected to be able to remove it completely with the noise reduction method because the signal has a high SNR compared to the original signal.Consequently, it is proposed the signal estimation method using differential acceleration and a conventional noise reduction method (M.Nakayama et al., 2011).

Noise reduction method
As a first approach to noise reduction, it is examined the effectiveness of a spectral subtraction method for the reduction of stationary noise.However, improvements in performances for the frequency components is inadequated with this approach.The noise spectrum is simply subtracted by a spectral subtraction method, so a Wiener-filtering method is expected to estimate the spectrum envelope of speech using linear prediction coefficients.Therefore, it is tried to extract a clear signal using the Wiener-filtering method, which could estimate and obtain the effective frequency components from noisy speech.Formula (2) shows the equation used for the Wiener-filtering method.
An estimated spectrum HEstimate(ω) can be converted to a retrieval signal from the differential acceleration signal.It can be calculated from the speech spectrum HSpeech(ω) and noise spectrum HNoise(ω).In particular, HSpeech(ω) is calculated with autocorrelation functions and linear prediction coefficients using a Levinson-Durbin algorithm (J.Durbin, 1960), and HNoise(ω) is then estimated using autocorrelation functions.

Evaluations
Signal retrieval for a signal measured by an OFBG microphone is performed using the same parameters in the method because a propagation path of body-conducted speech in a human body is not affected by either quiet or noisy environments.Figure 8 shows a retrieval signal from Figure 7 using a Wiener-filtering method where the linear prediction coefficients and autocorrelation functions are 1 and the frame width is 764 samples.These procedures were repeated five times on a signal to remove a stationary noise.From a retrieval signal, high frequency components from 2 kHz and above were recovered with these settings.This proposed method could also be applied to obtain a clear signal from body-conducted speech measured with OFBG microphone in noisy sound and high magnetic field environment.

Improvement on sound quality of body-conducted speech in sentence unit
The effectiveness of signal retrieval for body-conducted speech in word unit measured by an accelerometer and an OFBG microphone has been demonstrated at former sections.However the effectiveness of body-conducted speech in word unit is proven, signals in sentence unit need to be examined for practical use such as conversations in the noisy environment.Though the investigation for the sentence unit is an important evaluation, so it could revolutionize speech communications in the environment.As a first step in signal retrieval for sentence unit, the method adopts the method to signals in word unit because the transfer function between the microphone and sound source seems to change little whether word or sentence unit, and is examined a body-conducted speech in sentence unit directly measured by an accelerometer and an OFBG microphone.

Body-conducted speech from an accelerometer
In experiments on signal retrieval using an accelerometer, speech and body-conducted speech were measured in a quiet room of our laboratory and engine room of the training ship at the Oshima National College of Maritime Technology, where there is noisy environments with working a main engine and two generator, are shown Figures 9 (a) and (b).The recording environment is also used Table 1, however the speaker who uttered a word differs from a speaker in a former section.Noise within the engine room, under the two conditions of anchorage and cruising, were 93 and 98 dB SPL, respectively, and the SNR measurements from microphone.There was -20 and -25 dB SNR, respectively.In this research, the signal is experimented under cruising condition to estimate retrieval signals.
A 22-year-old male uttered A01 sentence from the ATR503 sentence database, and the sentence is a commonly used sentence in speech recognition and application (M.Abe et al., 1991).And the sentence is composed of the followings in sub-word of mora.Figures 10 and 11 show a speech and a body-conducted speech in sentence unit measured by a conventional microphone and accelerometer in a quiet room when a 22 years-old male uttered the sentence.Although the accelerometer is held with fingers, sounds are measured clearly because it was firmly held to the upper lip with a suitable pressure.Figure 12 shows a differential acceleration from Figure 11, becomes clearly signal with little noise because the BCS is high SNR.
Figures 13 and 14 show a speech and a body-conducted speech in sentence unit in the noisy environment.Speech is completely swamped by the intense noise from the engine and generators.On the other hand, body-conducted speech in Figure 14 is affected a little by the noise but can be measured.Because SNR in Figure 14 has low gain, differential acceleration in Figure 15 is considered that the performance of signal retrieval is reduced.
Figure 16 shows the signal retrieval from the differential acceleration works well when the treated four times since the performance is sufficient to recover the frequency characteristics.As a result, it is concluded that body-conducted speech is as clear as possible without noise disturbance.

Body-conducted speech from OFBG microphone
The quality of the signal measured by the OFBG microphone in the noisy environment of an MRI room was investigated here.A speaker uttered the sentence A01 during the operation of MRI devices, such that there was an 81 dB SPL-noise environment.Although a sound level meter was not permitted in the room, so it is measured in front of the gate door in the room.Figure 17 shows the signal of the uttered sentence recorded by the OFBG microphone in the MRI room when MRI equipment was in operation.Since the signal is clear, it is expected that the frequency characteristics of the signal can be recovered employing the signal retrieval method.Figures 18 and 19 show the differential acceleration and retrieved signal from the OFBG microphone in the MRI room when the MRI equipment was in operation and the method treated three times.These figures confirm to improve in the sound quality of BCS in sentence, and it also concluded that the SNR in BCS is best when it has high level.

Conclusions and future works
This section presents improvements on sound quality of body-conducted speeches measured with an accelerometer and an OFBG microphone.Especially, an MRI room has heavy noisy sound and high magnetic field environment.The environment does not allow bringing accelerometer such as a conventional body-conducted speech microphone which is made from magnetic materials.For conversations and communications between a patient and an operator in the room, an OFBG microphone is proposed, which can measure clear signals compared to accelerometer.
And then, the performances of signal retrieval method in sentence with the microphones that are an accelerometer and an OFBG microphone were evaluated, and the effectiveness is confirmed with time-frequency analysis and speech recognition.From this background, it is investigated estimating clear body-conducted speech in sentence unit from an OFBG microphone with our signal retrieval method that used combined differential acceleration and noise reduction.Applying the method to the signal measured recovered which in sound quality that was evaluated using time-frequency analysis.Thus, its retrieval method can also be applied to a signal measured by an OFBG microphone with the same settings because its conduction path is not affected by the noise in the air.The signals were measured in quiet and noisy rooms, specifically an engine room and MRI room.The signals were clearly obtained employing the signal retrieval method and the same settings used for the word unit as a first step.To obtain a clearer signal with the signal retrieval method, the pressure at which the microphone is held is important, and the sounds have high SNR in original BCS.
As future works, it needs to extend the signal retrieval method for practical use and improvement of algorithm for advance.

Table 1 .
Improvement on Sound Quality of the Body Conducted Speech from Optical Fiber Bragg Grating Microphone 179 Recording environments for microphone and accelerometer

Table 4 .
Table6gives averages of recognition results in each speaker.The recognition results for the OFGB microphone are found to be superior to the recognition results for the conventional BCS microphone.The differences in isolated-word recognition rates are about 15% to 35% respectively.These results show the effectiveness of the OFBG microphone when is measured clearly signals with it.Experimental conditions for isolated word recognition

Table 5 .
Recognition results of isolated word recognition in each data set