Wavelet Based Speech Strategy in Cochlear Implant

A significant percentage of the populations in developed countries encounter hearing impairment. Cochlear Implant was developed to increase the hearing capacity of these people. In recent years, adults and children have benefited from the usage of Cochlear Implant and the users were positively affected by the improvement of implant techniques. Although these devices allow for increased performance, a significant gap still remains in speech recognition between Cochlear Implant user and people, who possess normal listening capabilities.


Introduction
A significant percentage of the populations in developed countries encounter hearing impairment. Cochlear Implant was developed to increase the hearing capacity of these people. In recent years, adults and children have benefited from the usage of Cochlear Implant and the users were positively affected by the improvement of implant techniques. Although these devices allow for increased performance, a significant gap still remains in speech recognition between Cochlear Implant user and people, who possess normal listening capabilities.
The Cochlear implant prosthesis, works with direct stimulation of the auditory nerve cells of deaf people, whose recipient cells in their cochlea were destroyed (House&Berliner, 1982;House&Urban, 1979;Christiansen&Leigh, 2002;Loeb, 1990;Parkins&Anderson, 1983;Wilson, 1993). The system basically consists of the following sections; microphone, speech processor, transmitter, receiver and electrode array. Mainly, implant stimulation systems, signal processing strategies and techniques in increasing the suitability of the instrument to various patients were the improved upon this prosthesis. However, new developments in this area also exist on the configuration of hearing function, especially signal processing strategies.
Speech processing techniques are very important in increasing the users' hearing potential (Eddington, 1980;Wilson et al., 1991;Dormon et al., 1997;Moore&Teagle, 2002). A variety of strategies were developed in the recent years with the aim to improve the hearing abilities of deaf people and take these abilities closer to those of people with natural hearing. Of these, various speech processing strategies were developed for multi-channel cochlear implants (Derbel et al., 1994;Cheikhrouhou et al., 2004;Gopalakrishna et al., 2010;Millar et al., 1984). These strategies can be classified mainly in three parts; waveform strategies, feature extraction strategies and hybrid strategy (N-of-M strategy).
Wavelet method is a basic method that is used for noise filtering, compression and analysis of nonstationary signals and images. The wavelet transform is an appropriate method for semistationary signals and provides a good resolution in both time and frequency. Several studies were carried out on the use of the wavelet transform for speech processing. The wavelet transform gives better results than traditional methods in improving speech. The wavelet packet transform is a type of the discrete wavelet transform that allows for subband analysis in the second decomposition without any constraints. Basically, the wavelet packet

Strategies for cochlear implants
Multi-channel implants provide electrical stimulation in the cochlea using an array of electrodes. An electrode array is used so that different auditory nerve fibers can be stimulated at different places in the cochlea. Electrodes respond for each frequency of the signal and hair cells. The ones near the base of the cochlea are stimulated with high frequency signals while electrodes near the apex are stimulated with low frequency signals.
The various signal processing strategies developed for multi-channel cochlea can be examined under two main categories: waveform strategies and feature-extraction strategies. These strategies extract the speech information from the speech signal and redeliver it to the electrods. The waveform strategies use some type of waveform (in analog or pulsatile form) derived by filtering the speech signal into different frequency bands. The feature extraction strategies use some type of spectral features, such as formants, derived by feature extraction algorithms.
There are various parameters that present the acoustic signal information to the electrodes in these signal processing strategies. The first parameter is the number of electrodes used for stimulus that decides the frequency resolution. Mostly, it is used as 16-22 electrodes for stimulation. This also depends on receivers of the individual cochlear implant relating to neuron population distribution. The second parameter is the electrode configuration. Different configurations are controled since the electric current distributes electrodes symmetrically. The most important parameter is the amplitude of the electric current which is constructed using some envelope detection algorithms from the filtered waveform. It is controlled by the loudness level of the stimulation that could be comprehended. The electric current amplitude includes spectral information generated from the time varying current amplitude levels on each electrode and on different electrodes stimulated in the same cycle.
The main strategies are discussed below.

Compressed analog approach
The strategy of compressed analog approach has been developed mainly by Symbion Company (Edington, 1980). In this system, first, the audio signal is compressed using the www.intechopen.com Wavelet Based Speech Strategy in Cochlear Implant 41 automatic gain control unit. Later the signal is filtered with four side by side frequency bands at the center frequencies of 0.5, 1, 2 and 3.4 kHz. Then, the filtered waveforms are changed into stimulation format and sent to four electrodes that were placed into the cochlea by surgical intervention (Dorman et al., 1989).
The compressed analog approach gives useful spectral information to the electrodes. But the channel interaction causes problems in the compressed analog approach. Since the stimulation is analog, the stimulus is transmitted continuously to the four electrodes at the same time. The simultaneous stimulation causes channel interaction and can negatively affect the performance of the device.

Continuous interleaved sampling
The continuous interleaved sampling approach was developed by researchers of the Research Triangle Institute (Wilson et al., 1991). In this strategy, the signal is sent through the electrode not by stimulation but by interleaved strokes. Here, the amplitude of the pulse is derived from the envelope of bandpass filter. Later, this resulting envelope is compressed and used for modulation of two-phase pulses. The patient's hearing unit is electronically excited. Non-linear compression function (i.e. logarithmic function) is used to ensure that the envelope output is suitable for the patient's dynamic range. A block diagram of the continuous interleaved strategy is shown in Figure 1.  (Loizou, 1998)

N-of-M (N-M) speech processing
In N-M strategy, the audio signal is divided into m frequency bands; and the processor selects n number of the highest-energy envelope outputs (see in Figure 2). Only the electrodes corresponding to the selected n outputs are stimulated at each cycle (Nogueira et al., 2006). For example; in a strategy of 22-6, only 6 of the 22 channel outputs are selected, thus only these 6 selected channels are stimulated. N-M strategy is a hybrid strategy since it also includes feature representation. A general block diagram showing the feature inference for N-M system is given in Figure 3.  (Loizou, 1998)

Wavelet transform
A time signal can be evaluated by a series of coefficients, based on an analysis function. For example, a signal can be transformed from time domain to frequency domain. The oldest and best known method for this is the Fourier transform. Joseph Fourier developed his method that represents signal contents by using basis functions in 1807. Based on this work the wavelet theory was developed by Alfred Haar in 1909. In 1930's, Paul Levy improved Haar basis function using scale varying. In 1981, a transformation method of decomposing a signal into wavelet coefficients and reconstructing the original signal form these coefficients was found by Jean Morlet and Alex Grossman. And Stephane Mallat and Yves Meyer derived multiresolution decomposition using wavelets. Later, Ingrid Daubechies developed a new wavelet analysis method to construct her own family of wavelets using the multiresolution theory. The set of wavelet orthonormal basis function based on Daubechies' work is the milestone of wavelet applications today. With these developments, theoretical investigations of wavelet analysis began to accrue. (Merry, 2005).
Generally, the Fourier transform is an efficient transform for stationary and pseudo stationary signals. But this technique is not suitable for the nonstationary signals such as noisy and aperiodic signals. These signals can be analyzed using local transformation methods; the short time Fourier transform, time-frequency distributions and wavelets. All these techniques analyzed the signal using the correlation between original signal and analysis function.
Wavelet transform can be classified as; continuous wavelet transform, discrete wavelet transform and fast wavelet transform (Holschneider, 1989, Mallat, 1998Meyer, 1992). The wavelet transform applied to a wide range of use, subjects including signal and image processing, and biomedical signal processing. The most important advantage of the wavelet transform is that it allows for the local analysis of the signal. Also, wavelet analysis reveals such as discontinuities, corruptions etc. in a signal.
A a wavelet function, ψ(t) is a small wave. In wavelet function, the wavelet must be zero as soon as possible while still having oscillatory. Therefore, it includes different frequencies to  Figure 4. (Misiti et al., 2000).
The wavelet basis is a grid decomposition of the phase plane which is shown in Figure 5 (Pereyra&Mohlenkampy, 2004).
where E is the energy of the wavelet function. This means that the energy of the analyzing function is equal to the integrated square magnitude of (t) and it must have a finite value.
If is the Fourier transform of (t), then the following condition must satisfy the so called the admissibility condition.
This condition indicates that the wavelet has no zero frequency components. The mean of wavelet must equal zero. This is known as the admissibility condition.

Continuous wavelet transform
As a family of functions, wavelets can be reconstructed from translation and dilation of a single wavelet (t). Here, b is the translation parameter and a is the scale parameter. It is called the mother wavelet and is defined as The integral transformation of , is given by (Polikar, 1999) , = , * where b is translation parameter, a is dilation parameter and * indicates the complex conjugate which is used in a complex wavelet. This equation , is called a continuous wavelet transform of f(t). The energy of the signal is normalized by dividing wavelet coefficients with 1/ | | at each scale. This enables the wavelets to have the same energy for each scale. When scale parameter is changed at each scale, the center frequency of wavelet and the window length are also changed. Therefore scale is used instead of frequency for the wavelet analysis. The translation parameter identifies the location of the wavelet function in time. Change in translation parameter shows shifted wavelet over the signal. In time-scale domain, the rows are filled for constant scale and varying translation. Similarly, the columns are filled for constant translation and varying scale. The coefficients X a, b are called wavelet coefficients and are associated with a scale in frequency domain and a time in time domain.
The inverse continuous wavelet transform can also be defined as Note that the admissibility constant C must adhere to the second wavelet condition.
A wavelet function has a center frequency and it changes inversely to this frequency. Small scale change indicates that high frequencies corresponding to the scale include detailed information of the signal. In contrast, a large scale corresponds to a low frequency and gives coarser information of the signal. In wavelet transform, Heisenberg inequality must be satisfied. Therefore the bandwidth of time and scale ΔtΔω is constant and bounded. When the scale is decreased the time resolution Δt will increase. This indicates that the frequency resolution Δω is proportional to the frequency ω. Consequently, the wavelet transform has a constant relative frequency resolution (Debnath, 2002).
The continuous wavelet transform is a linear transformation like the Fourier transform. It accomplishes the discrete values of scale and translate parameters. The resulting coefficients are called wavelet series. Later, the discretization of coefficients can be done arbitrarily, but reconstruction is required.

Discrete wavelet transform
There are two ways to introduce the discrete wavelet. One is the discretization of the continuous wavelet transform and the other is through multiresolution analysis.
In many applications, the data is represented by a finite sequence, so it is important to analyze the discrete form of the continuous wavelet transform. From a mathematical view, a continuous notation of two continuous parameters a and b can be converted into the discrete form. Then, two positive constants a 0 and b 0 are defined as (Strang&Nguyen, 1997).
It is noted that the discrete wavelet transform can be derived directly from the corresponding continuous version by using = and b= . Therefore, the discrete wavelet transform gives a function of a finite set of the wavelet coefficients in the time-scale domain that was indexed m and n.
The aim of the multiresolution analysis is to represent a signal as a limit of consecutive approximations. These correspond to different levels of resolutions. In multiresolution analysis, the orthogonal wavelet bases are constructed using a definite set of rules, where a multiresolution analysis is a sequence of closed space V if the following conditions hold.

1.
⊂⋯⊂ There exists a scaling function ϕ ∈ V such that ϕ t-k ∈ is an orthonormal basis of V .
The function is called the scaling function. If is a multiresolution in ℝ and is the closed subspace, then ϕ produces the multiresolution analysis. Therefore the scaling function is used to approximate the signal up to a particular level of detail.
A family of scaling function can be constructed via shifts and power of two stretches given by the mother scaling function. , = ⁄ − , ℤ As with scaling function, wavelet is defined as In multiresolution analysis, is a lowpass filter and is a highpass filter. provides localization in time and frequency.
The discrete wavelet transform was first given by Mallat. For detailed information, the reader is also referred to Mallat (Mallat, 1989).

Fast wavelet transform
The fast wavelet transform is a method to compute the discrete wavelet transform just like the fast Fourier transform which computes the discrete Fourier transform.
In the discrete Fourier transform, the transform can be made fast because the transformation is represented as a product of sparse elementary matrices. Hovewer, the transformation matrix is orthogonal and its inverse is equal to its transpose. This matrix is made orthogonal by choosing the unit vectors as basis in time domain and the exponential terms as basis in the frequency domain. If the fast wavelet transform is generated from the discrete wavelet transform, Similar observation holds. The detailed information can be found in Mallat's book (Mallat, 1998).

Wavelet packet transform
The wavelet packet transform is based on the traditional discrete wavelet transform in terms of both approach and analysis of detail coefficients. With this method, better resolution can be obtained for both time and frequency depending on the contents of the data. It was first proposed by Coifman, Meyer and Wickerhauser .
The wavelet transform is signal-dependent since the decomposition of the signal is performed by taking the best set of functions from the basis. The basis is determined by selecting the second filter bank tree structure, from which conversion coefficients are obtained (Coifman&Wickerhauser, 1990. Therefore, application of the separation process is simple in terms of ease of calculation. The wavelet packet analysis for a time series can be summarized as follows (Mallat, 1989).

A space V of a multiresolution analysis in
ℝ is analyzed in a lower resolution space and a detail space of is added. Dividing the orthogonal basis ∅ tto the new orthogonal basis constitutes ∅ tof V and ψ tof The decompositions of and ψ are denoted by a pair of conjugate mirror filters, ℎ[ ] and g [n]. The relation between ℎ[ ] and [ ] is given as (Herley&Vetterly, 1994) Any node of the binary tree is labeled by (j, k), where j-L 0, j is the depth of the node on the tree, and k is the number of nodes. A space allowing an orthonormal basis − is associated to each node (j, k) when going down the tree. At the root, it has W =V and ψ =ϕ . The wavelet packet orthogonal bases at the nodes are defined by According to this, the filter functions are derived as and where ψ tis orthonormal.
Because of the filtering operation in the wavelet spaces, the phase space can participate in various ways. Any choice of decompositions gives the wavelet packet decomposition. This procedure is given in Figure 6 and phase plane is shown in Figure 7. Consequently, coefficients of decomposition and reconstruction are To sub sampling of the convolution of d with ℎ and , the coefficients must be obtained. By the iteration of these equations, all the branches of the tree are computed by the wavelet packet coefficients. This is given in Figure 8 and Figure 9. Best tree function is a one-or two-dimensional wavelet packet analysis function that computes the optimal sub tree of an initial tree with respect to an entropy type criterion (Coifman&Wickerhauser, 1992). The resulting tree may be much smaller than the initial one. Following the organization of the wavelet packets library, it is natural to count the decompositions issued from a given orthogonal wavelet. A signal of length N = 2L can be expanded in α different ways, where α is the number of binary sub trees of a complete binary tree of depth L, where ≥ . This number may be very large, and since explicit enumeration is generally intractable, it is interesting to find an optimal decomposition with respect to a convenient criterion, computable by an efficient algorithm (Donoho, 1995;Guo et al., 2000;Johnstone, 1997).
The difference between the discrete wavelet transform and the wavelet packet transform is in the decomposition of detail space. The wavelet packet transform decomposes not only the approximation space but also the detail space. This means that it can separate frequency band uniformly. Figure 10 and Figure 11 show 2-level analysis and synthesis part of the discrete wavelet transform for comparison.

Wavelet based speech strategy
The proposed speech processing strategy is based on wavelet packet transform. The strategy consists of five basic parts. Since the basis for the selection of electrode is frequency selection, it is improved by the use of the wavelet packet transform. The main wavelet function is experimentally selected as Daubechies 10. It is analyzed till level 8. Hanning window is used, to prevent short-term changes of the signal in windowing. A block diagram showing the proposed strategy is given in Figure 12. The channel outputs of the matching function is determined by finding the best tree in the matching block. The matching function shows the relationship between electrodes and output nodes of the wavelet packet transform. In this study the number of electrodes used to stimulate the cochlea is 22. The frequency position function is used to calculate channel frequency bands for cochlea. The selection of electrodes is the same as that of N-of-M model. 6 electrodes are selected for 22 channel electrodes. Only these 6 electrodes, with the highest amplitude, are analyzed using the wavelet packet transform. In the new speech processing method, the best tree is determined from a block, which is structured to eliminate noise and unwanted elements of the sound signal. Thus the new output electrodes are determined with less error than those of N-of-M strategy. Moreover better transmittance is obtained by minimizing the interference between neighboring channels.
The channel outputs of the matching function is determined by finding the best tree in the matching block. The used matching function is described as Here, E k shows the electrode output, and M shows the matching function. , is the wavelet coefficients. The matching function shows the relationship between electrodes and output nodes of the wavelet packet transform.
In this study, 22 electrodes are used for stimulation and the frequency position function of is used to calculate channel frequency bands for cochlea (Greenwood, 1990). Here; f shows the frequency in Hz unit, while x describes the length ratio of base from 0 to 1. 'A' and 'a' are constant and their values are 156,4 and 2,1 respectively. The selection of electrodes is the same as that of N-M model. In this study, 6 electrodes are selected from 22 channel electrodes and only these 6 electrodes, which have the highest amplitude, are analyzed using the wavelet packet transform.
The proposed new strategy is essentially based on N-M strategy. The input waveform is given in Figure 13. In addition, the output waveforms for the proposed strategy and the N-M strategy are given in Figure 14 and Figure 15, showing that the input and output waveforms are very similar.
As shown from the graphs, traditional N-M strategy removes some high frequency components that are between 25 ms and 75 ms at wideband spectrogram, high frequency components are very important for intelligibility and consonant recognition such as 's', 'ş', 'f', and such. New method keeps high frequency components using the wavelet packet transform because the wavelet packet transform analyses high frequency components as well as low frequency components. Another effect is mother wavelet selection; and experimentally Daubechies 10 is found to be more effective in high frequency analysis. www.intechopen.com 'Determine optimum tree' block eliminates noise and unnecessary components in speech signal. Therefore, it obtains better result than N-M strategy for electrode selection. New strategy output channels are more accurate than N-M strategy and conduces to reduce interaction between neighbour channels.

Experimental study
The experimental study was carried out on 20 healthy subjects between the ages of 23 and 30. The main language of all the subjects is Turkish and they have obtained air conduction thresholds better than 20 dB at octave frequencies ranging from 250 to 6000 Hz bilaterally.
Since the subjects would listen to the words in both algorithms and the possibility exists that the subjects could memorize the given words and their order, separate word lists were arranged for the new strategy, and the N-of-M strategy. The arrangements were attained consultation with a Turkish Language specialist and by referring to The Turkish Language Dictionary by Turkish Language Association. All words were balanced in terms of speech knowledge and degree of difficulty at both of the lists. The usage frequencies of vowels and consonants in the lists were determined according to Turkish grammar. In addition, the lists were recorded in the silent rooms of Ege University, the Department of Otology, and the doctors of the Department approved the reader's sound levels.
The test subjects listened to the lists from a microphone that was directly connected to their heads. While listening, they were asked to write the words they heard and leave a space for the words they could not comprehend within a table format in the listening order. The intelligibility percentage is calculated as follows. The results of intelligibility test are given in Table 1 for both the N-of-M Strategy and the Proposed Speech Processing Strategy. The intelligibility percent of the proposed wavelet packet transform based strategy is higher than that of N-of-M strategy. The average percentages of intelligibility according to gender are also given in Figure 16. To test the noise resistance of the proposed strategy, SNR improvement test was applied to the system. This test was performed by adding 5 dB of pink noise, F-16 noise, Volvo noise and factory noise respectively to the recorded data. All the added noise is real noise recorded as indicated by their names. Later, both the proposed method and the N-of-M www.intechopen.com method were separately applied to this data. The results were evaluated by looking at the SNR given in Figure 17. Consequently, the proposed method gives better results for each type of added noise.

Conclusion
In this study, a new speech processing strategy, based on the wavelet packet transform, is proposed for Cochlear implant applications. The foundation of the system lies in, first, obtaining the highest-energy coefficients, and then stimulating the linked electrodes and therefore improving the deaf patient's hearing ability.
The core of the system is based on the wavelet packet transform and it also uses the energy of the wavelet coefficients. By the application of various tests, the effect of intelligibility and noise resistance for the suggested speech processing method was investigated. Then, a new electrode selection algorithm, which depends on wavelet entropy distribution, was presented. The proposed electrode selection increases the noise performance and intelligibility. Additionally, the performance of the proposed method is better than the traditional and recently published methods.
In this study, the system was tested in terms of intelligibility; besides, SNR results were compared with those of the N-of-M Strategy. As a result, the proposed method was observed to increase performance in terms of intelligibility.
In the part of the wavelet packet transform, the determination of the optimum tree using the best tree functions is a significant part of this study because this part eliminates noise and unnecessary components from the speech signal. It also helps to improve intelligibility of speech in noisy environments.
During the live experiments session, people with normal hearing are used and also all results are based on only normal-hearing people. A further study on patients who use cochlear implant would give more accurate results for intelligibility.
For future works, hybrid mother wavelet in wavelet decomposition process is recommended. In this hybrid model, Daubechies family can be used for low-pass filter decomposition and Symlet family for high-pass filter decomposition. Moreover, the mother wavelet for deciding the choice of wavelet family can be selected according to speech signal characteristics in real time. This might give better results for speech intelligibility. Additionally, the bionic wavelet instead of the wavelet packet transform can be suggested in entire speech processing (Yuan, 2003).

Acknowledgment
We would like to specially thank the Department of Otology in Ege University and Yahya Ozturk for their assistance in this work.