Sensitivity and Positive Predictive Values for the SRBF and SSE methods applied on the clinical sounds set without and with additive Gaussian noise.
The advancement of technology has paved the way for signal processing methods to be implemented and applied in many simple tools useful in everyday life. This is most notable in the medical technology field where contributions involving the intelligent applications have boosted the quality of diagnosis. Proposing an objective signal processing methods able to extract relevant information from biosignals is a great challenge in telemedicine and auto-diagnosis fields.
For the cardiac system, many signals can be treated and monitored; ElectroCardioGram (ECG), PhonoCardioGram (PCG), Echo/Doppler and pressure monitor, see Figure 1.
The interest of this book chapter is the PCG signal. PCG and auscultation are noninvasive, low-cost and accurate for diagnosing some heart diseases.
The PCG signal confirms, and mostly, refines the auscultation data and provides further information about the acoustic activity concerning the chronology of the pathological signs in the cardiac cycle, by locating them with respect to the normal heart sounds. The cardiac sounds are by definition non-stationary signals, and are located within the low frequency range, approximately between 10 and 750 Hz.
The analysis of the cardiac sounds, solely based on the human ear, remains insufficient for a reliable diagnosis of cardiac pathologies, and for a clinician to obtain all the qualitative and quantitative information about cardiac activity especially in the field of time intervals.
Information, such as the temporal localization of the heart sounds, the number of their internal components, their frequency content, and the significance of diastolic and systolic murmurs, could all be studied directly on the PCG signal. In order to recognize and classify cardiovascular pathologies, advanced methods and techniques of signal processing and artificial intelligence will be used.
For that, different approaches could be considered for improve the electronic stethoscope:
Tool with embedded autonomous analysis, simple for home use by the general public for the purpose of auto-diagnosis, monitoring and warning in case of necessity.
Tool with sophisticated analysis (coupled to a PC, Bluetooth link) for the use of professionals in order to make an in-depth medical diagnosis and to train the medical students.
Whatever the approach, one of the first and most important phases in the analysis of heart sounds, is the segmentation of heart sounds. Heart sound segmentation partitions the PCG signals into cardiac cycles and further into S1 (first heart sound), systole, S2 (second heart sound) and diastole.
Identification of the two phases of the cardiac cycle and of the heart sounds with robust differentiation between S1 and S2 even in the presence of additional heart sounds and/or murmurs is a first step in this challenge. Then there is a need to measure accurately S1 and S2 allowing the progression to automatic diagnosis of heart murmurs with the distinction of ejection and regurgitation murmurs.
This phase of autonomous detection, without the help of ECG is based on signal processing tools such as: Shannon energy , Hilbert Transform , high order statistics , hidden Markov model  …
In this chapter we present a new module for heart sounds segmentation based on time-frequency analysis (S-Transform). The goal of this study is to develop a generic tool, suitable for clinical and home monitoring use, robust to noise, and applicable to diverse pathological and normal heart sound signals without the necessity of any previous information about the subject. The proposed segmentation module can be divided into three main blocks: localization of heart sounds, boundaries detection of the localized heart sounds and classification block to distinguish between S1 and S2.
The proposed methods are evaluated based on a database of 80 subjects (40 pathologic). This study is made under the control of an experienced cardiologist, in with the aim of validating the results of each method.
This chapter is organized as follows: Section 2 describes the data base used in this study. It is followed by the Section 3 which describes the different methods proposed for the segmentation module (localization, boundaries detection and classification). The results and discussion are presented in Section 4 and Sections 5 and 6 give the future research and the conclusion.
2. Data base
Several factors affect the quality of the acquired signal, above all, the type of the electronic stethoscope, its mode of use, the patient’s position during auscultation, and the surrounding noise. According to the cardiologist’s experience, it’s preferable that the signals remain unrefined; filtration will only be applied subsequently in the purpose of signal analysis. For this reason we used prototype stethoscopes produced by Infral Corporation, and comprising an acoustic chamber in which a sound sensor is inserted. Electronics of signal conditioning and amplification are inserted in a case along with a Bluetooth standard communication module.
Different cardiologists equipped with a prototype electronic stethoscope have contributed to a campaign of measurements in the Hospital of Strasbourg. In parallel, 2 prototypes have dedicated to the MARS500 project promoted by ESA, in order to collect signals form 6 volunteers (astronauts). The use of prototype electronic stethoscopes by different cardiologists makes the database rich in terms of qualitative diversity of collected sounds, which in turn makes the heart sounds localization more realistic.
The sounds are recorded with 16 bits accuracy and 8000Hz sampling frequency in a wave format, using the software “Stetho” developed under Alcatel-Lucent license.
The dataset contains 80 subjects, including 40 cardiac pathologies sounds which contain different systolic murmurs. Each subject corresponds to one recording sound. The length of each sound is 8 seconds.
At first the original signal is decimated by factor 4 from 8000 Hz to 2000 Hz sampling frequency and then the signal is filtered by a high-pass filter with cut-off frequency of 30 Hz, to eliminate the noise collected by the prototype stethoscope. The filtered signal is refiltered reverse direction so that there is no time delay in the resulting signal. Then, the Normalization is applied by setting the variance of the signal to a value of 1. The resulting signal is expressed by:
3.2. Localization of heart sounds
The localization algorithms operating on PCG data try to emphasize heart sound occurrences with an initial transformation that can be classified into three main categories: frequency based transformation, morphological transformations and complexity based transformations . The transformation try to maximize the distance between the heart sounds and the background noise, and the result is smoothed and tresholded in order to apply a peak detector algorithm. We note here, that the main goal of heart sound localization is to locate the first and the second heart sounds but without distinguishing the two from each other and without detecting the boundaries of located sounds.
3.3. SRBF localization method
We proposed the RBF method as a transformation to emphasize heart sounds and it was shown to have a good performance on low level noise signals . However, In the presence of high level of noise, the performance of the RBF method decreases. This was not surprising because the method operates directly on the heart sound without any feature extraction step. To deal with this problem, we proposed a method for heart sounds localization named SRBF . This method aims at extracting the envelope of the signal by applying the features extracted from the S-Transform matrix of the heart sound signal to the radial basis function (RBF) neural network. Compared with other existing methods for heart sounds localization, SRBF was shown to have a significant enhancement in term of sensitivity and positive predictive value and the robustness of this method was shown against additive white Gaussian noise.
We will briefly explain the different steps of the SRBF method:
The S-Transform of the heart sound is calculated. A frequency range of 0-100 Hz was used to cover the main frequency band of S1 and S2 and to avoid murmurs which have in general a spectral energy above the frequency of 100 Hz .
A sliding window of 50 ms (so 100 samples) was operated on the S-matrix and an overlap of 75% was chosen. The feature extraction is done by applying some standard statistical techniques and transformations like Root Mean Square (RMS), the maximum and the average of each column of the S-matrix. Each array (100 samples) was divided into 5 segments and the mean of calculated features of each segment was calculated and taken as input to the classifier. So for each step we have a 100 by 100 matrix which gives 15 descriptors.
A RBF neural network classifier is used and trained on two heart sounds samples (S1 and S2) and two no heart sound samples (systole, diastole) selected randomly from the database. The target is fixed to 1 for S1 or S2 and 0 for the other components. So the envelope of the signal is constructed by the output of the RBF neural network.
3.4. SSE localization method
A new method for the localization of heart sounds is proposed in this study (SSE). It uses the S-matrix like the SRBF method (0-100 Hz) and it calculates the Shannon Energy (SE) of the local spectrum calculated by the S-transform for each sample of the signal
The S-Transform proposed in , of a time series x(t) is:
Where the window function
The proposed SSE method calculates the Shannon energy of each column of the extracted S-matrix as follows:
Each column of the S-matrix represents the local frequency at a specific sample. The advantage of the Shannon energy transformation is its capacity to emphasize the medium intensities and to attenuate low intensities of the signal which represents the local spectrum in the case the SSE method. The main difference between the SSE and the SRBF method is the training phase needed for the RBF module. The RBF neural network in the SRBF method can be considered as a non-linear filter which is replaced with a simple average filter in the SSE method.
3.5. Boundaries detection algorithm: An optimized S-transform approach
The boundaries detection algorithm aims at estimating the onset and the endpoint of the located heart sounds. Accurate boundaries estimation is a very important step in the heart sound segmentation module and it is essential for the extraction of meaningful features from each part of heart cycles in order to perform an auto-diagnosis process.
3.5.1. Overview of existing methods
Different boundaries detection algorithms exists in the literature, in  the boundaries are estimated by applying a threshold on the extracted envelope of the signal, this is not be accurate for some cardiac cycles, because the envelope threshold level is used based on the average value of the whole recordings periods. The same authors propose another algorithm that employs the STFT (Short Time Fourier Transform) to explore the time-frequency domain of the signal . Authors quantify the spectrogram at each segment to two values by applying a threshold that reserves 60% of the signal energy, however, it is not clear how the energy of the signal is calculated and the accuracy of the algorithm is not mentioned. In  authors use some biomedical features of heart sounds (S1 and S2) like the maximum duration of S1 and S2 to determine the limit of estimated boundaries, the disadvantage of this method is that the estimation of energy of the signal is based on the time domain only, so in the presence of high level of noise the performance of this method will decrease dramatically.
3.5.2. The OSSE algorithm
In this chapter, we propose a new algorithm to estimate the heart sounds boundaries. The proposed algorithm tries to optimize the energy concentration of the S-transform at each located sound by using a window width optimization method. The envelope of the optimized S-transform is then recalculated by using the SSE approach and an adaptive threshold is applied to determine the onset and the ending of each located heart sound. Let us assume that
The block diagram of the proposed algorithm (OSSE) is shown below (Figure 4).
Estimate the boundaries limit
The boundaries limits are estimated basing on the fact that the maximum duration of S1 and S2 is 150 ms . So a 150ms window is applied in the proximity of detected S1 and S2 peaks which covers 75ms in the backward direction of the
Many studies tried to improve the TF representation of the S-transform[12-14]. The main study in the literature interested to optimize the energy concentration in the TF domain was in . That is, to minimize the spread of the energy beyond the actual signal components. As it well known, the ideal time-frequency transformation should only be distributed along frequencies for the duration of signal components. So the neighboring frequencies would not contain any energy and the energy contribution of each component would not exceed its duration .
The energy concentration in the Time-Frequency (TF) domain is a very important parameter for the algorithms that aim to detect the duration of any given events in a signal. Therefore, it should hold the same importance for the boundaries detection algorithm of heart sounds based on time-frequency features. However, in some cases, the S-transform suffers from poor energy concentration in TF domain. Hence, the importance of an energy concentration optimization process to improve the boundaries estimation of the heart sounds.
The main approach is to optimize the width of the window used in the S-transform. The width of the Gaussian window can be controlled by several ways by adding a new parameter to the window equation. We use in this study the parameter
We note here that in this study when α vary, p is fixed to 1, and when p vary, α is fixed to 1. The optimal value can be calculated in two methods; the first method calculates one global parameter, which is recommended for signals with constant or very slowly varying frequency components. The second method calculates the time-varying parameter which is more suitable for signals with fast varying frequency components. The disadvantage of the second approach is its high computational complexity which makes it unsuitable for applications where time is an important factor.
Based on the first approach, the optimization algorithm is applied on both parameters
With is the normalized energy of the S-transform for each α; it’s given by:
The CM (
The Adaptive threshold
Performing an optimized S-transform before calculating the SSE envelope makes the choice of threshold less sensitive to the variation of different heart sounds. In this study, a threshold which equals 10 % of the maximum value of the SSE envelope is applied to refine the estimated boundaries.
3.6. Distinguishing S1 and S2
Most of the existing methods for the segmentation of heart sounds use the feature of systole and diastole duration to classify the first heart sound (S1) and the second heart sound (S2) [1,17-18]. These time intervals can become problematic and useless in several clinical real life settings which are particularly represented by severe tachycardia or in tachyarrhythmia (Figure 5).
Consequently with the objective of development of a robust generic module for heart sound segmentation, we present in this chapter two feature extraction methods based on the Singular Value Decomposition (SVD) technique applied on the S-matrix, to classify S1 and S2. We investigate also, the ability of a new individual features based on the width of the optimized Gaussian window of the S-Transform, to discriminate between S1 and S2.
3.6.1. Feature extraction based on the S-Transform
The SVD is a powerful tool that provides a compact matrix or compact significant information about single signal. Different ways exist in the literature aims to represent the time-frequency matrix in a compact manner by using the SVD technique. In  authors extracted the eigenvalues of the time-frequency matrix. In  authors extended the method to also incorporate information from the eigenvectors to classify EEG seizures. In  the last technique is applied on the S-matrix in the aim to extract features for systolic heart murmur classification. Following this approach, this study proposes a feature extraction method for S1 and S2 classification.
The time-frequency analysis is performed by the S-Transform. The S-matrix Si of the extracted heart sound Hi is decomposed by the SVD technique as follows:
Based on our experience, in this study, the first left eigenvector and the first right eigenvector that correspond to the largest singular values are used for the feature extraction process. The histogram (10 bins) for each related distribution function is calculated based on the density function. Five feature vectors obtained by this method are tested in the classification process; the eigentime histogram vector
3.6.2. Feature extraction using the EMD
In the last few years, the Empirical Mode Decomposition (EMD) has been applied in many fields one of which the biomedical signal analysis, like the emotion classification in natural speech , analysis of gastroesphageal information . EMD has been applied to a simulated heart sounds in  authors show that EMD provides clear information about the components of S1 and S2 and their instantaneous frequency behaviour. In  authors presented a feature analysis approach of heart sound based on the improved Hilbert-Huang Transform, and applied the improved HHT by Hilbert spectrum analysis of various cases of heart sounds. In this study, a new feature extraction method based on EMD technique and Shannon energy is proposed for S1 and S2 classification.
As an alternative to the binomial TF transforms, EMD performs a multi-resolution analysis of non-stationary and nonlinear signals without the use of kernels or mother waveforms. To calculate the Intrinsic Mode Functions (IMFs), the local maxima and minima of extracted heart sound
The initial signal
For each IMF vector, the Shannon Energy is calculated as:
3.6.3. New individual features
4. Results and discussion
4.1. Localization methods
The performance of the SBRF and the SSE methods was measured as the methods capacity to locate S1 and S2 correctly. It was measured by sensitivity and positive predictive value:
And positive predictive value:
A sound is true positive (TP) if it is correctly located, all others detected sounds are considered as false positive (FP) and all missed sounds are considered as false negative (FN).
Results in Table 1 show that SRBF method reaches a higher PPV (98%) than the SSE method for the clinical signals without any additive noise. However, SSE reaches a higher sensitivity (96%) than the SRBF method (92%). The supervised approach performed by the RBF block in the SRBF method makes the extracted envelope more discriminative between the different parts of the signal than the unsupervised SSE method. Therefore, it is not surprising that the number of false detected sounds in the SRBF method is lower than the SSE method, which also explains the PPV results. The same reasons can also account for the false negative alarms which are higher in the SRBF method than the SSE method and which gives a higher sensitivity to the SSE method. In the presence of an additive white Gaussian noise, the performance of the SSE method is better with 93% sensitivity and 94% PPV. The robustness of both methods against noise is very significant. This is due to the advantage of performing a time-frequency analysis which makes methods more robust against noise. Figure 8 shows the envelopes extracted by the SSE and the SRBF method that correspond to a pathologic sound with a systolic murmur. Figure 9 shows the robustness of each method against white additive noise.
|Method||Sensitivity||PPV||Sensitivity (Noise)||PPV (Noise)|
4.2. Boundaries detection
The performance measure against each parameter is compared (Table2). The values of
|Heart Sounds||Optimal α||CM(α)||Optimal p||CM(p)||CM( α =1, p=1)|
The optimal α is reached when CM(α) is maximized, and the optimal
The “Reference” row in Table 3 represents the manual measures made by the cardiologists by using the software stetho developed under the license of Alcatel-Lucent. Results show the efficiency of optimizing the energy concentration of the S-transform in order to estimate more realistic boundaries for S1 and S2. Measures obtained by the SSE algorithm (without optimizing the S-transform) are always higher than the results given by the OSSE algorithm where an optimization process is performed. This is not surprising since the OSSE algorithm has a better energy concentration in the TF domain, which minimizes the spread of the energy beyond the S1 and the S2. Figure 11 shows the boundaries detection results, with and without optimization of the S-transform, applied on a S2 example and figure 12 shows the OSSE results applied on the entire heart sounds (normal and pathologic).
4.3. Feature extraction for S1 and S2 classification
4.3.1. Evaluating the feature vectors obtained by the SVD technique
The localization of heart sounds is established by using the SSE method. The boundaries of the heart sounds are determined by the OSSE algorithm. The results were visually inspected by a cardiologist and erroneously extracted heart sounds were excluded from the study. The feature extraction process extracts a feature vector per extracted sound
A 3-Neirest Neighbor (KNN) classifier is used to evaluate the performance of the four feature vectors obtained by the two methods and the 5-fold approach is used for cross validation. The choice of KNN classifier was based on its simplicity of and its robustness to a noisy training data.
The time domain feature vector reaches 92% classification rate, however, the frequency feature vector reaches 85% classification rate (81% sensitivity and 88% specificity). The Time-Frequency vector (TF Features) reaches the higher classification rate with 95% sensitivity and 97% specificity. The singular values are almost indistinguishable from each other and it is shown by the low classification rate for the SV features. For the EMD based method, the FV feature vector reaches a high classification rate with 94% sensitivity and 97% specificity (Table4).
|F-Features||SV Features||TF Features||FV1||FV2||FV3||FV4||FV|
In most cases seen in the medical field, S2 has a higher frequency than S1. This is due to the fact that S2 is the heart sound associated with the closure of the aortic valve in a context of high left ventricular pressure, the mitral closing occurring at low left ventricular pressure (S1). However, this criterion cannot be generalized on all real life cases because some medical conditions are characterized by S2 frequency content lower than S1 frequency content. Hence, the importance of time-frequency and multi-resolution based features approach, especially in a generic module, which can explain the high performance obtained with the TF and FV features vectors.
4.3.2. Evaluating α and p to discriminate S1 and S2
The parameters used in the optimization process (section 3.3.2) to determine the boundaries of each extracted sound
The main objective is to investigate the ability of these features to discriminate between S1 and S2. The probability that the two groups (S1 and S2) comes from distributions with different medians is calculated for each feature (
The Results are presented in Table 5. Significant differences between the groups, with 95% confidence are found for both features
The classification results are promising for the parameter
5. Future research
5.1. Classification of heart sounds
A new time-frequency based feature is proposed and validated to distinguish with S1 and S2 (Section 4.3.2). Another parameter can be tested by applying another windows type at the S-transform like the arbitrary and varying shape window . A combination of several features can also be used to classify S1 and S2 more accurately. This can be performed by combining the
On another hand, the classification of normal and pathological heart sounds is the final objective of any heart sounds auto-diagnosis framework. The classification rate will depend first on the segmentation results, which was the main objective of this book chapter. Then classic steps of feature extraction, feature selection, designing and testing classification systems, will be needed to complete the classification process
5.2. Real time application
One of the objectives of this study is to develop an auto diagnosis for various situations encountered in cardiology in real time. However, the S-Transform that can be considered as the heart of the proposed segmentation framework, suffers from a high computational burden. The implementation of a fast S-Transform algorithm on FPGA or GPU card will be necessary.
5.3. Sociological and psychological aspect
Introducing a smart stethoscope as a monitoring tool for home use, involves new problems related to sociological and psychological aspect of the user (patient). A smart stethoscope is a tool to facilitate the diagnosis process and to make it more objective and it will never replace the cardiologist and other advanced techniques of Cardiology. This should be taken into consideration in the deployment process in a telemedicine framework for example. The ergonomic aspect of the measuring instrument, the way to display the data and to transmit it, will be more than necessary elements to any future tool, simple for home use by the general public for the purpose of auto-diagnosis, monitoring and warning in case of necessity.
In this book chapter, a robust module for heart sounds segmentation has been proposed. The module is divided into three blocks; localization, boundaries detection, and classification of heart sounds (S1 and S2). Several methods are proposed during this study:
A heart sounds localization method based on the S-transform and Shannon Energy, named SSE, is proposed and evaluated against white additive Gaussian noise.
A method for boundaries detection named OSSE is proposed. It is based on an optimization process for the energy concentration in the TF domain provided by the S-transform.
A feature extraction methods based on Singular Value Decomposition (SVD) technique to distinguish between S1 and S2 are examined. The parameters used in the time-frequency optimization process to determine the boundaries of each extracted sound are also investigated and validated as discriminative features between S1 and S2.
Dividing the proposed segmentation method into three separate blocks, enable us to perform a targeted optimization at each level. This confers the feature of robustness to the proposed module, which is a more than necessary element to any auto-diagnosis module applicable in real life conditions.
The main objective of this study is to present a robust and generic PCG segmentation method useful in real life conditions (clinical use, home care, professional use …). The methods in the proposed framework are evaluated on a real data (80 subjects) with different noise levels and they are validated by the cardiologist.
More robustness tests against noisy signals, algorithms complexity, facility of implementation and more signals, would contribute to optimize the proposed module.