Parameters of VAE model.
Subtle distortions on electrocardiogram (ECG) can help doctors to diagnose some serious larvaceous heart sickness on their patients. However, it is difficult to find them manually because of disturbing factors such as baseline wander and high-frequency noise. In this chapter, we propose a method based on variational autoencoder to distinguish these distortions automatically and efficiently. We test our method on three ECG datasets from Physionet by adding some tiny artificial distortions. Comparing with other approaches adopting autoencoders [e.g., contractive autoencoder, denoising autoencoder (DAE)], the results of our experiment show that our method improves the performance of publically available on ECG analysis on the distortions.
- variational autoencoder
- variational inference
- ECG enhancement
- deep learning
Automatic electrocardiogram (ECG) recognition  is greatly helpful to doctors in their diagnosis and treatment of heart disease. As the number of portable ECG devices is increasing, more and more ECG records are available. However, it is inevitable that these ECG data are contaminated by different kinds of noise caused by such interference as baseline wandering, muscle shaking, and electrode movement [13, 14]. Considering the level and complexity of these noises, especially those components that may cause subtle deformations on ECG waveforms, these factors may decrease the accuracy of the ECG recognition. Additionally, there are much more unlabeled ECG data (i.e., there are not any type information about the data) that are stored in a lot of databases. Therefore, it is necessary to improve the performance of automatic ECG classification in unsupervised context by choosing proper models and algorithms.
In order to prevent noisy inference, many approaches of preprocessing or enhancement of ECG were successfully employed to remove the contaminations. Traditionally, most of these approaches are based on the filtering technology on frequency domain. Ziarani et al. and Konrad  eliminated the power line noise by extracting a specified component of a signal and tracking its variations over time. Alfaouri et.al.  and Dewangan et al.  employed wavelet transform method to isolate baseline wander and effectively detect and suppress the presence of power line interference in ECG. Although these filters can help suppress the high-frequency interference, they may drop out some useful information on the heart illness simultaneously. Because the frequency spectrum spreads not only low band but also high band. To overcome these drawbacks of filtering-based methods, some adaptive methods have been proposed. Abdelmounim et al.  applied adaptive algorithm to remove those noise that subsequently adapt to the wavelets selected by proper thresholding. However, the author also reported that this method had its own relative disadvantage that it had incapability of removing baseline wandering smoothly and effectively. Additionally, other technologies such as Fourier transform (FT) and empirical mode decomposition (EMD) were also employed for ECG preprocessing [19, 20]. FT maps the higher frequency components into the low area. Similarly, EMD separates different ECG components by proper intrinsic mode functions.
Feature extraction is another important procedure of ECG recognition. ECG features consists of amplitudes, intervals, and segments, which are shown in Figure 1. Each feature indicates certain activities of heart. For example, P wave represents atrial depolarization, it causes both atria to contract and pump blood to ventricles. Any distortion of P wave indicates malfunction of atrial appears.
Traditionally, the goal of ECG feature extraction is to extract all abovementioned features. As the amplitude of R wave is much larger than any others, many approaches based on the QRS complex detection have been proposed. Chan et al.  used a specific template to match the preferred ECG signals by the computation of the correlation between them. Krasteva and Jekova et al.  successfully implemented this method to evaluate the heart rhythm. Nevertheless, these approaches are heavily dependent on the prior knowledge about ECG and the relevant areas [23, 25], which cause more difficulties for further applications. Comparatively, some other approaches based on kernel functions are more popular and widely used because of their simplicity and sensitivity. Martis et al.  studied several methods [principal component analysis (PCA), linear discriminant analysis (LDA), independent component analysis (ICA), and discrete wavelet transform (DWT)] and compared them in feature extraction for classifying the arrhythmia ECGs. Banerjee et al.  focused on two specific regions (QRS complex area and T-wave region) on ECG waveforms to adequately distinguish between normal and abnormal ECG signals by yielding wavelet cross spectrum and wavelet coherence. Kærgaard et al.  proposed two hybrid signal processing schemes [ensemble empirical mode decomposition (EEMD) and discrete wavelet transform (DWT)] for ECG features extraction. These schemes were implemented by combining with the neural network and the wavelet transform. Nazarahari et al.  chose wavelet functions (WFs) as means of ECG classifying and proposed a wavelet design criterion for wavelet function choosing. Houssein et al.  classified the ECG by modified water wave optimization (WWO) algorithms and achieved over 93% average accuracy.
Although many important contributions have been given to ECG feature extraction by conventional methods based on kernel technologies, the accuracy and efficiency of these methods could rarely meet all the requirements of applications especially in the background of noise. Fortunately, different from the kernel methods, neural networks have been used to draw ECG features automatically by the hierarchical structure in the context of deep learning, which could be achieved by a new approach which is known as representation learning. Yan et al.  used a restricted Boltzmann machine (RBM) for ECG classification. Xiong et al. [9, 10] employed denoising autoencoder (DAE) and stacked contractive denoising autoencoder for ECG denoising , respectively. Zhou et al.  chose a stacked sparse autoencoder (SAE) to extract ECG feature for classifying and the level of accuracy achieved by this work shows derivable benefits over the traditional methods that require wavelets transform to perform ECG classification.
In terms of the heart illness automatically diagnosis auxiliary by the ECG recognition, some works mentioned above do not meet the necessary requirements because most studies focused on the arrhythmia distinguishing problems. Nevertheless, many heart diseases have close relationship not only with the rhythms of itself but also with the other features such as the length of the ST segment and the amplitude of P wave on the ECG waveforms. Additionally, there are rarely generative models to be used for ECG recognition. The contributions of this chapter include two aspects: (1) instead of using ECG signals on a cardiac period between two start points at P waves, we propose a new method for intercepting ECG segments between adjacent two R peaks and (2) we use variational autoencoder (VAE) model as an analysis tool to recognize different ECG signals by focusing on the variation of tiny distortion.
This chapter is organized as follows. Section 2 briefly describes autoencoder and its variants. Section 3 introduces the variational inference and variational autoencoder in detail. ECG preprocessing and classifying schema is proposed in Section 4. Our experiment results and discussions are shown in Section 5. Finally, Section 6 concludes.
2. Autoencoders and variants
Variational autoencoder has close relationship with autoencoder. An autoencoder is a neural network that consists of encoder and decoder. Encoder maps its input into representation and decoder reconstructs the representation back into the input, that is, perfect autoencoder can resemble the training data approximately by forcing to prioritize those aspects of the input that are helpful to resembling and discard the others. In this regard, the autoencoder learns the useful properties of training data. Comparatively, VAE shares the same character with AE besides some specialties of its own.
2.1. Autoencoder and regularized variants
Autoencoder can be used to get useful features from the encoder output. Generally, in the view of the feature dimension, autoencoder falls into two categories: undercomplete and overcomplete. Undercomplete means the dimension of feature is less than that of the input and more salient features could be learned well in this scenario. Conversely, in the case of overcomplete, the dimension of feature is greater than that of the input and more sparsity features might be drawn in this setting. Additionally, the objective function is another core topic for an autoencoder. It is designed to make the autoencoder have capabilities such as linear regression or logistic regression, which limit the model to some useful properties of the training data. The general form of the objective function can be depicted as follows:
where is the training data for a given autoencoder. are the parameters of the model and is a nonnegative hyperparameter that controls how much of the penalty term to the relative to the standard objective function J. Numerically, setting to 0 means not any regularization and larger values of result in more regularization. Conceptually, autoencoders with penalty term is usually called regularized autoencoder that is encouraged to have small derivative of the representation, which leads the convergence faster than those that have not any regularization during the training time.
Varied forms of regularizer terms make the autoencoder have different properties and bring us different variants of regularized autoencoder. These variants include primarily sparse autoencoder (SAE), denoising autoencoder (DAE) , contractive autoencoder (CAE), and variational autoencoder (VAE). Theoretically, VAE combines variance inference (VI) and neural networks. As a generative model, one of the prominent successes of VAE is that it realizes effective random sampling using back-propagation (BP) technology. This will be described in detail in Section 3.
Different from VAE, SAE makes majority of the neurons in its hidden layers be inactive since the active functions on these neurons are feasibly saturated for most input. This results in the sparsity of features, where many of the elements of the features are zero (or close to zero). In the view of mathematics, the sparsity of SAE is accomplished by the penalty term , where is the given sparsity value. Parameter p will be adjusted gradually to in the training stage and achieve satisfactory sparsity. Analogous to SAE, CAE [4, 26] yields the specialized contractive properties by the penalized term—a Jacobin matrix that is consisted of the partial derivatives of the decoder active functions to input vectors. Then the input perturbations can be resisted during training time. Consequently, neighborhood of points in samples is encouraged to map into a smaller area, which can be thought as the capability of contracting for CAE. The motivation of DAE is to be insensitive to noise. Instead of adding an additional penalty term to the object function, DAE is trained by the noise-corrupted data () [27, 30]. DAE yields great success in many cases especially in manifold assumption. As the corrupted data lie farther away from the manifold than the uncorrupted ones, DAE tends to take those points that are farther from the manifold to near. The larger distance from the manifold, the bigger step DAE takes to the manifold.
Generally, these autoencoders share some properties. DAE and CAE are able to learn the manifold structure of the samples. Simultaneously, SAE and CAE have the similar sparsity character on their representation. Nevertheless, the implementations of these autoencoders are quite different. For example, DAE reaches the goal by using the noise-corrupted data to train the structure to learn the proper parameters that can reconstruct the original samples without any noise. Comparatively, CAE takes Jacobian matrix as part of the loss function and encourages robustness on the representation by contracting the samples during the training process.
3. Variational inference and variational autoencoder
As the central problem in inference analysis, posterior distribution computation is facing two computing challenges: marginal likelihood computation and predictive distribution computation. Both of them are intractable since they often require computing high-dimensional integrals. Therefore, approximate inference approaches such as Gibbs sampling based on Markov chain Monte Carlo (MCMC) principle are appealing. However, Gibbs sampling and its variants are often restricted from some applications for their inefficiencies especially in the high-dimensional scenario. This awkward situation has not been changed until the VAE was proposed theoretically . To get an understanding of a VAE, we will first start from the relevant bases including variational inference (VI), evidence low boundary (ELBO), mean field, and Kullback–Leibler (KL) divergence.
To describe the problem mathematically, let be a set of N observations and be the m latent variables. ) denotes the joint distribution of and given the parameter of the model. and are called the likelihood of and the posterior distribution of , respectively.
3.1. Variational inference
Theoretically, the motivation of variational inference [33, 35] is to find a feasible distribution to approximate the desired posterior distribution that is intractable. To measure how closeness of these two distributions are, Kullback–Leibler (KL) divergence  is introduced. Let and indicate two different distributions of the continuous random variables , their KL divergence is defined as:
Intuitively, KL divergence is nonnegative and monotonically decreasing to the similarity of the distributions, that is, the more similar of the two distributions, the smaller the KL divergence value is. The identity equals zero when is the same as . However, the KL divergence is non-symmetrical as . The definition indicates implicitly another two properties: the KL divergence equals zero when goes infinitively to zero regardless of and rises asymptotically infinity as becomes zero. Hence, we can approximate the distribution P() for Q() by minimizing .
3.1.1. Evidence lower boundary
In the context of Bayesian statistics, “Evidence” is an alternative term used for the marginal likelihood of the observations. Formula (3) reveals the relationship between KL divergence and the logarithm of the evidence . The difference between them equals the expectation of , which is called the evidence lower boundary (ELBO). As the KL divergence is nonnegative, then we have the evidence lower boundary as formula (3). Jordan et al.  got the same result originally using the Jensen’s inequality. Formula (3) shows literally the name of ELBO. We may define the expectation of as , a function of distribution of :
Intuitively, maximizing ELBO is equivalent to minimizing the KL divergence. As the decreases to zero, it is necessary to make the posterior distribution share the same distribution with . Hence, we can use to approximate the posterior distribution by maximizing ELBO, which can be realized by optimizing the objective of as formula (4), finding an optimal distribution within a specifying family of densities over the latent variables. Expectation maximization (EM) algorithm  is one of the successful approaches that were designed for finding the optimal solution within the family . It alternates iteratively between expectation step (E-step) where the posterior distribution is calculated and then, maximization step (M-step) where the expectation of the complete-data likelihood with respect to the posterior distribution is maximized by optimizing the parameters . Then updates the parameters with :
3.1.2. Mean field
To simplify the optimization problem of ELBO, it is necessary to make assumption on the family as the selection of the family affects impressively on complexity of the optimization algorithm for the problem. This assumption focuses on the way that how to factorize as:
where denotes the individual factors that are mutually independent over the latent variables of the model. According to the chain rule of probability, the joint distribution can be decomposed as:
Then, the ELBO can be written as Eq. (7):
where is constant with respect to . Then, maximizing ELBO is equivalently maximizing the last summation term. Furthermore, we can derive out the optimal solution by Lagrangian multiplier method:
Formula (8) indicates that the factors are all proportional to the exponentiated log the joint distribution except the variational factor. This is the gist of the coordinate ascent variational inference (CAIV)  as well. However, as the ELBO is not a necessary convex function, there is no guarantee that the solution is a global optimum.
3.2. Variational autoencoder
As a deterministic model, general regularized autoencoder does not know anything about how to create a latent vector until a sample is input. Conversely, as a generative model, variational autoencoder (VAE)  emerges as a successful example of combination of variance inference and neural network. VAE forces the latent vector following some kind of distribution. These characters not only encourage the properties of the general regularized autoencoders but also expand some additional properties. For example, VAE can generate some data points even without any encoding input. It is the specialty of VAE that differs from the other regularized autoencoders. To explore VAE further, it is necessary to understand those complicated ideas such as the neural network structure, the loss function, and the optimization algorithm.
In the view of the hierarchy, the neural network structure of the VAE is mainly composed of three parts. The first part is the encoder, which is used to encode the signals from the input layer. The second part is the decoder, which is located in the right side as shown in Figure 2. The third part is the sampling unit located in the middle of the other two parts. Except for the encoder and the decoder which are similar to that of the traditional autoencoder, the additional sampling unit is responsible for sampling from the latent variables spaces.
Another issue about how to train the structure is the loss function as shown in formula (9), which is essentially the same as the negative in formula (7). In the view of training, the losses of a VAE come from two aspects: the first part is from the neural network that measures how much the difference between the reconstructed data and the original input. This part encourages the decoder to learn to reconstruct the input. Otherwise, the value of this part will become even larger that will increase the total loss value finally. The second part comes from the KL divergence that indicates how much close of the encoder’s distribution and the latent variables distribution. This part can be taken as a regularizer as that of the traditional autoencoder. It forces the encoder’s distribution go as close to the latent variables distribution as possible by minimizing KL divergence of them. In other words, if the encoder outputs representations are different from the specified distribution, then the regularizer term will penalize the loss function. Otherwise, the penalty will vanish away:
The last idea for VAE is the way that how to minimize the loss function of Eq. (9) as working on the neural networks, where the algorithms based on gradient decent are popularly adopted. Comparatively, it is feasible to compute the first term in the Eq. (9) as the expectation indicates the reconstruction difference and we can calculate it by the mean squared error between the output of the encoder and the decoder, as similar to that of the traditional autoencoders. However, it is more difficult to compute the second KL divergence directly as and are all intractable. Fortunately, An effective solution was proposed by Kingma et al.  on the assumption that follows a normal distribution , where and and are the parameters of the mean and the variance, respectively. For the simplicity, here we assume , where is a unitary diagonal matrix. The advantages of this choice make the computation of the KL divergence manageable. We can compute it in the closed form as:
is a constant value that is only relevant to the dimensionality of the distribution.
Additionally, to train a VAE neural structure, the gradient decent should be focused on when error back propagates through the sampling layers. However, we cannot derivate the loss function over the distribution directly as the distribution is a non-continuous operation and has no gradient. To clarify the problem, suppose we can take the derivation of respect to , then we get the gradient expression as following:
It is clear that the gradient depends not only on the decoder’s distribution but also on the encoder’s distribution . Except for the non-continuity of the encoder’s distribution, there is no stochastic unit with the neural network. Kingma et al.  presented a method named “reparameterization trick” to solve the problem successfully. Instead of drawing from the encoder’s representations directly, sampling unit generates and at first by sampling from the input . Given and , we can do sampling from , and then compute , where Consequently, given a fixed and , becomes continuous and deterministic for and , which means that derivation of over Q is computable. Then those algorithms based on the gradient descent (GD) can be effective on VAE neural networks. Comparing to the time-consuming Gibbs sampling methods, algorithms based on GD are much more effective and efficient.
4. ECG preprocessing and enhancement
In this section, we introduce our method on ECG preprocessing and enhancement. The task in this procedure is to split the ECG waves into segments according to the cardiac cycle  and then take them as data points for training our models. As described in Section 1, QRS complex is responsible for the activities of ventricular depolarization and repolarization, it has morphologically higher amplitude and sharper peak than other components such as P-wave and T-wave. Therefore, it is much more convenient to detect and locate Q peaks (or R, S peaks) than any other components in these ECG segments. Algorithm 1 describes the procedure of how to split ECG waveforms in detail. The templates selected in algorithm 1 are produced by the contours of the most ECG R wave peaks.
The critical step in Algorithm 1 is how to evaluate the similarity between the selected area on the ECG waveform and the given template. Generally, the mean squared error (MSE) is usually adopted in some ECG recognizing applications. However, the main disadvantage of this method is that it is time-consuming to align the selected area with the given template. For example, there are two pictures with the same curve, the similar value of the pictures may be definitely tiny if the template aligns extremely well or a very large as they do not cover each other at all. Another reasonable approach named the correlation coefficient is being currently used [21, 26]. Instead of computing directly the difference between the ECG waveform and the template as the MSE method, it solves an optimal problem that minimizes the sum of the squares of the offsets of the selected ECG data points to the corresponding points on the template.
We introduce a parameter for the length of the segment of ECG waveforms. It is important to keep lie in a proper range. Otherwise, there are more than one R peaks or none in the segment when the is out of the range. To avoid the awkward situations, there is a trick that let the be proportional to the distance between two adjacent peaks and rather less than it, that is, . For instance, suppose sampling rate is 250 Hz and heart rate equals 75 times per minute, then . As the heart rate is not a constant during the sampling procedure, then distance can be calculated by the inequation. For this reason, in all of our experiments, the distance is set empirically as the average of that of previous three cardiac periods. The searching step can be initialized as a constant value as there are no any variations on the vertical directions. We keep the equaling 1 in this chapter.
Algorithm 1. ECG R wave peak location algorithm.
1: input: ECG data file name pa
2: initial: set segment length and searching step , empty ECG data buffer and R wave peaks array ;
3: read ECG data into ECG data buffer from ECG data file ;
4: calculate segment number where length();
5: for each segment in
6: let search range in vertical direction equal start position;
7: while notand and do
8: Look for R wave peak in small area of in range of using template;
9: if// to decide whether find the target.
10: Save the result to ;
13: Update range of for next iteration;
14: end if
15: end while
16: update and respectively;
17: end for
18: return ECG data array , R wave peak array ;
Figure 3 shows ECG waveform (top picture) and the R wave peak detection and location (bottom picture). The ECG data are adopted from the American Heart Association (AHA) database on physionnet website , which consisted of 80 two-channel ECG recordings and digitized at 250 Hz with 12-bit resolution over a 10-mV range. The recordings in the database are divided into eight classes according to the highest level of ventricular ectopy present.
5. Experimental results and discussion
In this section, we evaluate the performance of VAE and other autoencoder variants described in Section 2.
5.1. ECG signals for multi-classification
To demonstrate the performance of our models on dealing with ECG signals, it is necessary to abstract an intact ECG signal in a cardiac period, which consists of features such as P-wave, QRS complex, and T-wave as described in Section 4. Then detection and location of P-wave becomes more critical step as every cardiac period of ECG signal starts at P-wave. However, as the amplitude of P-wave is smaller than that of QRS complex, and there are many kinds of noise on ECG singles. These factors enlarge the difficulties of abstraction of ECG signals in a cardiac period.
Our solution to alleviate this problem is offered by the fact that it is more feasible to locate R-peaks than to locate the start position of a P-wave. Instead of focusing on the cardiac period, we separate one cardiac period into two semi-cardiac periods at R-peak and then take two parts of the adjacent ECG signals together to form a new period ECG signal, which consists of the second part of the previous cardiac period and the first part of the next one. Figure 4(a) shows an example of an ECG signal that is composed of two parts of the adjacent semi-period. Additionally, in the view of information, there is no any feature lost in this separation.
The original ECG recording from ECG database contains several hours of ECG data, and it is unfeasible to train our models using these original ECG data directly. To train our models well, 30,000 ECG signals are abstracted completely from three different ECG databases. The AHA ECG database, the APNEA ECG database , and CHFDB ECG database . Additionally, for ECG data augmentation , these ECG data are divided into three different groups according to their source databases and each group has 10,000 ECG signals. On this basis, we augment the ECG data by zeroing a small segment on ECG signals and different positions we selected to zero correspond to different class labels. Figure 4(b)–(d) are three examples of our augmentation. Concretely, the labels of Figure 4(b)–(d) are 3, 4, and 5, respectively. (We use numbers 1–8 as eight labels for different class of ECG signals in all of our experiments. We add labels for the different classes of ECG signals, not for training our models but for simplifying evaluating the accuracy of our models in testing process.)
To evaluate the properties of our models on denoising for ECG signals, different type noise on different level are added into the original ECG records. These noise include Gaussian noise, salt and pepper noise, and Poisson noise. Moreover, to imitate baseline wandering noise, different amplitude sinusoidal signals are superimposed on the original ECG signals. The coefficients of the sinusoidal signal are 0.01, 0.05, and 0.1, respectively in all of our experiments. Figure 5shows the ECG signals polluted by different noises. Figure 5(a) and (c) show the augmented ECG signals without adding noise except for some one polluted during sampling. Figure 5(b) shows ECG signal polluted by the sinusoidal noise and the Gaussian noise. The coefficients for the sinusoidal and for the Gaussian are all 0.01. Nevertheless, the coefficients for the sinusoidal and for the Gaussian are 0.05 and 1 as shown in Figure 5(d). The mean and variance of the Gaussian noise are 0 and 0.01, respectively.
5.2. Recognization of ECG signals
After ECG signals have been abstracted completely by the methods described in Section 5.1, they are used to train VAE model. To compare the effect of the complexity of ECG data on our model, all ECG data are divided into two groups. The first one contains only two classes of ECG records, normal or abnormal. (We call this group as BI dataset) The normal ECG records mean those ones that contain all normal features as shown in Figures 4 and 5. The abnormal ECG records in BI dataset contain at least one abnormal feature such as prolonged PR interval, enlarged P-wave, and absence of T-wave. The second group contains 8 classes of ECG records, each of them are produced by zeroing a small segment of ECG data as described in Section 5.1 (We call this group as MI dataset). In order to verify the performance of the VAE model on ECG signals, the parameters of the model are shown in the Table 1. Table 2 shows the performance of the VAE model on recognizing these ECG signals from both BI and MI datasets. The results clearly show that the accuracies of recognition are higher than 95% for MI recorders and even more than 97% for BI recorders. In the view of the data complexity, the result is reasonable because the complexity of MI is much higher than that of BI.
|Input size||400||Equal the length of signal|
|h1||100||First layer of the encoder|
|h2||10||Second layer of the encoder|
|z-mean||2||Mean of the sampler|
|z-variance||2||Variance of sampler|
|Batch size||100||Randomly select samples from the dataset|
|DB||Record||ECG no.||Sample no. (103)||Class no.||Precision (%)||Error (%)|
Advantages of VAE model on recognization ECG signals can be further shown by comparasion with other autoecoders such as CAE,DAE, and SAE mentioned in Section 2. In order to make the comparison be fair and reasonable, all of the parameters of the model are the same exept for that of the sampler in VAE model (the values of the parameters can be seen in Table 1). Moreover, the ECG records of BI and MI from ahadb database are used to train and test all the models. Figure 6 shows the accuracy of the models on recognizing the ECG records. Both (a) and (b) in Figure 6 take the rate of the representation to the input on size as variable. Figure 6(a) takes the BI ECG records from the ahadb as the datasource for the models. Conversely, the MI records from the same dataset are selected in Figure 6(b). It is clear that the accuracy of the VAE model is higher than that of the other models on both BI and MI ECG records, which is at leat 95% on BI records and no more than 90% on MI records. Meanwhile, both figures indicate a fact that the proper rate for the accuracy on the same condition is at 1. The accruy is near 80% when rate falls at 0.5. Simlarly, the accury drops sharply as the rate rise up. Therefore, there is no necessary for representation of ECG signals to compress (rate < 1) or stetch (rate > 1) themselves.
Figure 7 demostrates the performance of the VAE model on denoising for ECG records. The method of adding noise into ECG records in our experiment can be seen in Section 5.1. The coefficient for sinusoidal is 0.05 and the mean and the variance of Gaussian noise are 0 and 0.05, respectively. For the goal of comparison, we take four groups of ECG records (BI, noisy BI, MI and noisy MI) as dataset for the VAE model.
The results show that the accuracy under noisy condition is similar to that of without noise on the same dataset. This means that performance of VAE model on ECG recognition is robust to some kinds of noises.
In this chapter, we develop a VAE model to recognize a tiny distortion on ECG signals. First, we analyze the characteristics of the features of the ECG signals, which are closely related to ECG components such as P-waves, QRS complex, and T-waves. Second, we explain an algorithm that deals with the location of R peaks. On the basis of the algorithm, we abstract a segment of ECG signal between two adjacent R peaks from three real-life ECG databases. Finally, we train our models by using the selected ECG signals. The results of our experiments demonstrate that the proposed VAE model can be used as an effective tool to automatically recognize ECG signals. Especially, this model is robust to some kinds of noises that are usually produced during the sampling procedures. Furthermore, as a generative model, VAE is a recently established based on the neural networks. The important characteristic of the model is that it can be used in the scenario of the unsupervised learning . Simultaneously, with the emergence of the large amount of unlabeled ECG records and the requirement for real-time diagnosis of heart illness by automatic recognition ECG signals, our method in this chapter can offer a solution to these problems.
In the view of the clinic, future work should put more energy on setting up the set of features of ECG signals, especially, the relationship between the features and the heart diseases. Additionally, because of the physiological characteristics of heart, a single ECG wave may not accurately represent the entire situation of the heart, it is therefore desirable to obtain all of ECG signals from all of 12 or 18 leads. For example, if an anterior wall myocardial infarction happens. Feature of ST-segment elevation reciprocally changes on the ECGs from the leads of I, aVL, and V1–V5. Therefore, the general implementation of VAE model to such clinic situations warrants further study.