Open access peer-reviewed chapter

Electrocardiogram Recognization Based on Variational AutoEncoder

By Shaojie Chen, Zhaopeng Meng and Qing Zhao

Submitted: October 26th 2017Reviewed: March 12th 2018Published: August 29th 2018

DOI: 10.5772/intechopen.76434

Downloaded: 351


Subtle distortions on electrocardiogram (ECG) can help doctors to diagnose some serious larvaceous heart sickness on their patients. However, it is difficult to find them manually because of disturbing factors such as baseline wander and high-frequency noise. In this chapter, we propose a method based on variational autoencoder to distinguish these distortions automatically and efficiently. We test our method on three ECG datasets from Physionet by adding some tiny artificial distortions. Comparing with other approaches adopting autoencoders [e.g., contractive autoencoder, denoising autoencoder (DAE)], the results of our experiment show that our method improves the performance of publically available on ECG analysis on the distortions.


  • electrocardiogram
  • variational autoencoder
  • variational inference
  • ECG enhancement
  • deep learning

1. Introduction

Automatic electrocardiogram (ECG) recognition [29] is greatly helpful to doctors in their diagnosis and treatment of heart disease. As the number of portable ECG devices is increasing, more and more ECG records are available. However, it is inevitable that these ECG data are contaminated by different kinds of noise caused by such interference as baseline wandering, muscle shaking, and electrode movement [13, 14]. Considering the level and complexity of these noises, especially those components that may cause subtle deformations on ECG waveforms, these factors may decrease the accuracy of the ECG recognition. Additionally, there are much more unlabeled ECG data (i.e., there are not any type information about the data) that are stored in a lot of databases. Therefore, it is necessary to improve the performance of automatic ECG classification in unsupervised context by choosing proper models and algorithms.

In order to prevent noisy inference, many approaches of preprocessing or enhancement of ECG were successfully employed to remove the contaminations. Traditionally, most of these approaches are based on the filtering technology on frequency domain. Ziarani et al. and Konrad [15] eliminated the power line noise by extracting a specified component of a signal and tracking its variations over time. Alfaouri [16] and Dewangan et al. [17] employed wavelet transform method to isolate baseline wander and effectively detect and suppress the presence of power line interference in ECG. Although these filters can help suppress the high-frequency interference, they may drop out some useful information on the heart illness simultaneously. Because the frequency spectrum spreads not only low band but also high band. To overcome these drawbacks of filtering-based methods, some adaptive methods have been proposed. Abdelmounim et al. [18] applied adaptive algorithm to remove those noise that subsequently adapt to the wavelets selected by proper thresholding. However, the author also reported that this method had its own relative disadvantage that it had incapability of removing baseline wandering smoothly and effectively. Additionally, other technologies such as Fourier transform (FT) and empirical mode decomposition (EMD) were also employed for ECG preprocessing [19, 20]. FT maps the higher frequency components into the low area. Similarly, EMD separates different ECG components by proper intrinsic mode functions.

Feature extraction is another important procedure of ECG recognition. ECG features consists of amplitudes, intervals, and segments, which are shown in Figure 1. Each feature indicates certain activities of heart. For example, P wave represents atrial depolarization, it causes both atria to contract and pump blood to ventricles. Any distortion of P wave indicates malfunction of atrial appears.

Figure 1.

An ECG waveform with two cardiac periods. It consists of P wave, QRS complex, and T wave. Additionally, there are two intervals: PR interval (3) and QT interval (5). Three segments include PR segment (2), ST segment (4), and TP segment (6). RR interval (7) means how long is the duration between two adjacent peaks of R wave.

Traditionally, the goal of ECG feature extraction is to extract all abovementioned features. As the amplitude of R wave is much larger than any others, many approaches based on the QRS complex detection have been proposed. Chan et al. [21] used a specific template to match the preferred ECG signals by the computation of the correlation between them. Krasteva and Jekova et al. [22] successfully implemented this method to evaluate the heart rhythm. Nevertheless, these approaches are heavily dependent on the prior knowledge about ECG and the relevant areas [23, 25], which cause more difficulties for further applications. Comparatively, some other approaches based on kernel functions are more popular and widely used because of their simplicity and sensitivity. Martis et al. [3] studied several methods [principal component analysis (PCA), linear discriminant analysis (LDA), independent component analysis (ICA), and discrete wavelet transform (DWT)] and compared them in feature extraction for classifying the arrhythmia ECGs. Banerjee et al. [5] focused on two specific regions (QRS complex area and T-wave region) on ECG waveforms to adequately distinguish between normal and abnormal ECG signals by yielding wavelet cross spectrum and wavelet coherence. Kærgaard et al. [6] proposed two hybrid signal processing schemes [ensemble empirical mode decomposition (EEMD) and discrete wavelet transform (DWT)] for ECG features extraction. These schemes were implemented by combining with the neural network and the wavelet transform. Nazarahari et al. [8] chose wavelet functions (WFs) as means of ECG classifying and proposed a wavelet design criterion for wavelet function choosing. Houssein et al. [4] classified the ECG by modified water wave optimization (WWO) algorithms and achieved over 93% average accuracy.

Although many important contributions have been given to ECG feature extraction by conventional methods based on kernel technologies, the accuracy and efficiency of these methods could rarely meet all the requirements of applications especially in the background of noise. Fortunately, different from the kernel methods, neural networks have been used to draw ECG features automatically by the hierarchical structure in the context of deep learning, which could be achieved by a new approach which is known as representation learning. Yan et al. [12] used a restricted Boltzmann machine (RBM) for ECG classification. Xiong et al. [9, 10] employed denoising autoencoder (DAE) and stacked contractive denoising autoencoder for ECG denoising [8], respectively. Zhou et al. [11] chose a stacked sparse autoencoder (SAE) to extract ECG feature for classifying and the level of accuracy achieved by this work shows derivable benefits over the traditional methods that require wavelets transform to perform ECG classification.

In terms of the heart illness automatically diagnosis auxiliary by the ECG recognition, some works mentioned above do not meet the necessary requirements because most studies focused on the arrhythmia distinguishing problems. Nevertheless, many heart diseases have close relationship not only with the rhythms of itself but also with the other features such as the length of the ST segment and the amplitude of P wave on the ECG waveforms. Additionally, there are rarely generative models to be used for ECG recognition. The contributions of this chapter include two aspects: (1) instead of using ECG signals on a cardiac period between two start points at P waves, we propose a new method for intercepting ECG segments between adjacent two R peaks and (2) we use variational autoencoder (VAE) model as an analysis tool to recognize different ECG signals by focusing on the variation of tiny distortion.

This chapter is organized as follows. Section 2 briefly describes autoencoder and its variants. Section 3 introduces the variational inference and variational autoencoder in detail. ECG preprocessing and classifying schema is proposed in Section 4. Our experiment results and discussions are shown in Section 5. Finally, Section 6 concludes.

2. Autoencoders and variants

Variational autoencoder has close relationship with autoencoder. An autoencoder is a neural network that consists of encoder and decoder. Encoder maps its input into representation and decoder reconstructs the representation back into the input, that is, perfect autoencoder can resemble the training data approximately by forcing to prioritize those aspects of the input that are helpful to resembling and discard the others. In this regard, the autoencoder learns the useful properties of training data. Comparatively, VAE shares the same character with AE besides some specialties of its own.

2.1. Autoencoder and regularized variants

Autoencoder can be used to get useful features from the encoder output. Generally, in the view of the feature dimension, autoencoder falls into two categories: undercomplete and overcomplete. Undercomplete means the dimension of feature is less than that of the input and more salient features could be learned well in this scenario. Conversely, in the case of overcomplete, the dimension of feature is greater than that of the input and more sparsity features might be drawn in this setting. Additionally, the objective function is another core topic for an autoencoder. It is designed to make the autoencoder have capabilities such as linear regression or logistic regression, which limit the model to some useful properties of the training data. The general form of the objective function can be depicted as follows:


where Xis the training data for a given autoencoder. θ=WebeWdbdare the parameters of the model and αis a nonnegative hyperparameter that controls how much of the penalty term Ωto the relative to the standard objective function J. Numerically, setting αto 0 means not any regularization and larger values of αresult in more regularization. Conceptually, autoencoders with penalty term is usually called regularized autoencoder that is encouraged to have small derivative of the representation, which leads the convergence faster than those that have not any regularization during the training time.

Varied forms of regularizer terms make the autoencoder have different properties and bring us different variants of regularized autoencoder. These variants include primarily sparse autoencoder (SAE), denoising autoencoder (DAE) [3], contractive autoencoder (CAE), and variational autoencoder (VAE). Theoretically, VAE combines variance inference (VI) and neural networks. As a generative model, one of the prominent successes of VAE is that it realizes effective random sampling using back-propagation (BP) technology. This will be described in detail in Section 3.

Different from VAE, SAE makes majority of the neurons in its hidden layers be inactive since the active functions on these neurons are feasibly saturated for most input. This results in the sparsity of features, where many of the elements of the features are zero (or close to zero). In the view of mathematics, the sparsity of SAE is accomplished by the penalty term KLp~p, where p~is the given sparsity value. Parameter p will be adjusted gradually to p~in the training stage and achieve satisfactory sparsity. Analogous to SAE, CAE [4, 26] yields the specialized contractive properties by the penalized term—a Jacobin matrix that is consisted of the partial derivatives of the decoder active functions to input vectors. Then the input perturbations can be resisted during training time. Consequently, neighborhood of points in samples is encouraged to map into a smaller area, which can be thought as the capability of contracting for CAE. The motivation of DAE is to be insensitive to noise. Instead of adding an additional penalty term to the object function, DAE is trained by the noise-corrupted data x~(x~=x+βτ) [27, 30]. DAE yields great success in many cases especially in manifold assumption. As the corrupted data x~lie farther away from the manifold than the uncorrupted ones, DAE tends to take those points that are farther from the manifold to near. The larger distance from the manifold, the bigger step DAE takes to the manifold.

Generally, these autoencoders share some properties. DAE and CAE are able to learn the manifold structure of the samples. Simultaneously, SAE and CAE have the similar sparsity character on their representation. Nevertheless, the implementations of these autoencoders are quite different. For example, DAE reaches the goal by using the noise-corrupted data to train the structure to learn the proper parameters that can reconstruct the original samples without any noise. Comparatively, CAE takes Jacobian matrix as part of the loss function and encourages robustness on the representation by contracting the samples during the training process.

3. Variational inference and variational autoencoder

As the central problem in inference analysis, posterior distribution computation is facing two computing challenges: marginal likelihood computation and predictive distribution computation. Both of them are intractable since they often require computing high-dimensional integrals. Therefore, approximate inference approaches such as Gibbs sampling based on Markov chain Monte Carlo (MCMC) principle are appealing. However, Gibbs sampling and its variants are often restricted from some applications for their inefficiencies especially in the high-dimensional scenario. This awkward situation has not been changed until the VAE was proposed theoretically [36]. To get an understanding of a VAE, we will first start from the relevant bases including variational inference (VI), evidence low boundary (ELBO), mean field, and Kullback–Leibler (KL) divergence.

To describe the problem mathematically, let X=x1x2xNbe a set of N observations and Z=z1z2zmbe the m latent variables. P(Z,X;θ) denotes the joint distribution of Xand Zgiven the parameter θof the model. PXZand PZXare called the likelihood of Zand the posterior distribution of X, respectively.

3.1. Variational inference

Theoretically, the motivation of variational inference [33, 35] is to find a feasible distribution to approximate the desired posterior distribution that is intractable. To measure how closeness of these two distributions are, Kullback–Leibler (KL) divergence [34] is introduced. Let PXand QXindicate two different distributions of the continuous random variables X, their KL divergence is defined as:


Intuitively, KL divergence is nonnegative and monotonically decreasing to the similarity of the distributions, that is, the more similar of the two distributions, the smaller the KL divergence value is. The identity equals zero when QXis the same as PX. However, the KL divergence is non-symmetrical as KL(QPKLPQ. The definition indicates implicitly another two properties: the KL divergence equals zero when QXgoes infinitively to zero regardless of PXand rises asymptotically infinity as PXbecomes zero. Hence, we can approximate the distribution P(X) for Q(X) by minimizing KLQXPX.

3.1.1. Evidence lower boundary

In the context of Bayesian statistics, “Evidence” is an alternative term used for the marginal likelihood of the observations. Formula (3) reveals the relationship between KL divergence and the logarithm of the evidence PX. The difference between them equals the expectation of logpXZlogqZ, which is called the evidence lower boundary (ELBO). As the KL divergence is nonnegative, then we have the evidence lower boundary as formula (3). Jordan et al. [1] got the same result originally using the Jensen’s inequality. Formula (3) shows literally the name of ELBO. We may define the expectation of logpXZlogqZas LQ, a function of distribution of QZ:


Intuitively, maximizing ELBO is equivalent to minimizing the KL divergence. As the KLQZPZXdecreases to zero, it is necessary to make the posterior distribution PZXshare the same distribution with QZ. Hence, we can use QZto approximate the posterior distribution PZXby maximizing ELBO, which can be realized by optimizing the objective of Lqas formula (4), finding an optimal distribution QZwithin a specifying family Qof densities over the latent variables. Expectation maximization (EM) algorithm [2] is one of the successful approaches that were designed for finding the optimal solution QZwithin the family Q. It alternates iteratively between expectation step (E-step) where the posterior distribution PZXθis calculated and then, maximization step (M-step) where the expectation of the complete-data likelihood with respect to the posterior distribution PZXθoldis maximized by optimizing the parameters θnew. Then updates the parameters θoldwith θnew:


3.1.2. Mean field

To simplify the optimization problem of ELBO, it is necessary to make assumption on the family Q,as the selection of the family affects impressively on complexity of the optimization algorithm for the problem. This assumption focuses on the way that how to factorize QZas:


where QiZidenotes the individual factors that are mutually independent over the latent variables of the model. According to the chain rule of probability, the joint distribution PXZcan be decomposed as:


Then, the ELBO can be written as Eq. (7):


where logPXis constant with respect to QZ. Then, maximizing ELBO is equivalently maximizing the last summation term. Furthermore, we can derive out the optimal solution Qby Lagrangian multiplier method:


Formula (8) indicates that the factors are all proportional to the exponentiated log the joint distribution except the ithvariational factor. This is the gist of the coordinate ascent variational inference (CAIV) [37] as well. However, as the ELBO is not a necessary convex function, there is no guarantee that the solution Qis a global optimum.

3.2. Variational autoencoder

As a deterministic model, general regularized autoencoder does not know anything about how to create a latent vector until a sample is input. Conversely, as a generative model, variational autoencoder (VAE) [36] emerges as a successful example of combination of variance inference and neural network. VAE forces the latent vector following some kind of distribution. These characters not only encourage the properties of the general regularized autoencoders but also expand some additional properties. For example, VAE can generate some data points even without any encoding input. It is the specialty of VAE that differs from the other regularized autoencoders. To explore VAE further, it is necessary to understand those complicated ideas such as the neural network structure, the loss function, and the optimization algorithm.

In the view of the hierarchy, the neural network structure of the VAE is mainly composed of three parts. The first part is the encoder, which is used to encode the signals from the input layer. The second part is the decoder, which is located in the right side as shown in Figure 2. The third part is the sampling unit located in the middle of the other two parts. Except for the encoder and the decoder which are similar to that of the traditional autoencoder, the additional sampling unit is responsible for sampling from the latent variables spaces.

Another issue about how to train the structure is the loss function as shown in formula (9), which is essentially the same as the negative LQin formula (7). In the view of training, the losses of a VAE come from two aspects: the first part is from the neural network that measures how much the difference between the reconstructed data and the original input. This part encourages the decoder to learn to reconstruct the input. Otherwise, the value of this part will become even larger that will increase the total loss value finally. The second part comes from the KL divergence that indicates how much close of the encoder’s distribution QZXand the latent variables distribution. This part can be taken as a regularizer as that of the traditional autoencoder. It forces the encoder’s distribution QZXigo as close to the latent variables distribution PZas possible by minimizing KL divergence of them. In other words, if the encoder outputs representations are different from the specified distribution, then the regularizer term will penalize the loss function. Otherwise, the penalty will vanish away:


The last idea for VAE is the way that how to minimize the loss function of Eq. (9) as working on the neural networks, where the algorithms based on gradient decent are popularly adopted. Comparatively, it is feasible to compute the first term in the Eq. (9) as the expectation indicates the reconstruction difference and we can calculate it by the mean squared error between the output of the encoder and the decoder, as similar to that of the traditional autoencoders. However, it is more difficult to compute the second KL divergence directly as PZand PXiZare all intractable. Fortunately, An effective solution was proposed by Kingma et al. [36] on the assumption that QZXifollows a normal distribution QZXi~NZθ, where θ=μ1Σ1and μ1and Σ1are the parameters of the mean and the variance, respectively. For the simplicity, here we assume PZ=NZ0I, where Iis a unitary diagonal matrix. The advantages of this choice make the computation of the KL divergence manageable. We can compute it in the closed form as:


Dis a constant value that is only relevant to the dimensionality of the distribution.

Additionally, to train a VAE neural structure, the gradient decent should be focused on when error back propagates through the sampling layers. However, we cannot derivate the loss function over the distribution QZXidirectly as the distribution is a non-continuous operation and has no gradient. To clarify the problem, suppose we can take the derivation of JVAErespect to QZXi, then we get the gradient expression as following:


It is clear that the gradient depends not only on the decoder’s distribution PXiZbut also on the encoder’s distribution QZXi. Except for the non-continuity of the encoder’s distribution, there is no stochastic unit with the neural network. Kingma et al. [36] presented a method named “reparameterization trick” to solve the problem successfully. Instead of drawing from the encoder’s representations directly, sampling unit generates µand σat first by sampling from the input X. Given μXand σX, we can do sampling from NμXσ2X, and then compute Z=μX+σXε, where ε~N0I.Consequently, given a fixed Xand ε, LVAEbecomes continuous and deterministic for Pand Q, which means that derivation of LVAEover Q is computable. Then those algorithms based on the gradient descent (GD) can be effective on VAE neural networks. Comparing to the time-consuming Gibbs sampling methods, algorithms based on GD are much more effective and efficient.

4. ECG preprocessing and enhancement

In this section, we introduce our method on ECG preprocessing and enhancement. The task in this procedure is to split the ECG waves into segments according to the cardiac cycle [28] and then take them as data points for training our models. As described in Section 1, QRS complex is responsible for the activities of ventricular depolarization and repolarization, it has morphologically higher amplitude and sharper peak than other components such as P-wave and T-wave. Therefore, it is much more convenient to detect and locate Q peaks (or R, S peaks) than any other components in these ECG segments. Algorithm 1 describes the procedure of how to split ECG waveforms in detail. The templates selected in algorithm 1 are produced by the contours of the most ECG R wave peaks.

The critical step in Algorithm 1 is how to evaluate the similarity between the selected area on the ECG waveform and the given template. Generally, the mean squared error (MSE) is usually adopted in some ECG recognizing applications. However, the main disadvantage of this method is that it is time-consuming to align the selected area with the given template. For example, there are two pictures with the same curve, the similar value of the pictures may be definitely tiny if the template aligns extremely well or a very large as they do not cover each other at all. Another reasonable approach named the correlation coefficient is being currently used [21, 26]. Instead of computing directly the difference between the ECG waveform and the template as the MSE method, it solves an optimal problem that minimizes the sum of the squares of the offsets of the selected ECG data points to the corresponding points on the template.

Figure 2.

Neural network structure of VAE. It consists of three parts: The encoder, the decoder, and the sampling unit. The encoder (indicating by number 2) and the decoder (indicating by number 6) are all fully connected multilayers neural networks. The sampling unit consists of the mean generator (indicating by number 3), the standard deviation generator (indicating by number 4), and the latent vector generator (indicating by number 5). The structure of the sampling unit lies on the assumption of Z∽Νμσ2.

We introduce a parameter hstepfor the length of the segment of ECG waveforms. It is important to keep hsteplie in a proper range. Otherwise, there are more than one R peaks or none in the segment when the hstepis out of the range. To avoid the awkward situations, there is a trick that let the hstepbe proportional to the distance between two adjacent peaks and rather less than it, that is, hstepsampling rateheart rate. For instance, suppose sampling rate is 250 Hz and heart rate equals 75 times per minute, then hstep200. As the heart rate is not a constant during the sampling procedure, then distance can be calculated by the inequation. For this reason, in all of our experiments, the distance is set empirically as the average of that of previous three cardiac periods. The searching step can be initialized as a constant value as there are no any variations on the vertical directions. We keep the vstepequaling 1 in this chapter.

Algorithm 1. ECG R wave peak location algorithm.

1: input: ECG data file name pa

2: initial: set segment length hstepand searching step vstep, empty ECG data buffer ecg_vMand R wave peaks array ecg_posm;

3: read ECG data into ECG data buffer ecg_vfrom ECG data file ecg_data_file;

4: calculate segment number N=L/hstepwhere L=length(ecg_v);

5: for each segment sin N

6: let search range in vertical direction equal start position;

7: while notbfindand tp>0and bp>0do

8: Look for R wave peak in small area of rplpin range of tpbpusing template;

9: ifisfindpeak// to decide whether find the target.

10: Save the result to ecg_pos;

11: break;

12: else

13: Update range of tpbpfor next iteration;

14: end if

15: end while

16: update rpand lprespectively;

17: end for

18: return ECG data array v, R wave peak array ecg_pos;

Figure 3 shows ECG waveform (top picture) and the R wave peak detection and location (bottom picture). The ECG data are adopted from the American Heart Association (AHA) database on physionnet website [24], which consisted of 80 two-channel ECG recordings and digitized at 250 Hz with 12-bit resolution over a 10-mV range. The recordings in the database are divided into eight classes according to the highest level of ventricular ectopy present.

Figure 3.

ECG waveform and R-wave peak location adopted from AHA database (top). The bottom picture shows the result of R peaks detection and location for the ECG waveform in the top picture.

5. Experimental results and discussion

In this section, we evaluate the performance of VAE and other autoencoder variants described in Section 2.

5.1. ECG signals for multi-classification

To demonstrate the performance of our models on dealing with ECG signals, it is necessary to abstract an intact ECG signal in a cardiac period, which consists of features such as P-wave, QRS complex, and T-wave as described in Section 4. Then detection and location of P-wave becomes more critical step as every cardiac period of ECG signal starts at P-wave. However, as the amplitude of P-wave is smaller than that of QRS complex, and there are many kinds of noise on ECG singles. These factors enlarge the difficulties of abstraction of ECG signals in a cardiac period.

Our solution to alleviate this problem is offered by the fact that it is more feasible to locate R-peaks than to locate the start position of a P-wave. Instead of focusing on the cardiac period, we separate one cardiac period into two semi-cardiac periods at R-peak and then take two parts of the adjacent ECG signals together to form a new period ECG signal, which consists of the second part of the previous cardiac period and the first part of the next one. Figure 4(a) shows an example of an ECG signal that is composed of two parts of the adjacent semi-period. Additionally, in the view of information, there is no any feature lost in this separation.

Figure 4.

An example of ECG signals that is composed by two parts of the adjacent semi-period. (a) Single period ECG signal between the adjacent R peaks derived by algorithm 1. (b)–(d) are different class ECG signals derived by making a small segment of the same ECG signal zero on different position. The difference is marked by red rectangle area.

The original ECG recording from ECG database contains several hours of ECG data, and it is unfeasible to train our models using these original ECG data directly. To train our models well, 30,000 ECG signals are abstracted completely from three different ECG databases. The AHA ECG database, the APNEA ECG database [24], and CHFDB ECG database [24]. Additionally, for ECG data augmentation [32], these ECG data are divided into three different groups according to their source databases and each group has 10,000 ECG signals. On this basis, we augment the ECG data by zeroing a small segment on ECG signals and different positions we selected to zero correspond to different class labels. Figure 4(b)(d) are three examples of our augmentation. Concretely, the labels of Figure 4(b)(d) are 3, 4, and 5, respectively. (We use numbers 1–8 as eight labels for different class of ECG signals in all of our experiments. We add labels for the different classes of ECG signals, not for training our models but for simplifying evaluating the accuracy of our models in testing process.)

To evaluate the properties of our models on denoising for ECG signals, different type noise on different level are added into the original ECG records. These noise include Gaussian noise, salt and pepper noise, and Poisson noise. Moreover, to imitate baseline wandering noise, different amplitude sinusoidal signals are superimposed on the original ECG signals. The coefficients of the sinusoidal signal are 0.01, 0.05, and 0.1, respectively in all of our experiments. Figure 5shows the ECG signals polluted by different noises. Figure 5(a) and (c) show the augmented ECG signals without adding noise except for some one polluted during sampling. Figure 5(b) shows ECG signal polluted by the sinusoidal noise and the Gaussian noise. The coefficients for the sinusoidal and for the Gaussian are all 0.01. Nevertheless, the coefficients for the sinusoidal and for the Gaussian are 0.05 and 1 as shown in Figure 5(d). The mean and variance of the Gaussian noise are 0 and 0.01, respectively.

Figure 5.

Single periodic ECG signal polluted by different noises. (a) Original ECG signal without adding noise. (b) ECG signal of (a) with Gaussian noise. (c) Original ECG signal with a segment- flatness. (d) ECG signal of (c) contaminated by Gaussian and sine wave noise imitating basing line wander.

5.2. Recognization of ECG signals

After ECG signals have been abstracted completely by the methods described in Section 5.1, they are used to train VAE model. To compare the effect of the complexity of ECG data on our model, all ECG data are divided into two groups. The first one contains only two classes of ECG records, normal or abnormal. (We call this group as BI dataset) The normal ECG records mean those ones that contain all normal features as shown in Figures 4 and 5. The abnormal ECG records in BI dataset contain at least one abnormal feature such as prolonged PR interval, enlarged P-wave, and absence of T-wave. The second group contains 8 classes of ECG records, each of them are produced by zeroing a small segment of ECG data as described in Section 5.1 (We call this group as MI dataset). In order to verify the performance of the VAE model on ECG signals, the parameters of the model are shown in the Table 1. Table 2 shows the performance of the VAE model on recognizing these ECG signals from both BI and MI datasets. The results clearly show that the accuracies of recognition are higher than 95% for MI recorders and even more than 97% for BI recorders. In the view of the data complexity, the result is reasonable because the complexity of MI is much higher than that of BI.

Parameter nameValueComment
Input size400Equal the length of signal
h1100First layer of the encoder
h210Second layer of the encoder
z-mean2Mean of the sampler
z-variance2Variance of sampler
Learning rate0.01
FunctionLog-sigmaLogarithmic sigma
Batch size100Randomly select samples from the dataset

Table 1.

Parameters of VAE model.

DBRecordECG no.Sample no. (103)Class no.Precision (%)Error (%)

Table 2.

Performance evaluation of VAE model on three ECG databases.

Advantages of VAE model on recognization ECG signals can be further shown by comparasion with other autoecoders such as CAE,DAE, and SAE mentioned in Section 2. In order to make the comparison be fair and reasonable, all of the parameters of the model are the same exept for that of the sampler in VAE model (the values of the parameters can be seen in Table 1). Moreover, the ECG records of BI and MI from ahadb database are used to train and test all the models. Figure 6 shows the accuracy of the models on recognizing the ECG records. Both (a) and (b) in Figure 6 take the rate of the representation to the input on size as variable. Figure 6(a) takes the BI ECG records from the ahadb as the datasource for the models. Conversely, the MI records from the same dataset are selected in Figure 6(b). It is clear that the accuracy of the VAE model is higher than that of the other models on both BI and MI ECG records, which is at leat 95% on BI records and no more than 90% on MI records. Meanwhile, both figures indicate a fact that the proper rate for the accuracy on the same condition is at 1. The accruy is near 80% when rate falls at 0.5. Simlarly, the accury drops sharply as the rate rise up. Therefore, there is no necessary for representation of ECG signals to compress (rate < 1) or stetch (rate > 1) themselves.

Figure 6.

Accuracy of different models on recognition ECG signals from aha database. (a) Accuracy of the models on recognizing ECG signals from BI dataset of ahadb ECG database. (b) Accuracy of the models on recognizing ECG signals from MI dataset of ahadb ECG database.

Figure 7 demostrates the performance of the VAE model on denoising for ECG records. The method of adding noise into ECG records in our experiment can be seen in Section 5.1. The coefficient for sinusoidal is 0.05 and the mean and the variance of Gaussian noise are 0 and 0.05, respectively. For the goal of comparison, we take four groups of ECG records (BI, noisy BI, MI and noisy MI) as dataset for the VAE model.

Figure 7.

The performance of the VAE model on denoising for ECG records.

The results show that the accuracy under noisy condition is similar to that of without noise on the same dataset. This means that performance of VAE model on ECG recognition is robust to some kinds of noises.

6. Conclusions

In this chapter, we develop a VAE model to recognize a tiny distortion on ECG signals. First, we analyze the characteristics of the features of the ECG signals, which are closely related to ECG components such as P-waves, QRS complex, and T-waves. Second, we explain an algorithm that deals with the location of R peaks. On the basis of the algorithm, we abstract a segment of ECG signal between two adjacent R peaks from three real-life ECG databases. Finally, we train our models by using the selected ECG signals. The results of our experiments demonstrate that the proposed VAE model can be used as an effective tool to automatically recognize ECG signals. Especially, this model is robust to some kinds of noises that are usually produced during the sampling procedures. Furthermore, as a generative model, VAE is a recently established based on the neural networks. The important characteristic of the model is that it can be used in the scenario of the unsupervised learning [31]. Simultaneously, with the emergence of the large amount of unlabeled ECG records and the requirement for real-time diagnosis of heart illness by automatic recognition ECG signals, our method in this chapter can offer a solution to these problems.

In the view of the clinic, future work should put more energy on setting up the set of features of ECG signals, especially, the relationship between the features and the heart diseases. Additionally, because of the physiological characteristics of heart, a single ECG wave may not accurately represent the entire situation of the heart, it is therefore desirable to obtain all of ECG signals from all of 12 or 18 leads. For example, if an anterior wall myocardial infarction happens. Feature of ST-segment elevation reciprocally changes on the ECGs from the leads of I, aVL, and V1–V5. Therefore, the general implementation of VAE model to such clinic situations warrants further study.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Shaojie Chen, Zhaopeng Meng and Qing Zhao (August 29th 2018). Electrocardiogram Recognization Based on Variational AutoEncoder, Machine Learning and Biometrics, Jucheng Yang, Dong Sun Park, Sook Yoon, Yarui Chen and Chuanlei Zhang, IntechOpen, DOI: 10.5772/intechopen.76434. Available from:

chapter statistics

351total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

A Survey on Methods of Image Processing and Recognition for Personal Identification

By Ryszard S. Choras

Related Book

First chapter

Automated Generation of User Interfaces – A Comparison of Models and Future Prospects

By Helmut Horacek, Roman Popp and David Raneburger

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us