Recognition of Cardiac Arrhythmia by Means of Beat Clustering on ECG-Holter Recordings

The development of bio-signal analysis systems, mostly, has become a major research field due to technological progress in signal processing. Electrocardiography (ECG) had been amongst the most studied type of bio-signals for several decades. Research on this type of signals has become an important tool for the diagnosis of cardiac disorders. Because of its simplicity, low cost and a non-invasive nature it is still widely used despite newer available techniques. This chapter deal with the problem of long-term recording analysis corresponding to ECG signals of Holter recordings. The motivation for studying this issue focuses on the development of methods for cardiac arrhythmia analysis to identify particular events occurring at specific periods of time. Such events are associated to cardiac disorders that may become potentially harmful to the patient. The developed methods are aimed at further building up of specialized equipment that will provide clinical monitoring for both the patient and the specialist, as well as the support for real time diagnosis.The above mentioned will decrease mortality rates regarding heart problems specially for people living in rural areas. This technology will benefit them to have access to a quicker and efficient specialized medical diagnostics. This chapter focuses on analyzing two major aspects of Holter recordings: The first one corresponds to the large amount of data stored in such recordings, reaching up to 100.000 heartbeats for its evaluation, which becomes a hard task for the specialist to assess the information and to decide what heartbeats are important for a determined analysis. There are cases where only a few beats allow to identify a certain pathology or to prevent deadly diseases. Therefore, a detailed analysis of the complete record is needed. The second aspect corresponds to the intrinsic characteristics of the signal, such as heart rate variability, morphological variety, among others. They may result from problems in the cardiac system or the patient’s physical and physiological characteristics. In addition, the electrical nature of ECG signals and its transmission to electronical devices increase the noise sensitivity, which can completely alter the diagnostic information contained in the signal, changing the training processes in the identification of cardiac pathologies. 12


Introduction
The development of bio-signal analysis systems, mostly, has become a major research field due to technological progress in signal processing. Electrocardiography (ECG) had been amongst the most studied type of bio-signals for several decades. Research on this type of signals has become an important tool for the diagnosis of cardiac disorders. Because of its simplicity, low cost and a non-invasive nature it is still widely used despite newer available techniques. This chapter deal with the problem of long-term recording analysis corresponding to ECG signals of Holter recordings. The motivation for studying this issue focuses on the development of methods for cardiac arrhythmia analysis to identify particular events occurring at specific periods of time. Such events are associated to cardiac disorders that may become potentially harmful to the patient. The developed methods are aimed at further building up of specialized equipment that will provide clinical monitoring for both the patient and the specialist, as well as the support for real time diagnosis.The above mentioned will decrease mortality rates regarding heart problems specially for people living in rural areas. This technology will benefit them to have access to a quicker and efficient specialized medical diagnostics. This chapter focuses on analyzing two major aspects of Holter recordings: The first one corresponds to the large amount of data stored in such recordings, reaching up to 100.000 heartbeats for its evaluation, which becomes a hard task for the specialist to assess the information and to decide what heartbeats are important for a determined analysis. There are cases where only a few beats allow to identify a certain pathology or to prevent deadly diseases. Therefore, a detailed analysis of the complete record is needed. The second aspect corresponds to the intrinsic characteristics of the signal, such as heart rate variability, morphological variety, among others. They may result from problems in the cardiac system or the patient's physical and physiological characteristics. In addition, the electrical nature of ECG signals and its transmission to electronical devices increase the noise sensitivity, which can completely alter the diagnostic information contained in the signal, changing the training processes in the identification of cardiac pathologies.

Cardiac arrhythmias
In general, the pathologies observed using the ECG are divided into three categories: 1. Heart rhythm disturbances, or arrhythmias.
2. Dysfunctions of blood perfusion in the myocardium or cardiac ischemia.
3. Chronic disorders of mechanical structure of the heart, such as left ventricular hypertrophy.
We will describe the characterization and identification of the first type of pathologies above mentioned. The methods are developed over the entire QRS complexes that are associated with ventricular electrical activity. They contain clinic important information, for example their morphology has significant changes in abnormal ventricular heartbeats. QRS complexes are also present in most of the heartbeats and their signal to noise ratio is the highest among all waves present in the signal.

Not imminently life-threatening cardiac arrhythmias
Broadly speaking, arrhythmias can be divided into two groups: The first group includes ventricular fibrillation and tachycardia, which are life-threatening disorders and require immediate therapy with a defibrillator. Identification of these arrhythmias and successful detectors have been developed with high sensitivity and specificity degree. However, this study just analyzes the second group, which includes arrhythmias that are not imminently life-threatening but may require therapy to prevent further problems. According to the AAMI standard (ANSI/AAMI EC57:1998/(R)2003) as is described in De-Chazal et al. (2004), the following arrhythmia groups shown in Table 1 are of interest to be examined: Normal-labeled heartbeat recordings (termed N), Supraventricular ectopic beat (Sv), Ventricular ectopic beat (V), Fusion beat (F), as well as unknown beat class (Q) are taken into consideration. One or more classes of such arrhythmias can be present during Holter analysis. The MIT/BIH arrhythmia database Moody & Mark (1982) is one of the most representatives, at a scientific level, to evaluate the design of algorithms regarding the analysis of cardiac arrythmias. The database contains several types of beats within each group of arrhythmias recommended by the AAMI, for example, in the Normal group we can find the following arrhythmia types: Left bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), Atrial Escape (AE) and junctional Nodal Escape (NE). The Table 1 shows a classification of arrhythmias previously mentioned.

Group of arrhythmias N
It corresponds to any beat that does not belong to Sv, V, F or Q classes (Table 1), as shown in Figure 1. Bundle Branch Block (BBB) is a disorder in the conduction of electrical impulses to the ventricles Braunwald (1993). The electrical impulse conduction to the ventricles is carried out via the His bundle and its divisions: right and left bundle branch. When one of these branches is altered, the electrical impulse spreads throughout the ventricular muscle itself rather than spreading in the Purkinje system. This reduces the conduction velocity. In case there is blockage in one of the branches, the complex will take more time than normal Guyton & Hall (n.d.). Branch blocks also originate morphological changes (R-prime) within the QRS complex. In the LBBB, cardiac depolarization spreads much faster in the right ventricle compared to the left ventricle. Therefore, the left ventricle remains polarized longer than the right one. This is observed in left precordial leads (V 5 and V 6 ) through an extension and a morphological change (RR')oftheQRS. Besides, in the RBBB, the impulse conduction through the right ventricle is delayed regarding the left one, in this way, the QRS is prolonged and generates a morphology known as rsR observed in the right precordial leads (V 1 and V 2 ). The BBB does not necessarily mean heart disease, since it can occur also in healthy patients. It may have a good prognosis and may not progress to a higher degree block Micó & Ibor (2004). However, in some studies Brugada et al. (1998);ginsburg et al. (2006); Pabón (2001) it was found that the presence of RBBB is correlated with arterial hypertension, heart failure, coronary disease, pulmonary embolism, and increased mortality and the presence of LBBB increases the risk of coronary heart disease, mortality and ventricular myocardial infarction Balaguer (n.d.), Li et al. (n.d.). Thus, it is necessary to detect such arrhythmias because of the prognostic value they have. The AE are characterized by occasionally appearing and interrupt the pace of the rate base. The most common are those identified ahead of that cadence or extrasistoles and those delayed or escape heartbeats. Depending on the morphology of the waves,it will be possible to know the origin of the heartbeats (atrial, nodal or ventricular) and the type of the existing AtrioVentricular (AV )conduction.

Group of arrhythmias type Sv
It includes both, atrial and supraventricular premature beats as well as their variants. An example is illustrated in Figure 2. An Atrial Premature Beat (APB) is also called Atrial Ectopic Beat (AEB) or Premature Atrial Contraction (PAC). It is an extra heartbeat caused by electrical activation of the atrium from an abnormal site before a normal heartbeat happens. Generally, APBs occur in healthy people that rarely have symptoms. It is common among people who have lung problems, specially in adults instead of young people. Recent studies on risk factors for stroke have shown that frequent APB heartbeats are an independent risk factor for suffering a stroke Rodríguez- . Although, APBs are often considered a benign disorder, it has been shown in clinical practice that frequent APBs could be an early symptom of heart failure and may precede atrial fibrillation. Frequent APBs can be an indicator for other risk factors, such as severe hypertension, asymptomatic atherosclerosis, structural abnormalities causing stroke, calcified mitral valve or enlargement of the left atrium. These risk factors might increase in the formation of thromboembolism Engström et al. (2000). Experts have usually analysed Holter recordings for detecting APB beats due to their frequency and they have found that detecting them is troublesome because of their nature. They have shown similar morphological characteristics in contrast to normal heartbeats which accounts for the majority. Particularly, ventricular depolarization and repolarization have displayed similar morphology between QRS complexes and T waves. Atrial depolarization has also been used for identifying such beats, it means analysing PR intervals and P waves. Nevertheless, there may exist beats that do not have P waves, since beats overlap with a previous T wave which results in a slight increase of its amplitude. Heart rate variability (HRV) is another more effective technique used to detect APB heartbeats. From a physiological point of view, before there is a completion of ventricular repolarization, there is a premature excitement in the atrial area different from the sinus node. This fact results in a premature beat. Besides, there will be a delay in the activation of the sinus node for the next cardiac cycle, triggering both an increase and a later decrease of the heart rate. The HRV's drawback is that if there is continuous premature beats, the pattern just described disappears. In some cases this is interpreted as normal pace beats reducing possibilities to succeed in the detection of APB beats through this technique.

Group of arrhythmias type V
A ventricular premature beat (ventricular ectopic beat, premature ventricular contraction) is an extra heartbeat resulting from abnormal electrical activation originated in the ventricles before a normal heartbeat occurs. See Figure 3. The main symptom is a perception of a skipped heartbeat. ECG is used to diagnose such condition. In some avoiding stress, caffeine, and alcohol may be usually enough to treat this condition. Ventricular premature beats are more common in adults. This arrhythmia may also be caused by physical or emotional stress, intake of caffeine (in beverages and foods) or alcohol, or use of cold or fever remedies containing drugs that stimulate the heart, like pseudoephedrine. Other causes include coronary artery disease (especially during or shortly after a heart attack) and disorders that cause ventricles to enlarge, like heart failure and heart valve disorders. VE beats are hardly found in ECG of 12-leads, therefore Holter recordings are used for their detection Holter (n.d.). VEs can be identified following certain criteria of morphological features of the ECG Dave et al. (2005); Friedman (1989): • QRS duration: It is higher than the average QRS dominant. It is due to an abnormal activation of the ventricle.
• Different morphologies in the QRS complexes are present: There are not preceding P waves prematurely. T wave is often found in the opposite direction of R wave. If heartbeats originated from a single focus, all the VPC would have the same morphology, although different from the normal one.
• RR intervals: They are shorter than RR average and later a complete compensatory pause can be observed in the heartbeat.
• VEs originated from the left ventricle normally produce heartbeat patterns of RBBB and the ones originated from the right ventricle normally produce heartbeat patterns associated with LBBB.
A ventricular escape beat is another type of ventricular extrasystole. It is a self-generated electrical discharge initiated by the ventricles that causes their contraction. It has been stated that the heart rhythm begins in the atria of the heart and is subsequently transmitted to the ventricles. The ventricular escape beat is followed after a long pause in ventricular rhythm to prevent from a possible cardiac arrest. It indicates a failure of the electrical conduction system of the heart to stimulate the ventricles (This would lead to the absence of heartbeats, unless ventricular escape beats occur). Ventricular escape beats happen when the rate of electrical discharge reaches the ventricles and they in turn alter the base rate. An escape beat usually occurs around 23s after an electrical impulse has failed to reach the ventricles.

Group of arrhythmias type F
Fusion heartbeats develop when either the atria or the ventricles are activated by two simultaneously invading impulses and they can be assessed in P wave or QRS complex of the ECG. An atrial fusion beat results when: the sinus beat coincides with an atrial ectopic beat, two atrial ectopic beats coincide, or an atrial or sinus beat coincide with retrograde conduction from a junctional focus. A ventricular fusion beat results when: a ventricular beat coincides with either a sinus beat, a ventricular ectopic beat, or a junctional beat. A couple of examples are shown in Figure 4.

Group of arrhythmias type Q
Unclassified heartbeats (heartbeats Q) correspond to heartbeats that do not contain relevant medical information, mainly due to some external conditions as artifacts, electrode disconnection, saturation of acquisition system, or heartbeats by pacemakers. In some systems, it is necessary to isolate this kind of heartbeats from the training space in order to give an adequate diagnosis. Normally, These heartbeats are considered as outliers because of their low importance in the diagnosis. Figure 5 shows two types of Q heartbeats: Paced beat and Unclassified beat.

Ambulatory electrocardiography
During the last two decades, acquisition systems for physiological signals have been developed and improved. It has been stated that they are lighter, smaller and capable of recording multiple signals up to 48 hours. These systems also called ambulatory record systems are used in ECG analysis to detect infrequent arrhythmias or transient abnormalities in heart function often associated to everyday life stress, besides transient ischemic events or silent myocardial ischemia. This type of disorders cannot be detected in short-time ECG or 12 leads ECG recordings. Holter recorders have been used to detect this type of abnormalities. Nowdays, signals are recorded in flash-type semiconductor memories, which can be transferred to a workstation for further analysis J. Segura-Juárez et al. (2004). On the other hand, the increase of health costs makes an urgent need to develop ambulatory systems to reduce the number of patients going to hospitals. Therefore, it is necessary the design of a portable, low cost, high performance and simple system that allows an automated analysis and diagnosis. Such system has to fulfill certain requirements such as integrate various data analysis techniques, for instance: signal processing, pattern recognition, decision making and human-machine interaction. The existing portable devices have improved in size and performance due to technological reasons, the need to record the signal over a specific period of time, which is constrained by the storage capacity of the devices. For example, a typical signal of 24 hours consists of approximately 100.000 heartbeats that can be morphologically grouped (clustered) into a much smaller number of classes. Most of the classes where the heartbeats have a typical pattern, it is enough to know the number of heartbeats and a representative template of the morphology for grouping them, but in the time span where cardiac activity presents anomalies or symptoms of illness, the whole recording is needed. This is only possible if the portable device for analysis is able to do both record the signal, and process it. Some technical issues with regard to the ECG processing have been discussed, such as the problem of the wide variability into signal morphology, not only among patients, but also due to patients movements, electrical conduction changes, body characteristics, among others. In addition, the ECG signal is contaminated by several noise sources, both external sources (interference of the power line, movement of the electrodes) and biological sources (muscle movement causing high-frequency interference and breathing causing baseline displacement). Because of this, it is not possible to have a general training set that takes into account all cases of interest. That is the reason why, this kind of analysis requires special care to choose appropriate techniques for signal conditioning (pre-processing), since the quality of input signal for further classification has a direct impact on its performance. Figure 6 depicts the methodology proposed for Holter arrhythmia analysis that considers the following stages: a) Preprocessing, b) Feature extraction, c) Analysis of relevance, and c) Clustering. As input data, Holter recordings are initially preprocessed to reduce the influence of interferences and artifacts. Next, recordings are segmented based on estimation of fiducial point of QRS complexes. Heartbeat features extracted using variability, prematurity, morphology and representation measurements of the heart rate variability, are calculated by weighted linear projection. After that, projected data is grouped by soft clustering algorithm. The restrictions for reducing computational load lead to framing along the time axis the input data into a equal number (N s in Figure 6) of successive divisions of the Holter recordings, where each frame is separately processed. Therefore, according to the assumed criterium of homogeneity between two given consecutive frame divisions, resulting clusters can be either merged or split. Finally, such clusters, which represent different types of arrhythmia present in the recording, are analyzed by the specialists, and serve them as a supporting tool for the medical diagnosis.

Preprocessing and segmentation
The heartbeat set from recorded Holter ECG signals is to be processed. Let s(t), that is subject to discrete time transformation, s = {s k };w h e r es k s[kT s ],b e i n gk ∈ N,a n dT s the sampling period. At the beginning, recordings are normalized by the z-scores approach to prevent biasing, i.e., s 0 =( s − E {s})/(|max{s}|), where the notation E {·} stands for the expectance operator. Then, unbiased vector s 0 is filtered to reduce signal disturbances and artifacts. Specifically, power line interference is reduced using an algorithm based on adaptive sinusoidal interference canceller that provides significant signal-to-noise ratio improvement Martens et al. (2006). Also, the baseline wandering is cancelled out by the method described in Roddy (1991) that is based on a two-pole, phase-compensated filter, developed for realtime processing of long ECG segments. Although, the signal is also partially filtered, this preprocessing is supposed not to affect the separability among the underlying heartbeat groups. R-peak locations are previously estimated accordingly to the procedure given in Laguna & Sörnmo (2005), since the analysis of arrhythmias under consideration is supported on fixed changes of both QRS complex, as well as the HRV. The following sequential procedures are included: band-pass filtering, R peak enhancement and adaptive thresholding. Furthermore, their segmentation is carried out for a fixed window length to avoid analysis over QRS complexes of different length, that is, each j-th complex d j is accomplished as follows: d j = {s 0 k }; ∀k ∈ [l j − aF s , l j + bF s ],w h e r el j is the R-peak time location of the j-th heartbeat and F s = 1/T s is the sampling frequency. Nonetheless, it must be quoted that some morphologies such as VE, might exhibit S-waves lasting exceptionally more than usual, and therefore, they can be missed if using such a short processing window. QRS width is fixed to be of 200 ms length, i.e., a = b = 0.1.

Feature extraction
Heartbeat characterization is achieved by taking into consideration the wide set of features previously proposed for arrhythmia analysis over Holter ECG recordings Cuesta et al. (2007); Cvetkovic et al. (2008); Lagerholm et al. (2000); Rodríguez- . The whole set of studied features can be divided into the following groups, as shown in table 2:

Prematurity and variability based features
When considering Sv labelled arrhythmias, their morphology is highly similar to the normal heartbeat shape. Therefore, the following set of features, which are extracted from variability of cardiac rhythm, are mainly considered Rodríguez-Sotelo et al.
• Difference between RR and pre-RR intervals, • Difference between post-RR and RR intervals,  It should be noted that atrial (S) and ventricular (V) ectopic beats manifest abrupt changes on fiducial point intervals, which in turn, affect the respective values of heartbeat interval features.
-Prematurity features (x 4 , x 5 , x 6 ): Defined parameters, x 4 = x 1 − x 2 and x 5 = x 3 − x 1 ,a r e assumed to be relevant, since they make possible the identification of S type arrhythmia, when reflecting the increase or decrease of heart rate. Besides, if any heartbeat occurs after another S-labeled event, it is regarded as normal, and the above mentioned features will change of sign. Feature x 6 accounts for the number of consecutive S that is also sensitive to an increase of the heart rate, exceeding the normal range set for x 4 .T h ep a r a m e t e rx 6 is expressed as follows: ( 1 ) The first and second squared terms in Eq.
(1) are sensitive to abrupt changes of heart rate, whereas, the last addend is inferred as unnormalized Shannon entropy, which increases the value of x 6 whenever heart rate is steadily increasing.
Since most analyzed arrhythmias change the shape of QRS complexes, their characterization can be achieved by commonly used time and spectral-based techniques Cuesta et al. (2003). Therefore, regarding the former techniques, the following features are worth to be considered: A couple of features that are sensitive to abnormal QRS complexes: x 7 ,I tc o m p u t e sa morphological dissimilarity by means of Dynamic Time Warping (DTW) approach between current QRS complex and linearly averaged QRS complex of the last n heartbeats Cuesta et al. (2003).
x 8 = max{d j }/min{d j } ,b e i n gd j the current QRS complex. This feature is sensitive to ventricular arrhythmias that exhibit abnormal QRS complexes such as ventricular extrasystoles or branch blocks Cuesta et al. (2007). Because of the noticeable morphological characteristic of branch block heartbeats, the QRS energy, which is a straightforward feature to detect previously described type of heartbeats, is estimated as  On the other hand, spectral-based representation features used in the field of signal compression are also taken into account, since only a few coefficients are needed to reconstruct the signal Lagerholm et al. (2000). In this line of analysis, the Hermite coefficient h i related to i-th order base is used and calculated as follows: where ·, · is the inner product, H i is a Hermite polynomial of degree i and σ a parameter determining the window length. Wavelet decomposition coefficients are also studied. Specifically, 4-level coefficients of Daubechies-2 class (dB2) are computed. They have been proved to describe properly different heartbeat morphologies, as discussed in Cvetkovic et al. (2008). The following statistical descriptors are extracted from decomposition coefficients: mean value, variance, and maximum values are estimated. As a result, given an i-th observation heartbeat, the respective feature vector {x i ∈ R p : i = 1,...,n}, p = 100, is assumed to be the input training space toward arrhythmia classification stage.

Analysis of relevance
Since there is a huge amount of information stored during Holter monitoring, classification of heartbeats usually becomes quite time-consuming; that is the reason why any automated processing of the recording would be of benefit; particularly, the dimensional reduction procedure can be considered. In this sense, and based on multivariate representation of input data, a direct approach is the use of linear decomposition methods to decrease the dimensionality of the resulting feature space from heartbeat characterization. Among linear decomposition methods, the PCA and its variations have shown to be a good alternative for this aim Perlibakas (2004). Moreover, the non-parametric nature, feasibility of implementation and versatility are some advantages of PCA. Nonetheless, Holter monitoring of cardiac arrhythmias is an application where the conventional PCA might not be recommended because it gives the same importance to all observations, being sensitive to the presence of outliers and noise in the data. In fact, strong asymmetry among class observations requires a properly selection of heartbeat features to provide convenient separability among heartbeat types Cuesta et al. (2007); Sotelo, Frau, Ordónez, Domínguez & Novak (2009). To that end, a weighted version of PCA (termed WPCA) is used, where introduced weights are given depending on variable-wise relevance criteria; this in turn makes possible to assess the relative importance of each feature (variable) immersed on the original data representation by using a kind of weighting factor. The following two linear projection methods to estimate the feature-wise weighting factor, namely, Minimum Square Error (MSE) and M-inner product are described in the next paragraphs. Given a set of p-dimensional vector data, {x i }, being centered, i.e., E {x i } = 0, ∀i,whereall n observations can be aligned in the input matrix X =[ x 1 | ··· | x n ] T ∈ R n×p , then the respective linear projection is Y = XV , Y ∈ R n×p . Generally, the ortonormal projection is performed to a q-dimensional space (q < p), being V ∈ R p×p an orthogonal matrix, where the representation quality of X is measured by using a given error function ε between the original data and the truncated orthonormal projection V ∈ R p×q . This can be expressed as a distance measure: ε = d(X, X),w h e r e X = Y V T ,b e i n g X ∈ R n×p the truncated input matrix. There exist several alternatives for calculating this distance, such as, the Minkowski distance (L p metrics), square Euclidean distance, angle-based distance, Mahalanobis, among others, as discussed in Perlibakas (2004). Commonly, analysis of relevance methods aim to minimize ε. Denoting X = XW as the weighted data matrix, we can estimate a set of their q most relevant eigenvalues. The weighted relevance (weighting covariance) matrix is introduced as follows Yue & Tomoyasu (2004): where W ∈ R p×p is a diagonal weighting matrix.

MSE-based approach
The main goal of conventional PCA is to find out the optimum fitting for a given data in terms of least squares. This technique has been considered as the simplest eigenvectorbased multivariate analysis, where the linear decomposition of matrix X by singular value decomposition takes place, is the singular values matrix, U ∈ R n×n corresponds to eigenvectors of XX T ,a n dV holds eigenvectors of Σ X when W = diag(1 p ) and 1 p is a p-dimensional all-ones vector. Therefore, the minimum square error (MSE) distance is achieved to assess the representation quality, which yields to the following minimization problem: Let, x (l) ∈ R n×1 , l = 1,...,p,t h el-th feature of the input matrix, X that can be approximated by its truncated version in a q-dimensional ortonormal space by the following linear combination: then, the MSE value between the original and the reconstructed features is estimated as, that can be minimized if maximizing its complement, and therefore the following expression takes place: The coefficients of the linear combination in Eq. (5) are given by c i , estimated for the matrix V ,ist h ei-th element of the respective l feature. Replacing c (l) i ,t h e expression (7) is rewritten as follows: where λ =[λ 1 ,...,λ p ] is the vector of the eigenvalues of Σ X . As a result, generalizing the last expression for the p features, and having the simple average as an estimation of expectance operator, then the relevance measure is assumed as follows: where ν ν ν i is a vector composed by the square of each of v i elements. It should be remarked that vector ρ yields a relevance index, which measures the accumulated variance of eigenvalues and eigenvectors, and such vector is used as a weighting factor. Then, accordingly to the quadratic form of the generalized covariance matrix (see Eq. (3)), the weighting matrix can be obtained as W = diag( √ ρ). Lastly, the commonly known criterion of variance explained is used to find q, which rejects the elements that do not significantly contribute to the accumulated variance of data set. In addition, since the first principal component holds most of explained variance, the particular case q = 1 is also analyzed throughout this section.

M-inner product approach
This case considers the M-inner product as an error measure between the original variable and its orthonormal projection. Let U p ∈ R p×p be an arbitrary orthonormal matrix, and x (l) = u (l) T p X the linear combination to estimate the l-th feature. Then, the error measure for each feature is given by: where ·, · A is the M-inner product regarding a symmetric positive definite matrix A ∈ R n×n , which is related to the inner product between observations, i.e., A = ∑ p i=1 x i x T i .I f definition for the l-th estimated feature x (l) , given by Eq. (5), is replaced in Eq. (10), the following expression holds: that can be minimized if maximizing its complement, i.e, x (l)T A x (l) .T h u s ,r e p l a c i n gA and x (l) given in Eq. (5), and generalizing for all variables, the following expression yields the value: where λ are the eigenvalues of XX T . Furthermore, the eigenvalues of X T X matrix are the first p eigenvalues of X X T , then, maximizing Eq. (12) is equivalent to maximizing the expression: Next, establishing a weighted relevance matrix, A α = XWWX T ,w h e r eW = diag( √ α) and α ∈ R p×1 is a weighting vector, and assuming the orthonormal invariance criterion Compute α as the eigenvector associated with the major eigenvalue of G.

Compute auxiliar matrix
7. Make r ← r + 1 and return to the step 2 Yu & Shi (2003), the optimization problem can be rewritten as: being matrix Q ∈ R n×n an arbitrary orthonormal matrix that will be explained in detail further. Besides, the weighting vector is adjusted to be √ α to make the optimization problem in hand to be bilinear regarding α,t h u s , X = Xdiag( √ α). The weighting vector α and the orthonormal matrix Q are determined at the maximal point of the optimization problem. Finally, the objective function can be rewriting as the following quadratic form: ..,p,e l e m e n t sa n d M = X T . As a consequence, the previous equation becomes the objective function to be used in the unsupervised Q − α algorithm, as described in Wolf & Shashua (2005). It must be quoted that the matrix G is obtained from an arbitrary orthonormal transformation, it is necessary to apply an iterative method to tune the matrix Q and the weighting vector α. From the optimization problem, described by Eq. (15), it can be observed that vector α points out to the direction of most relevant features, whereas matrix Q means its rotation. Therefore, the adjustment of these parameters should be mutually dependent and must be achieved on an alternating way, as shown in algorithm 1. In steps 5 and 6, it introduces an auxiliar orthonormal projection of C α and QR decomposition, respectively, to refine matrix Q at each iteration. Then, the q most relevant features are those elements of M that satisfy ∑ q i=1 α 2 i ≈ σ e /100, for a given percentage fraction σ e of explained variance. Convergence of the algorithm 1 is discussed in detail in Wolf & Shashua (2005) (with r < 5 iterations). However, an indicator of the algorithm convergence could be the change of the vector α, i.e, the difference between the current and preceding vector: where δ ≥ 0 stands for any needed accurate amount, being χ[r] the achieved value of χ at the r-th iteration. The procedure for computing the weighting vector, α, is refined iteratively, and the whole data set is to be used, where the orthonormal matrix is updated per iteration to get the 239 Recognition of Cardiac Arrhythmia by Means of Beat Clustering on ECG-Holter Recordings www.intechopen.com Algorithm 2 Projection of weighted data.

(Initialization): Normalize
Choose a method to find the weighting vector w (a) w ← √ α, Eigenvector corresponding to the largest eigenvalue of G (algorithm 1, r ← last iteration) (b) w ← √ α, Eigenvector corresponding to the largest eigenvalue of (X T X ) · 2 .
3. Weight original data X = Xdiag(w) 4. Compute principal components V of X 5. Project data Y = X V subset of relevant features. As a result, the computational load may increase. Nonetheless, based on variance criterion, it can be inferred that the first q components of x (l) hold the most informative directions of weighting data. Thus, the l (q + 1 ≤ l ≤ p)d i r e c t i o n sd o not contribute significantly to the explained variance. Time calculation when computing the vector α can be diminished just to one iteration with no significant decrease of accuracy Wolf & Shashua (2005). With this in mind, the feature relevance may be preserved optimizing the p original variables or the first q variables. Indeed, maximizing tr(Q T A α A α Q) is equivalent to maximize tr(A α A α )= tr(Xdiag(α)X T Xdiag(α)X T ).S i n c e , t h i s expression is bilinear regarding α, the objective function can be re-written as α T Hα,w h er e Accordingly, it can be inferred that the approximate vector of relevance α is the eigenvector corresponding to the largest eigenvalue of (X T X )· 2 (where notation (χ)· 2 stands for the square of each one of the elements of the involved matrix χ).

Projection of weighted data
As described above, the data are weighted by the diagonal matrix W = diag(w),w h e r e w is the weighting vector calculated using either the MSE or the M inner-product-based approaches previously explained. Therefore, weighting data X = XW is linearly projected, so: Y = X V ,w h e r e V are the principal components of X, V = V ,i fW = diag(1 p ).The achieved procedure for relevance analysis and rotation of weighted data based on described methods is described in algorithm 2.

Clustering of cardiac arrhythmias
The projected weighted data Y are clustered in three stages. Firstly, the estimation of number of groups is carried out by means of spectral analysis of affinity measure, as described in Ng et al. (2001). Secondly, center initialization is achieved based on the J-H-means clustering algorithm Hansen & Mladenovic (2001), where objective function corresponds to the Minimum Sum-of-Squares (MSS), as suggested in Rodríguez- . To carry out the above process, a random initialization is developed where every point y i out of a sphere of radius ǫ has a center at ζ j , j = 1:k (ζ j is the j-th of the k centers)and is considered a center candidate. y i replaces a current center ζ j . Once updating and computing the MSS, the process is repeated as long as MSS reaches the optimal value. The process stops whenever there is no further MSS optimization. This initialization is aimed to avoid local minima. Lastly, the third clustering stage computes the final partition based on the Gaussian Expectation Maximization Clustering (GEMC) algorithm Cvetkovic et al. (2008), having as an objective function a linear combination of centered gaussian distributions over each centroid: where p(y i | ζ j ), assumed as gaussian distribution centered at ζ j , is the probability of y i ,and p(ζ j ) is the a priori probability of the j-th cluster. For sake of simplicity, the log function is used, while the minus sign accounts for minimization. Besides, the GEMC employs a soft member function, f m (·), assigning a membership level to y i for every cluster, as described in Hamerly & Elkan (2002). Further decreasing of computational load can be reached if sectioning the whole input recording into divisions for localized processing. At the beginning, a proper length of frame division to be clustered is estimated. It is assumed that its validity measures provide an equivalent performance compared to the full length processing of input recordings. Selecting proper number of localized clustering segments is constrained by the following restrictions: twice of number of features must exceed the amount of heartbeats per segment, and the minimum computational cost should be reached. At the end of the grouping step, combination of clustered segments is developed based on estimation of the proximities between each chosen cluster and the remaining clusters. In this regard, DTW algorithm, noted as dtw(·, ·),isusedasadissimilaritymeasureamong heartbeats related to the set of centroids of a given cluster, as detailed in Cuesta et al. (2007). Thus, considering P i = {J i 1 ,...,J i k i } as the partition estimated for i-th segment, where J i j i is the j-th cluster associated to i-th segment and k i is the number of assumed groups for the same partition, {ζ i 1 ,...,ζ i k i } are the centroids of i-cluster, and Y (ζ) stands for the projected data related to ζ centroid, then, a combination of clusters follows next rule: that is, if an estimated measure υ(j i , j i−1 ) lies within assumed proximity interval, then both chosen clusters are to be combined. Otherwise, following comparison of cluster is accomplished. Nonetheless, if there is any cluster not fulfilling the proximity measure during the current i-th iteration, it is no discarded but considered later during the coming next iterations. Therefore, incorrect clustering of minority classes is avoided whereas computational load is decreased.

Performance measures
clustering index, as a validity measure, is expressed as the relationship between the expected value of the GEMC objective function, given in Eq. (16), and assessed if considering an ideal partition (θ 2 ), and the current value estimated for the final partition (θ 1 ), i.e θ 1 /θ 2 ,a s introduced in Sotelo, Peluffo, Frau, Ordónez & Domínguez (2009). Since θ 2 ≥ θ 1 ,o n em i g h t infer that index is regarded to a proper clustering if its value lies closer to 1. It must be quoted that the proposed above measure is not sensitive to the assumed number of clusters.
Another cluster validity measure that can be considered is the clustering quality developed by spectral graph partitioning Yu & Shi (2003), when a proper clustering means that tight connection is reached within partitions, there is a loose connection between partitions. Thus, the cluster coherence is computed as follows: where B =[ B 1 ,...,B k ], B ∈ R n×k is a binary matrix comprised by the membership values of all elements to each cluster: b ij = ⌊max arg j f m (y i /ζ j )⌋, j = 1,...,k,w h e r e⌊·⌋ is 1 if its argument is true and 0 otherwise. A is the affinity matrix and D ∈ R n×n is the degree of matrix A. Due to normalization with respect to the affinity matrix, the maximum value of B is 1, resulting in evidencing a good clustering if its value is closer to 1. Clustering is penalized when there is a large set of groups. In addition, supervised measures are accomplished to contrast with another similar references, taking advantage of recording labels, as further described. Particularly, each assembled cluster can be split into two clases: one holding the majority heartbeats regarding to the class of interest (MC), and another having the minority beatings being of different classes (OC). The following quantitative measures are defined: True Positive ( T P ), heartbeats MC classified correctly, True negative (T N ), heartbeats OC, classified correctly, False positive (F P ), heartbeats OC classified as MC and False negative (F N ), heartbeats MC classified as OC. After computing the above described measures, the following values of sensitivity (Se), specificity (Sp), and clustering performance (CP) are estimated: Since there is no ideal partition, there are more clusters than classes expected. Therefore, the partition might be penalized when holding a relatively large number of clusters, for instance, by means of a factor as e ηk r /k a ,wherek r is the number of groups resulting from the clustering, k a is the admissibility value of groups, and η,0< η ≤ 1, is an adjusting value. In this way, the measure ϕ that can be S e , S p or C P is weighted as follows: 242 Advances in Electrocardiograms -Methods and Analysis www.intechopen.com  Fig. 7. First 3 principal components after weighting the data matrix for recording 217 with different w (algorithm 2)

Relevant experiments and discussion
Training is carried out over a set of ECG MIT-BIH database, which holds different types of arrhythmia, as previously discussed 1. The analysis is carried out over the whole data set for the MIT/BIH arrhythmia database that holds 48 recordings each one being of about 30 minutes long. It is important to note that the recording analysis is performed one by one, and some recordings exhibit strong unbalanced number of observation per class. Namely, it can be found some recordings holding just one-two heartbeats of class F,afewofS (less than 10), whereas its number of normal heartbeats may be very huge (more than 3000!). Figure 7 shows an example for relevance analysis stage using the proposed scheme, taking into account the last 5 minutes of recording numbered as 217. It can be observed that there is a short separation of first 3 principal components. Remaining subfigures show the transformed data using the methods studied, where a better separability can be noted when using w = √ ρ and w = √ α. Particulary, in case of ρ, the ignored eigenvectors (see Eq. (9)) for computing the relevance generate an homogeneous weighting of the analyzed features set, resulting in a lower selectivity, i.e., w = √ ρ, having similar separability to w = 1 p (i.e. Conventional PCA).

Analysis of relevance results
The variable weighting using the analyzed methods is shown in Fig. 8. It exhibits a similarity between w = √ α and w = √ ρ. The weighting obtained from iterative Q − α algorithm stands out mainly due to the quadratic nature of the objective function to be maximized, which employs M-inner product as distance measure. Although, it is not possible to generalize the results to all recordings because of ECG signal variability, this behavior is observed in most cases. Figure 9 shows the dynamic of the calculated relevance of variables according to morphology type of each recording. Three segments of 207-th recording are analyzed. The first segment corresponds to the first 5 minutes of recording, which contains type L, R and V beats. The second one corresponds to a period between 20 minutes and 25 minutes, that only has type L and V beats. The last one contains type A and E beats corresponding to the last 5 minutes of recording. It can be seen that in the first segment, the relevant variables correspond to the WT features (table 2) while in the second one, the Hermite coefficients had higher weight since these coefficients characterize appropriately to the morphology of type L and V beats. Finally, in relevance with α relevance with α relevance with ρ relevance with ρ the last analyzed segment the weight for each one of the first 3 variables (HRV features) is increased. According to this, it can be concluded that segment analysis allows a local analysis of relevance and achieves good performance after the final partition, as discussed in the following section. θ 1 /θ 2 ǫ B k end q Time [s] μ − σ 2 α 0.95 -0.08 0.86 -0.12 9 -2.43 6 -0.86 37.20 -7.65 ρ 0.94 -0.10 0.82 -0.14 11 -2.67 10 -1.78 38.52 -7.52 ρ 0.94 -0.10 0.83 -0.14 11 -2.66 9 -1.77 37.47 -7.66 α 0.95 -0.08 0.84 -0.12 9 -2.44 6 -1.16 36.20 -7.54 Table 3. Clustering performance using non-supervised indices (θ 1 /θ 2 , ǫ B ), number of resultant groups (k end ), number of relevant features (q) and processing time [s].
It can be highlighted the fact that the variability features (HRV, table 2) are essential to discriminate between normal heartbeats and supraventricular ectopic beats, both of them having similar morphology. Most important morphological features correspond to those based on WT, which discriminate between type V and Q heartbeats.

Computing clustering performance
The results of clustering are accomplished by framing each recording into 6 divisions and the resulting clusters are merged as described in section 4.4. The number of segments is achieved experimentally, improving the trade-off among the number of segments, computational cost and quality of partition. Thus, the segment analysis enhances the performance if compared to the whole data clustering. In fact, it reduces the probability that a minority class heartbeat might be clustered wrongly. Furthermore, in most cases, the sum of processing times over all segments turns out to be considerably shorter than the time of analysis of the whole recording data for one iteration. As a result, the introduced framing approach significantly reduces the computational cost. Table 3 depicts the whole system performance using the non-supervised indices, as discussed in section 4.4, and parameters for computational cost and the number of resulting groups are displayed, as well. The first column refers to the index θ 1 /θ 2 , the second one to ǫ B ,thethird one corresponds to the resulting clusters (k end ) after processing all segments of each recording, the fourth one is the number of selected features before projecting the data, and the last one is the time fixed from filtering to clustering stages. The rows show the methods considered for weighting of variables. Evaluation of the quality of partition is provided by the index θ 1 /θ 2 that has been introduced in Sotelo, Peluffo, Frau, Ordónez & Domínguez (2009), where a maximum index value of θ 1 /θ 2 = 0.98 was achieved over only 14 recordings of the entire MIT/BIH database. In this chapter, the maximum index achieved is close to (∼ 0.96), over the whole recording set if providing framed division analysis; pointing out to have a better generalization ability. If increasing the number of groups k, then index θ 1 /θ 2 t e n d st o1 .N o n e t h e l e s s ,av e r yh i g h number of k leads to a more difficult evaluation by specialists. Specifically, in Lagerholm et al. (2000), the total number of groups representing the entire MIT/BIH database is assumed to be 25, while in Rodríguez- , when considering only some recordings from MIT/BIH database, the average number of groups diminishes to 15. In this chapter, for the entire MIT/BIH database the value k ranges within 9 ≤ k ≤ 11, still showing a better performance. As discussed above, the non-supervised measure ǫ B penalizes the number of groups after the final partition. In fact, if the clustering procedure is carried out for a value less than or equal to the proper value of k,thenǫ B tends to 1. Otherwise, this value becomes far from 1, as the amount of k is increased, due to the upper bound monotonicity theorem Yu & Shi (2003). Concretely, a value of k less than 6 can be taken as an admisible number of groups because 5 classes are considered in the present arrhythmia analysis. Table 3 shows that the achieved values for ǫ B are ranging within interval 0.82 ≤ ǫ M ≤ 0.86, which can be considered as a realistic outcome to quantify the resultant partition. Still, this measure does not reach the value 1, as if grouping were absolutely correct, because of the effect of penalization regarding the number of groups and also the sensitivity to the affinity matrix. Besides, the average number is shown for the relevant features q before the WPCA projection (see section 4.3) that most of them contribute to the clustering process. As seen, the range of values of q is 6 ≤ q ≤ 10, showing admisible values with respect to the total number of features. Additionally, the average computing time needed to process the whole recording is 37 s, being a reasonable time for each considered recording. Table 4 shows the arithmetic mean (μ)a n dv a r i a n c e(σ 2 ) of supervised measures, discussed in section 4.4, over the entire database for each group of arrhythmias, using the proposed weighting methods (see algorithm 2). In Rodríguez-Sotelo et al. (2009), the clustering performance is evaluated only considering three arrhythmia groups, namely, N, S and V (see table 1). Nonetheless, the performance measures are calculated by couples of arrhythmias and, as a consequence, the value of measures tends to increase. The reported results are Se S = 93.3% and Sp S = 99.5%, which can be compared with the second column of table 4, where the maximum values are Se S = 91.3% and Sp S = 99.5%. Although, sensitivity is less than the reported in Rodríguez- , it should be noted that among results of this work, all recordings from the database with S-type arrhythmias (the class of interest) and the remaining groups as another class are analyzed. Thus, the proposed method provides more robustness when considering other types of arrhythmias. For some considered arrhythmia groups, the performance results in table 4 became remarkable, e.g. Se N = 99.25%, Se V = 96.5%. In other cases, e.g. F group, the performance values decreases, Se F = 70.7%, due to the low number of representative heartbeats of some classes. Still, one can infer that the best performance is provided when the data are weighted by using α. Finally, in Figure 10, an example of clustered heartbeats is presented by using the recording 217 of the MIT/BIH arrhythmia database which contains heartbeats of type: N, V, f and P (in

Conclusions
The proposed methodology for unsupervised Holter monitoring of cardiac arrhythmias that is based on variable-wise relevance analysis leads to an improvement of clustering of those heartbeat types recommended by AAMI. Because of strong asymmetry among class observations, the heartbeat-derived features should be properly selected by their weighting projection. This makes possible to assess the relative importance of each feature immersed on the original data representation. In addition, because of restrictions for reducing computational load, proposed methodology is carried out by successive division analysis along the time, where each recording is separately processed, and thus significantly reduces the processing time. It must be noted that the relevance analysis provides enough generalization capability, mainly, because of most informative features are weighted and projected. In general, in this work, the M-inner product-based approach showed better performance than MSE-based approach, and although its iterative nature leads to high computational cost, the segment analysis compensates for this effect. This it is possible its implementation for real time applications. Besides, the assuming grouping that includes initial parameters estimation (estimation of number of groups and center initialization), which is based on spectral techniques and soft partitional clustering, generates a proper final partition. The methodology provides an useful tool to analyze cardiac arrhythmias with suitable quality since it is based on non-supervised training. That is, there is no need for labelling of recordings, which mostly is not feasible for Holter monitoring. Testing of considered methodology by using introduced cluster validity measures shows a comparable performance in comparison to another referenced works based on either supervised or unsupervised training and carried out for the MIT/BIH database. As future work, additional spectral clustering stages should be explored with the possibility of unifying the stages of feature selection and clustering, in order to improve accuracy and computational load for the system.