Open access peer-reviewed chapter

# Application of Principal Component Analysis to Image Compression

By Wilmar Hernandez and Alfredo Mendez

Submitted: November 18th 2017Reviewed: February 6th 2018Published: March 16th 2018

DOI: 10.5772/intechopen.75007

## Abstract

In this chapter, an introduction to the basics of principal component analysis (PCA) is given, aimed at presenting PCA applications to image compression. Here, concepts of linear algebra used in PCA are introduced, and PCA theoretical foundations are explained in connection with those concepts. Next, an image is compressed by using different principal components, and concepts such as image dimension reduction and image reconstruction quality are explained. Also, using the almost periodicity of the first principal component, a quality comparative analysis of a compressed image using two and eight principal components is carried out. Finally, a novel construction of principal components by periodicity of principal components has been included, in order to reduce the computational cost for their calculation, although decreasing the accuracy.

### Keywords

• principal component analysis
• population principal components
• sample principal components
• image compression
• image dimension reduction
• image reconstruction quality

## 1. Introduction

Principal component analysis, also known as the Hotelling transform or Karhunen-Loeve transform, is a statistical technique that was proposed by Karl Pearson (1901) as part of factorial analysis; however, its first theoretical development appeared in 1933 in a paper written by Hotelling [1, 2, 3, 4, 5, 6, 7, 8]. The complexity of the calculations involved in this technique delayed its development until the birth of computers, and its effective use started in the second half of the twentieth century. The relatively recent development of methods based on principal components makes them little used by a large number of non-statistician researchers. The purposes of these notes are to disclose the nature of the principal component analysis and show some of its possible applications.

Principal component analysis refers to the explanation of the structure of variances and covariances through a few linear combinations of the original variables, without losing a significant part of the original information. In other words, it is about finding a new set of orthogonal axes in which the variance of the data is maximum. Its objectives are to reduce the dimensionality of the problem and, once the transformation has been carried out, to facilitate its interpretation.

By having p variables collected on the units analyzed, all are required to reproduce the total variability of the system, and sometimes the majority of this variability can be found in a small number, k, of principal components. Its origin lies in the redundancy that there exists many times between different variables, so the redundancy is data, not information. The k principal components can replace the p initial variables, so that the original set of data, consisting of n measures of p variables, is reduced to n measures of k principal components.

The objective pursued by the analysis of principal components is the representation of the numerical measurements of several variables in a space of few dimensions, where our senses can perceive relationships that would otherwise remain hidden in higher dimensions. The abovementioned representation must be such that, when discarding higher dimensions, the loss of information is minimal. A simile could illustrate the idea: imagine a large rectangular plate that is a three-dimensional object, but that for practical purposes, we consider it as a flat two-dimensional object. When carrying out this reduction in dimensionality, a certain amount of information is lost since, for example, opposite points located on the two sides of the rectangular plate will appear confused in a single one. However, the loss of information is largely compensated by the simplification made, since many relationships, such as the neighborhood between points, are more evident when they are drawn on a plane than when done by a three-dimensional figure that must necessarily be drawn in perspective.

The analysis of principal components can reveal relationships between variables that are not evident at the first sight, which facilitates the analysis of the dispersion of observations, highlighting possible groupings and detecting the variables that are responsible for the dispersion.

## 2. Preliminaries

The study of multivariate methods is greatly facilitated by means of matrix algebra [9, 10, 11]. Next, we introduce some basic concepts that are essential for the explanation of statistical techniques, as well as for geometric interpretations. In addition, the relationships that can be expressed in terms of matrices are easily programmable on computers, so we can apply calculation routines to obtain other quantities of interest. It is a basic introduction about concepts and relationships.

### 2.1. The vector of means and the covariance matrix

Let X=X1Xptbe a random column vector of dimension p. Each component, Xi, is a random variable (r.v.) with mean EXi=μiand variance VXi=EXiμi2=σii. Given two r.v., Xiand Xj, we define the covariance between them as CovXiXj=EXiμiXjμj=σij. The expected values, variances, and covariances can be grouped into vectors and matrices that we will call population mean vector, μ, and population covariance matrix, :

μ=EX=μ1μp,=CovX=EXμXμt=σ11σ1pσp1σppE1

The population correlation matrix is given by ρ=ρij, where ρij=σijσiiσjj.

In the case of having n values of the r.v.s, we will consider estimators of the previous population quantities, which we will call sample estimators.

Definition 2.1: Let X=x11x1pxp1xppbe a simple random sample of a p-dimensional r.v. ordered in the data matrix, with the values of the r.v.s in each column. The p-dimensional sample mean column vector is X¯=x¯i, where x¯i=1pm=1pxim. The sample covariance matrix is S=sij=nn1Sn=nn1XX¯XX¯t. The generalized sample variance is the determinant of S, S.The sample correlation matrix is R=rij, where rij=sijsiisjjwith i,j=1p.

Proposition 2.1: Let X1,,Xpbe a simple random sample of a p-dimensional r.v. Xwith mean vector μand covariance matrix . The unbiased estimators of μand are X¯and S..

### 2.2. Eigenvalues and eigenvectors

One of the problems that linear algebra deals with is the simplification of matrices through methods that produce diagonal or triangular matrices, which are widely used in the resolution of linear systems of the form Ax=b.

Definition 2.2: Let Abe a square matrix. If vtAv0for any vector v, Ais a nonnegative definite matrix. If Av=λv, with v0, λis an eigenvalue associated with the eigenvector v.

Proposition 2.2: Let Abe a symmetric p by p matrix with real-valued entries. Ahas p pairs of eigenvalues and eigenvectors, λ1e1,,λpep, such that:

1. All the eigenvalues are real. Also,

1. Ais positive definite if all the eigenvalues are positive.

2. Ais nonnegative definite if all the eigenvalues are nonnegative.

2. The eigenvectors can be chosen with 2-norm equal to 1.

3. The eigenvectors are mutually perpendicular.

4. The eigenvectors are unique unless two or more eigenvalues are equal.

5. The spectral decomposition of Ais A=λ1e1e1t++λpepept.

6. If P=e1epis an orthogonal matrix and Λis a diagonal matrix with main diagonal entries λ1λp, the spectral decomposition of Acan be given by A=PΛPt. Therefore, A1=PΛ1Pt=i=1p1λieieit.

Remark 2.1: Let Xbe a matrix with the values of a simple random sample in each column of a p-dimensional r.v., and let yit=xi1xin, with i=1p, be the ith row of X. Let 1nt=11be the n by one vector with all its coordinates equal to 1. It can be proven that:

1. The projection of the vector yiton the vector 1nis the vector x¯i1n, whose 2-norm is equal to nx¯i.

2. Matrix Snis obtained from the residuals ei=yix¯i1n, the squared 2-norm of eiis equal to n1sii, and the scalar product of eiand ejis equal to n1sij.

3. The sample correlation coefficient rijis the cosine of the angle between eiand ej.

4. If Uis the volume generated by the vectors ei, with i=1p, then S=U2n1p. Therefore, the generalized sample variance is proportional to the square of the volume generated by deviation vectors. The volume will increase if the norm of some eiis increased.

### 2.3. Distances

Many techniques of multivariate statistical analysis are based on the concept of distance. Let Q=x1x2be a point in the plane. The Euclidean distance from Qto the origin, O, is dQO=x12+x22. If Q=x1xpand R=y1yp, the Euclidean distance between these two points of ℜp is dQR=x1y12++xpyp2. All points x1xpwhose square distance to the origin is a fixed quantity, for example, x12++xp2=c2, are the points of the p-dimensional sphere of radius c.

For many statistical purposes, the Euclidean distance is unsatisfactory, since each coordinate contributes in the same way to the calculation of such a distance. When the coordinates represent measures subject to random changes, it is desirable to assign weights to the coordinates depending on how high or low the variability of the measurements is. This suggests a measure of distance that is different from the Euclidean.

Next, we introduce a statistical distance that will take into account the different variabilities and correlations. Therefore, it will depend on the variances and covariances, and this distance is fundamental in multivariate analysis.

Suppose we have a fixed set of observations in ℜp, and, to illustrate the situation, consider n pairs of measures of two variables, x1and x2. Suppose that the measurements of x1vary independently of x2and that the variability of the measures of x1are much greater than those of x2. This situation is shown in Figure 1, and our first objective is to define a distance from the points to the origin.

In Figure 1, we see that the values that have a given deviation from the origin are farther from the origin in the x1direction than in the x2direction, due to the greater variability inherent in the direction of x1. Therefore, it seems reasonable to give more weight in the coordinate x2than in the x1. One way to obtain these weights is to standardize the coordinates, that is, x1=x1/s11and x2=x2/s22, where siiis the sample variance of the variable xi. Thus, the statistical distance from a point Q=x1x2to the origin is dQO=x12s11+x22s22. Therefore, the points that are equidistant from the origin of a constant distance care on an ellipse centered at the origin, whose major axis coincides with the coordinate that has the greatest variability. In the case that the variability of one variable is analogous to that of the other and that the coordinates are independent, the Euclidean distance is proportional to the statistical distance.

If Q=x1xpand R=y1ypare two points of ℜp, the statistical distance between them is dQR=x1y12s11++xpyp2spp, with siibeing the sample variance of the variable xi. The statistical distance defined so far does not include most of the important cases where the variables are not independent. Figure 2 shows a situation where the pairs x1x2seem to have an increasing trend, so the sample correlation coefficient will be positive. In Figure 2, we see that if we make a rotation of amplitude αand consider the axes g1g2we are in conditions analogous to those of Figure 1 (a). Therefore, the distance from the point Q=g1g2to the origin will be dQO=g12s˜11+g22s˜22, where s˜iiis the sample variance of the variable gi.

The relationships between the original coordinates and the new coordinates can be expressed as

g1=x1cosα+x2sinαg2=x1sinα+x2cosαE2

and, after some algebraic manipulations, dQO=a11x12+2a12x1x2+a22x22, where aijare values that depend on the angle and the dispersions, and also must meet the condition that the distance between any two points must be positive.

The distance from a point Q=x1x2to a fixed point R=y1y2in situations where there is a positive correlation is dQR=a11x1y12+2a12x1y1x2y2+a22x2y22. So, in this case, the coordinates of all points Q=x1x2verify the equation a11x1y12+2a12x1y1x2y2+a22x2y22=c2, which is the equation of an ellipse of center R=y1y2and with axes parallel to g1g2. Figure 3 shows ellipses with constant statistical distances.

This distance can be generalized to ℜp if a11,,app,a12,,ap1,pare values such that the distance from Qto Ris given by.

dQR=A+B,whereA=a11x1y12++appxpyp2B=2a12x1y1x2y2++2ap1,pxp1yp1xpypE3

This distance, therefore, is completely determined by the coefficient aij, with i,j1p, which can be arranged in a matrix given by

A=a11a1pap1appE4

The elements of Eq. (4) cannot be arbitraries. In order to define a distance over a vector space, Eq. (4) must be a square, symmetric, positive definite matrix. Therefore, the sample covariance matrix of a data matrix, S, is a candidate to define a statistical distance.

Figure 4 shows a cloud of points with center of gravity, x¯1x¯2, at point R. At the first glance, it can be seen that the Euclidean distance from point Rto point Qis greater than the Euclidean distance from point Rto the origin; however, Qseems to have more to do with the cloud of points than the origin. If we take into account the variability of the points in the cloud and take the statistical measure, then Qwill be closer to Rthan the origin.

The above given explanation has tried to be an illustration of the need to consider distances other than the Euclidean.

## 3. Population principal components

Principal components are a particular case of linear combinations of p r.v.s, X1,,Xp. These linear combinations represent, geometrically, a new coordinate system that is obtained by rotating the original reference system that has X1,,Xpas coordinate axes. The new axes represent the directions with maximum variability and provide a simple description of the structure of the covariance.

Principal components depend only on the variance/covariance matrix (or on the correlation matrix ρ) of X1,,Xp, and it is not necessary to assume that the r.v.s follows an approximately normal distribution. In case of having a normal multivariate distribution, we will have interpretations in terms of ellipsoids of constant density, if we consider the distance that defines the matrix, and the inferences can be made from the population components.

Let X=X1Xptbe a p-dimensional random vector with covariance matrix and eigenvalues λ1λ2λp. Let us consider the following p linear combinations:

Y1=l1tX=l11X1++lp1XpYp=lptX=l1pX1++lppXpE5

These new r.v.s verify the following equalities:

VYi=litΣlii=1,,pCovYiYj=litΣlji,j=1,,pijE6

Principal components are those linear combinations that, being uncorrelated among them, have the greatest possible variance. Thus, the first principal component is the linear combination with the greatest variance, that is, VY1=l1tΣl1is maximum. Since if we multiply l1by some constant the previous variance grows, we will restrict our attention to vectors of norm one, with which the aforementioned indeterminacy disappears. The second principal component is the linear combination that maximizes the variance and is uncorrelated with the first one, and the norm of the coefficient vector is equal to 1.

Proposition 3.1: Let be the covariance matrix of the random vector X=X1Xpt. Let us assume that has p pairs of eigenvalues and eigenvectors, λ1e1,,λpep, with λ1λ2λp. Then, the ith principal component is given by

Yi=eitX=e1iX1++epiXpi=1,,pE7

In addition, with this choice it is verified that:

1. VYi=eitΣei=λii=1,,p.

2. CovYiYj=0i,j=1,,pij.

3. If any of the eigenvalues are equal, the choice of the corresponding eigenvectors as vectors of coefficients is not unique.

4. σ11++σpp=i=1pVXi=λ1++λp=j=1pVYj.

Remark 3.1: For the demonstration of these results, expressions are used on maximums of quadratic forms between vectors of fixed norm maxl0ltΣlltl=λ1. Also, the Lagrange multipliers method can be used, expressions when the abovementioned maximum is subject to orthogonality conditions and properties on the trace of a matrix (if Σ=PΛPt, then trΣ=trPΛPt=trΛ).

Due to the previous result, principal components are uncorrelated among them, with variances equal to the eigenvalues of , and the proportion of the population variance due to the ith principal component is given by λiλ1++λp.

If a high percentage of the population variance, for example, the 90%, of a p-dimensional r.v., with large p, can be attributed to, for example, the five first principal components, then we can replace all the r.v.s by those five components without a great loss of information.

Each component of the coefficient vector eit=e1iepi, eki, also deserves our attention, since it is a measure of the relationship between the r.v.s Xkand Yi.

Proposition 3.2: If Y1=e1tX,,Yp=eptXare the principal components obtained from the covariance matrix , with pairs of eigenvalues and eigenvectors λ1e1λpep, then the linear correlation coefficients between the variables Xkand the components Yiare given by

ρXk,Yi=ekiλiσkki,k=1,,pE8

Therefore, ekiis proportional to the correlation coefficient between Xkand Yi.

In the particular case that Xhas a normal p-dimensional distribution, ΝpμΣ, the density of Xis constant in the ellipsoids with the center at μgiven by XμtΣ1Xμ=c2that have axes ±cλieiand i=1,,p, where λieiare the pairs of eigenvalues and eigenvectors of .

If the covariance matrix, , can be decomposed into Σ=PΛPt, where Pis orthogonal and Λdiagonal, it can be shown that Σ1=PΛ1Pt=i=1p1λieieit. Also, if it can be assumed that μ=0, to simplify the expressions, then

c2=xtΣ1x=1λ1e1tx2+1λ2e2tx2++1λpeptx2E9

If the principal components y1=e1tx,,yp=eptxare considered, the equation of the constant density ellipsoid is given by

c2=1λ1y12+1λ2y22++1λpyp2E10

Therefore, the axes of the ellipsoid have the directions of the principal components.

Example 3.1: Let X1,X2,X3be the three-unidimensional r.v.s and X=X1X2X3t, with covariance matrix

Σ=200083032E11

It can be verified that the pairs of eigenvalues and eigenvectors are λ1=9.243e1t=00.9240.383, λ2=2e2t=100, and λ3=0.757e3t=00.3830.924. Therefore, the principal components are the following:

Y1=e1tX=0.924X20.383X3Y2=e2tX=X1Y3=e3tX=0.383X2+0.924X3E12

The norm of all the eigenvectors is equal to 1, and, in addition, the variable X1is the second principal component, because X1is uncorrelated with the other two variables.

The results of Proposition 3.1 can be verified for this data, for example, VY1=9.243and CovY1Y2=0. Also, i=13VXi=2+8+2=12=9.243+2+0.757=j=13VYj. Thus, the proportion of the total variance explained by the first component is λ1/12=77%, and the one explained by the first two is λ1+λ2/12=93.69%, so that the components Y1and Y2can replace the original variables with a small loss of information.

The correlation coefficients between the principal components and the variables are the following:

ρX1,Y1=0ρX2,Y1=0.993ρX3,Y1=0.823ρX1,Y2=1ρX2,Y2=0ρX3,Y2=0ρX1,Y3=0ρX2,Y3=0.118ρX3,Y3=0.568E13

In view of these values, it can be concluded that X2and X3individually are practically equally important with respect to the first principal component, although this is not the case with respect to the third component. If, in addition, it is assumed that the distribution of Xis normal, Ν3μΣ, with a null mean vector, ellipsoids of constant density xtΣ1x=c2can be considered. An ellipsoid of constant statistical distance and projections is shown in Figure 5.

The ellipsoid with c2=8has been represented in Figure 5 (a), together with its axes and the ellipsoid projections on planes parallel to the coordinate axes. The aforementioned projections are ellipses of red, green, and blue colors that are reproduced in Figure 5 (b). Also, in this figure, the black ellipse obtained by projecting the ellipsoid on the plane determined by the first two main components has been represented. The equation of this ellipse is y12a2+y22b2=8, where a=cη1and b=cη2, with η1and η2being the two smallest eigenvalues of Σ1, and the axes are determined by Y1and Y2. As can be seen, the diameters of the ellipse determined by the first two components are larger than the others. Therefore, the area enclosed by this ellipse is the largest of all, indicating that it is the one that gathers the greatest variability.

### 3.1. Principal components with respect to standardized variables

The principal components of the normalized variables Z1=X1μ1σ11,,Zp=Xpμpσppcan also be considered, which in matrix notation is Z=VXμ, where Vis the diagonal matrix whose elements are 1σ11,,1σpp. It is easily verified that the r.v. Zverifies EZ=0and CovZ=VΣV=ρ, where ρis the correlation matrix of X.

Principal components of Zare obtained by the eigenvalues and eigenvectors of the correlation matrix, ρ, of X. Furthermore, with some simplification, the previous results can be applied, since the variance of each Ziis equal to 1.

Let W1,,Wpbe the principal components of Zand viuit, i=1,,p, the pairs of eigenvalues and eigenvectors of ρ, since they do not have to be the same.

Proposition 3.3: Let Z=Z1Zptbe a random vector with covariance matrix ρ. Let v1u1,,vpupbe the pairs of eigenvalues and eigenvectors of ρ, with v1vp. Then, the ith principal component is given by Wi=uitVXμ, i=1,,p. In addition, with this choice it is verified that:

1. VWi=vi, i=1,,p.

2. CovWiWj=0, i,j=1,,p, ij.

3. If any of the eigenvalues are equal, the choice of the corresponding eigenvectors as vectors of coefficients is not unique.

4. i=1pVWi=v1++vp=j=1pVZj=p.

5. The linear correlation coefficients between the variables Zkand the principal components Wiare ρZk,Wi=ukiviand i,k=1,,p.

These results are a consequence of those obtained in Proposition 3.1 and Proposition 3.2 applied to Zand ρinstead of Xand .

The total population variance of the normalized variables is the sum of the elements of the diagonal of ρ, that is, p. Therefore, the proportion of the total variability explained by the ith principal component is vip, i=1,,p.

Example 3.2: Let X1and X2be the two-unidimensional r.v.s and X=X1X2twith the covariance matrix, , and correlation matrix, ρ, given by

Σ=1234ρ=10.20.21E14

It can be verified that the pairs of eigenvalues and eigenvectors for S are λ1=100.04e1t=0.020.999and λ2=0.96e2t=0.9990.02. Therefore, the principal components are the following:

Y1=e1tX=0.02X10.999X2Y2=e2tX=0.999X1+0.02X2E15

Furthermore, the eigenvalues and eigenvectors of ρare v1=1.2u1t=0.7070.707and v2=0.8u2t=0.7070.707; hence, the principal components of the normalized variables are the following:

W1=u1tZ=0.707Z1+0.707Z2=0.707X1μ1+0.0707X2μ2W2=u2tZ=0.707Z1+0.707Z2=0.707X1μ1+0.0707X2μ2E16

Because the variance of X2is much greater than that of X1, the first principal component for Σis determined by X2, and the proportion of variability explained by that first component is λ1λ1+λ2=0.99.

When considering the normalized variables, each variable also contributes to the components determined by ρ, and the dependencies between the normalized variables and their first component are ρZ1,W1=u11v1=0.7071.2=0.774and ρZ2,W1=u21v1=0.7071.2=0.774. The proportion of the total variability explained by the first component is v1p=0.6.

Therefore, the importance of the first component is strongly affected by normalization. In fact, the weights, in terms of Xiare 0.707and 0.0707for ρ, as opposed to 0.02and 0.999for Σ.

Remark 3.2: The above example shows that the principal components deduced from the original variables are, in general, different from those derived from the normalized variables. So, normalization has important consequences.

When the units in which the different one-dimensional random variables are given are very different and in the case that one of the variances is very dominant compared to the others, the first principal component, with respect to the original variables, will be determined by the variable whose variance is the dominant one. On the other hand, if the variables are normalized, their relationship with the first components will be more balanced.

Principal components can be expressed in particular ways if the covariance matrix, or the correlation matrix, has special structures, such as diagonal ones, or structures of the form Σ=σ2A.

## 4. Sample principal components

Once we have the theoretical framework, we can now address the problem of summarizing the variation of n measurements made on p variables.

Let x1,,xnbe a sample of a p-dimensional r.v. Xwith mean vector μand covariance matrix Σ. These data have a vector of sample means x¯, covariance matrix S, and correlation matrix R.

This section is aimed at constructing linear uncorrelated combinations of the measured characteristics that contain the greatest amount of variability contained in the sample. These linear combinations are called principal sample components.

Given nvalues of any linear combination l1txj=l11x1j++lp1xpj, j=1,,n, its sample mean is l1tx¯j, and its sample variance is l1tSl1. If we consider two linear combinations, l1txjand l2txj, their sample covariance is l1tSl2.

The first principal component will be the linear combination, l1txj, which maximizes the sample variance, subject to the condition l1tl1=1. The second component will be the linear combination, l2txj, which maximizes the sample variance, subject to the condition that l2tl2=1and that the sample covariance of the pairs l1txjl2txjis equal to zero. This procedure is continued until the pprincipal components are completed.

Proposition 4.1: Let S=sikbe the pby pmatrix of sample covariances, whose pairs of eigenvalues and eigenvectors are λ̂1ê1,,λ̂pêp, with λ̂1λ̂2λ̂p0. Let xbe an observation of the p-dimensional random variable X, then:

1. The ith principal component is given by ŷi=êitx=ê1ix1++êpixp, i=1,,p.

2. The sample variance of ŷkis λ̂k, k=1,,p.

3. The sample covariance of ŷiŷk, ik, is equal to 0.

4. The total sample variance is i=1psii=λ̂1++λ̂p.

5. The sample correlation coefficients between xkand ŷiare rxk,ŷi=êkiλ̂iskk, i,k=1,,p.

In the case that the random variables have a normal distribution, the principal components can be obtained from a maximum likelihood estimation Σ̂=Sn, and, in this case, the sampling principal components can be considered as maximum likelihood estimates of the population principal components. Although the eigenvalues of Sand Σ̂are different but proportional, with constant proportionality fixed, the proportion of variability they explain is the same. The sample correlation matrix is the same for Sand Σ̂. We still do not consider the particular case of normal distribution of the variables, so as not to have to include hypotheses that should be verified for the data under study.

Sometimes, the observations xare centered by subtracting the mean x¯. This operation does not affect the covariance matrix and produces principal components of the form ŷi=êitxx¯, and in this case ŷ¯ifor any component, while the sample variances remain λ̂1,,λ̂p.

When trying to interpret the principal components, the correlation coefficients rxk,ŷiare more reliable guides than the coefficients êik, since they avoid interpretive problems caused by the different scales in which the variables are measured.

### 4.1. Interpretations of the principal sample components

Principal sample components have several interpretations. If the distribution of Xis close to NpμΣ, then components ŷi=êitxx¯are realizations of the main population components Yi=eitXμ, which will have distribution Np0Λ, where Λis the diagonal matrix whose elements are the eigenvalues, ordered from major to minor, from the sample covariance matrix. Keeping in mind the hypothesis of normality, contours of constant density, Ep=xpxx¯tS1xx¯=c2, can be estimated and make inferences from them.

Although it is not possible to assume normality in the data, geometrically the data are npoints p, and the principal components represent an orthogonal transformation whose coordinate axes are the axes of the ellipsoid Epand with lengths proportional to λ̂i, with λ̂ibeing the eigenvalues of S. Since all eigenvectors have been chosen such that their norm is equal to 1, the absolute value of the ith component ŷi=êitxx¯is the length of the projection of the vector xx¯on the vector êi. Therefore, the principal components can be seen as a translation of the origin to the point x¯and a rotation of the axes until they pass through the directions with greater variability.

When there is a high positive correlation between all the variables and a principal component with all its coordinates of the same sign, this component can be considered as a weighted average of all the variables or the size of the index that forms that component. The components that have coordinates of different signs oppose a subset of variables against another, being a weighted average of two groups of variables.

The interpretation of the results is simplified assuming that the small coefficients are zero and rounding the rest to express the component as sums, differences, or quotients of variables.

The interpretation of the principal components can be facilitated by graphic representations in two dimensions. A usual graph is to represent two components as coordinate axes and project all points on those axes. These representations also help to test hypotheses of normality and to detect anomalous observations. If there is an observation that is atypical in the first variable, we will have that the variability in that first variable will grow and that the covariance with the other variables will decrease, in absolute value. Consequently, the first component will be strongly influenced by the first variable, distorting the analysis.

Sometimes, it is necessary to verify that the first components are approximately normal, although it is not reasonable to expect this result from a linear combination of variables that do not have to be normal.

The last component can help detect suspicious observations. Each observation xcan be expressed as a linear combination of the eigenvectors of S, xj=ŷ1jê1++ŷpjêp, with which the difference between the first components ŷ1jê1++ŷqjêqand the observation xjis ŷq1jêq1++ŷpjêp, which is a vector with square of the norm ŷq1j2++ŷpj2, and we will suspect of observations that have a large contribution to the square of the aforementioned norm.

An especially small value of the last eigenvalue of the covariance matrix, or correlation matrix, can indicate a linear dependence between the variables that have not been taken into account. In this case, some variable is redundant and should be removed from the analysis. If we have four variables and the fourth is the sum of the other three, then the last eigenvalue will be close to zero due to rounding errors, in which case we should suspect some dependence. In general, eigenvalues close to zero should not be ignored, and eigenvalues associated with these eigenvalues can indicate linear dependencies in the data and cause deformations in the interpretations, calculations, and consequent analysis.

### 4.2. Standardized sample principal components

In general, principal components are not invariant against changes of scale in the original variables, as has been mentioned when referring to the normalized population principal components. Normalizing, or standardizing, the variables consists of performing the following transformation zj=Dxjx¯=x1jx¯1s11xpjx¯psppt, j=1,,p. If the matrix Zis the pby nmatrix whose columns are zj, it can be shown that its sample mean vector is the null vector and that its correlation matrix is the sample correlation matrix, R, of the original variables.

Remark 4.1: Applying that the principal components of the normalized variables are those obtained for the sample observations but substituting the matrix Sfor R, we can establish that if z1,,znare the normalized observations, with covariance matrix R=rik, where rikis the sample correlation coefficient between observations xiand xk, and if the pairs of eigenvalues and eigenvectors of Rare v̂1û1,,v̂pûp, with v̂1v̂p0, then

1. The ith principal component is given by ω̂i=ûitz=û1iz1++ûpizp, i=1,,p.

2. The sample variance of ω̂kis v̂k, k=1,,p.

3. The sample covariance of ω̂iω̂k, ik, is equal to 0.

4. The total sample variance is trR=p=v̂1++v̂p.

5. The sample correlation coefficients between zkand ω̂iare rzk,ω̂i=ûkiv̂i, i,k=1,,p.

6. The proportion of the total sample variance explained by the ith principal component is v̂ip.

### 4.3. Criteria for reducing the dimension

The eigenvalues and eigenvectors of the covariance matrix, or correlation matrix, are the essence of the analysis of principal components, since the eigenvalues indicate the directions of maximum variability and the eigenvectors determine the variances. If a few eigenvalues are much larger than the rest, most of the variance can be explained with less than pvariables.

In practice, decisions about the number of components to be considered must be made in terms of the pairs of eigenvalues and eigenvectors of the covariance matrix, or correlation matrix, and different rules have been suggested:

1. When performing the graph iλ̂i, it has been empirically verified that with the first values there is a decrease with a linear tendency of quite steep slope and that from a certain eigenvalue this decrease is stabilized. That is, there is a point from which the eigenvalues are very similar. The criterion consists of staying with the components that exclude the small eigenvalues and that are approximately equal.

2. Select components until obtaining a proportion of the preset variance (e.g., 80%). This rule should be applied with care, since components that are interesting to reflect certain nuances suitable for the interpretation of the analysis could be excluded.

3. A rule that does not have a great theoretical support, which must be applied carefully so as not to discard any valid component for the analysis, but which has given good empirical results, is to retain those components with variances, λ̂i, above a certain threshold. If the work matrix is the correlation matrix, in which case the average value of the eigenvalues is one, the criterion is to keep the components associated with eigenvalues greater than unity and discard the rest.

## 5. Application to image compression

We are going to illustrate the use of principal components to compress images. To this end, the image of Lena was considered. This photograph has been used by engineers, researchers, and students for experiments related to image processing.

### 5.1. Black and white photography

The black and white photograph shown in Figure 6 was considered. First, the image in .jpg format was converted into the numerical matrix Image of dimension 512by 512(i.e., 29x 29). Second, to obtain the observation vectors, the matrix was divided into blocks of dimension 23x 23, Aij, with which 4096blocks were obtained, and each of them was a vector of observations.

Image=A1,1A1,64A64,1A64,64E17

Third, each matrix Aijwas stored in a vector of dimension 64, x, which contained the elements of the matrix by rows, that is, x=ai,1ai,8ai+1,1ai+1,8ai+8,8. This way, we had the observations xk64k=14096, which were grouped in the observation matrix x=xijΜ4096,64.

Fourth, the average of each column, x¯=x¯1x¯64, was calculated obtaining the vector of means, and from each observation xij, its corresponding mean x¯jwas subtracted. Thus, the matrix of centered observations Uwas obtained. The covariance matrix of xwas S=UtUΜ64,64.

Fifth, the 64pairs of eigenvalues and eigenvectors of S, λ̂iêi, were found, and they were ordered according to the eigenvalues from highest to lowest. The 8largest eigenvalues are drawn in Figure 7. As can be seen, the first eigenvalue is much larger than the rest. Thus, the first principal component completely dominates the total variability.

Sixth, with the theoretical results and the calculations previously made, the 64principal components ŷj=êjtx=ê1,jx1++ê64,jxp, j=1,,p, were built. The first principal component was ŷ1=0.1167x1+0.1166x64. Therefore, an orthonormal basis of 64was built.

Seventh, each vector êj=ê1,jê64,jtwas grouped by rows in a matrix Μ8,8:

Êj=ê1,jê8,jê57,jê64,jE18

Each of the 64matrices Êjwas converted into an image. The images of the first three principal components are shown in Figure 8.

At this point, it is important to mention that the data matrix xhas been assumed to be formed by 4096vectors of 64expressed in the canonical base, B. Also, the base whose vectors were the eigenvectors of S, B=ê1ê64, was considered. The coordinates with respect to the canonical basis of the vectors of Bwere the columns of the matrix PC=ê1tê64t. Then, given a vector vthat with respect to the canonical base had coordinates x1x64and with respect to the base Bhad coordinates y1y64, the relation between them was x1x64t=PCy1y64t. Also, as PCis an orthogonal matrix, y1y64=x1x64PC. Thus, the coordinates of the 4096vectors that formed the observations matrix had as coordinates, with respect to the new base, the rows of the matrix of dimension 4096x 64given by y=xPC.

Eight, in order to reduce the dimension, it was taken into consideration that if we keep all the vectors of B, we can perfectly reconstruct our data matrix, because y=xPCx=yPC1=yPCt. Additionally, for the case under study, to reduce the dimension, if we use the slope change rule, we can consider the first two principal components; five components if we want to explain 97%of the variability, because i=15λ̂i/j=164λ̂j=97%; or eight components if we want to explain 98%of the total variability.

In order to compress the image, the first vectors of the base Bwere used. Moreover, supposing that we were left with M, M<64, the matrix TMgiven by Eq. (19) was defined:

TM=IMxM0Mx64M064MxM064Mx64ME19

Therefore, the dimension of yM=yTMwas 4096× 64.

Ninth, to reconstruct the compressed image, each row of yMwas regrouped in an 8x 8matrix. The ith row of yM, denoted by yMi=bi,1bi,8bi,9bi,16bi,64, was transformed into the matrix Bigiven by Eq. (20), and the matrix Compressed_imagegiven by Eq. (21) was built:

Bi=bi,1bi,8bi,9bi,16bi,57bi,64i=1,,4096E20
Compressed_image=B1B64B65B128B4033B4096E21

Tenth and finally, Eq. (21) was converted into a .jpg file. Figure 9 shows the original image and compressed images with two, five, and eight principal components.

By increasing the number of principal components, the percentage of the variability explained is increased by very small percentages, but, nevertheless, nuances are added to the photo sufficiently remarkable, since they make it sharper, smooth out the contours, and mark the tones more precisely.

#### 5.1.1. Objective measures of the quality of reconstructions

The two methods that we will use are the peak signal-to-noise ratio (PSNR) and the entropy of the error image. The PSNR measure evaluates the quality in terms of deviations between the processed and the original image, and the entropy of an image is a measure of the information content contained in that image.

Definition 5.1: Let Nbe the number of rows by the number of columns in the image. Let xnn=1Nbe the set of pixels of the original image. Let ynn=1Nbe the set of reconstruction pixels. Let rn=xnynn=1Nbe the error. The mean square error (MSE) is

MSE=1Nn=1Nrn2E22

Definition 5.2: Let the images under study be the 8bit images. The peak signal-to-noise ratio of the reconstruction is

PSNR=10log102812MSEE23

Figure 10 (a) shows PSNRof the reconstructions of the image versus the number of principal components used for the reconstruction, together with the regression line that adjusts the said cloud of points. Figure 10 (b) shows the values of the PSNR when we use three quarters (black), half (red), quarter (blue), eighth (green), sixteenth (brown), and the thirty-second part (yellow) of the components, which means a corresponding reduction in compression. A behavior close to linearity with a slope of approximately 0.2can be seen. With the reductions considered, the PSNRvaries between 27and 63.

If the entropy is high, the variability of the pixels is very high, and there is little redundancy. Thus, if we exceed a certain threshold in compression, the original image cannot be recovered exactly. If the entropy is small, then the variability will be smaller. Therefore, the information of a pixel with respect to the pixels of its surroundings is high and, therefore, randomness is lost.

Definition 5.3: Let Ibe an 8bit image that can take the values 0255. Let pibe the frequency with which the value i0255appears. Then, the entropy is

HI=i=0255pilog2piE24

Figure 11 (a) shows the entropy of the reconstructions from 1to 256components. As can be seen, the entropy is increasing until the first 10 components, and then it becomes damped tending asymptotically to the value of the entropy of the image (7.4452). It can be seen that the difference with more than 170components is insignificant. Figure 11 (b) shows the entropy of the reconstructions using 8components (black), 16components (brown), 32components (green), 64components (blue), and 128components (red), respectively.

Finally, we consider the entropy of the images of the errors. Given an image, I, the value of each of its pixels is an element of the set 0255, and if we have a reconstruction, Î, and consider the error, E=IÎ, then the value of its pixels will be an element of the set 255255. Therefore, Ecannot be considered as an image. Since a pixel of value eijin Eis an error of the same size as eij, to consider images we denominate image of the error to ImE=eij, being E=eij.

Figure 12 (a) shows the entropy of the error image versus the number of principal components used for the reconstruction, together with an adjusted line of slope − 0.02. Figure 12(b) shows the entropy when we use 8 components (black), 16 components (brown), 32 components (green), 64 components (blue), and 128 components (red), respectively. With more than 200principal components, the entropy of the errors is zero, which means that the errors have very little variability, and with fewer components, the decrease seems linear with slope 0.02.

### 5.2. Coordinates of the first principal component

In this section, we will consider the coordinates of the first vectors that form the principal components. If we consider that the vectors have been obtained as 23x 23dimension blocks, vectors will have 64coordinates. Figure 13 shows the coordinates of the first six principal components with respect to the canonical base.

As can be seen from Figure 13, all coordinates seem to have some component with period 8. This suggests that there may be some relationship with the shape of the blocks chosen and that most vectors are close to being periodic with period 8, because when we consider each of the 4096vectors of 64components, the first 8pixels are adjacent to the next 16pixels, and these are adjacent to the next 8pixels, and so on, up to 8times.

Since the first principal components collect a large part of the characteristics of the vectors, it is plausible that they also reflect the periodicity of the vectors. Recall that principal components are linear combinations of vectors and that if all of them had all their periodic coordinates with the same period, then all components would be periodic as well.

In Figure 14, the coordinates of the first three principal components are shown when the vectors are constructed from blocks of 22x 22(see Figure 14 (a-c)) and from blocks of 24x 24(see Figure 14 (d-f)). As can be seen, the periodicity in the first components is again appreciated.

### 5.3. Reduction of the first principal component by periodicity

Using the almost periodicity of the first principal component, we can use less information to obtain acceptable reconstructions of the image. If in the first principal component of dimension 64we repeat the first eight values periodically and use kprincipal components to reconstruct the image, we go from a reduction of k/64to another of k1+8/64/64. Figure 15 shows both the reconstruction of the image with 2and 8original principal components and the reconstruction of the image with 2and 8principal components, but with the first component replaced by a vector whose coordinates have period 8, we call this componentsper.

The first componentspercomponent is not the true one. Therefore, reconstructions from this set cannot be made with total precision. If we use to compare the 1-norm, 2-norm, and -norm of the image and the corresponding reconstruction, with the original principal components and the principal components using their periodicity, we obtain, by varying the number of used principal components, the results shown in Figure 16.

With the original principal components (blue), the original image can be completely reconstructed, while if we use only a few components, in this case 10or less, approximations similar to the original ones are obtained with componentsper(green).

## 6. Conclusions

This chapter has been devoted to give a short but comprehensive introduction to the basics of the statistical technique known as principal component analysis, aimed at its application to image compression. The first part of the chapter was focused on preliminaries, mean vector, covariance matrix, eigenvectors, eigenvalues, and distances. That part finished bringing up the problems that the Euclidean distance presents and highlights the importance of using a statistical distance that takes into account the different variabilities and correlations. To that end, a brief introduction was made to a distance that depends on variances and covariances.

Next, in the second part of the chapter, principal components were introduced and connected with the previously explained concepts. Here, principal components were presented as a particular case of linear combinations of random variables, but with the peculiarity that those linear combinations represent a new coordinate system that is obtained by rotating the original reference system, which has the aforementioned random variables as coordinate axes. The new axes represent the directions with maximum variability and provide a simple description of the structure of the covariance.

Then, the third part of the chapter was devoted to show an application of principal component analysis to image compression. An original image was taken and compressed by using different principal components. The importance of carrying out objective measures of quality reconstructions was highlighted. Also, a novel contribution of this chapter was the introduction to the study of the periodicity of the principal components and to the importance of the reduction of the first principal component by periodicity. In short, a novel construction of principal components by periodicity of principal components has been included, in order to reduce the computational cost for their calculation, although decreasing the accuracy. It can be said that using the almost periodicity of the first principal component, less information to obtain acceptable reconstructions of the image can be used.

Finally, we would not like to finish this chapter without saying that few pages cannot gather the wide range of applications that this statistical technique has found in solving real-life problems. There is a countless number of applications of principal component analysis to solve problems that both scientists and engineers have to face in real-life situations. However, in order to be practical, it was decided to choose and develop step by step an application example that could be of interest for a wide range of readers. Accordingly, we thought that such an example could be one related to data compression, because with the advancements of information and communication technologies both scientists and engineers need to either store or transmit more information at lower costs, faster, and at greater distances with higher quality. In this sense, one example is image compression by using statistical techniques, and this is the reason why, in this chapter, it was decided to take advantage of statistical properties of an image to present a practical application of principal component analysis to image compression.

## Acknowledgments

chapter PDF
Citations in RIS format
Citations in bibtex format

## More

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## How to cite and reference

### Cite this chapter Copy to clipboard

Wilmar Hernandez and Alfredo Mendez (March 16th 2018). Application of Principal Component Analysis to Image Compression, Statistics - Growing Data Sets and Growing Demand for Statistics, Türkmen Göksel, IntechOpen, DOI: 10.5772/intechopen.75007. Available from:

### chapter statistics

5Crossref citations

### Related Content

Next chapter

#### Severe Nuclear Accidents and Learning Effects

By Thomas Rose and Trevor Sweeting

#### Advances in Statistical Methodologies and Their Application to Real Problems

Edited by Tsukasa Hokimoto

First chapter

#### Why the Decision‐Theoretic Perspective Misrepresents Frequentist Inference: Revisiting Stein’s Paradox and Admissibility

By Aris Spanos

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

View all Books