In this chapter, an introduction to the basics of principal component analysis (PCA) is given, aimed at presenting PCA applications to image compression. Here, concepts of linear algebra used in PCA are introduced, and PCA theoretical foundations are explained in connection with those concepts. Next, an image is compressed by using different principal components, and concepts such as image dimension reduction and image reconstruction quality are explained. Also, using the almost periodicity of the first principal component, a quality comparative analysis of a compressed image using two and eight principal components is carried out. Finally, a novel construction of principal components by periodicity of principal components has been included, in order to reduce the computational cost for their calculation, although decreasing the accuracy.
- principal component analysis
- population principal components
- sample principal components
- image compression
- image dimension reduction
- image reconstruction quality
Principal component analysis, also known as the Hotelling transform or Karhunen-Loeve transform, is a statistical technique that was proposed by Karl Pearson (1901) as part of factorial analysis; however, its first theoretical development appeared in 1933 in a paper written by Hotelling [1, 2, 3, 4, 5, 6, 7, 8]. The complexity of the calculations involved in this technique delayed its development until the birth of computers, and its effective use started in the second half of the twentieth century. The relatively recent development of methods based on principal components makes them little used by a large number of non-statistician researchers. The purposes of these notes are to disclose the nature of the principal component analysis and show some of its possible applications.
Principal component analysis refers to the explanation of the structure of variances and covariances through a few linear combinations of the original variables, without losing a significant part of the original information. In other words, it is about finding a new set of orthogonal axes in which the variance of the data is maximum. Its objectives are to reduce the dimensionality of the problem and, once the transformation has been carried out, to facilitate its interpretation.
By having p variables collected on the units analyzed, all are required to reproduce the total variability of the system, and sometimes the majority of this variability can be found in a small number, k, of principal components. Its origin lies in the redundancy that there exists many times between different variables, so the redundancy is data, not information. The k principal components can replace the p initial variables, so that the original set of data, consisting of n measures of p variables, is reduced to n measures of k principal components.
The objective pursued by the analysis of principal components is the representation of the numerical measurements of several variables in a space of few dimensions, where our senses can perceive relationships that would otherwise remain hidden in higher dimensions. The abovementioned representation must be such that, when discarding higher dimensions, the loss of information is minimal. A simile could illustrate the idea: imagine a large rectangular plate that is a three-dimensional object, but that for practical purposes, we consider it as a flat two-dimensional object. When carrying out this reduction in dimensionality, a certain amount of information is lost since, for example, opposite points located on the two sides of the rectangular plate will appear confused in a single one. However, the loss of information is largely compensated by the simplification made, since many relationships, such as the neighborhood between points, are more evident when they are drawn on a plane than when done by a three-dimensional figure that must necessarily be drawn in perspective.
The analysis of principal components can reveal relationships between variables that are not evident at the first sight, which facilitates the analysis of the dispersion of observations, highlighting possible groupings and detecting the variables that are responsible for the dispersion.
The study of multivariate methods is greatly facilitated by means of matrix algebra [9, 10, 11]. Next, we introduce some basic concepts that are essential for the explanation of statistical techniques, as well as for geometric interpretations. In addition, the relationships that can be expressed in terms of matrices are easily programmable on computers, so we can apply calculation routines to obtain other quantities of interest. It is a basic introduction about concepts and relationships.
2.1. The vector of means and the covariance matrix
Let be a random column vector of dimension p. Each component, , is a random variable (r.v.) with mean and variance . Given two r.v., and , we define the covariance between them as . The expected values, variances, and covariances can be grouped into vectors and matrices that we will call population mean vector, , and population covariance matrix, :
The population correlation matrix is given by , where .
In the case of having n values of the r.v.s, we will consider estimators of the previous population quantities, which we will call sample estimators.
Definition 2.1: Let be a simple random sample of a p-dimensional r.v. ordered in the data matrix, with the values of the r.v.s in each column. The p-dimensional sample mean column vector is , where . The sample covariance matrix is . The generalized sample variance is the determinant of , The sample correlation matrix is , where with .
Proposition 2.1: Let be a simple random sample of a p-dimensional r.v. with mean vector and covariance matrix . The unbiased estimators of and are and .
2.2. Eigenvalues and eigenvectors
One of the problems that linear algebra deals with is the simplification of matrices through methods that produce diagonal or triangular matrices, which are widely used in the resolution of linear systems of the form
Definition 2.2: Let be a square matrix. If for any vector , is a nonnegative definite matrix. If , with , is an eigenvalue associated with the eigenvector .
Proposition 2.2: Let be a symmetric p by p matrix with real-valued entries. has p pairs of eigenvalues and eigenvectors, , such that:
All the eigenvalues are real. Also,
is positive definite if all the eigenvalues are positive.
is nonnegative definite if all the eigenvalues are nonnegative.
The eigenvectors can be chosen with 2-norm equal to 1.
The eigenvectors are mutually perpendicular.
The eigenvectors are unique unless two or more eigenvalues are equal.
The spectral decomposition of is .
If is an orthogonal matrix and is a diagonal matrix with main diagonal entries , the spectral decomposition of can be given by . Therefore, .
Remark 2.1: Let be a matrix with the values of a simple random sample in each column of a p-dimensional r.v., and let , with , be the ith row of . Let be the n by one vector with all its coordinates equal to 1. It can be proven that:
The projection of the vector on the vector is the vector , whose 2-norm is equal to .
Matrix is obtained from the residuals , the squared 2-norm of is equal to , and the scalar product of and is equal to .
The sample correlation coefficient is the cosine of the angle between and .
If is the volume generated by the vectors , with , then . Therefore, the generalized sample variance is proportional to the square of the volume generated by deviation vectors. The volume will increase if the norm of some is increased.
Many techniques of multivariate statistical analysis are based on the concept of distance. Let be a point in the plane. The Euclidean distance from to the origin, , is . If and , the Euclidean distance between these two points of ℜp is . All points whose square distance to the origin is a fixed quantity, for example, , are the points of the p-dimensional sphere of radius .
For many statistical purposes, the Euclidean distance is unsatisfactory, since each coordinate contributes in the same way to the calculation of such a distance. When the coordinates represent measures subject to random changes, it is desirable to assign weights to the coordinates depending on how high or low the variability of the measurements is. This suggests a measure of distance that is different from the Euclidean.
Next, we introduce a statistical distance that will take into account the different variabilities and correlations. Therefore, it will depend on the variances and covariances, and this distance is fundamental in multivariate analysis.
Suppose we have a fixed set of observations in ℜp, and, to illustrate the situation, consider n pairs of measures of two variables, and . Suppose that the measurements of vary independently of and that the variability of the measures of are much greater than those of . This situation is shown in Figure 1, and our first objective is to define a distance from the points to the origin.
In Figure 1, we see that the values that have a given deviation from the origin are farther from the origin in the direction than in the direction, due to the greater variability inherent in the direction of . Therefore, it seems reasonable to give more weight in the coordinate than in the . One way to obtain these weights is to standardize the coordinates, that is, and , where is the sample variance of the variable . Thus, the statistical distance from a point to the origin is . Therefore, the points that are equidistant from the origin of a constant distance are on an ellipse centered at the origin, whose major axis coincides with the coordinate that has the greatest variability. In the case that the variability of one variable is analogous to that of the other and that the coordinates are independent, the Euclidean distance is proportional to the statistical distance.
If and are two points of ℜp, the statistical distance between them is , with being the sample variance of the variable . The statistical distance defined so far does not include most of the important cases where the variables are not independent. Figure 2 shows a situation where the pairs seem to have an increasing trend, so the sample correlation coefficient will be positive. In Figure 2, we see that if we make a rotation of amplitude and consider the axes we are in conditions analogous to those of Figure 1 (a). Therefore, the distance from the point to the origin will be , where is the sample variance of the variable .
The relationships between the original coordinates and the new coordinates can be expressed as
and, after some algebraic manipulations, , where are values that depend on the angle and the dispersions, and also must meet the condition that the distance between any two points must be positive.
The distance from a point to a fixed point in situations where there is a positive correlation is . So, in this case, the coordinates of all points verify the equation , which is the equation of an ellipse of center and with axes parallel to . Figure 3 shows ellipses with constant statistical distances.
This distance can be generalized to ℜp if are values such that the distance from to is given by.
This distance, therefore, is completely determined by the coefficient , with , which can be arranged in a matrix given by
The elements of Eq. (4) cannot be arbitraries. In order to define a distance over a vector space, Eq. (4) must be a square, symmetric, positive definite matrix. Therefore, the sample covariance matrix of a data matrix, , is a candidate to define a statistical distance.
Figure 4 shows a cloud of points with center of gravity, , at point . At the first glance, it can be seen that the Euclidean distance from point to point is greater than the Euclidean distance from point to the origin; however, seems to have more to do with the cloud of points than the origin. If we take into account the variability of the points in the cloud and take the statistical measure, then will be closer to than the origin.
The above given explanation has tried to be an illustration of the need to consider distances other than the Euclidean.
3. Population principal components
Principal components are a particular case of linear combinations of p r.v.s, . These linear combinations represent, geometrically, a new coordinate system that is obtained by rotating the original reference system that has as coordinate axes. The new axes represent the directions with maximum variability and provide a simple description of the structure of the covariance.
Principal components depend only on the variance/covariance matrix (or on the correlation matrix ) of , and it is not necessary to assume that the r.v.s follows an approximately normal distribution. In case of having a normal multivariate distribution, we will have interpretations in terms of ellipsoids of constant density, if we consider the distance that defines the matrix, and the inferences can be made from the population components.
Let be a p-dimensional random vector with covariance matrix and eigenvalues . Let us consider the following p linear combinations:
These new r.v.s verify the following equalities:
Principal components are those linear combinations that, being uncorrelated among them, have the greatest possible variance. Thus, the first principal component is the linear combination with the greatest variance, that is, is maximum. Since if we multiply by some constant the previous variance grows, we will restrict our attention to vectors of norm one, with which the aforementioned indeterminacy disappears. The second principal component is the linear combination that maximizes the variance and is uncorrelated with the first one, and the norm of the coefficient vector is equal to 1.
Proposition 3.1: Let be the covariance matrix of the random vector . Let us assume that has p pairs of eigenvalues and eigenvectors, , with . Then, the ith principal component is given by
In addition, with this choice it is verified that:
If any of the eigenvalues are equal, the choice of the corresponding eigenvectors as vectors of coefficients is not unique.
Remark 3.1: For the demonstration of these results, expressions are used on maximums of quadratic forms between vectors of fixed norm . Also, the Lagrange multipliers method can be used, expressions when the abovementioned maximum is subject to orthogonality conditions and properties on the trace of a matrix (if , then ).
Due to the previous result, principal components are uncorrelated among them, with variances equal to the eigenvalues of , and the proportion of the population variance due to the ith principal component is given by .
If a high percentage of the population variance, for example, the , of a p-dimensional r.v., with large p, can be attributed to, for example, the five first principal components, then we can replace all the r.v.s by those five components without a great loss of information.
Each component of the coefficient vector , , also deserves our attention, since it is a measure of the relationship between the r.v.s and .
Proposition 3.2: If are the principal components obtained from the covariance matrix , with pairs of eigenvalues and eigenvectors , then the linear correlation coefficients between the variables and the components are given by
Therefore, is proportional to the correlation coefficient between and .
In the particular case that has a normal p-dimensional distribution, , the density of is constant in the ellipsoids with the center at given by that have axes and , where are the pairs of eigenvalues and eigenvectors of .
If the covariance matrix, , can be decomposed into , where is orthogonal and diagonal, it can be shown that . Also, if it can be assumed that , to simplify the expressions, then
If the principal components are considered, the equation of the constant density ellipsoid is given by
Therefore, the axes of the ellipsoid have the directions of the principal components.
Example 3.1: Let be the three-unidimensional r.v.s and , with covariance matrix
It can be verified that the pairs of eigenvalues and eigenvectors are , , and . Therefore, the principal components are the following:
The norm of all the eigenvectors is equal to 1, and, in addition, the variable is the second principal component, because is uncorrelated with the other two variables.
The results of Proposition 3.1 can be verified for this data, for example, and . Also, . Thus, the proportion of the total variance explained by the first component is , and the one explained by the first two is , so that the components and can replace the original variables with a small loss of information.
The correlation coefficients between the principal components and the variables are the following:
In view of these values, it can be concluded that and individually are practically equally important with respect to the first principal component, although this is not the case with respect to the third component. If, in addition, it is assumed that the distribution of is normal, , with a null mean vector, ellipsoids of constant density can be considered. An ellipsoid of constant statistical distance and projections is shown in Figure 5.
The ellipsoid with has been represented in Figure 5 (a), together with its axes and the ellipsoid projections on planes parallel to the coordinate axes. The aforementioned projections are ellipses of red, green, and blue colors that are reproduced in Figure 5 (b). Also, in this figure, the black ellipse obtained by projecting the ellipsoid on the plane determined by the first two main components has been represented. The equation of this ellipse is , where and , with and being the two smallest eigenvalues of , and the axes are determined by and . As can be seen, the diameters of the ellipse determined by the first two components are larger than the others. Therefore, the area enclosed by this ellipse is the largest of all, indicating that it is the one that gathers the greatest variability.
3.1. Principal components with respect to standardized variables
The principal components of the normalized variables can also be considered, which in matrix notation is , where is the diagonal matrix whose elements are . It is easily verified that the r.v. verifies and , where is the correlation matrix of .
Principal components of are obtained by the eigenvalues and eigenvectors of the correlation matrix, , of . Furthermore, with some simplification, the previous results can be applied, since the variance of each is equal to 1.
Let be the principal components of and , , the pairs of eigenvalues and eigenvectors of , since they do not have to be the same.
Proposition 3.3: Let be a random vector with covariance matrix . Let be the pairs of eigenvalues and eigenvectors of , with . Then, the ith principal component is given by , . In addition, with this choice it is verified that:
, , .
If any of the eigenvalues are equal, the choice of the corresponding eigenvectors as vectors of coefficients is not unique.
The linear correlation coefficients between the variables and the principal components are and .
These results are a consequence of those obtained in Proposition 3.1 and Proposition 3.2 applied to and instead of and .
The total population variance of the normalized variables is the sum of the elements of the diagonal of , that is, . Therefore, the proportion of the total variability explained by the ith principal component is , .
Example 3.2: Let and be the two-unidimensional r.v.s and with the covariance matrix, , and correlation matrix, , given by
It can be verified that the pairs of eigenvalues and eigenvectors for S are and . Therefore, the principal components are the following:
Furthermore, the eigenvalues and eigenvectors of are and ; hence, the principal components of the normalized variables are the following:
Because the variance of is much greater than that of , the first principal component for is determined by , and the proportion of variability explained by that first component is .
When considering the normalized variables, each variable also contributes to the components determined by , and the dependencies between the normalized variables and their first component are and . The proportion of the total variability explained by the first component is .
Therefore, the importance of the first component is strongly affected by normalization. In fact, the weights, in terms of are and for , as opposed to and for .
Remark 3.2: The above example shows that the principal components deduced from the original variables are, in general, different from those derived from the normalized variables. So, normalization has important consequences.
When the units in which the different one-dimensional random variables are given are very different and in the case that one of the variances is very dominant compared to the others, the first principal component, with respect to the original variables, will be determined by the variable whose variance is the dominant one. On the other hand, if the variables are normalized, their relationship with the first components will be more balanced.
Principal components can be expressed in particular ways if the covariance matrix, or the correlation matrix, has special structures, such as diagonal ones, or structures of the form .
4. Sample principal components
Once we have the theoretical framework, we can now address the problem of summarizing the variation of n measurements made on p variables.
Let be a sample of a p-dimensional r.v. with mean vector and covariance matrix . These data have a vector of sample means , covariance matrix , and correlation matrix .
This section is aimed at constructing linear uncorrelated combinations of the measured characteristics that contain the greatest amount of variability contained in the sample. These linear combinations are called principal sample components.
Given values of any linear combination , , its sample mean is , and its sample variance is . If we consider two linear combinations, and , their sample covariance is .
The first principal component will be the linear combination, , which maximizes the sample variance, subject to the condition . The second component will be the linear combination, , which maximizes the sample variance, subject to the condition that and that the sample covariance of the pairs is equal to zero. This procedure is continued until the principal components are completed.
Proposition 4.1: Let be the by matrix of sample covariances, whose pairs of eigenvalues and eigenvectors are , with . Let be an observation of the p-dimensional random variable , then:
The ith principal component is given by , .
The sample variance of is , .
The sample covariance of , , is equal to 0.
The total sample variance is .
The sample correlation coefficients between and are , .
In the case that the random variables have a normal distribution, the principal components can be obtained from a maximum likelihood estimation , and, in this case, the sampling principal components can be considered as maximum likelihood estimates of the population principal components. Although the eigenvalues of and are different but proportional, with constant proportionality fixed, the proportion of variability they explain is the same. The sample correlation matrix is the same for and . We still do not consider the particular case of normal distribution of the variables, so as not to have to include hypotheses that should be verified for the data under study.
Sometimes, the observations are centered by subtracting the mean . This operation does not affect the covariance matrix and produces principal components of the form , and in this case for any component, while the sample variances remain .
When trying to interpret the principal components, the correlation coefficients are more reliable guides than the coefficients , since they avoid interpretive problems caused by the different scales in which the variables are measured.
4.1. Interpretations of the principal sample components
Principal sample components have several interpretations. If the distribution of is close to , then components are realizations of the main population components , which will have distribution , where is the diagonal matrix whose elements are the eigenvalues, ordered from major to minor, from the sample covariance matrix. Keeping in mind the hypothesis of normality, contours of constant density, , can be estimated and make inferences from them.
Although it is not possible to assume normality in the data, geometrically the data are points , and the principal components represent an orthogonal transformation whose coordinate axes are the axes of the ellipsoid and with lengths proportional to , with being the eigenvalues of . Since all eigenvectors have been chosen such that their norm is equal to 1, the absolute value of the ith component is the length of the projection of the vector on the vector . Therefore, the principal components can be seen as a translation of the origin to the point and a rotation of the axes until they pass through the directions with greater variability.
When there is a high positive correlation between all the variables and a principal component with all its coordinates of the same sign, this component can be considered as a weighted average of all the variables or the size of the index that forms that component. The components that have coordinates of different signs oppose a subset of variables against another, being a weighted average of two groups of variables.
The interpretation of the results is simplified assuming that the small coefficients are zero and rounding the rest to express the component as sums, differences, or quotients of variables.
The interpretation of the principal components can be facilitated by graphic representations in two dimensions. A usual graph is to represent two components as coordinate axes and project all points on those axes. These representations also help to test hypotheses of normality and to detect anomalous observations. If there is an observation that is atypical in the first variable, we will have that the variability in that first variable will grow and that the covariance with the other variables will decrease, in absolute value. Consequently, the first component will be strongly influenced by the first variable, distorting the analysis.
Sometimes, it is necessary to verify that the first components are approximately normal, although it is not reasonable to expect this result from a linear combination of variables that do not have to be normal.
The last component can help detect suspicious observations. Each observation can be expressed as a linear combination of the eigenvectors of , , with which the difference between the first components and the observation is , which is a vector with square of the norm , and we will suspect of observations that have a large contribution to the square of the aforementioned norm.
An especially small value of the last eigenvalue of the covariance matrix, or correlation matrix, can indicate a linear dependence between the variables that have not been taken into account. In this case, some variable is redundant and should be removed from the analysis. If we have four variables and the fourth is the sum of the other three, then the last eigenvalue will be close to zero due to rounding errors, in which case we should suspect some dependence. In general, eigenvalues close to zero should not be ignored, and eigenvalues associated with these eigenvalues can indicate linear dependencies in the data and cause deformations in the interpretations, calculations, and consequent analysis.
4.2. Standardized sample principal components
In general, principal components are not invariant against changes of scale in the original variables, as has been mentioned when referring to the normalized population principal components. Normalizing, or standardizing, the variables consists of performing the following transformation , . If the matrix is the by matrix whose columns are , it can be shown that its sample mean vector is the null vector and that its correlation matrix is the sample correlation matrix, , of the original variables.
Remark 4.1: Applying that the principal components of the normalized variables are those obtained for the sample observations but substituting the matrix for , we can establish that if are the normalized observations, with covariance matrix , where is the sample correlation coefficient between observations and , and if the pairs of eigenvalues and eigenvectors of are , with , then
The ith principal component is given by , .
The sample variance of is , .
The sample covariance of , , is equal to 0.
The total sample variance is .
The sample correlation coefficients between and are , .
The proportion of the total sample variance explained by the ith principal component is .
4.3. Criteria for reducing the dimension
The eigenvalues and eigenvectors of the covariance matrix, or correlation matrix, are the essence of the analysis of principal components, since the eigenvalues indicate the directions of maximum variability and the eigenvectors determine the variances. If a few eigenvalues are much larger than the rest, most of the variance can be explained with less than variables.
In practice, decisions about the number of components to be considered must be made in terms of the pairs of eigenvalues and eigenvectors of the covariance matrix, or correlation matrix, and different rules have been suggested:
When performing the graph , it has been empirically verified that with the first values there is a decrease with a linear tendency of quite steep slope and that from a certain eigenvalue this decrease is stabilized. That is, there is a point from which the eigenvalues are very similar. The criterion consists of staying with the components that exclude the small eigenvalues and that are approximately equal.
Select components until obtaining a proportion of the preset variance (e.g., ). This rule should be applied with care, since components that are interesting to reflect certain nuances suitable for the interpretation of the analysis could be excluded.
A rule that does not have a great theoretical support, which must be applied carefully so as not to discard any valid component for the analysis, but which has given good empirical results, is to retain those components with variances, , above a certain threshold. If the work matrix is the correlation matrix, in which case the average value of the eigenvalues is one, the criterion is to keep the components associated with eigenvalues greater than unity and discard the rest.
5. Application to image compression
We are going to illustrate the use of principal components to compress images. To this end, the image of Lena was considered. This photograph has been used by engineers, researchers, and students for experiments related to image processing.
5.1. Black and white photography
The black and white photograph shown in Figure 6 was considered. First, the image in .jpg format was converted into the numerical matrix Image of dimension by (i.e., x ). Second, to obtain the observation vectors, the matrix was divided into blocks of dimension x , , with which blocks were obtained, and each of them was a vector of observations.
Third, each matrix was stored in a vector of dimension , , which contained the elements of the matrix by rows, that is, . This way, we had the observations , which were grouped in the observation matrix .
Fourth, the average of each column, , was calculated obtaining the vector of means, and from each observation , its corresponding mean was subtracted. Thus, the matrix of centered observations was obtained. The covariance matrix of was .
Fifth, the pairs of eigenvalues and eigenvectors of , , were found, and they were ordered according to the eigenvalues from highest to lowest. The largest eigenvalues are drawn in Figure 7. As can be seen, the first eigenvalue is much larger than the rest. Thus, the first principal component completely dominates the total variability.
Sixth, with the theoretical results and the calculations previously made, the principal components , , were built. The first principal component was . Therefore, an orthonormal basis of was built.
Seventh, each vector was grouped by rows in a matrix :
Each of the matrices was converted into an image. The images of the first three principal components are shown in Figure 8.
At this point, it is important to mention that the data matrix has been assumed to be formed by vectors of expressed in the canonical base, . Also, the base whose vectors were the eigenvectors of , , was considered. The coordinates with respect to the canonical basis of the vectors of were the columns of the matrix . Then, given a vector that with respect to the canonical base had coordinates and with respect to the base had coordinates , the relation between them was . Also, as is an orthogonal matrix, . Thus, the coordinates of the vectors that formed the observations matrix had as coordinates, with respect to the new base, the rows of the matrix of dimension x given by .
Eight, in order to reduce the dimension, it was taken into consideration that if we keep all the vectors of , we can perfectly reconstruct our data matrix, because . Additionally, for the case under study, to reduce the dimension, if we use the slope change rule, we can consider the first two principal components; five components if we want to explain of the variability, because ; or eight components if we want to explain of the total variability.
In order to compress the image, the first vectors of the base were used. Moreover, supposing that we were left with , , the matrix given by Eq. (19) was defined:
Therefore, the dimension of was × .
Ninth, to reconstruct the compressed image, each row of was regrouped in an x matrix. The ith row of , denoted by , was transformed into the matrix given by Eq. (20), and the matrix given by Eq. (21) was built:
By increasing the number of principal components, the percentage of the variability explained is increased by very small percentages, but, nevertheless, nuances are added to the photo sufficiently remarkable, since they make it sharper, smooth out the contours, and mark the tones more precisely.
5.1.1. Objective measures of the quality of reconstructions
The two methods that we will use are the peak signal-to-noise ratio () and the entropy of the error image. The PSNR measure evaluates the quality in terms of deviations between the processed and the original image, and the entropy of an image is a measure of the information content contained in that image.
Definition 5.1: Let be the number of rows by the number of columns in the image. Let be the set of pixels of the original image. Let be the set of reconstruction pixels. Let be the error. The mean square error () is
Definition 5.2: Let the images under study be the bit images. The peak signal-to-noise ratio of the reconstruction is
Figure 10 (a) shows of the reconstructions of the image versus the number of principal components used for the reconstruction, together with the regression line that adjusts the said cloud of points. Figure 10 (b) shows the values of the PSNR when we use three quarters (black), half (red), quarter (blue), eighth (green), sixteenth (brown), and the thirty-second part (yellow) of the components, which means a corresponding reduction in compression. A behavior close to linearity with a slope of approximately can be seen. With the reductions considered, the varies between and .
If the entropy is high, the variability of the pixels is very high, and there is little redundancy. Thus, if we exceed a certain threshold in compression, the original image cannot be recovered exactly. If the entropy is small, then the variability will be smaller. Therefore, the information of a pixel with respect to the pixels of its surroundings is high and, therefore, randomness is lost.
Definition 5.3: Let be an bit image that can take the values . Let be the frequency with which the value appears. Then, the entropy is
Figure 11 (a) shows the entropy of the reconstructions from to components. As can be seen, the entropy is increasing until the first 10 components, and then it becomes damped tending asymptotically to the value of the entropy of the image (). It can be seen that the difference with more than components is insignificant. Figure 11 (b) shows the entropy of the reconstructions using components (black), components (brown), components (green), components (blue), and components (red), respectively.
Finally, we consider the entropy of the images of the errors. Given an image, , the value of each of its pixels is an element of the set , and if we have a reconstruction, , and consider the error, , then the value of its pixels will be an element of the set . Therefore, cannot be considered as an image. Since a pixel of value in is an error of the same size as , to consider images we denominate image of the error to , being .
Figure 12 (a) shows the entropy of the error image versus the number of principal components used for the reconstruction, together with an adjusted line of slope − 0.02. Figure 12(b) shows the entropy when we use 8 components (black), 16 components (brown), 32 components (green), 64 components (blue), and 128 components (red), respectively. With more than principal components, the entropy of the errors is zero, which means that the errors have very little variability, and with fewer components, the decrease seems linear with slope .
5.2. Coordinates of the first principal component
In this section, we will consider the coordinates of the first vectors that form the principal components. If we consider that the vectors have been obtained as x dimension blocks, vectors will have coordinates. Figure 13 shows the coordinates of the first six principal components with respect to the canonical base.
As can be seen from Figure 13, all coordinates seem to have some component with period . This suggests that there may be some relationship with the shape of the blocks chosen and that most vectors are close to being periodic with period 8, because when we consider each of the vectors of components, the first pixels are adjacent to the next pixels, and these are adjacent to the next pixels, and so on, up to times.
Since the first principal components collect a large part of the characteristics of the vectors, it is plausible that they also reflect the periodicity of the vectors. Recall that principal components are linear combinations of vectors and that if all of them had all their periodic coordinates with the same period, then all components would be periodic as well.
In Figure 14, the coordinates of the first three principal components are shown when the vectors are constructed from blocks of x (see Figure 14 (a-c)) and from blocks of x (see Figure 14 (d-f)). As can be seen, the periodicity in the first components is again appreciated.
5.3. Reduction of the first principal component by periodicity
Using the almost periodicity of the first principal component, we can use less information to obtain acceptable reconstructions of the image. If in the first principal component of dimension we repeat the first eight values periodically and use principal components to reconstruct the image, we go from a reduction of to another of . Figure 15 shows both the reconstruction of the image with and original principal components and the reconstruction of the image with and principal components, but with the first component replaced by a vector whose coordinates have period , we call this .
The first component is not the true one. Therefore, reconstructions from this set cannot be made with total precision. If we use to compare the -norm, -norm, and -norm of the image and the corresponding reconstruction, with the original principal components and the principal components using their periodicity, we obtain, by varying the number of used principal components, the results shown in Figure 16.
With the original principal components (blue), the original image can be completely reconstructed, while if we use only a few components, in this case or less, approximations similar to the original ones are obtained with (green).
This chapter has been devoted to give a short but comprehensive introduction to the basics of the statistical technique known as principal component analysis, aimed at its application to image compression. The first part of the chapter was focused on preliminaries, mean vector, covariance matrix, eigenvectors, eigenvalues, and distances. That part finished bringing up the problems that the Euclidean distance presents and highlights the importance of using a statistical distance that takes into account the different variabilities and correlations. To that end, a brief introduction was made to a distance that depends on variances and covariances.
Next, in the second part of the chapter, principal components were introduced and connected with the previously explained concepts. Here, principal components were presented as a particular case of linear combinations of random variables, but with the peculiarity that those linear combinations represent a new coordinate system that is obtained by rotating the original reference system, which has the aforementioned random variables as coordinate axes. The new axes represent the directions with maximum variability and provide a simple description of the structure of the covariance.
Then, the third part of the chapter was devoted to show an application of principal component analysis to image compression. An original image was taken and compressed by using different principal components. The importance of carrying out objective measures of quality reconstructions was highlighted. Also, a novel contribution of this chapter was the introduction to the study of the periodicity of the principal components and to the importance of the reduction of the first principal component by periodicity. In short, a novel construction of principal components by periodicity of principal components has been included, in order to reduce the computational cost for their calculation, although decreasing the accuracy. It can be said that using the almost periodicity of the first principal component, less information to obtain acceptable reconstructions of the image can be used.
Finally, we would not like to finish this chapter without saying that few pages cannot gather the wide range of applications that this statistical technique has found in solving real-life problems. There is a countless number of applications of principal component analysis to solve problems that both scientists and engineers have to face in real-life situations. However, in order to be practical, it was decided to choose and develop step by step an application example that could be of interest for a wide range of readers. Accordingly, we thought that such an example could be one related to data compression, because with the advancements of information and communication technologies both scientists and engineers need to either store or transmit more information at lower costs, faster, and at greater distances with higher quality. In this sense, one example is image compression by using statistical techniques, and this is the reason why, in this chapter, it was decided to take advantage of statistical properties of an image to present a practical application of principal component analysis to image compression.
This work was supported by the Universidad de Las Americas, Ecuador, and the Universidad Politecnica de Madrid, Spain.