Two-Dimensional Principal Component Analysis and Its Extensions

Normally in Principal Component Analysis (PCA) (Sirovich & Kirby, 1987; Turk & Pentland, 1991), the 2D image matrices are firstly transformed to 1D image vectors by vectorization. The vectorization of a matrix is the column vector obtain by stacking the columns of the matrix on top of one another. The covariance or scatter matrix are formulated from the these image vectors. The covariance matrix will be well estimated if and only if the number of available training samples is not far smaller than the dimension of this matrix. In fact, it is too hard to collect this the number of samples. Then, normally in 1D subspace analysis, the estimated covariance matrix is not well estimated and not full rank.


Introduction
Normally in Principal Component Analysis (PCA) (Sirovich & Kirby, 1987;Turk & Pentland, 1991), the 2D image matrices are firstly transformed to 1D image vectors by vectorization.The vectorization of a matrix is the column vector obtain by stacking the columns of the matrix on top of one another.The covariance or scatter matrix are formulated from the these image vectors.The covariance matrix will be well estimated if and only if the number of available training samples is not far smaller than the dimension of this matrix.In fact, it is too hard to collect this the number of samples.Then, normally in 1D subspace analysis, the estimated covariance matrix is not well estimated and not full rank.Two-Dimensional Principal Component Analysis (2DPCA) was proposed by Yang et al. (2004) to apply with face recognition and representation.Evidently, the experimental results in Kong et al. (2005); Yang & Yang (2002); Yang et al. (2004); Zhang & Zhou (2005) have shown the improvement of 2DPCA over PCA on several face databases.Unlike PCA, the image covariance matrix is computed directly on image matrices so the spatial structure information can be preserved.This yields a covariance matrix whose dimension just equals to the width of the face image.This is far smaller than the size of covariance matrix in PCA.Therefore, the image covariance matrix can be better estimated and will usually be full rank.That means the curse of dimensionality and the Small Sample Size (SSS) problem can be avoided.
In this chapter, the detail of 2DPCA's extensions will be presented as follows: The bilateral projection scheme, the kernel version, the supervised framework, the variation of image alignment and the random approaches.
For the first extension, there are many techniques were proposed in bilateral projection s c h e m e ss u c ha s2 D 2 PCA (Zhang & Zhou, 2005), Bilateral 2DPCA (B2DPCA) (Kong et al., 2005), Generalized Low-Rank Approximations of Matrices (GLRAM) (Liu & Chen, 2006;Liu et al., 2010;Ye, 2004), Bi-Dierectional PCA (BDPCA) (Zuo et al., 2005) and Coupled Subspace Analysis (CSA) (Xu et al., 2004).The left and right projections are determined by solving two eigenvalue problems per iteration.One corresponds to the column direction and another one corresponds to the row direction of image, respectively.In this way, it is not only consider the image in both directions but also reduce the feature matrix smaller than the original 2DPCA.
As the successful of the kernel method in kernel PCA (KPCA), the kernel based 2DPCA was proposed as Kernel 2DPCA (K2DPCA) in Kong et al. (2005).That means the nonlinear mapping can be utilized to improve the feature extraction of 2DPCA. 1 www.intechopen.comSince 2DPCA is unsupervised projection method, the class information is ignored.To embed this information for feature extraction, the Linear Discriminant Analysis (LDA) is applied in Yang et al. (2004).Moreover, the 2DLDA was proposed and then applied with 2DPCA in Sanguansat et al. (2006b).Another method was proposed in Sanguansat et al. (2006a) based on class-specific subspace which each subspace is constructed from only the training samples in own class while only one subspace is considered in the conventional 2DPCA.In this way, their representation can provide the minimum reconstruction error.
Because of the image covariance matrix is the key of 2DPCA and it is corresponding to the alignment of pixels in image.Different image covariance matrix will obtain the difference information.To produce alternated version of the image covariance matrix, it can be done by rearranging the pixels.The diagonal alignment 2DPCA and the generalized alignment 2DPCA were proposed in Zhang et al. (2006) and Sanguansat et al. (2007a), respectively.
Finally, the random subspace based 2DPCA were proposed by random selecting the subset of eigenvectors of image covariance matrix as in Nguyen et al. (2007); Sanguansat et al. (2007b;n.d.) to build the new projection matrix.From the experimental results, some subset eigenvectors can perform better than others but it cannot predict by their eigenvalues.However, the mutual information can be used in filter strategy for selecting these subsets as shown in Sanguansat (2008).

Two-dimensional principal component analysis
Let each image is represented by a m by n matrix A of its pixels' gray intensity.We consider linear projection of the form y = Ax,( where x is an n dimensional projection axis and y is the projected feature of this image on x, called principal component vector. In original algorithm of 2DPCA (Yang et al., 2004), like PCA, 2DPCA search for the optimal projection by maximize the total scatter of projected data.Instead of using the criterion as in PCA, the total scatter of the projected samples can be characterized by the trace of the covariance matrix of the projected feature vectors.From this point of view, the following criterion was adopt as J(x)=tr(S x ),( 2 ) where The total power equals to the sum of the diagonal elements or trace of the covariance matrix, the trace of S x can be rewritten as This matrix G is called image covariance matrix.Therefore, the alternative criterion can be expressed by J(x)=tr(x T Gx),( 6) where the image inner-scatter matrix Gx is computed in a straightforward manner by ( 7 )   where Ā denotes the average image, It can be shown that the vector x maximizing Eq. ( 4) correspond to the largest eigenvalue of G (Yang & Yang, 2002).This can be done, for example, by using the Eigenvalue decomposition or Singular Value Decomposition (SVD) algorithm.However, one projection axis is usually not enough to accurately represent the data, thus several eigenvectors of G are needed.The number of eigenvectors (d) can be chosen according to a predefined threshold (θ). Let eigenvalues of G which sorted in non-increasing order.We select the d first eigenvectors such that their corresponding eigenvalues satisfy For feature extraction, Let x 1 ,...,x d be d selected largest eigenvectors of G.E a c hi m a g eA is projected onto these d dimensional subspace according to Eq. ( 1).The projected image Y =[y 1 ,...,y d ] is then an m by d matrix given by: where X =[x 1 ,...,x d ] is a n by d projection matrix.

Column-based 2DPCA
The original 2DPCA can be called the row-based 2DPCA.The alternative way of 2DPCA can be using the column instead of row, column-based 2DPCA (Zhang & Zhou, 2005).
This method can be consider as same as the original 2DPCA but the input images are previously transposed.From Eq. ( 7), replace the image A with the transposed image A T and call it the column-based image covariance matrix H,thus Similarly in Eq. ( 10), the column-based optimal projection matrix can be obtained by computing the eigenvectors of H (z) corresponding to the q largest eigenvalues as where Z =[z 1 ,...,z q ] is a m by q column-based optimal projection matrix.The value of q can also be controlled by setting a threshold as in Eq. ( 9).

The relation of 2DPCA and PCA
As Kong et al. (2005) 2DPCA, performed on the 2D images, is essentially PCA performed on the rows of the images if each row is viewed as a computational unit.That means the 2DPCA of an image can be viewed as the PCA of the set of rows of an image.The relation between 2DPCA and PCA can be proven that by rewriting the image covariance matrix G in normal covariance matrix as where f ( Ā) can be neglected if the data were previously centralized.Thus, where and the term 1 mM CC T is the covariance matrix of rows of all images.

Bilateral projection frameworks
There are two major difference techniques in this framework, i.e. non-iterative and iterative.All these methods use two projection matrices for both row and column.The former computes these projections separately while the latter computes them simultaneously via iterative process.

Non-iterative method
The non-iterative bilateral projection scheme was applied to 2DPCA via left and right multiplying projection matrices Xu et al. (2006); Zhang & Zhou (2005); Zuo et al. (2005) as follows where B is a feature matrix which extracted from image A and Z is a left multiplying projection matrix.Similar to the right multiplying projection matrix X in Section 2, matrix Z is a m by q projection matrix that obtained by choosing the eigenvectors of image covariance matrix H corresponding to the q largest eigenvalues.Therefore, the dimension of feature matrix is decreasing from m × n to q × d (q < m and d < n).In this way, the computation time also be reducing.Moreover, the recognition accuracy of B2DPCA is often better than 2DPCA as the experimental results in Liu & Chen (2006); Zhang & Zhou (2005); Zuo et al. (2005).

Iterative method
The bilateral projection scheme of 2DPCA with the iterative algorithm was proposed in Kong et al. (2005); Liu et al. (2010); Xu et al. (2004); Ye (2004).Let Z ∈ R m×q and X ∈ R n×d be the left and right multiplying projection matrix respectively.For an m × n image A k and q × d projected image B k , the bilateral projection is formulated as follows: where B k is the extracted feature matrix for image A k .
The optimal projection matrices, Z and X in Eq. ( 17) can be computed by solving the following minimization criterion that the reconstructed image, ZB k X T , gives the best approximation of A k : where M is the number of data samples and • F is the Frobenius norm of a matrix.
The detailed iterative scheme designed to compute the optimal projection matrices, Z and X, is listed in Table 1.The obtained solutions are locally optimal because the solutions are dependent on the initialized Z 0 .In Kong et al. (2005), the initialized Z 0 sets to the m × m identity matrix I m , while this value is set to I q 0 in Ye (2004), where I q is the q × q identity matrix.
Alternatively, The criterion in Eq. ( 18) is biquadratic and has no closed-form solution.
Therefore, an iterative procedure to obtain the local optimal solution was proposed in Xu et al. (2004).For X ∈ R n×d , the criterion in Eq. ( 18) can be rewritten as where The solution of Eq. ( 19) is the eigenvectors of the eigenvalue decomposition of image covariance matrix: Similarly, for Z ∈ R m×q , the criterion in Eq. ( 18) is changed to S 1 : Initialize Z, Z = Z 0 and i = 0 S 2 : While not convergent Compute the q eigenvectors {e Z j } q j=1 of H corresponding to the largest l eigenvalues S 8 : where Again, the solution of Eq. ( 21) is the eigenvectors of the eigenvalue decomposition of image covariance matrix: By iteratively optimizing the objective function with respect to Z and X, respectively, we can obtain a local optimum of the solution.The whole procedure, namely Coupled Subspace Analysis (CSA) Xu et al. (2004), is shown in Table 2.

Kernel based frameworks
From Section 2.2, 2DPCA which performed on the 2D images, is basically PCA performed on the rows of the images if each row is viewed as a computational unit.
Similar to 2DPCA, the kernel-based 2DPCA (K2DPCA) can be processed by traditional kernel PCA (KPCA) in the same manner.Let a i k is the i-th row of the k-th image, thus the k-th image can be rewritten as From Eq. ( 15), the covariance matrix C can be constructed by concatenating all rows of all training images together.Let ϕ : R m → R m ′ , m < m ′ be the mapping function that map the the row vectors into a feature space of higher dimensions in which the classes can be linearly 6 Principal Component Analysis www.intechopen.com Compute the q eigenvectors {e Z j } q j=1 of H corresponding to the largest q eigenvalues S 10 : separated.Therefore the element in the kernel matrix K can be computed by which is an mM-by-mM matrix.Unfortunately, there is a critical problem in implementation about the dimension of its kernel matrix.The kernel matrix is M × M matrix in KPCA, where M is the number of training samples, while it is mM × mM matrix in K2DPCA, where m is the number of row of each image.Thus, the K2DPCA kernel matrix is m 2 times of KPCA kernel matrix.For example, if the training set has 200 images with dimensions of 100 × 100 then the dimension of kernel matrix shall be 20000 × 20000, that is very big for fitting in memory unit.
After that the projection can be formed by the eigenvectors of this kernel matrix as same as the traditional KPCA.

Supervised frameworks
Since the 2DPCA is the unsupervised technique, the class information is neglected.This section presents two methods which can be used to embedded class information to 2DPCA.Firstly, Linear Discriminant Analysis (LDA) is implemented in 2D framework.Secondly, an 2DPCA is performed for each class in class-specific subspace.

Two-dimensional linear discriminant analysis of principal component vectors
The PCA's criterion chooses the subspace in the function of data distribution while Linear Discriminant Analysis (LDA) chooses the subspace which yields maximal inter-class distance, and at the same time, keeping the intra-class distance small.In general, LDA extracts features which are better suitable for classification task.However, when the available number of training samples is small compared to the feature dimension, the covariance matrix estimated by these features will be singular and then cannot be inverted.This is called singularity problem or Small Sample Size (SSS) problem Fukunaga (1990).
Various  1998) demonstrates a significant improvement when applying LDA over principal components from the PCA-based subspace.Since both PCA and LDA can overcome the drawbacks of each other.PCA is constructed around the criteria of preserving the data distribution.Hence, it is suited for representation and reconstruction from the projected feature.However, in the classification tasks, PCA only normalize the input data according to their variance.This is not efficient since the between classes relationship is neglected.In general, the discriminant power depends on both within and between classes relationship.LDA considers these relationships via the analysis of within and between-class scatter matrices.Taking this information into account, LDA allows further improvement.Especially, when there are prominent variation in lighting condition and expression.Nevertheless, all of above techniques, the spatial structure information still be not employed.Two-Dimensional Linear Discriminant Analysis (2DLDA) was proposed in Ye et al. (2005).For overcoming the SSS problem in classical LDA by working with images in matrix representation, like in 2DPCA.In particular, bilateral projection scheme was applied there via left and right multiplying projection matrices.In this way, the eigenvalue problem was solved two times per iteration.One corresponds to the column direction and another one corresponds to the row direction of image, respectively Because of 2DPCA is more suitable for face representation than face recognition, like PCA.For better performance in recognition task, LDA is still necessary.Unfortunately, the linear transformation of 2DPCA reduces the input image to a vector with the same dimension as the number of rows or the height of the input image.Thus, the SSS problem may still occurred when LDA is performed after 2DPCA directly.To overcome this problem, a simplified version of the 2DLDA is applied only unilateral projection scheme, based on the 2DPCA concept (Sanguansat et al., 2006b;c).Applying 2DLDA to 2DPCA not only can solve the SSS problem and the curse of dimensionality dilemma but also allows us to work directly on the image matrix in all projections.Hence, spatial structure information is maintained and the size of all scatter matrices cannot be greater than the width of face image.Furthermore, computing with this dimension, the face image do not need to be resized, since all information still be preserved.

Two-dimensional linear discriminant analysis (2DLDA)
Let z be a q dimensional vector.A matrix A is projected onto this vector via the similar transformation as Eq. ( 1): v=Az. ( This projection yields an m dimensional feature vector. 2DLDA searches for the projection axis z that maximizing the Fisher's discriminant criterion Belhumeur et al. (1997); Fukunaga (1990): where S w is the within-class scatter matrix and S b is the between-class scatter matrix.In particular, the within-class scatter matrix describes how data are scattered around the means of their respective class, and is given by where K is the number of classes, Pr(ω i ) is the prior probability of each class, and H = A − EA.The between-class scatter matrix describes how different classes.Which represented by their expected value, are scattered around the mixture means by where With the linearity properties of both the trace function and the expectation, J(z) may be rewritten as 9 Two-Dimensional Principal Component Analysis and Its Extensions www.intechopen.com Furthermore, Sb and Sw can be evaluated as follows: where n i and Āi are the number of elements and the expected value of class ω i respectively.Ā denotes the overall mean.
Then the optimal projection vector can be found by solving the following generalized eigenvalue problem: Again the SVD algorithm can be applied to solve this eigenvalue problem on the matrix S−1 w Sb .Note that, in this size of scatter matrices involved in eigenvalue decomposition process is also become n by n.Thus, with the limited the training set, this decomposition is more reliably than the eigenvalue decomposition based on the classical covariance matrix.
The number of projection vectors is then selected by the same procedure as in Eq. ( 9).Let Z =[z 1 ,...,z q ] be the projection matrix composed of q largest eigenvectors for 2DLDA.Given a m by n matrix A, its projection onto the principal subspace spanned by z i is then given by The result of this projection V is another matrix of size m by q.Like 2DPCA, this procedure takes a matrix as input and outputs another matrix.These two techniques can be further combined, their combination is explained in the next section.

2DPCA+2DLDA
In this section, we apply an 2DLDA within the well-known frameworks for face recognition, the LDA of PCA-based feature (Zhao, Chellappa & Krishnaswamy, 1998).This framework consists of 2DPCA and 2DLDA steps, namely 2DPCA+2DLDA.From Section 2, we obtain a linear transformation matrix X onwhicheachinputfaceimageA is projected.At the 2DPCA step, a feature matrix Y is obtained.The matrix Y is then used as the input for the 2DLDA step.Thus, the evaluation of within and between-class scatter matrices in this step will be slightly changed.From Eqs. ( 30) and ( 31), the image matrix A is substituted for the 2DPCA feature matrix Y as follows where Y k is the feature matrix of the k-th image matrix A k , Ȳi be the average of Y k which belong to class ω i and Ȳ denotes a overall mean of Y, The 2DLDA optimal projection matrix Z c a nb eob t a in edb ysolv in gt h eeigen v a luepr ob lem in Eq. ( 32).Finally, the composite linear transformation matrix, L=XZ, is used to map the face image space into the classification space by, The matrix D is 2DPCA+2DLDA feature matrix of image A with dimension m by q.However, the number of 2DLDA feature vectors q cannot exceed the number of principal component vectors d.In general case (q < d), the dimension of D is less than Y in Section 2. Thus, 2DPCA+2DLDA can reduce the classification time compared to 2DPCA.

Class-specific subspace-based two-dimensional principal component analysis
2DPCA is a unsupervised technique that is no information of class labels are considered.Therefore, the directions that maximize the scatter of the data from all training samples might not be as adequate to discriminate between classes.In recognition task, a projection that emphasize the discrimination between classes is more important.The extension of Eigenface, PCA-based, was proposed by using alternative way to represent by projecting to Class-Specific Subspace (CSS) (Shan et al., 2003).In conventional PCA method, the images are analyzed on the features extracted in a low-dimensional space learned from all training samples from all classes.While each subspaces of CSS learned from training samples from one class.In this way, the CSS representation can provide a minimum reconstruction error.
The reconstruction error is used to classify the input data via the Distance From CSS (DFCSS).
Less DFCSS means more probability that the input data belongs to the corresponding class.
This extension was based on Sanguansat et al. (2006a).Let G k be the image covariance matrix of the k th CSS.Then G k can be evaluated by where Āk is the average image of class ω k .T h ek th projection matrix X k is a n by d k projection matrix which composed by the eigenvectors of G k corresponding to the d k largest eigenvalues.
The k th CSS of 2DPCA was represented as a 3-tuple: Let S be a input sample and U k be a feature matrix which projected to the k th CSS, by where W k = S − Āk .Then the reconstruct image W r k can be evaluates by Therefore, the DFCSS is defined by reconstruction error as follows 11 Two-Dimensional Principal Component Analysis and Its Extensions www.intechopen.comFor illustration, we assume that there are 4 classes, as shown in Fig. 1.The input image must be normalized with the averaging images of all 4 classes.And then project to 2DPCA subspaces of each class.After that the image is reconstructed by the projection matrices (X)ineachclass.
The DFCSS is used now to measure the similarity between the reconstructed image and the normalized original image on each CSS.From Fig. 1, the DFCSS of the first class is minimum, thus we decide this input image is belong to the first class.

Alignment based frameworks
Since 2DPCA can be viewed as the row-based PCA, that means the information contains only in row direction.Although, combining it with the column-based 2DPCA can consider the information in both row and column directions.But there still be other directions which should be considered.

Diagonal-based 2DPCA (DiaPCA)
The motivation for developing the DiaPCA method originates from an essential observation on the recently proposed 2DPCA (Yang et al., 2004).In contrast to 2DPCA, DiaPCA seeks the optimal projective vectors from diagonal face images and therefore the correlations between variations of rows and those of columns of images can be kept.Therefore, this problem can solve by transforming the original face images into corresponding diagonal face images, as shown in Fig. 2 and Fig. 3.Because the rows (columns) in the transformed diagonal face images simultaneously integrate the information of rows and columns in original images, it can reflect both information between rows and those between columns.Through the entanglement of row and column information, it is expected that DiaPCA may find some useful block or structure information for recognition in original images.The sample diagonal face images on Yale database are displayed in Fig. 4.
Experimental results on a subset of FERET database (Zhang et al., 2006) show that DiaPCA is more accurate than both PCA and 2DPCA.Furthermore, it is shown that the accuracy can be further improved by combining DiaPCA and 2DPCA together.

Image cross-covariance analysis
In PCA, the covariance matrix provides a measure of the strength of the correlation of all pixel pairs.Because of the limit of the number of training samples, thus this covariance cannot be well estimated.While the performance of 2DPCA is better than PCA, although all of the correlation information of pixel pairs are not employed for estimating the image covariance matrix.Nevertheless, the disregard information may possibly include the useful information.Sanguansat et al. (2007a) proposed a framework for investigating the information which was neglected by original 2DPCA technique, so called Image Cross-Covariance Analysis (ICCA).
To achieve this point, the image cross-covariance matrix is defined by two variables, the first variable is the original image and the second one is the shifted version of the former.By our shifting algorithm, many image cross-covariance matrices are formulated to cover all of the information.The Singular Value Decomposition (SVD) is applied to the image cross-covariance matrix for obtaining the optimal projection matrices.And we will show that these matrices can be considered as the orthogonally rotated projection matrices of traditional 2DPCA.ICCA is different from the original 2DPCA on the fact that the transformations of our method are generalized transformation of the original 2DPCA.
First of all, the relationship between 2DPCA's image covariance matrix G,inEq.(5),andPCA's covariance matrix C can be considered as where G(i, j) and C(i, j) are the i th row, j th column element of matrix G and matrix C, respectively.And m is the height of the image.
For illustration, let the dimension of all training images are 3 by 3. Thus, the covariance matrix of these images will be a 9 by 9 matrix and the dimension of image covariance matrix is only 3 by 3, as shown in Fig. 5.
From Eq. ( 43), each elements of G is the sum of all the same label elements in C, for example: It should be note that the total power of image covariance matrix equals and traditional covariance matrix C are identical, tr(G)=tr(C). (45) From this point of view in Eq. ( 43), we can see that image covariance matrix is collecting the classification information only 1/m of all information collected in traditional covariance matrix.However, there are the other (m − 1)/m elements of the covariance matrix still be not  considered.By the experimental results in Sanguansat et al. (2007a).For investigating how the retaining information in 2D subspace is rich for classification, the new G is derived from the PCA's covariance matrix as where 1 ≤ L ≤ mn.
The G L can also be determined by applying the shifting to each images instead of averaging certain elements of covariance matrix.Therefore, the G L can alternatively be interpreted as the image cross-covariance matrix or where B L is the L th shifted version of image A that can be created via algorithm in Table 3.The samples of shifted images B L are presented in Fig. 6.
In 2DPCA, the columns of the projection matrix, X, are obtained by selection the eigenvectors which corresponding to the d largest eigenvalues of image covariance matrix, in Eq. ( 5).For understanding the relationship between the ICCA projection matrix and the 2DPCA projection matrix, we will investigate in the simplest case, i.e. there are only one training

Random frameworks
In feature selection, the random subspace method can improve the performance by combining many classifiers which corresponds to each random feature subset.In this section, the random method is applied to 2DPCA in various ways to improve its performance.
7.1 Two-dimensional random subspace analysis (2DRSA) The main disadvantage of 2DPCA is that it needs many more coefficients for image representation than PCA.Many works try to solve this problem.In Yang et al. (2004), PCA is used after 2DPCA for further dimensional reduction, but it is still unclear how the dimension of 2DPCA could be reduced directly.Many methods to overcome this problem were proposed by applied the bilateral-projection scheme to 2DPCA.In Zhang & Zhou (2005); Zuo et al. (2005), the right and left multiplying projection matrices are calculated independently while the iterative algorithm is applied to obtain the optimal solution of these projection matrices in Kong et al. (2005); Ye (2004).And the non-iterative algorithm for optimization was proposed in Liu & Chen (2006).In Xu et al. (2004), they proposed the iterative procedure which the right projection is calculated by the reconstructed images of the left projection and the left projection is calculated by the reconstructed images of the right projection.Nevertheless, all of above methods obtains only the local optimal solution.
Another method for dealing with high-dimensional space was proposed in Ho (1998b), called Random Subspace Method (RSM).This method is the one of ensemble classification methods, like Bagging Breiman (1996) and Boosting Freund & Schapire (1995).However, Bagging and Boosting are not reduce the high-dimensionality.Bagging randomly select a number of samples from the original training set to learn an individual classifier while Boosting specifically weight each training sample.The RSM can effectively exploit the high-dimensionality of the data.It constructs an ensemble of classifiers on independently selected feature subsets, and combines them using a heuristic such as majority voting, sum rule, etc.
There are many reasons the Random Subspace Method is suitable for face recognition task.Firstly, this method can take advantage of high dimensionality and far away from the curse of dimensionality (Ho, 1998b).Secondly, the random subspace method is useful for critical training sample sizes (Skurichina & Duin, 2002).Normally in face recognition, the dimension of the feature is extremely large compared to the available number of training samples.Thus applying RSM can avoid both of the curse of dimensionality and the SSS problem.Thirdly, The nearest neighbor classifier, a popular choice in the 2D face-recognition domain (Kong et al., 2005;Liu & Chen, 2006;Yang et al., 2004;Ye, 2004;Zhang & Zhou, 2005;Zuo et al., 2005), can be very sensitive to the sparsity in the high-dimensional space.Their accuracy is often far from optimal because of the lack of enough samples in the high-dimensional space.The RSM brings significant performance improvements compared to a single classifier Ho (1998a); Skurichina & Duin (2002).Finally, since there is no hill climbing in RSM, there is no danger of being trapped in local optima Ho (1998b).
The RSM was applied to PCA for face recognition in Chawla & Bowyer (2005).They apply the random selection directly to the feature vector of PCA for constructing the multiple subspaces.Nevertheless, the information which contained in each element of PCA feature vector is not equivalent.Normally, the element which corresponds to the larger eigenvalue, contains more useful information.Therefore, applying RSM to PCA feature vector is seldom appropriate.Table 4. Two-Dimensional Random Subspace Analysis Algorithm Different from PCA, the 2DPCA feature is a matrix form.Thus, RSM is more suitable for 2DPCA, because the column direction does not depend on the eigenvalue.
A framework of Two-Dimensional Random Subspace Analysis (2DRSA) (Sanguansat et al., n.d.) is proposed to extend the original 2DPCA.The RSM is applied to feature space of 2DPCA for generating the vast number of feature subspaces, which be constructed by an autonomous, pseudorandom procedure to select a small number of dimensions from a original feature space.For a m by n feature matrix, there are 2 m such selections that can be made, and with each selection a feature subspace can be constructed.And then individual classifiers are created only based on those attributes in the chosen feature subspace.The outputs from different individual classifiers are combined by the uniform majority voting to give the final prediction.
The Two-Dimensional Random Subspace Analysis consists of two parts, 2DPCA and RSM.After data samples was projected to 2D feature space via 2DPCA, the RSM are applied here by taking advantage of high dimensionality in these space to obtain the lower dimensional multiple subspaces.A classifier is then constructed on each of those subspaces, and a combination rule is applied in the end for prediction on the test sample.The 2DRSA algorithm is listed in Table 4, the image matrix, A, is projected to feature space by 2DPCA projection in Eq. ( 10).In this feature space, it contains the data samples in matrix form, the m × d feature matrix, Y in Eq. ( 10).The dimensions of feature matrix Y depend on the height of image (m) and the number of selected eigenvectors of the image covariance matrix G (d).Therefore, only the information which embedded in each element on the row direction was sorted by the eigenvalue but not on the column direction.It means this method should randomly pick up some rows of feature matrix Y to construct the new feature matrix Z.The dimension of Z is r × d, normally r should be less than m.The results in Ho (1998b) have shown that for a variety of data sets adopting half of the feature components usually yields good performance.

Two-dimensional diagonal random subspace analysis (2D 2 RSA)
The extension of 2DRSA was proposed in Sanguansat et al. (2007b), namely the Two-Dimensional Diagonal Random Subspace Analysis.It consists of two parts i.e.DiaPCA and RSM.Firstly, all images are transformed into the diagonal face images as in Section 6.1.After that the transformed image samples was projected to 2D feature space via DiaPCA, the RSM are applied here by taking advantage of high dimensionality in these space to obtain the lower dimensional multiple subspaces.A classifier is then constructed on each of those subspaces, and a combination rule is applied in the end for prediction on the test sample.Similar to 2DRSA, the 2D 2 RSA algorithm is listed in Table 5.Table 5. Two-Dimensional Diagonal Random Subspace Analysis Algorithm.

Random subspace method-based image cross-covariance analysis
As discussed in Section 6.2, not all elements of the covariance matrix is used in 2DPCA.
Although, the image cross-covariance matrix can be switching these elements to formulate many versions of image cross-covariance matrix, the (m − 1)/m elements of the covariance matrix are still not advertent in the same time.For integrating this information, the Random Subspace Method (RSM) can be using here via randomly select the number of shifting L to construct a set of multiple subspaces.That means each subspace is formulated from difference versions of image cross-covariance matrix.And then individual classifiers are created only based on those attributes in the chosen feature subspace.The outputs from different individual classifiers are combined by the uniform majority voting to give the final prediction.Moreover, the RSM can be used again for constructing the subspaces which are corresponding to the difference number of basis vectors d.Consequently, the number of all random subspaces of ICCA reaches to d × L. That means applying the RSM to ICCA can be constructed more subspaces than 2DRSA.As a result, the RSM-based ICCA can alternatively be apprehended as the generalized 2DRSA.

Conclusions
This chapter presents the extensions of 2DPCA in several frameworks, i.e. bilateral projection, kernel method, supervised based, alignment based and random approaches.All of these methods can improve the performance of traditional 2DPCA for image recognition task.The bilateral projection can obtain the smallest feature matrix compared to the others.The class information can be embedded in the projection matrix by supervised frameworks that means the discriminant power should be increased.The alternate alignment of pixels in image can reveal the latent information which is useful for the classifier.The kernel based 2DPCA can achieve to the highest performance but the appropriated kernel's parameters and a huge of memory are required to manipulate the kernel matrix while the random subspace method is good for robustness.

7
Two-Dimensional Principal Component Analysis and Its Extensions www.intechopen.com

Fig. 2 .
Fig. 2. Illustration of the ways for deriving the diagonal face images: If the number of columns is more than the number of rows

Fig. 5 .
Fig. 5.The relationship of covariance and image covariance matrix.
While in ICCA, the eigenvalues of image cross-covariance matrix, G L ,a r ec o m p l e xn u m b e r with non-zero imaginary part.The Singular Value Decomposition (SVD) is applied to this matrix instead of Eigenvalue decomposition.Thus, the ICCA projection matrix contains a set of orthogonal basis vectors which corresponding to the d largest singular values of image cross-covariance matrix.

Fig. 6 .
Fig. 6.The samples of shifted images on the ORL database.
Principal Component Analysis and Its Extensions www.intechopen.com

S 1 :
P r o j e c ti m a g e ,A, by Eq. (10).S 2 :F o ri = 1 to the number of classifiers S 3 : Randomly select a r dimensional random subspace, Z r i , from Y (r < m).S 4 : Construct the nearest neighbor classifier, C r i .S 5 :E n dF o r S 6 : Combine the output of each classifiers by using majority voting.

S 1 :
Transforming images into diagonal images.S 2 :P r o j e c ti m a g e ,A,byEq.(10).S 3 :F o ri = 1 to the number of classifiers S 4 : Randomly select a r dimensional random subspace, Z r i ,fromY (r < m).S 5 : Construct the nearest neighbor classifier, C r i .S 6 :E n dF o r S 7 : Combine the output of each classifiers by using majority voting.