MSTAR images comprising training set
Recently, super-resolution reconstruction (SRR) method of low-dimensional face subspaces has been proposed for face recognition. This face subspace, also known as eigenface, is extracted using principal component analysis (PCA). One of the disadvantages of the reconstructed features obtained from the super-resolution face subspace is that no class information is included. To remedy the mentioned problem, at first, this chapter will be discussed about two novel methods for super-resolution reconstruction of discriminative features, i.e., class-specific and discriminant analysis of principal components; that aims on improving the discriminant power of the recognition systems. Next, we discuss about two-dimensional principal component analysis (2DPCA), also refered to as image PCA. We suggest new reconstruction algorithm based on the replacement of PCA with 2DPCA in extracting super-resolution subspace for face and automatic target recognition. Our experimental results on Yale and ORL face databases are very encouraging. Furthermore, the performance of our proposed approach on the MSTAR database is also tested.
In general, the fidelity of data, feature extraction, discriminant analysis, and classification rule are four basic elements in face and target recognition systems. One of the efficacies of recognition systems could be improved by enhancing the fidelity of the noisy, blurred, and undersampled images that are captured by the surveillance imagers. Regarding to the fidelity of data, when the resolution of the captured image is too small, the quality of the detail information becomes too limited, leading to severely poor decisions in most of the existing recognition systems. Having used super-resolution reconstruction algorithms (Park et al., 2003), it is fortunately to learn that a high-resolution (HR) image can be reconstructed from an undersampled image sequence obtained from the original scene with pixel displacements among images. This HR image is then used to input to the recognition system in order to improve the recognition performance. In fact, super-resolution can be considered as the numerical and regularization study of the ill-conditioned large scale problem given to describe the relationship between low-resolution (LR) and HR pixels (Nguyen et al., 2001).
On the one hand, feature extraction aims at reducing the dimensionality of face or target image so that the extracted feature is as representative as possible. On the other hand, super-resolution aims at visually increasing the dimensionality of face or target image. Having applied super-resolution methods at pixel domain (Lin et al., 2005; Wagner et al., 2004), the performance of face and target recognition applicably increases. However, with the emphases on improving computational complexity and robustness to registration error and noise, the continuing research direction of face recognition is now focusing on using eigenface super-resolution (Gunturk et al., 2003; Jia & Gong, 2005; Sezer et al., 2006).
The essential idea of eigen-domain based super-resolution using 2D eigenface instead of the conventional 1D eigenface is to overcome the three major problems in face recognition system, i.e., the curse of dimensionality, the prohibited computing processing of the singular value decomposition at visually improved high-quality image, and natural structure and correlation breaking in the original data.
In Section 2, the basic of super-resolution for low-dimensional framework is briefly explained. Then, discriminant approaches are detailed in Section 3 with the purpose of increasing the discrimination power of the eigen-domain based super-resolution. In Section 4, the implement of the two dimensional eigen-domain based super-resolution is addressed.
We also discuss the possibility of the extension of two dimensional eigen-domain based super-resolution with discriminant information in Section 5. Finally, Section 6 provides the experimental results on the Yale and ORL face databases and MSTAR non-face database.
2. Eigenface-domain super-resolution
The fundamental of the super-resolution for in low-dimensional face subspace is formulated here. The important of the image super-resolution model and its eigenface-domain based reconstruction is that they can be used for practical extensions of one- and two-dimensional super-resolved discriminant face subspaces in the next sections, respectively.
2.1. Image super-resolution model
According to the numerically computational SRR framework (Nguyen et al., 2001), the relationship between an HR image and a set of LR images can be formulated in matrix form as follows:
where p is the number of available frame, and are vectors extracted from the kth LR image frame and HR image in lexicographical order, respectively, and D is the down-sampling operator, B is the blurring or averaging operator, and Ek is the affine transform, and nk is noise of the frame k, respectively.
Thus, we can reformulate (Eq. 1) as
The above equation can be solved as an inverse problem with a regularization term, or
It should be noted that the matrix H is a very large sparse matrix. As a result, the analytic solution of x is very hard to find. One of the popular methods used for finding the solution of this kind of the inverse problem is by using conjugate gradient method.
2.1. Reconstruction algorithm
Common preprocessing step used for pattern recognition and in compression schemes is dimensionality reduction of data. In image analysis, PCA is one of the popular methods used for dimensionality reduction. Let be an optimal eigenface that removes the redundancy by decorrelating the image data x. The optimal eigenfaces are coded in its columns. Face image x is assumed to be vectored. Thus, the optimal image representation of x can be written as
where a is the dimensional feature that represents x, and ex is its representation error. Given that is the matrix that contains eigenfaces of the kth LR image frame, where the scaling resolution factor is within the range 0 to 1 and N is the total face image pixels. We can formulate the low-resolution image representation as
It is easy to derive the following equation
By considering the second and third terms as the observation noise with Gaussian distribution (Gunturk et al., 2003), we can obtain
Without loss of generality, we can numerically solve for the true super-resolution feature vector at the eigen-domain level as in (Eq. 5), or
where is the regularization term. In particular, we introduce the notation p in (Eq. 14) in order to differentiate the PCA-domain based super-resolution approach (Gunturk et al., 2003) from our proposed approaches which will be presented in the upcoming sections,
3. Discriminant face subspaces
PCA and its eigenface extension are constructed around the criteria of preserving the data distribution. Hence, it is well suited for face representation and reconstruction from the projected face feature. However, it is not an efficient classification method because the between classes relationship has been neglected. Here, we discuss on the possibilities that how we can embed discriminant information into eigenface-domain based super-resolution.
3.1. Face-specific subspace super-resolution
As widely known, the eigen-domain based face recognition methods use the subspace projections that do not consider class label information. The eigenface's criterion chooses the face subspace (coordinates) as the function of data distribution that yields the maximum covariance of all sample data. In fact, the coordinates that maximize the scatter of the data from all training samples might not be so adequate to discriminate classes. In recognition task, a projection is always preferred to include discrimination information between classes. One of the extensions of eigenface, called face-specific subspace (FSS) (Shan, 2003), is proposed as an alternative feature extraction method to include class information for face recognition application. According to FSS, each reduced dimensional basis of class-specific subspace (CSS) is learned from the training samples of the same class. Actually, each individual set of CSS optimally represents the data within its own class with negligible error. As a result, large representation error occurs, when the input data is projected and then reconstructed using a reduced set with less maximum covariance coordinates (or equivalently, using a set of principal components that does not belong to the input class). This way, by using reconstruction error obtained from projection-reconstruction process between classes, also called distance from CSS (DFCSS), a new metric can be suitably used as the distance for classifying the input data. In other words, the smaller the DFCSS is, the higher the probability that the input data belongs to the corresponding class will be. Similar work based on FSS (Belhumeur, 1997) attacking wide attentions in face recognition society is also published recently.
The original face-specific subspace (FSS) was proposed to manipulate the conventional eigenface in order to improve the recognition performance. According to FSS, the difference between FSS and the traditional method is that the covariance matrix of the pth class is individually evaluated from training samples of the pth class. Thus, the pth FSS is represented as a 4-tuple, i.e., the projection matrix, the mean of the pth class, the eigenvalues of covariance matrix, and the dimension of the pth CSS. For identification, the input sample is projected using all CSSs and then reconstruct by those CSSs. If reconstruction error which obtained from the pth CSS is minimum then the input sample is belong to the pth class, also called distance from CSS (DFCSS).
There are many advantages of using CSS in face and target recognition. For example, the transformation matrices are trained from samples within their own classes, thus it is more optimum (using fewer components) to represent each sample in its own class than a transformation matrix trained by samples in all classes. Additionally, since DFCSS is the distance between the original image and its reconstruction image obtained from CSS, the memory space needed is only for storing the C transformation matrices, where C is the number of classes. This is far less than the conventional subspace methods, where we need to store both a single all-classes transformation matrix and also its prototypes (a large set of feature vectors calculated for all training samples). Moreover, the number of distance calculation in CSS is less than the number of distance calculation in conventional methods, since the number of classes is usually less than the number of training samples.
By combining super-resolution reconstruction approach with class-specific idea, a new method for face and automatic target recognition is proposed.
3.2. Discriminant analysis of principal components
The PCA's criterion chooses the subspace as the function of data probability distribution while linear discriminant analysis (LDA) chooses the subspace which yields maximal inter-class distance, and at the same time, keeping the intra-class distance small. In general, LDA extracts features which are better suitable for classification task. Both techniques intend to project the vector representing face image onto lower dimensional subspace, in which each 2D face image matrix must be first transformed into vector and then a collection of the transformed face vectors are concatenated into a matrix.
The PCA and LDA implementation causes three major problems in pattern recognition. First of all, the covariance matrix, which collects the feature vectors with high dimension, will lead to curse of dimensionality. It will further cause the very demanding computation both in terms of memory and time. Secondly, the spatial structure information could be lost when the column-stacking vectorization and image resize are applied. Finally, especially in face recognition task, the available number of training samples is relatively small compared to the feature dimension, so the covariance matrix which estimated by these features trends to be singular, which is addressed ased singularity problem or small sample zize (SSS) problem. Especially, as a supervised technique, LDA has a tendency to overfitting because of the SSS problems.
Various solutions have been proposed for solving the SSS problem. Among these LDA extensions, Fisherface and the discriminant analysis of principal components framework (Zhao, 1998) demonstrate a significant improvement when applying LDA over principal components subspace. Since both PCA and LDA can overcome the drawbacks of each other.
It has also been noted that LDA faces two certain drawbacks when directly applied to the original input space. First of all, some non-face information such as image background has been regarded by LDA as the discriminant information. This causes misclassification when the face of the same subject is presented on different background. Secondly, the within-class scatter matrix trends to be singular when SSS problem has occurred. Projecting the high dimensional input space into low dimensional subspace via PCA first can solve the shortcomings of the LDA problems. In other words, class information should be included to PCA by incorporating LDA.
3.2.1. Proposed reconstruction algorithm
Here, we can obtain a linear projection which maps the HR input image x first into the face subspace, and finally into the classification space z. Thus, we can modify the equation (Eq. 5) to be
whereis the optimal discrimination projection obtained from solving the generalized eigenvalue problem:
With little manipulations, we can reconstruct discriminant analysis of principal components based super-resolution as
4. Two-dimensional eigen-domain based super-resolution
Recently, Yang (Yang et al., 2004) proposed an original technique called two-dimensional principal component analysis (2DPCA), in which the image covariance matrix is computed directly on image matrices so the spatial structure information can be preserved. One of the benefits of this method is that the dimension of the covariance matrix just equals to the width of the face image or the height in case of 2DPCA variant. This size is much smaller than the size of covariance matrix estimated in PCA. Therefore, the image covariance matrix can be better estimated with full rank in case of few training examples, like in face recognition.
We now consider linear projection of the form
where represents any face image in its original matrix form, , be the d largest eigenvectors that can be form to be,andis the projected HR feature of this image on, called principal component matrix. The criterion used for obtaining the eigenvectors in (Eq. 21) has been descriptively shown in Yang and Sanguangsat (Yang et al., 2004; Sanguangsat, 2006).
4.1. Alternative image super-resolution model
LR and HR images can be simply related as (Vijay, 2008)
where p is the number of available frame;,are downsampling matrices, and, are image matrices from the kth LR image frame and HR image, respectively. It should be noted that two-dimensional Gaussian blur can be represented by using together the two separateand. An extension to downsampling and affine transform can also be easily conducted by placing the elements of the matrices properly (Gsmooth, n.d.). It should also be noted that both the input LR and HR image are represented in its original matrix form. We do not transform the LR and HR images to be vectors in lexicography order as in (Eq. 1).
4.2. Proposed reconstruction algorithm
where, be the d largest eigenvectors that can be form to be,andis the projected LR feature of the image on.
Without loss of generality,
It is easy to derive the following equation
It should be noted that is a feature matrix, unlike which is a feature vector. Thus, it is a little more complicated to solve the inverse problem for super-resolution feature matrix. By applying vector operator as presented in Kumar and Schott (Kumar, 2008; Schott, 2005),
(Eq. 26) can be rewritten as
where is the regularization term. Thus, after we convert back to matrix, we will obtain the desired super-resolution feature matrix.
5. Extensions to two-dimensional linear discriminant analysis of principal component matrix
Similarly to PCA, 2DPCA is more suitable for face representation than face recognition. For better performance in recognition task, LDA is necessary. Unfortunately, the linear transformation of 2DPCA reduces only the size of rows. However, if we apply LDA directly to 2DPCA, the number of the rows still equals to the height of original image. As a result, we are still facing the singular problem in LDA. Thus, a modified LDA, called two-dimensional linear discriminant analysis (2DLDA), based on the 2DPCA concept is proposed to overcome the SSS problem. Applying 2DLDA to 2DPCA not only can solve the SSS problem and the curse of dimensionality dilemma but also allows us to work directly on the image matrix in all projections. This way, the spatial structure information is still maintained. Moreover, the SSS problem has been remedy since the size of all scatter matrices cannot be greater than the width of face image. Our research group (Sanguangsat, 2006) are the first group that focus on the extension of discriminant analysis of principal component of Section 3.1 by two-dimensional projection, called two-dimensional linear discriminant of principal component matrix
6. Experimental results
Having assumed that we can perfectly obtain the information regarding to frame to frame motion, hence we can use these information to form the proper super-resolution matrix equation in (Eq. 5). In our experiment settings, evaluation images were shifted by a uniform random integer, blurred with Gaussian point spreading function with standard deviation 1, and downsampled by a factor of four to produce 16 low-resolution images for each high-resolution image. Using 9 (preselected) out of 16 complete set of frames of each image, we can construct the super-resolution subspaces and also super-resolution images, respectively. Our super-resolution subspace approach is then compared with pixel-domain super-resolution approach using the class-specific subspace for face and automatic target recognition. Here, we conduct and show experiments according to the algorithm proposed in Subsection 3.1 only. Ongoing experiments on the other reconstruction algorithms, i.e., discriminant analysis of principal components, two-dimensional eigenface-domain based super-resolution, and 2DLDA of 2DPCA, are conducting. Essentially, we expect very encouraging the recognition results.
6.1. Evaluation databases
Eigenface-domain super-resolution method is used as the baseline for comparison based on the well-known Yale and AR face databases (Yale, 1997; Martinez, 1998) and MSTAR non-face database (Center, 1997), respectively.
6.1.1. Yale database
The Yale database contains 165 images of 15 subjects. There are 11images per subject, one for each of the following facial expressions or configurations: center-light, with glasses, happy, left-light, without glasses, normal, right-light, sad, sleepy, surprised, and wink. All sample images of one person from the Yale database are shown in Fig. 1. Each image was manually cropped and resized to pixels. In all experiments, the five image samples (centerlight, glasses, happy, leftlight, and noglasses) are used for training, and the six remaining images (normal, rightlight, sad, sleepy, surprise and wink) for test.
6.1.2. AR database
The AR face database was created by Aleix Martinez and Robert Benavente in the Computer Vision Center (CVC) at the U.A.B. It contains over 4,000 color images corresponding to 126 people's faces (70 men and 56 women). Images feature frontal view faces with different facial expressions, illumination conditions, and occlusions (sun glasses and scarf). The pictures were taken at the CVC under strictly controlled conditions. No restrictions on wear (clothes, glasses, etc.), make-up, hair style, etc. were imposed to participants. Each person participated in two sessions, separated by two weeks (14 days) time. The same pictures were taken in both sessions.
In our experiments, only 14 images without occlusions (sun glasses and scarf) are used for each subject, as shown in Fig. 2. All images were manually cropped and resized to pixels, and then convert to 256 level gray scale images. The first five images per subject are used to train, and the remaining images to test.
6.1.3. MSTAR database
The MSTAR public release data set contains high resolution synthetic aperture radar data collected by the DARPA/Wright laboratory Moving and Stationary Target Acquisition and Recognition (MSTAR) program. The data set contains SAR images with size of three difference types of military vehicles, i.e., BMP2 armored personal carriers (APCs), BTR70 APCs, and T72 tanks. The sample images from the MSTAR database are shown in Fig. 3. Because the MSTAR database is large, at this time, all images were centrally cropped to pixels for evaluation purpose.
Tables 1 and 2 detail the training and testing sets, where the depression angle means the look angle pointed at the target by the antenna beam at the side of the aircraft. Based on the different depression angles SAR images acquired at different times, the testing set can be used as a representative sample set of the SAR images of the targets for testing the recognition performance.
|Vehicle No.||Serial No.||Depression Angle||Images|
|Vehicle No.||Serial No.||Depression Angle||Images|
6.2. Class-specific subspace results
The class-specific super-resolution images reconstructed for classification with pixel-domain and eigen-domain based approaches are shown in Fig. 4 and 5, respectively. The first images in the first column are the input testing images. The images from the second to the sixth columns are corresponding to the class-specific super-resolution reconstruction obtained from the corresponding five different set of class-specific eigenfaces. Here, we show five class-specific units. Thus, five reconstructed images are obtained from each input image. Image with least error at ith class-specific unit will be identified to ith class. It should be noted that the images reconstructed using pixel-domain based super-resolution approach give us good perceptual view. However, as shown for eigen-domain based approach, the fourth and fifth input images also give us good perceptual views, while others give comparable reconstruction results. Thus, the reconstruction images based on class-specific super-resolution subspace are more dependent to its corresponding eigen-vectors.
Table 3 and 4 show the confusion matrices of the MSTAR target recognition. As shown in Table 5, the performance of the pixel-domain based super-resolution method is slightly better than our proposed method. However, our method is greatly benefits in term of computation. Additionally, we can derive principal component coefficients of the face databases using simple matrix inversion of very small size, which is only. This is because of the reason we use inner product approach to calculate the PCA coefficients. Thus, our algorithm is far faster than implementing super-resolution at pixel-domain. In pixel-domain based super-resolution approach, they have to solve a very large and sparse matrix using conjugate gradient method. In the MSTAR database, we found that the class 2 target cannot be recognized at all. This may be because the size of the low-resolution test image is too small. If we increase the size of the test images to or larger, we think that we can have better recognition accuracy.
In this chapter we have conducted experiments on face and automatic target recognition by focusing on the eigenface-domain based super-resolution implementations. We have also presented an extensive literature survey on the subject of more advanced and/or discriminant eigenface subspaces. From our discussion, several new super-resolution reconstruction algorithms have been proposed here.
In particular, several new eigenface-domain super-resolution algorithms are suggested as follows
Class-specific face subspace based super-resolution is proposed in Subsection 3.1
Equation (Eq. 18) is used for including discriminant analysis of principal components for extracting face feature for eigenface-domain super-resolution
Equation (Eq. 28) is used for two-dimensional eigenface-domain super-resolution
Two-dimensional eigenface in Equation (Eq. 28) is proposed to be replaced by two-dimensional linear discriminant analysis of principal component matrix
Current research in face and automatic target recognition is yet to utilize the full potential of these techniques. During preparing this chapter, we have just realized that there many aspects of studies and comparisons that should be conducted to gain more understanding on the variants of the eigenface-domain based super-resolution. For example, recognition accuracy should be compared between majority-voting using multiple low-resolution eigenfaces VS one super-resolved eigenface. This way, we can relate a set of LR face recognition with multiple classifier system. Furthermore, all of the proposed algorithms use a two-stage approach, that is, dimensionality reduction is first implemented, after that the super-resolution enhancement is performed. It may be a little more encouraging if we can further conduct the study on joint dimensionality reduction-resolution enhancement. This idea is quite similar to joint source-channel coding, which is a very popular approach studied for transmitting data over network. Evidently, we are thinking about computing certain desired eigenfaces and then super-resolve the computed eigenfaces on the fry. This approach trends to be quite a more biological plausible.
The MSTAR data sets provided through the Center for Imaging Science, John Hopkin University, under the contact ARO DAAH049510494. This work was partially supported by the Thailand Research Fund (TRF) under grant number: MRG 5080427.