The Maximum Non-Linear Feature Selection of Kernel Based on Object Appearance

Principal component analysis (PCA) is linear method for feature extraction that is known as Karhonen Loove method. PCA was first proposed to recognize face by Turk and Pentland, and was also known as eigenface in 1991 [Turk, 1991]. However, PCA has some weaknesses. The first, it cannot capture the simplest invariance of the face image [Arif et al., 2008b] , when this information is not provided in the training data. The last, the result of feature extraction is global structure [Arif, 2008]. The PCA is very simple, has overcome curse of dimensionality problem, this method have been known and expanded by some researchers to recognize face such as Linear Discriminant Analysis (LDA)[Yambor, 2000; A.M. Martinez, 2003; J.H.P.N. Belhumeur 1998], Linear Preserving Projection that known Lapalacianfaces [Cai, 2005; Cai et al, 2006; Kokiopoulou, 2004; X. He et al., 2005], Independent Component Analysis, Kernel Principal Component Analysis [Scholkopf et al., 1998; Sch olkopf 1999], Kernel Linear Discriminant Analysis (KLDA) [Mika, 1999] and maximum feature value selection of nonlinear function based on Kernel PCA [Arif et al., 2008b]. As we know, PCA is dimensionality reduction method based on object appearance by projecting an original ndimensional (row*column) image into k eigenface where k<<n. Although PCA have been developed into some methods, but in some cases, PCA can outperform LDA, LPP and ICA when it uses small sample size.


Principal component analysis in input space
Over the last two decades, many subspace algorithms have been developed for feature extraction.One of the most popular is Principal Component Analysis (PCA) [Arif et al., 2008a, Jon, 2003;A.M. Martinez and A.C. Kak, 2001; M. Kirby and L. Sirovich, 1990; M. Turk and A. Pentland, 1991].PCA has overcome Curse of Dimensionality in object recognition, where it has been able to reduce the number of object characteristics fantastically.Therefore, until now PCA is still used as a reference to develop a feature extraction.
Suppose a set of training image containing m training image X (k) , k, k 1..m,each training image has hxw size where H, H1..h and W, W1.w.Each training image is represented as:  To expresses m training image set, it is necessary to composed Equation (2) in the following equation: ( The average of training image set of (Equation ( 3)) can be obtained by column-wise summation.It can be formulated by using the following equation And N, N1..n.
The result of Equation ( 4) is in the row vector form, it has 1xN dimension.It can be rewritten in the following equation Therefore, Equation ( 5) can be replicated as many as m row size.The zero mean matrix can be formulated by using the following equation M, M1..m.Furthermore, the covariance value can be computed by using the following equation   As shown in Equation ( 7), C has mxm size and the value of m<<n.To obtain the principal components, the eigenvalues and eigenvectors can be computed by using the following equation: The values of  and  represent eigenvalues and eigenvectors of C respectively.9) can be changed into row vector as seen in the following equation

Kernel principal component analysis
Principal Component Analysis has inspired some researchers to develop it.Kernel Principal Component Analysis (KPCA) is Principal Component Analysis in feature space [Sch¨olkopf et al., 1998;Sch¨olkopf et al., 1999;Arif et al., 2008b;Mauridhi et al., 2010].Principally, KPCA works in feature space [Arif et al., 2008b].Input space of training set is transformed into feature space by using Mercer Kernel that yields positive semi definite matrix as seen in the Kernel Trick [Sch¨olkopf et al., 1998;Sch¨olkopf et al., 1999]

Maximum value selection of kernel principal component analysis as feature extraction in feature space
The results of Equation ( 12), ( 13) and ( 14) will be selected as object feature candidates [Arif et al., 2008b, Mauridhi, 2010].The biggest value of them will be employed as feature space in the next stage, as seen in the following equation (,) max( : For each kernel function has yielded one matrix feature, so we have 3 matrix of feature space from 3 kernel functions.For each corresponding matrix position will be compared and will be selected the maximum value (the greatest value).The maximum value will be used as feature candidate.It can be represented by using the following equation The biggest value of feature space is the most dominant feature value.As we know, feature space as seen on equation ( 16) is yielded by using kernel (in this case, training set is transformed into feature space using equation ( 12), ( 13) and ( 14) and followed by selection of the biggest value at the same position using equation ( 15).where feature selection in kernel space will be used to determine average, zero mean, covariance matrix, eigenvalue and eigenvector in feature space.These values are yielded by using kernel trick as nonlinear component.Nonlinear component is linear component (principal component) improvement.So, it is clear that the biggest value of these kernels is improvement of the PCA performance.
The average value of Equation ( 16) can be expressed in the following equation So, zero mean in the feature space can be found by using the following equation Where M, M1..m.The result of Equation ( 18) has mxm.To obtain the eigenvalues and the eigenvectors in feature space, it is necessary to calculated the covariance matrix in feature space.It can be computed by using the following equation Based on Equation ( 19), the eigenvalues and the eigenvectors in feature space can be determined by using the following equation The eigenvalues and eigenvectors yielded by Equation ( 20) can be written in following matrices To obtain the value of the most until the less dominant feature, the Equation (21) will be sorted decreasingly and followed by Equation ( 20) [Arif et al., 2008b, Mauridhi, 2010].The bigger value of the eigenvalue in the feature space, the more dominant the corresponding eigenvector in feature space.The result of sorting Equation ( 21) can be shown in the following equation     Figure 3 shows that the more number of dimensional used, the higher recognition rate, but the recognition decreased on the certain dimension.As seen in Figure 3, recognition rate decreased into 95% when 200 dimensions were used.The first maximum recognition rate, which was 97.5%, occurred when 49 dimensions were used in this experiment [Arif et al., 2008b].
In the 2 nd scenario, the maximum dimension used was 240 (240=40*6) training set.The first maximum recognition rate occurred when 46 dimensions were used, this was 99.375%.When 1 until 46 dimensions were used, recognition rate increased proportionally to the number of dimension used, but when 47 until the 240 dimensions were used, the recognition rate tended to be stable, with insignificant fluctuations as seen in Figure 4 [ Arif et al., 2008b].Figure 6 is the experimental results of the 4 th scenario.In this scenario, 8 training sets for each person were used, whereas the number of dimensions used was 320. Figure 6 shows that the recognition rate tended to increase significantly for experimental results using less than 23 dimensions, whereas 100% recognition rate occurred for experimental results using more than 24 dimensions [Arif et al., 2008b].
In the last scenario, 9 training sets were used, whereas the number of dimension used was 360, as seen in Figure 7. Similarly to the previous scenario, the recognition rate tended to increase when experimental used less than 6 dimensions, while using 7 dimensions resulted 154 in 100% recognition rate, using 8 dimension resulted in 97% recognition rate, and 100% recognition rate was yielded from experimental results using more than 9 dimensions, as shown in Figure 7 [ Arif et al., 2008b].The maximum recognition rate for all scenarios can be seen in Table 2.This table shows that the more number of training set used, the higher recognition rate achieved, whereas the first maximum recognition rate tended to occur on the lower dimension inversely proportional to the number of dimensions used [Arif et al., 2008b].

Experimental results using the YALE face image database
In this last experiment, the YALE face database was used.The experiments were conducted for 6 scenarios, for each scenario, 5, 6, 7, 8, 9, and 10 training set were used.The rest of each data sample for every experiment, i.e. 6, 5, 4, 3, 2 and 1, were used as testing set as listed in In the first scenario, 5 training sets were used, where the rest of the YALE data experiment was used for testing.In this scenario, the number of dimensions used was 75.The completed experimental results can be seen in Figure 8.This figure shows that the number of recognition rate increased significantly when less than 9 dimensions were used, which were 16.67% until 92.22%.Whereas the maximum recognition rate occurred when 13, 14, and 15 dimensions were used, that was 94.44% [Arif et al., 2008b].For experimental results using more than 16 dimensions, the recognition rate fluctuated insignificantly as seen in Figure 8.The experimental results of the 2 nd scenario were shown in Figure 9.The recognition rate increased from 22.67% until 97.33% when using less than 10 dimensions, recognition rate decreased insignificantly when using 16 dimensions, and recognition rate tended to be stable around 97.33% when experiments used more than 17 dimensions, [Arif et al., 2008b].Similarly, it occurred in the 3 rd scenario.In this scenario, the recognition rate increased significantly when the number of dimensions was less than 13, though on the certain number of dimensions the recognition rate decreased.But when the number of dimensions used was more than 14, experimental results yielded its maximum rate, which is 98.33% as seen in Figure 10 [ Arif et al., 2008b].In the last three scenarios as seen in Figure 11, 12, and 13, experimental results have shown that the recognition rate also tended to increase when the number of dimensions used was less than 7, whereas experimental results that used more than 8 dimensions achieved 100% recognition rate [Arif et al., 2008b].The experimental results of the 1 st , 2 nd , and 3 rd scenarios were compared to other methods, such as PCA, LDA/QR, and LPP/QR as seen in Table 5, whereas for the 4 th and 5 th scenarios were not compared, since they have achieved maximum result (100%

The maximum value selection of kernel linear preserving projection as extension of kernel principal component analysis
Kernel Principal Component Analysis as appearance method in feature space yields global structure to characterized an object.Besides global structure, local structure is also important.Kernel Linear Preserving Projection as known as KLPP is method used to preserve the intrinsic geometry of the data and local structure in feature space [Cai et al., 2005;Cai et al.,, 2006;Kokiopoulou, 2004;Mauridhi et al., 2010].The objective of LPP in feature space is written in the following equation [Mauridhi et al., 2010] In this case the value of S i,j can be defined as Where  >0, but it is sufficiently small compared to the local neighborhood radius.
Minimizing the objective function ensures the closeness between points that is located in the www.intechopen.comsame class.If neighboring points of (x i ) and (x j ) are mapped far apart in feature space and if ((y i ) -(y j )) is large, then (S ij ) incurs a heavy penalty in feature space.Suppose a set of data and a weighted graph G = (V, E) is constructed from data points where the data points that are closed to linked by the edge.Suppose maps of a graph to a line is chosen to minimize the objective function of KLPP in Equation ( 24) on the limits (constraints) as appropriate.Suppose a represents transformation vector, whereas the i th column vector of X is symbolized by using x i .By simple algebra formulation step, the objective function in feature space can be reduced in the following equation [Mauridhi et al., 2010] 2 Laplacian matrices in feature space known as Laplacianlips, when these are implemented in smiling stage classification.The minimum of the objective function in feature space is given by the minimum eigenvalue solution in feature space by using the following equation Eigenvalues and eigenvectors in feature space can be calculated by using Equation ( 27).The most until the less dominant features can be achieved by sorting eigenvalues decreasingly and followed by sorting corresponding eigenvectors in feature space.

Experimental results of smile stage classification based on the maximum value selection of kernel linear preserving projection
To evaluate the Maximum Value Selection of Kernel Linear Preserving Projection Method, it is necessary to conduct the experiment.In this case, 30 persons were used as experiment.Each person consists of 3 patterns, which are smiling pattern I, III and IV, while smiling pattern II is not used.The image size was 640x 640 pixels and every face image was changed the size into 50x50 pixels (Figure 14).Before feature extraction process, face image had been manually cropped against a face data at oral area to produce spatial coordinate [5.90816 34.0714 39.3877 15.1020[5.90816 34.0714 39.3877 15. ] [Mauridhi et al., 2010]].This was conducted to simplify calculation process.In this case, cropped data were used for both training and testing set.This process caused the face data size reduction into 40x16 pixels as seen in Figure 15.17) and ( 18) are similarity measure for the angular separation and Canberra [Mauridhi et al., 2010].To achieve classification rate percentage, equation ( 19) was used.The result of classification using the 1 st , 2 nd , and 3 rd scenario can be seen in Figure 16, 17, and 18 respectively [Mauridhi et al., 2010].
The 1 st , 2 nd , and 3 rd scenario had similarity trend as seen in Figure 16, 17, and 18. Recognition rate increased significantly from the 1 st until 10 th dimension, whereas recognition rate using more than 11 dimensions slightly fluctuated.The maximum and the average recognition rate in the 1 st scenario were not different, which was 93.33%.In the 2 nd scenario, the maximum recognition rate was 90%, when Canberra similarity measure was used.In the 3 rd scenario, the maximum recognition rate was 100%, when angular separation was used.The maximum recognition rate was 93.33%, for both Angular Separation and Canberra Similarity Measure [Mauridhi et al., 2010] as seen in Table 6.
Fig. 1. average of ORL Face Image Database Using 3, 5 and 7 Face Image for Each Person The zero mean matrix can be calculated by subtracting the face image values of training set with Equation (5).In order to perform the subtraction, both face image and Equation (5) must have the same size.

5. 1
Experimental results using the ORL face image database ORL face image database consist of 40 persons, 36 of them are men and the other 4 are women.Each of them has 10 poses.The poses were taken at different time with various kinds of lighting and expressions (eyes open/close, smiling/not smiling) [Research Center of Att, 2007].The face position is frontal with 10 up to 20% angles.The face image size is 92x112 pixels as shown in Figure 2.

Fig. 2 .
Fig. 2. Face Images of ORL DatabaseThe experiments are employed for 5 times, and for each experiment 5, 6, 7, 8 and 9 poses for each person are used.The rest of training set, i.e. 5, 4, 3, 2 and 1, will be used as the testing[Arif et al., 2008b] as seen in Table1 Fig. 3. Experimental Results on ORL Face Image Database Using 5 Training Set

Fig. 4 .
Fig. 4. Experimental Results on ORL Face Image Database Using 6 Training SetIn the 3 rd scenario, training set used for each person was 7, whereas the number of dimensions used was 280.The more number of training set used, the number of dimension is increased.In this scenario, the maximum recognition rate was 100%, it occurred when 23 until 53 dimensions were used, whereas when more than 53 dimensions were used, recognition rate decreased to be 99.67% as seen in Figure5[Arif et al., 2008b].

Fig. 5 .
Fig. 5. Experimental Results on ORL Face Image Database Using 7 Training Set

Fig. 6 .
Fig. 6.Experimental Results on ORL Face Image Database Using 8 Training Set

Fig. 8 .
Fig. 8. Experimental Results on YALE Face Image Database Using 5 Training Set

Fig. 9 .
Fig. 9. Experimental Results on YALE Face Image Database Using 6 Training Set

Fig. 10 .
Fig. 10.Experimental Results on YALE Face Image Database Using 7 Training Set

Fig. 16 .Fig. 17 .
Fig. 16.Smile Stage Classification Recognition Rate Based on the Maximum Value Selection of Kernel Linear Preserving Projection Method Using 1 st Scenario

results of face recognition by using maximum value selection of kernel principal component analysis as feature extraction in feature space
In this chapter, the experimental results of "The Maximum Value Selection of Kernel Principal Component Analysis for Face Recognition" will be explained.We use Olivetti-Att-ORL (ORL)[Research Center of Att, 2007]and YALE face image databases [Yale Center for ComputationalVision and Control, 2007]as experimental material.

Table 1 .
The Scenario of ORL Face Database Experiment In this experiment, each scenario used different dimension.The 1 st , 2 nd , 3 rd , 4 th

Table 2 .
It contains 15 people, each of them were doing 11 poses.The poses were taken in various kinds of lighting (left lighting and center lighting), various expressions (normal, smiling, sad, sleepy, surprising, and wink) and accessories (wearing or not wearing glasses) [Yale Center for Computational Vision and Control, 2007] as shown in Figure 8.The ORL Face Database Recognition Rate using Maximum Feature Value Selection Method of Nonlinear Function based on KPCA Fig. 8. Face Sample of Images of YALE Database

Table 3 .
The Scenario of the YALE Face Database Experiment www.intechopen.com

Table 4 .
The YALE Face Database Recognition Rate using Maximum Feature Value Selection Method of Non linear Function based on KPCA ).The recognition rate of 5, 6, and 7 training set, for both on the ORL and the YALE face database, "The Maximum Value Selection of Kernel Principal Component Analysis", outperformed the other methods.

Table 5 .
The Comparative Results for Face Recognition Rate

Table 6 .
[Gunawan et al., 2009]assification Recognition Rate Based on the Maximum Value Selection of Kernel Linear Preserving Projection Method Using 3 rd Scenario The Smile Stage Classification Recognition Rate using Maximum Feature Value Selection Method of Non linear Kernel Function based on Kernel Linear Preserving ProjectionThe experimental results of the Maximum Value Selection of Kernel Linear Preserving Projection Method have been compared to "Two Dimensional Principal Component Analysis (2D-PCA) and Support Vector Machine (SVM) as its classifier"[Rima et al., 2010]and have been combined with some methods, which were Principal Component Analysis (PCA)+Linear Discriminant Analysis (LDA) and SVM as its classifier[Gunawan et al., 2009]as seen Figure19For both the maximum non-linear feature selection of Kernel Principal Component Analysis and Kernel Linear Preserving Projection has yielded local feature structure for extraction, which is more important than global structures in feature space.It can be shown that, the maximum non-linear feature selection of Kernel Principal Component Analysis for face recognition has outperformed the PCA, LDA/QR and LPP/QR on the ORL and the YALE face databases.Whereas the maximum value selection of Kernel Linear Preserving Projection as extension of Kernel Principal Component Analysis has outperformed the 2D-PCA+SVM and the PCA+LDA+SVM for smile stage classification.