## 1. Introduction

From the tragic 911 incident in 2011, more and more researches focus on the security issues with computational intelligence. How to avoid the tragic event from happening again and how to quickly identify the terrorists and suspects before or after the tragic event happens are very important. Therefore, the effectiveness of security is being examined almost everywhere. The discoveries of current security vulnerabilities along with the exploration of new methods should be constantly investigated to improve the security systems. Security measures with computational intelligence are used to improve the safety of our everyday lives.

The security issue is that we should recognize the wanted criminals and invaders from their biometric characteristics such as face, fingerprint, iris, palm and so on. Among these biometrics signals, the face image is easier and more direct to be captured by distanced cameras than others. For instance, the face images of any suspect who walks through the hotel lobby will be recorded by cameras. Hence, computer vision technologies with cameras can be applied to realize intelligent video surveillance systems. Since many face images of criminals and terrorists are available in the police department, they can be used to identify if the unknown face images are them from the distributed cameras. Thus, an efficient face recognition system could help to improve security. The face recognition systems would not only be helpful in identifying the criminals and terrorists, but also be used to search missing persons or identify the incident of weak person. Thus, face recognition systems with surveillance cameras have been already installed in many locations such as department store, airports and supermarkets. Besides, if the face recognition systems installed at home can timely detect the user’s facial expression, the smart service for the user can be properly introduced accordingly.

The goal of face recognition is to distinguish a specific identity and its outlook from face images. However, in realistic situations, such as video surveillance and access control, face recognition task might encounter great challenges such as different facial expressions, illumination variations, partial occlusions and even low resolution problems, which will degrade the face recognition performance and result in severe security complications. For example, the image captured by a CCTV camera at a distance would have a very low resolution which degrades the recognition performance significantly. Besides, in the testing phase, the face image is a factor which is out of control. In other words, the person may not be on a frontal pose and may not be a pure image, that is the person may be wearing glasses, hat, or mask, or even with some lighting influence and expressions. Over past years, subspace projection optimizations have been widely proposed to solve this problem with linear [1] and non-linear [2] approaches. The principal component analysis (PCA) [3, 4] and linear discriminant analysis (LDA) [5] are the two typical examples of linear transform approaches which attempt to seek a low-dimensional subspace for dimensionality reduction. The nonlinear projection approaches also have been used in many literatures like the kernel PCA (KPCA) [6] and kernel LDA (KLDA) [7] which can uncover the underlying structure when the samples lie on a nonlinear manifold structure in the image space.

Recently, the linear regression classification (LRC) proposed in 2010 by Naseem *et al*. [8] has been treated as an effective subspace projection method, which performs well on face recognition. Moreover, the robust linear regression classification (RLRC) [9] estimating regression parameters by using the robust Huber estimation was introduced to achieve robust face recognition under illumination variation and random pixel corruption. Ridge regression (RR) [10] estimated the regression parameters by using a regularized least square method to model the linear dependency in the spatial domain. Huang *et al*. and Chou *et al*. presented several improved approaches of LRC, including improved-PCA-LRC [11], LDA-LRC [12], unitary-LRC [13], and generalized-LRC [14, 15] for dealing with different situations like facial expressions, lighting changes, and pose variations. Lai *et al*. [16] utilized the least trimmed square (LTS) as a robust estimator to detect the contaminated pixels from query for boosting the performance under the partial occlusion situation.

The rest of this chapter is organized as follows. With the overview of fundamentals and facial representation, several famous face recognition algorithms are first presented in Section 2. Section 3 is dedicated to present several advances of subspace projection optimizations for robust face recognition technologies including RLRC, RR, improved principal component regression (IPCR), unitary regression classification (URC), linear discriminant regression classification (LDRC), generalized linear regression classification (GLRC) and trimmed linear regression (TLR). The performances of the aforementioned projection methods will be shown in Section 4. Finally, conclusions are drawn in Section 5.

## 2. Fundamentals of face recognition and representation

As shown in **Figure 1**, the typical face recognition system contains two major parts: face detection and face recognition. In this section, the face detection methods are first briefly introduced. Then, the well-known subspace project methods are reviewed. Finally, the similarity measures of image feature vectors are overviewed. Generally, the unknown data vector will be projected into a certain subspace, a similarity measure will be used to classify it. To narrow down the computation and increase the recognition accuracy, the first step of the recognition system, called face detection, is to detect and crop the face region from the image or video.

### 2.1. Face detection

The methods of face detection [17–20], can be separated into neural network, feature-based, and color-based approaches. Neural network approach [21] trains the facial class and non-facial class while a new image or video can be detected based on the prior training data. The well-known method is AdaBoost learning algorithm [22, 23]. Feature-based approach is to utilize the facial feature for detecting facial region. For example, the corresponding positions of the eyes, nose, and mouth are useful features; moreover, the shape of face, which is almost like an ellipse, can be included. Rule-based algorithm [24] and elliptical edge [25] are two popular feature-based methods. Color-based approach as [26] adopts the variance of skin color to detect if the region is face or not. For example, the face region in grayscale should not change immense while the eyes, mouth and hair should be darker than the other part of face.

Once the face areas are detected by a selected face detection method, their face images in size of *a*×*b* pixels could be projected into another subspace such as principal space, kernel space, frequency space and so on, in order to find a proper set of features for boosting the recognition performance. Assume there are *C* subjects. Each class is with *N* training color images. For the *i*^{th} class, *i* = 1, 2, …, *C*, the *j*^{th} training color image in size of *a*×*b* pixels with *K* components is formed a data matrix as *ν*_{i,j,k} ∈ *R*^{a×b×K} for *j* = 1, 2, …, *N* and *k* =1, 2, …, *K*. For example, *K* = 3 color components, *k* =1, 2, and 3 denote the red, green, blue channels, respectively. For some recognition algorithms, *ν*_{i,j,k} ∈ *R*^{a×b×K} is transformed to grayscale as *g*_{i,j}=*c*_{1}*ν*_{i,j,1}+*c*_{2}*ν*_{i,j,2} +*c*_{3}*ν*_{i,j,3}, where are *c*_{1}, *c*_{2}, and *c*_{3} are fixed in visualization. The gray image *g*_{i,j} is reshaped into one column vector as *x*_{i,j} ∈ *R*^{M×1} where *M* = *a*×*b*. In the testing phase, an unknown color image, ** z** ∈

*R*^{M×K}, is given. In order to predict unknown

**by training data, it should be transformed to grayscale, be normalized and be reshaped into a column vector as**

*z***∈**

*y*

*R*^{M×1}.

### 2.2. Subspace projection methods

The famous subspace projection methods, such as PCA and LDA are reviewed in the following sections.

#### 2.2.1. Principal component analysis (PCA)

The PCA method is widely used for dimensionality reduction in the computer vision field, especially for face recognition technology. In the PCA, the data is represented as a linear combination of an orthonormal set of vectors that maximize the data scatter across all images. The first principal component represents the most variability of the image as possible while the second one represents the second most, and so on. The flow chart to find PCA transformation bases is shown in **Figure 2**. The main objective of the PCA is to reduce the dimension of the feature image *x*_{i,j} to retain a few principal components. This means that most of the useless information would be reduced, and the remaining data could be well represented in a lower dimension space by the PCA.

As shown in **Figure 2**, the derivations of the PCA transformation bases are stated in the following equations. First, the feature face image should remove the global mean to become:

where

After the computation of the feature face images, we can obtain *M*×*M* covariance matrix of all feature face images as:

Based on the covariance matrix, the eigenvectors and eigenvalues can be retrieved by singular value decomposition (SVD) or eigen-decomposition as:

where ** r** = {

*r*

_{1},

*r*

_{2},…,

*r*} is a set of total

_{M}*M*descending-ordered eigenvalues and their corresponding eigenvectors

*= {*

**u**

*u*_{1},

*u*_{2},…,

*u**} According to the expected dimension, we can choose*

_{M}*P*principal components. Thus, the PCA transformation with the

*P*largest eigenvectors, the PCA transformation

*P*_{PCA }with

*P*×

*M*size can be formed by the corresponding

*P*eigenvalues as:

Finally, we can achieve the PCA features, *R*^{P×1}, by multiplying PCA transformation and the feature image vector as:

On the other hand, the testing image vector ** y** can be projected onto PCA subspace by

And the similarity measure based on this feature data vector is calculated to determine the final result.

#### 2.2.2. Linear discriminant analysis

Fisher proposed the LDA for recognition which is a kind of statistical analysis method like the PCA. But the difference is that the LDA can discriminate the different subjects even though the maximum variance subspaces among them are overlapped as shown in **Figure 3**. The goal of the LDA is that these projections onto a line will be well separated by disparate classes and be well concentrated by the same class.

Thus, the concept of LDA is to seek the optimal projection by maximizing the ratio of between-class and within-class scatter. Fisher utilizes a criterion to optimize this problem as:

where ^{th} class. And *X*_{i} is concatenated by the *i*^{th} data set of *N* training gray images. Then, the optimal projection matrix, *W*_{LDA}, can be solved by computing generalized SVD or eigen-decomposition as:

where ** Λ** is the diagonal eigenvalue matrix. We apply the optimal projection matrix to convert the face feature vector

*x*_{i,j}into a new discriminant vector,

*w*_{Fisher,i,j}as:

In the same way, the testing image vector is projected onto LDA subspace by *P*_{LDA} and can be represented as:

And the final result can be determined by using similarity measure based on this feature vector.

### 2.3. Similarity measures

There exist three distance measures [27–29] such as the city block distance (Taxicab geometry, *L*_{1}), Euclidean distance (*L*_{2}) and *L*_{∞} norm distance. These distance measures are defined from two column vectors *w*_{i,j} and *M* or *P*. The distance measures, *L*_{1}, *L*_{2}, and *L*_{∞} can be respectively written as:

and

where *m*^{th} component of *x*_{i,j} and ** y** column vectors, respectively.

However, these vectors satisfy the Cauchy-Schwarz inequality as:

To ignore the amplitudes of two feature data vectors, the similarity measure can be also defined by a cosine criterion as:

## 3. Advances of subspace projection optimization

In this section, the advances of subspace projection optimization are presented for robust face recognition system. Then, the well-known subspace projection methods including LRC, RLRC, RR, IPCR, URC, LDRC, GLRC and TLR are introduced.

### 3.1. Linear regression classification (LRC)

For applying the linear regression to estimate the class specific model, all *N* training gray images from the same class are concatenated as:

where *X*_{i} is in the size of *M*×*N* and is called class-specific model. In other words, the *i*^{th} class is represented by a vector space *X*_{i}, which is called the regressor for each subject, in the training phase.

In the testing phase, if an unknown column vector ** y** belongs to the

*i*

^{th}class, its linear combination can be rewritten in terms of the training data from the

*i*

^{th}class and can be formulated as:

where *β*_{i} ∈ *R*^{N×1} is the vector of regression parameters. The goal of the linear regression is to find the regression parameters by minimizing the residual errors as:

The regression coefficients, *β*_{i}, can be solved through the least-square estimation method and can be represented as:

For each class *i*, the regressed vector

By substituting Equation (19) into Equation (20), the predicted response vector

Theoretically, we can treat Equation (21) as a class-specific projection as:

where ** y** onto the subspace of the

*i*

^{th}class by the projection matrix

In the LRC approach, the minimum reconstruction error is adopted for determining the final result. In other words, the distance between predicted response vector ** y** will be smallest when the unknown column vector belongs to the training vector space of class

*i*. Therefore, the identity

*i** can be determined by minimizing the Euclidean distance between the predicted response vector and unknown vector as:

### 3.2. Robust linear regression classification (RLRC)

The LRC has been claimed that classical statistical methods are robust, but they are only robust in the fact of true cases. Once the data distribution is in fact of false cases, the regression parameter under original least square estimation could be inaccurate. In other words, the original least square estimation is inefficient and can be biased in the presence of outliers. There exist several approaches for robust estimation like *R*-estimator [30, 31] and *L*-estimator [30, 32]. However, *M*-estimator is now shown superiority due to their generality, efficiency and high breakdown point [30, 33]. Based on the *M*-estimator, the optimal function becomes:

where

and *ρ*(•) is a symmetric function and *γ* being a tuning constant, also called the Huber threshold.

### 3.3. Ridge regression (RR)

The goal of the RR is to find and minimize the residual errors and their penalty as:

where *λ* is the regularization parameter. Comparing with linear regression, the RR adds a penalty,

### 3.4. Improved principal component regression (IPCR)

Multicollinearity denotes the interrelations among the independent variables. In the linear regression, the regression estimation could be imprecise because the multicollinearity phenomenon would inflate the variance and covariance. To overcome the problem of multicollinearity, various approaches have been proposed. IPCR is one of the powerful approaches.

The IPCR is a two-step classification method. In the first step, the PCAZ is adopted to transform the observed variables into the new decorrelated components. Then, the first *n* components are dropped because these components are very sensitive to the lighting changes. Mathematically, the PCA process is used in all training samples including covariance matrix evaluation as Equation (2), and eigen-decomposition estimation as Equation (3). Then, we can obtain a set of eigenvectors, ** u**={

*u*_{1},

*u*_{2},…,

*u**}, and a set of eigenvalues,*

_{M}*r*={

*r*

_{1},

*r*

_{2},…,

*r*} with

_{M}*r*

_{1}≥

*r*

_{2}≥…≥

*r*

_{M}. As above mentioned, we drop first

*n*components and the projection matrix can be express as:

The PCAZ features, *R*^{P×1}, can be obtained by multiplying the projection matrix and the average image vector as:

In order to apply LRC to estimate class specific model, feature vectors should be grouped according to the class-membership. Hence, for the *i*^{th} class, we have ** y**, is transformed to PCAZ subspace as

*y*^{(PCAZ)}. In the second step, the new subspace of PCAZ projection is used in LRC such that we can seek more reliable regression coefficients for each subject for face recognition. The goal of regression becomes to minimize the residual errors as:

The regression parameter vectors can be rewritten as a matrix form as:

### 3.5. Unitary regression classification (URC)

The total within-class projection error from all classes cannot be taken in previous mentioned methods for classification that would degrade the recognition accuracy. The URC is proposed to minimize the total within-class projection error from all classes for LRC to improve the robustness for pattern recognition.

Instead of original space, we hope to find a global unitary rotation *P*_{URC}=[*s*_{1},…,*s*_{Ψ}] with *Ψ*≤*M*, which can rotate the original data space to a new compact *w*_{URC} data space as:

to achieve the total minimum projection error of all training data stated as:

where *w*_{URC} data space, the *i*^{th} class projection matrix can be obtained by following *P*_{URC}, is used to achieve the total minimum within-class projection error for LRC. From minimum reconstruction error, the objective function in ** T** data space can be represented as:

(34) |

By substituting

where *P*_{URC }**= [ s**

_{1},…,

*s*_{Ψ}

**]**, can be solved by evaluating eigen-decomposition as:

where *λ*_{Ψ}≧… ≧*λ*_{l}≧…≧*λ*_{1}≧0.

### 3.6. Linear discriminant regression classification (LDRC)

Although the previous methods including LRC, RLRC, and IPCRC can perform well on face recognition, we cannot guarantee that the projection subspace in LRC or IPCRC is most discriminatory. When the projection subspaces among the different subjects overlap, the recognition result would be incorrect. To obtain an effective discriminant subspace for LRC, the LRC with discriminant analysis is presented by maximizing the ratio of the between-class reconstruction error (BCRE) to the within-class reconstruction error (WCRE) by the LRC.

Mathematically, all images are collected from *C* classes as ** X** = [

*X*_{1},

*X*_{2},…,

*X*_{C}] = [

*x*_{1,1},…,

*x*_{i,j},…,

*x*_{C,N}]. LDRC is to find an optimal projection by maximizing the BCRE over the WCRE for the LRC such that the LRC on the optimal subspace has better discrimination for classification. The goal of LDRC is to maximize the objective function as:

where *P*_{LDRC}=[*u*_{1}, *u*_{2},…, *u*_{φ}] is the optimal projection matrix, and *E*_{BC} and *E*_{WC} denote the BCRE and WCRE, respectively. The original space, *x*_{i,j}, can be mapped into the subspace,

where *q*^{th} class and *x*_{i,j} is used to instead of

(39) |

With some algebraic deduction, the form becomes:

(40) |

where

and

is inter-class and intra-class reconstruction error, respectively. In other words, the objective function can be represented as:

For solving the optimization problem, Equation (43) can be reformulated as the following:

where ϑ is a constant. The projection matrix, *P*_{LDRC}=[*u*_{1}, *u*_{2},…, *u*_{φ}], can be solved by evaluating eigen-decomposition as:

where *λ*_{1}≧… ≧*λ*_{l}≧…≧*λ*_{φ}.

### 3.7. Generalized linear regression classification (GLRC)

In real-world recognition applications, the input images generally have multiple components which can overcome the unexpected effects such as pose variations, limited image information and so on. For color face recognition, the GLRC with membership grade (MG) criteria is proposed to defend the unexpected effects.

Mathematically, each channel component is separately normalized and transformed to one column vector such that *ν*_{i,j,k}∈*R*_{p×q×K} → *x*_{i,j,k}∈*R*_{d×K}, where *d* = *p⋅q*. In the *i*^{th} class, the *k*^{th} component of *N* training images is collected as:

for *i* = 1, 2, …, *C* and *k* = 1, 2, …, *K*, where *X*_{i,k} is treated as the *k*^{th}-channel collected training data of the *i*^{th} class in the training phase.

For the test image, the *k*^{th}-channel testing image, *z*_{k}, is normalized and reshaped into a column vector as *y*_{k} ∈ *R*^{d×1}. For the *k*^{th} component, the linear combination of *X*_{i,k} from the *i*^{th} class for the test vector *y*_{k} becomes:

where *β*_{GLRC,i}∈*R*^{N×1} is an ideal projection vector of the *i*^{th}-class regression parameter for all channels. In order to estimate the projection vector, the objective function becomes:

After solving the optimization problem, the regression vector can be expressed as:

In order to achieve optimal performance, the different components should be treated as unequally important. Thus, the absolute sum of prediction residual of the *k*^{th} component after the direct least square optimization is given as:

where *k*^{th} component to be inverse of the normalized absolute sum of prediction residual, which is expressed by:

where *ε* is a tiny value which is used to avoid *r*_{k} = 0. The larger the residual, *r*_{k} is, the less important the *k*^{th} component will be. For the GRLC optimization, we propose the linear combination of *X*_{i,k} of the *k*^{th} component in the *i*^{th} class for the test vector *y*_{k} becomes:

where *R*^{N×1} is the vector of the *i*^{th}-class total regression parameters to achieve the GRLC optimization as:

The optimal total regression parameter vector,

The prediction,

For identity recognition, the minimum prediction error of the GRLC should be further designed to compute the similarity between the prediction vector ** y**. The similarity in terms of minimization of prediction errors of total

*K*components can be designed by the following MG criteria as:

where *t* is the pre-selected fuzzy factor.

### 3.8. Trimmed linear regression (TLR)

For the occlusion situations, the previous methods including LRC, RLRC, IPCR, URC, LDRC, and GLRC are not suitable because the existing methods treat all pixels as equally import. Conversely, if the outliers can be detected and trimmed from the testing image and the corresponding training samples, the mechanism still can work. Hampel identifier [34, 35] for outlier detection is highly thought of by the researchers because it can make out the extreme values easily. An advantage of Hampel identifier is that it adopts median absolution deviation (MAD), which is a powerful measure in statistics, for removing the masking data. Mathematically, the Hampel identifier can be expressed as:

where Δ is a data set, media(Δ) denotes the media value of Δ data set. The number of 0.6745 is a probable error of standard deviation. When the ratio is larger than 2.24, the data will be abandoned. For example, there is a data set, [2, 3, 3, 4, 4, 250]. The sample mean is 44.33, sample variance is 100.76, sample median is 3.5, *MAD* equals to 0.5, and the detection rule by mean and median is:

and

respectively. We can observe that the Hampel identifier excludes the outlier easier than the other one.

For the face recognition, the error of estimation can be presented as:

where error is a zero mean distribution. In order to detect the occlusion part, each pixel should suffice the Hampel identifier estimation as:

where *τ*<*M*. The objective function becomes:

The regression parameter vectors can be represented as:

## 4. Experimental results

In order to verify the recognition accuracy, the well-known databases including Yale B, AR, FERET, and FEI are utilized. In the experiments, we evaluate the mentioned method against low resolution problem coupled with facial expressions, illumination changes, pose variations, and partial occlusions.

### 4.1. Yale B database

The Yale B database contains 10 subjects [36, 37]. Each subject has 64 illumination images with 9 different poses. The Yale B can be divided into five subset based on angle of the light source directions as shown in **Figure 4**. In the experiments, the first subset with normal pose is used for training and the remaining subsets (Subset 2 to 5) with normal pose are utilized for testing. All images are cropped and resized to 30×25 pixels. **Table 1** reveals that IPCRC performs better than the traditional subspace projection like PCA and LDA. Moreover, the IPCRC can also outperform the LRC, RLRC and RR. The reason is that the original subspace cannot represent the data distribution very well. Besides, PCA subspace is very sensitive to illuminant variations. However, IPCRC not only can transform to PCA subspace, but also can defend the illumination variations by removing the top *n* components. Thus, IPCRC possesses higher robustness to illuminations than the other methods.

### 4.2. FERET database

Furthermore, we experiment on the FERET face database [38, 39] for the purpose of verifying the performance among the different subspace projections. In the experiments, we select four facial images including fa, fb, ql, and qr from 300 subjects as **Figure 5**. All images are converted, cropped, and downsampled to 30×25 pixels with grayscale. As the **Figure 5** shown, the fa and fb samples are small pose and rotation changes; conversely, the ql and qr samples are major pose variations. In order to obtain a reliable result, cross-validation experimental procedure is adopted. In other words, three images per person are used for training while the fourth image is used for testing. **Table 2** shows that the average recognition accuracy (ARA) in URC performs outstandingly. We can observe that the RLRC and IPCRC are highly sensitive to pose variations but in spite of these, methods perform well in noisy and illuminated face images, respectively.

### 4.3. AR database

AR face database [40, 41] was conducted by Martinez and Benavente in 1998. This database contains 4000 mug shots of 126 subjects (70 males and 56 females) with different variations such as facial expressions, lighting changes and partial occlusions. For normal case, each subject contains 26 images in two sessions. The first session (AR1 ~ AR13), containing 13 photos, includes facial expression, different lighting changes, and partial occlusions (sunglasses and scarf) with lighting changes. The second session (AR14 ~ AR26) duplicates the same way of first session two weeks later as shown in **Figure 6**. In the experiments, 100 subjects are selected and all images are cropped and resized into 30×25 pixels with grayscale. We classify the images into four different expressions including neutral (AR4, AR14), happy (AR2, AR3), angry (AR1, AR17), and screaming (AR15, AR16) expressions. The single-one-expression training strategy is adopted to present the performance. For example, if neutral expression images are used for training, the happy, angry, and screaming expressions are used as query images. **Table 3** reveals that the LDRC achieves the best performance in all cases. Moreover, we can observe that the happy expression images for training obtain higher performance than the others; conversely, the screaming expression images for testing can obtain lowest performance. On the other hand, the partial occlusion situations are used to discussion. In this experiments, the expression variation images (AR1~AR4, AR14~AR17) are utilized as training set, and testing sets are separated in two cases including sunglasses (AR8, AR21) and scarf (AR11, AR24). All images are cropped and resized into 42×30 pixels with grayscale. In the **Table 4**, we can observe two points. First, the TLRC can perform better than the other methods under sunglasses occlusion or scarf occlusion. Second, the upper bound occlusion seems to obtain higher performance than the lower bound occlusion. In other words, the mouth features are more useful than the eye features.

### 4.4. FEI database

The FEI face database [42, 43] contains 200 subjects (100 males and 100 females). Each subject has 14 images with different pose variations (image1~image10), facial expressions (image11~image12), and illumination variations (image13~image14) as shown in **Figure 7**. In the experiments, all images are resized to 24×20 pixels with grayscale and the “leave-one-out strategy” is adopted. From **Table 5**, it can be seen that the IPCRC is more robust to severe lighting variation (image 14) and URC is good at facial profiles (image 1, image 10). All in all, the ARA of URC performs the best.

### 4.5. Discussions

From the experimental results, we can observe that IPCRC has a good performance under illumination situation. The reason is that the first *n* components in IPCRC are removed. The first *n* components are very sensitive to the lighting changes. However, although IPCRC has better performance under the lighting changes, it cannot handle the pose variations and occlusion problems very well. For the pose variations, the URC performs better than the other subspace methods because URC attempts to minimize the total intra-class reconstruction error to find an optimal projection which can decrease the pose influence. LDRC embeds discriminant analysis into the LRC for seeking an optimal projection matrix such that the LRC on that subspace has high discriminatory ability for classification. In other words, LDRC can perform better than LRC and IPCRC in most cases. In the occlusion situation, the TLRC can effectively remove the masking data and project onto a more reliable subspace.

## 5. Conclusions

In this chapter, we presented several subspace projection methods for robust face recognition to deal with different practical situations such as pose variations, lighting changes, facial expressions, and partial occlusions.

For illumination variation task in face recognition, an improved principal component classification can be used to solve the multicollinearity problem and can perform better recognition accuracy than the original linear regression and RR. For the pose variations, a URC has been presented to minimize the total within-class projection error from all classes for LRC to improve the robustness for pattern recognition. Moreover, a LDRC has been proposed to overcome facial expressions by maximizing the ratio of the BCRE to the WCRE by the LRC. For the partial occlusions, a trimmed regression classification is used to remove unreliable pixels by the Hampel identifier. Finally, experimental results have revealed the comparisons with different subspace projection optimizations.