Iterative approach for calculating the weight.

## Abstract

The insufficiency of labeled data is an important problem in image classification such as face recognition. However, unlabeled data are abundant in the real-world application. Therefore, semisupervised learning methods, which corporate a few labeled data and a large number of unlabeled data into learning, have received more and more attention in the field of face recognition. During the past years, graph-based semisupervised learning has been becoming a popular topic in the area of semisupervised learning. In this chapter, we newly present graph-based semisupervised learning method for face recognition. The presented method is based on local and global regression regularization. The local regression regularization has adopted a set of local classification functions to preserve both local discriminative and geometrical information, as well as to reduce the bias of outliers and handle imbalanced data; while the global regression regularization is to preserve the global discriminative information and to calculate the projection matrix for out-of-sample extrapolation. Extensive simulations based on synthetic and real-world datasets verify the effectiveness of the proposed method.

### Keywords

- Semi-supervised Learning
- Dimensionality Reduction
- Local and Global Regressions
- Face Recognition
- Transductive and Inductive Learning

## 1. Introduction

In the real world, there are ever-increasing vision face data generated from Internet surfing and daily social communication. These metadata can be labeled or unlabeled, and accordingly be utilized for image retrieval, summarization, and indexing. To handle these datasets for realizing the above tasks, automatic annotation is an elementary step, which can be formulated as a pattern classification problem and accomplished by learning-based techniques. Traditionally, the supervised-learning-based methods, such as Linear discriminant analysis (LDA) and Support Vector Machine (SVM), can deliver satisfactory recognition accuracy given that the number of labeled data is adequate. But labeling a huge amount of data is expensive and time consuming. On the other hand, the unlabeled data are sufficient and can be easily obtained from real-world application. Therefore, semisupervised learning-based methods that utilize a few of labeled data and a huge amount of unlabeled data are becoming more and more popular than only relying on the supervised learning methods [27–33].

Recently, since two pioneer semisupervised methods, i.e., Gaussian Fields and Harmonic Functions (GFHF) and Learning with Local and Global Consistency (LLGC), have been proposed in 2003 and 2004, respectively, graph-based semisupervised learning methods have received considerable research interest in the area of semisupervised learning. These methods usually represent both labeled and unlabeled sets by a graph, and then utilize their graph Laplacian matrix to characterize the manifold structure. Finally, different learning tasks such as image classification, clustering, and dimensionality reduction are performed on the graph Laplacian matrix. For example, GFHF and LLGC work in a transductive way by directly propagating the class label information from the labeled set to the unlabeled set along the graph, where the labels of unlabeled data can be estimated. Other similar works include Random Walk [5] and Special Label Propagation (SLP) [8]. However, the transductive learning methods cannot predict the class labels of new-coming samples, hence suffering the out-of-sample problem.

To solve the out-of-sample problem, inductive learning methods are proposed during the past decades. Typical methods for inductive learning are Manifold Regularization (MR) [1] and Semisupervised Discriminant Analysis (SDA) [2]. The MR tries to learn a projection matrix by adding the graph Laplaican regularized term to the cost function of original supervised methods. Therefore, both unlabeled and new-coming data can be cast into a low-dimensional subspace, hence the out-of-sample problem can be naturally solved [7, 9, 10, 16]. For example, MR has extended the regularized least square and SVM to their semisupervised learning extensions, i.e., Laplacian regularized least squares (Lap-RLS) and Laplacian SVM by adding a manifold regularized term. Similarly, Cai et al. [2] have extended LDA to SDA for semisupervised dimensionality reduction.

It should be noted that the success of semisupervised learning is based on how to utilizing the unlabeled data for characterizing the distribution of labels in data space. Several methods including Locally Linear Reconstruction [11, 12, 20], Local Regression and Global Alignment [13, 14], and Local Spline Regression [18, 19] have been developed to discover the intrinsic manifold structure of data. However, when we do semisupervised classification, the data points lying far away the data manifold are noisy for learning the correct classifier and can deteriorate the classification performance. On the other hand, sampling in real-world applications is usually not uniform. As a result, the sampled data may be imbalanced or with multi-density distribution. None of the aforementioned methods focus on solving the two problems.

In this chapter, we develop an effective semisupervised dimensionality reduction method, i.e., Local and Global Regression (LGR), for face recognition with outliers and imbalanced face data. In order to both handle transductive and inductive learning problems, LGR aims to sufficiently learn the classification function by using all data. In detail, the presented method first extends the original supervised regression term to a supervised loss term and a global regression regularized term, where the loss term is to fix the inconsistency between the predicted labels and initial labels, while the global regression term is to sufficiently learn the classification function using all training data and to obtain the projection matrix for handling out-of-sample problem. Furthermore, to capture the local discriminative information, a set of weighted local classification functions are adopted for each dataset to estimate the labels of its nearby data, where the weight is to reduce the outliers bias and to deal with imbalanced data. Thus, both local and global discriminative information of dataset can be preserved by the proposed LGR method.

The main contributions of this work are as follows: (1) we propose a new effective method for semisupervised dimensionality reduction, which can handle both transductive and inductive learning problems; (2) we develop a graph Laplacian matrix, which can characterize both local geometrical and discriminative information, as well as reduce the bias of outliers and handle imbalanced data; (3) we have also established the connection between the proposed method and other state-of-the-art methods. Theoretical analysis has shown that many popular semi‐supervised methods such as LRGA can be viewed as the special cases of the proposed method. Extensive simulations based on synthetic and real-world datasets verify the effectiveness of the proposed method.

This chapter is organized as follows. In Section 2, the notations and motivations are first given. We then propose our LGR method for both handling transductive and inductive learning problems. Finally, we also establish the connection between the proposed method and other state-of-the-art methods. Section 3 demonstrates the extensive simulations and the final conclusions are drawn in Section 4.

## 2. The proposed method

### 2.1. Notation and motivation

In semi-supervised learning, we define *l* and the remaining *u* columns are the labeled and unlabeled samples, respectively; *y*_{j} representing the class assignment of *x*_{j}, i.e. *x*_{j} belongs to the *i*th class; *D* and *c* are the numbers of features and classes, respectively. We also let *W* is the weight matrix defined as *x*_{i} is within the *k* nearest neighbor of *x*_{j} or if *x*_{j} is within the *k* nearest neighbor of *x*_{i}; *D* is a diagonal matrix satisfying

Most semi-supervised learning methods utilize the Gaussian function based affinity matrix. As point out in references [11, 12], the Gaussian function based affinity matrix is found to be oversensitive to the Gaussian variance; only a slight variation on the variance may affect the results dramatically. Thus, Gaussian function based affinity matrix is not a good method for handling image classification. The method developed should be robust to the parameters.

Second, when carrying out semisupervised classification, the samples lying far away from the data manifold are outliers which may lead to learn an incorrect classifier and deteriorate the classification performance. Considering Figure 1(a and b) as examples, we generalize a two-cycle and two-moon datasets with outliers. Considering the distribution of two data, the ideal decision boundary should lie in the gap between two data sub-manifolds. However, since there are many outliers around the data manifold, these outliers will blur the clear distribution of the whole data and are noisy to learn a correct classifier. Therefore, it is very important to develop a method that can adaptively reduce the effects of outliers.

Third, in real-world applications, sampling is usually not uniform. Consequently, the sampled data can be imbalanced or follows multi-density distribution. Figure 1(c) shows a two-plate dataset with two classes: each class follows a Gaussian distribution but with different cores and density. Obviously, the data points (left data points) in the high-density area will take more important part than those (right data points) in the low-density area when to learn a classifier, which may cause incorrect classification results. The method developed should handle such imbalanced data with multi-density distribution.

The method developed should also solve the out-of-sample problem. To address the above problems, we, in this paper, propose a new semisupervised learning method, which is based on local and global regression.

### 2.2. Local and global regression

We start from the supervised least-squares regression. The least-square regression is to fix a linear model *X* on *Y*:

where *V* is the projection matrix that is to project the new-coming samples and *b* is the bias term. Although the label *y*_{j} of *x*_{j} (*l* is usually very small, the classification function *z*_{j} and add a regression term to Eq. (1) as follows:

According to Eq. (2), the classification function *x*_{j}. We denote *k* neighborhood set of *x*_{j} with itself,

However, minimizing the above total errors over all data samples tends to force each local error *x*_{j} in order to penalize each regression error, which can be shown as

In the following section, we will discuss how to select the weight *V*_{j} and bias *b*_{j}, we perform derivatives to Eq. (4) w.r.t. *V*_{j} and *b*_{j} to zeros. Then, Eq. (4) will be reduced to

where *x*_{p} is the *q*th neighbors to *x*_{p}; *V* and *b* to zero, we have

where *b* and *V* in Eq. (6), the global regression term in Eq. (2) can be written as

where

where *l* and the remaining *u* diagonal elements as 1 and 0, respectively; the second term describes the local discriminative structure of data; the third term describes the global discriminative structure; and **LGR**. Finally, by performing derivatives of *z* to zero, we can calculate the solution of *z* as

Then, we can obtain the optimal projection matrix and bias term by replacing *z* in Eq. (6).

### 2.3. Weight selection for bias reduction

In this section, we consider how to select the weights in the proposed method suggested in Section 2.2. Note, our goal of using the weights is to weaken the effects of outliers and the weight *x*_{j} and should be far away from outliers. Hence, the weight

1. Initialize x_{j}. |

2. Update |

3. Update |

4. Iterate steps 2 and 3 until |

Table 1 shows the basic steps of the iterative approach. Following Table 1, the weight

**Theorem 1**. *The approach in Table 1 will monotonically decrease the objective function ∑i=0k−1∥xji−μjt∥ until convergence*.

**Proof**. According to step 3 in Table 1, we know that

where

Based on the lemma in reference [6] that

By summing Eqs. (11) and (12) in two sides, we have

Eq. (14) indicates that the objective function

Here, in order to show the convergence of the approach, we simply show an example in Figure 2(a), where we generalize eight normal data points and two outliers in *R*^{2}. Figure 2(b) shows the converged route of *t*. From Figure 2(b), we can observe that the optimal solution

### 2.4. Normalizing graph Laplacian matrix

It can be easily proved that *L*_{d} is a graph Laplacian matrix ( see the Appendix). But *L*_{d} may not be a normalized graph Laplacian matrix. As pointed in references [8, 23], the normalization can strengthen the local regressions in the low-density region and weaken those in the high-density region. Since the data sampling is usually uniform in practice, normalization is useful for handling the case when the density of dataset varies dramatically. In this section, we show that by choosing a special weight vector *X*_{j}, *L*_{d} can be a normalized graph Laplacian matrix.

Specifically, let us consider a data sample *x*_{j} and let *K*_{l} be the index set of those neighborhoods; set *x*_{j} as a neighbor of *x*_{j}, i.e., if *x*_{l} can be denoted as *l* and *j*. Obviously, if *x*_{l} is in the low-density area, it has sparse neighbors and *K*_{l} is relatively small. As a result, its connections to other samples will be weaker than that which has large *K*_{l}. Here, to strengthen the connections of samples in the low-density area, we need to normalize the weights corresponding to each *K*_{l}. Let *l* be the global index of

Hence, based on this definition, we have the following theorem:

*Theorem 2. With the normalization for each wji as in Eq. (14), L*_{d} *is both graph Laplaican matrix and normalized graph Laplacian matrix*.

**Proof**. The proof that *L*_{d} is a graph Laplacian matrix can be seen in the Appendix. In order to prove *L*_{d} is a normalized graph Laplacian matrix, we need prove *L*_{d} can be reformulated in the form of *W*_{d} is equal to 1. Note *W*_{d} as follows:

where each

Then, *L*_{d} can be reformulated as

Here, for each *l* corresponding to

The second equation holds as *L*_{d} in the form of *L*_{d} is a graph Laplaican matrix (as proved in the Appendix), it satisfies

which indicates that the sum of each column or row of *W*_{d} is equal to 1. We thus prove the theorem. Theorem 2 indicates that by choosing a special weight vector *L*_{d} can be both graph Laplacian matrix and normalized graph Laplacian matrix.

Here, it should be noted that if *x*_{l} is an outlier, its local weights can be significantly decreased, whether taking *x*_{l} as a neighbor of itself or of other data points. Otherwise, the normalization does not change the magnitude of its original local weights. For some data points in the low-density area, normalizing the weights can increase the information convection through those points. Finally, the basic steps of the proposed LGR are given in Table 2 and the flowchart by utilizing the proposed LGR method for face recognition is given in Figure 3.

Input: Data matrix |

Output: The projection matrix |

Algorithm: |

1. Determine the weight for each local patch based on Table 1. |

2. Normalize the weight as in Eq. (14). |

3. Form local regression regularized term L_{d} as in Eq. (5) with special local weight vector. |

4. Form global regression regularized term L_{g} as in Eq. (7). |

5. Solve the regression problem as in Eq. (8): and calculate estimated label matrix |

6. Calculate the projection matrix V^{*} by replacing z^{*} to Eq. (6) as V^{*}. |

### 2.5. Discussion and relative work

In this section, we discuss the relationship of Learning from Local and Global Information (LLGDI) with other state-of-the-art methods including MR, Flexible Manifold Embedding (FME), and Local Regression and Global Alignment (LRGA).

#### 2.5.1. Relationship to manifold regularization (Lap-RLS/L) [1]

The goal of MR [1] is to develop a semisupervised learning strategy by extending the original supervised methods, such as RLS and SVM to their semisupervised learning versions, i.e., Laplacian RLS and Laplacian SVM. For example, Lap-RLS/L is to fix a linear model *X* on *Y* and simultaneously to preserve the manifold smoothness in the embeddings of both the labeled and the unlabeled set. The objective function of Lap-RLS/L can be given as

However, it can be observed that Lap-RLS/L cannot sufficiently train the classification function due to the utilization of labeled samples, though it uses manifold term as complementary. Hence, the proposed LGR is superior to Lap-RLS/L.

#### 2.5.2. Relationship to FME [7, 10]

Nie et al. has proposed another unified framework, i.e., FME [7, 10], for semisupervised dimensionality reduction, in which they verify that LLGC, GFHF, and Lap-RLS/L are only special cases in the framework. The basic objective function of FME can be given as

It can be observed that Eq. (22) is almost the same as the objective function of LGR in Eq. (10), when we consider

#### 2.5.3. Relationship to LRGA [13, 14]

Recently, Yang et al. has proposed semisupervised transductive learning method, namely, LRGA [13, 14], for multimedia retrieval. They share the similar concept with the proposed method. The basic objective function of LRGA can be given as

It can be noted that LRGA is a special case of LGR when

## 3. Simulation results

In this section, we will evaluate the proposed LGR based on three synthetic datasets and two real-world datasets.

### 3.1. Synthetic datasets

In this section, we evaluate the performance of the proposed LGR and SLP for transductive learning. The SLP is an extensive method to GFHF, LLGC, and Random Walk (RW) hence, it is representative. Here, we utilize two-moon and two-cycle datasets in Figure 1(a and b) for evaluation. Figure 4 shows the results of LGR and SLP for transductive learning. From Figure 4, we can see that LGR can achieve better simulation result than SLP, in a way that less data are misclassified in LGR than SLP. This indicates the proposed LGR is robust to the outliers.

We also evaluate the inductive performance of the proposed LGR for handling the out-of-sample problem. Figure 5 shows the gray images of decision surfaces and boundaries learned by LGR, which are formed as follows: for each pixel, we form the its gray value as the difference from each pixel to its nearest labeled data of different classes in the reduced subspace. Here, we set the reduced dimensionality as 1. Then, we form the decision boundaries by the pixels with the value 0. Following Figure 5, we can observe that the proposed LGR can learn clear decision boundary that can well separate two classes, which verifies the effectiveness of LGR for handling the out-of-sample problem.

To show the merit of normalization, we utilize two-plate dataset in Figure 1(c) for evaluation. Our goal is to show LGR can handle multi-density dataset. Figure 6 shows the gray images of decision surfaces and boundaries learned by LGR without normalization and LGR with normalization. From Figure 6, we can observe that LGR without normalization cannot find proper boundary. However, LGR with normalization can achieve better performance, as there are less missing-classified data points separated by the decision boundary, which becomes more distinctive and accurate. The improved results are believed to be due to the fact that normalization can strengthen the local regressions in the low-density region and weaken those in the high-density region. This is proved to be advantageous to be used for multi-density dataset.

### 3.2. Semisupervised face recognition based on real-world benchmark datasets

For handling the face recognition problem, we use three real-world face datasets to evaluate the performance of methods, which include UMNIST: cannot find the full name [24], Extended Yale-B [25], and Massachusetts Institute of Technology Center for Biological and Computational Learning (MIT-CBCL) [26] datasets. The UMIST dataset is a multi-view face dataset, consisting of 1012 images of 20 peoples, each covering a wide range of poses from profile to frontal views. Therefore, the UMIST has widely been used for general purpose face recognition under different face poses. The size of each image is 112×92 with 256 gray levels per pixel. In our simulation, we down-sample the size of each image to 28×23 and no other preprocessing is performed. The Extended Yale-B dataset contains 16,123 images of 38 human subjects under 9 poses and 64 illumination conditions. Because of the illumination variability, the same object can appear dramatically different even when viewed in fixed pose. Hence, this is another challenge for face recognition, and Extended Yale-B dataset are extensively used for testing appearance-based face recognition methods. Similar to the UMIST dataset, the images are also cropped and resized to 32×32 pixels. This dataset now has around 64 near frontal images under different illuminations per individual. The MIT-CBCL dataset provides 3240 synthetic images rendered from 3D head models of 10 peoples. The head models are generated by fitting a morphable model to the high-resolution training images. Different from UMNIST dataset, the MIT-CBCL dataset is based on the 3D morphable model, which is rendered under varying pose and illumination conditions making the face recognition task more challengeable. The size of each image is originally 200×200 with 256 gray levels per pixel. In our simulation, we down-sample the size of each image to 32×32 and no other preprocessing is performed. The detailed information of dataset and some sampled images of real-world datasets can be shown in Table 3 and Figure 7. For each dataset, we randomly select 10, 50 and 30 samples from each class as training samples for UMNIST, Extended Yale-B, and MIT-CBCL datasets. The test set is then formed by the selected or all remaining samples. The data partitioning for each dataset is also given in Table 3.

Next, we compare our method with other supervised and semisupervised dimension reduction methods. These methods include Regularized Linear discriminant analysis (RLDA), SDA [2], Lap-RLS/L [1], least-square solution for solving SDA in Eq. (16) (in Table 1, we refer to it as LS-SDA) [28], FME [7, 10], and the proposed LGR. Note that Principal Component Analysis (PCA) is an unsupervised method while RLDA is supervised methods, and the remaining methods LGR are all semisupervised methods. The simulation settings are as follows: for SDA, Lap-RLS/L, two parameters, i.e., *α*_{t} and *α*_{m}, need to be determined for balancing the trade-off between the manifold and Tikhonov terms. We use fivefold cross validation to determine the best values and the candidate set is *α*_{t} in RLDA and the addition regularized parameter *α*_{r} in FME and LGR. In order to eliminate the null space before performing dimension reduction, the training sets in all datasets are preliminarily processed with PCA operator. Since most of methods, such as RLDA, SDA, Lap-RLS/L and FME, and the proposed LGR have a limited rank of *c*–1, we simply reduce the dimensionality of all methods to *c*–1. All methods used labeled set in the output reduced subspace to train a nearest neighborhood classifier in order to evaluate the classification accuracy of test set. We also compare the performance of nearest neighborhood classifier with other state-of-the-art methods as a baseline.

Dataset | Database Type | #Samples | #Dim | #Class | #Training per Class | #Test per Class |
---|---|---|---|---|---|---|

UMNIST | Face | 1012 | 1024 | 20 | 20 | Remains |

Extended Yale-B | Face | 16123 | 1024 | 38 | 50 | Remains |

MIT-CBCL | Face | 3240 | 1024 | 10 | 30 | 30 |

The average accuracies over 20 random splits with the above parameters for each dataset are shown in Table 4. From the simulation results, we can obtain the following observation: (1) given sufficient labeled samples, all the supervised and semisupervised dimension reduction methods outperform nearest neighborhood classifier due to the utilization of label information and feature extraction; (2) the semisupervised dimension reduction methods are better than the corresponding supervised methods. For example, SDA outperforms RLDA by about 5–6% in COIL100 dataset with two labeled samples per class. For other datasets, it can outperform by 2–3%. This indicates that by incorporating the unlabeled set into the training procedure, the classification performance can be markedly improved, as the manifold structure embedded in the dataset is preserved; (3) we also observe that both SDA and the least-square solution in Table 1 can achieve the same classification results due to the reason as analyzed in Section 3; (4) the proposed LGR can deliver better accuracies than those delivered by other semisupervised dimension reduction methods such as SDA and Lap-RLS/L by about 3–4% in most datasets. The improvement can even achieve almost 8% in ETH80 dataset with two labeled samples per class. The improvement is believed to be true that LGR aims to characterize both local and global discriminative information embedded in dataset, which is better to handle classification problem; (5) we observe that LGR outperform FME by about 2% in most cases. The main reason is that LGR has utilized a weighted normalized local discriminative Laplacian matrix to preserve both manifold and discriminative structures in dataset, which is better than only relying on neighborhood graph.

Dataset | Method | 4 labeled samples per class | 7 labeled samples per class | 10 labeled samples per class | |||
---|---|---|---|---|---|---|---|

Unlabeled | Test | Unlabeled | Test | Unlabeled | Test | ||

Mean±std | Mean±std | Mean±std | Mean±std | Mean±std | Mean±std | ||

UMNIST | Baseline | 81.1±0.9 | 80.2±1.0 | 88.6±0.7 | 88.3±0.7 | 93.1±0.6 | 93.0±0.7 |

RLDA | 85.2±0.6 | 85.0±0.7 | 90.7±0.5 | 90.4±0.6 | 95.3±0.4 | 94.4±0.5 | |

SDA | 86.4±0.7 | 86.3±0.7 | 92.1±0.6 | 91.7±0.7 | 96.2±0.4 | 95.4±0.5 | |

LS-SDA | 86.4±0.7 | 86.3±0.7 | 92.1±0.6 | 91.7±0.7 | 96.2±0.4 | 95.4±0.5 | |

Lap-RLS/L | 86.6±0.7 | 86.0±0.8 | 91.9±0.3 | 91.9±0.4 | 95.7±0.5 | 95.3±0.6 | |

FME | 88.2±0.6 | 87.7±0.6 | 93.1±0.3 | 92.9±0.4 | 96.7±0.5 | 96.1±0.5 | |

LGR | 89.2±0.4 | 88.9±0.5 | 94.2±0.2 | 93.8±0.4 | 97.9±0.6 | 97.2±0.4 |

Dataset | Method | 4 labeled samples per class | 7 labeled samples per class | 10 labeled samples per class | |||
---|---|---|---|---|---|---|---|

Unlabeled | Test | Unlabeled | Test | Unlabeled | Test | ||

mean±std | mean±std | mean±std | mean±std | mean±std | mean±std | ||

Extended Yale-B | Baseline | 50.5±1.9 | 50.1±1.9 | 63.9±1.5 | 63.6±1.7 | 69.8±1.5 | 69.4±1.6 |

RLDA | 74.8±2.0 | 74.5±2.1 | 81.6±1.6 | 81.1±1.6 | 89.6±1.5 | 89.3±1.7 | |

SDA | 86.4±1.9 | 86.0±2.0 | 89.8±1.7 | 89.5±1.8 | 92.8±1.5 | 92.6±1.5 | |

LS-SDA | 86.4±1.9 | 86.082.0 | 89.8±1.7 | 89.5±1.8 | 92.8±1.5 | 92.6±1.5 | |

Lap-RLS/L | 86.7±2.0 | 86.2±2.2 | 90.1±1.8 | 89.7±1.9 | 93.1±1.6 | 92.8±1.7 | |

FME | 88.2±1.8 | 88.0±2.0 | 91.3±1.7 | 91.1±1.9 | 93.9±1.5 | 93.6±1.6 | |

LGR | 89.5±1.7 | 89.1±2.2 | 92.2±1.6 | 92.1±1.6 | 94.8±1.4 | 94.5±2.0 |

Dataset | Method | 4 labeled samples per class | 7 labeled samples per class | 10 labeled samples per class | |||
---|---|---|---|---|---|---|---|

Unlabeled | Test | Unlabeled | Test | Unlabeled | Test | ||

Mean±std | Mean±std | Mean±std | Mean±std | Mean±std | Mean±std | ||

MIT-CBCL | Baseline | 69.5±3.5 | 68.9±3.9 | 80.5±6.9 | 79.8±7.0 | 90.3±4.0 | 90.1±3.8 |

RLDA | 72.6±2.9 | 73.8±4.6 | 82.5±7.6 | 82.2±7.2 | 92.1±3.2 | 92.9±3.0 | |

SDA | 75.7±2.9 | 76.9±4.7 | 84.7±6.7 | 84.2±6.5 | 95.0±3.3 | 94.9±3.1 | |

LS-SDA | 75.7±2.9 | 76.9±4.7 | 84.7±6.7 | 84.2±6.5 | 95.0±3.3 | 94.9±3.1 | |

Lap-RLS/L | 75.2±4.0 | 76.1±4.1 | 84.7±6.5 | 84.1±6.5 | 93.7±4.3 | 93.7±3.6 | |

FME | 78.4±2.5 | 78.1±3.2 | 85.3±5.4 | 85.1±5.0 | 95.1±3.7 | 94.9±3.1 | |

LGR | 81.0±2.2 | 80.8±3.2 | 87.2±5.1 | 86.5±5.1 | 96.5±3.4 | 96.4±3.2 |

## 4. Conclusion

In this chapter, we propose a semisupervised method, namely LGR, for face recognition. With the above analysis, the following conclusions can be drawn: (1) the proposed LGR can achieve better results in face recognition than those delivered by other state-of-the-art methods as more discriminative information are captured based on local and global regressions, (2) the proposed LGR is robust to outliers and can handle the imbalanced data, and (3) the proposed LGR can deal with out-of-sample extrapolation to estimate the labels of new-coming face data by casting it to the global projection matrix.

## Appendix

In order to prove that *L*_{d} is graph Laplacian matrix, we need to prove *L*_{d} is positive semidefinite matrix and the sum of each row or column of *L*_{d} is equal to zero. We first have the following Lemmas:

*Lemma 1. For each local patch X*_{j},* L*_{j} *can be reformulated as follows:*

*where Gj=(I−ΔjekTek/(ekΔjekT))Δj−1/2∈Rk×k*.

Proof. First, it can be easily noted that

Then, we have

The second equation holds as *A*. Thus, Lemma 1 is proved.

*Lemma 2. Given a positive semidefinite matrix C, DCD*^{T} *is a positive semidefinite matrix for any matrix D*.

*Lemma 3. Given a set of positive semidefinite matrixes* *then* *is a positive semidefinite matrix.*

We neglect the proofs of Lemmas 2 and 3 as they can be seen in reference [15]. Then with Lemmas 1–3, we can easily prove Theorem 2 as follows:

**Proof of Theorem 2**. Note that following Lemma 1, we reformulate each *L*_{j} as *L*_{d}, i.e.,

is also a positive semidefinite matrix. In addition, for each

which indicates that the sum of each row or column of *L*_{d} is equal to zero. We thus prove *L*_{d} is graph Laplacian matrix.