Iterative approach for calculating the weight.
The insufficiency of labeled data is an important problem in image classification such as face recognition. However, unlabeled data are abundant in the real-world application. Therefore, semisupervised learning methods, which corporate a few labeled data and a large number of unlabeled data into learning, have received more and more attention in the field of face recognition. During the past years, graph-based semisupervised learning has been becoming a popular topic in the area of semisupervised learning. In this chapter, we newly present graph-based semisupervised learning method for face recognition. The presented method is based on local and global regression regularization. The local regression regularization has adopted a set of local classification functions to preserve both local discriminative and geometrical information, as well as to reduce the bias of outliers and handle imbalanced data; while the global regression regularization is to preserve the global discriminative information and to calculate the projection matrix for out-of-sample extrapolation. Extensive simulations based on synthetic and real-world datasets verify the effectiveness of the proposed method.
- Semi-supervised Learning
- Dimensionality Reduction
- Local and Global Regressions
- Face Recognition
- Transductive and Inductive Learning
In the real world, there are ever-increasing vision face data generated from Internet surfing and daily social communication. These metadata can be labeled or unlabeled, and accordingly be utilized for image retrieval, summarization, and indexing. To handle these datasets for realizing the above tasks, automatic annotation is an elementary step, which can be formulated as a pattern classification problem and accomplished by learning-based techniques. Traditionally, the supervised-learning-based methods, such as Linear discriminant analysis (LDA) and Support Vector Machine (SVM), can deliver satisfactory recognition accuracy given that the number of labeled data is adequate. But labeling a huge amount of data is expensive and time consuming. On the other hand, the unlabeled data are sufficient and can be easily obtained from real-world application. Therefore, semisupervised learning-based methods that utilize a few of labeled data and a huge amount of unlabeled data are becoming more and more popular than only relying on the supervised learning methods [27–33].
Recently, since two pioneer semisupervised methods, i.e., Gaussian Fields and Harmonic Functions (GFHF) and Learning with Local and Global Consistency (LLGC), have been proposed in 2003 and 2004, respectively, graph-based semisupervised learning methods have received considerable research interest in the area of semisupervised learning. These methods usually represent both labeled and unlabeled sets by a graph, and then utilize their graph Laplacian matrix to characterize the manifold structure. Finally, different learning tasks such as image classification, clustering, and dimensionality reduction are performed on the graph Laplacian matrix. For example, GFHF and LLGC work in a transductive way by directly propagating the class label information from the labeled set to the unlabeled set along the graph, where the labels of unlabeled data can be estimated. Other similar works include Random Walk  and Special Label Propagation (SLP) . However, the transductive learning methods cannot predict the class labels of new-coming samples, hence suffering the out-of-sample problem.
To solve the out-of-sample problem, inductive learning methods are proposed during the past decades. Typical methods for inductive learning are Manifold Regularization (MR)  and Semisupervised Discriminant Analysis (SDA) . The MR tries to learn a projection matrix by adding the graph Laplaican regularized term to the cost function of original supervised methods. Therefore, both unlabeled and new-coming data can be cast into a low-dimensional subspace, hence the out-of-sample problem can be naturally solved [7, 9, 10, 16]. For example, MR has extended the regularized least square and SVM to their semisupervised learning extensions, i.e., Laplacian regularized least squares (Lap-RLS) and Laplacian SVM by adding a manifold regularized term. Similarly, Cai et al.  have extended LDA to SDA for semisupervised dimensionality reduction.
It should be noted that the success of semisupervised learning is based on how to utilizing the unlabeled data for characterizing the distribution of labels in data space. Several methods including Locally Linear Reconstruction [11, 12, 20], Local Regression and Global Alignment [13, 14], and Local Spline Regression [18, 19] have been developed to discover the intrinsic manifold structure of data. However, when we do semisupervised classification, the data points lying far away the data manifold are noisy for learning the correct classifier and can deteriorate the classification performance. On the other hand, sampling in real-world applications is usually not uniform. As a result, the sampled data may be imbalanced or with multi-density distribution. None of the aforementioned methods focus on solving the two problems.
In this chapter, we develop an effective semisupervised dimensionality reduction method, i.e., Local and Global Regression (LGR), for face recognition with outliers and imbalanced face data. In order to both handle transductive and inductive learning problems, LGR aims to sufficiently learn the classification function by using all data. In detail, the presented method first extends the original supervised regression term to a supervised loss term and a global regression regularized term, where the loss term is to fix the inconsistency between the predicted labels and initial labels, while the global regression term is to sufficiently learn the classification function using all training data and to obtain the projection matrix for handling out-of-sample problem. Furthermore, to capture the local discriminative information, a set of weighted local classification functions are adopted for each dataset to estimate the labels of its nearby data, where the weight is to reduce the outliers bias and to deal with imbalanced data. Thus, both local and global discriminative information of dataset can be preserved by the proposed LGR method.
The main contributions of this work are as follows: (1) we propose a new effective method for semisupervised dimensionality reduction, which can handle both transductive and inductive learning problems; (2) we develop a graph Laplacian matrix, which can characterize both local geometrical and discriminative information, as well as reduce the bias of outliers and handle imbalanced data; (3) we have also established the connection between the proposed method and other state-of-the-art methods. Theoretical analysis has shown that many popular semi‐supervised methods such as LRGA can be viewed as the special cases of the proposed method. Extensive simulations based on synthetic and real-world datasets verify the effectiveness of the proposed method.
This chapter is organized as follows. In Section 2, the notations and motivations are first given. We then propose our LGR method for both handling transductive and inductive learning problems. Finally, we also establish the connection between the proposed method and other state-of-the-art methods. Section 3 demonstrates the extensive simulations and the final conclusions are drawn in Section 4.
2. The proposed method
2.1. Notation and motivation
In semi-supervised learning, we define be the data matrix where the first l and the remaining u columns are the labeled and unlabeled samples, respectively; be the binary label matrix with each column yj representing the class assignment of xj, i.e. , as the class matrix, where , if xj belongs to the ith class; , otherwise, D and c are the numbers of features and classes, respectively. We also let be the graph Laplacian matrix associated with both labeled and unlabeled sets ,where W is the weight matrix defined as , if xi is within the k nearest neighbor of xj or if xj is within the k nearest neighbor of xi; , otherwise, D is a diagonal matrix satisfying .
Most semi-supervised learning methods utilize the Gaussian function based affinity matrix. As point out in references [11, 12], the Gaussian function based affinity matrix is found to be oversensitive to the Gaussian variance; only a slight variation on the variance may affect the results dramatically. Thus, Gaussian function based affinity matrix is not a good method for handling image classification. The method developed should be robust to the parameters.
Second, when carrying out semisupervised classification, the samples lying far away from the data manifold are outliers which may lead to learn an incorrect classifier and deteriorate the classification performance. Considering Figure 1(a and b) as examples, we generalize a two-cycle and two-moon datasets with outliers. Considering the distribution of two data, the ideal decision boundary should lie in the gap between two data sub-manifolds. However, since there are many outliers around the data manifold, these outliers will blur the clear distribution of the whole data and are noisy to learn a correct classifier. Therefore, it is very important to develop a method that can adaptively reduce the effects of outliers.
Third, in real-world applications, sampling is usually not uniform. Consequently, the sampled data can be imbalanced or follows multi-density distribution. Figure 1(c) shows a two-plate dataset with two classes: each class follows a Gaussian distribution but with different cores and density. Obviously, the data points (left data points) in the high-density area will take more important part than those (right data points) in the low-density area when to learn a classifier, which may cause incorrect classification results. The method developed should handle such imbalanced data with multi-density distribution.
The method developed should also solve the out-of-sample problem. To address the above problems, we, in this paper, propose a new semisupervised learning method, which is based on local and global regression.
2.2. Local and global regression
We start from the supervised least-squares regression. The least-square regression is to fix a linear model by regressing X on Y:
where V is the projection matrix that is to project the new-coming samples and b is the bias term. Although the label yj of xj () has already been known, since l is usually very small, the classification function may not be sufficiently trained due to the small sample size. To solve this problem, we introduce as a set of estimated labels to play the same roll by replacing with zj and add a regression term to Eq. (1) as follows:
According to Eq. (2), the classification function can be sufficiently learned by using all the predicted labels and to fix to their original labels. In other meaning, the global discriminative information can be preserved by the regression term of Eq. (2). Furthermore, to grasp the local discriminative information, we induce a local regression function for each data sample xj. We denote as the k neighborhood set of xj with itself, as the local data matrix formed by all samples in , where is the index set of and , . We also denote as the local low-dimensional label matrix in . Then, the local regression function for all data samples can be given as follows:
However, minimizing the above total errors over all data samples tends to force each local error similar to each other. Given some cases that the dataset includes some outliers, assuming all the local regression errors equally may emphasize the effects from outliers and weaken the effects from normal data. In this section, to weaken the effects from outliers, we add a weight vector for each local data patch xj in order to penalize each regression error, which can be shown as
In the following section, we will discuss how to select the weight . Our motivation is to let the weight of local error be large given are the normal data and in the contrast to let the weight be small given is outlier. In detail, to obtain local projection matrix Vj and bias bj, we perform derivatives to Eq. (4) w.r.t. Vj and bj to zeros. Then, Eq. (4) will be reduced to
where ; is the selected matrix satisfying , if xp is the qth neighbors to xp; , otherwise, is the local graph Laplacian matrix. Similarly, by setting the derivatives of Eq. (2) w.r.t. V and b to zero, we have
where is the diagonal matrix with the first l and the remaining u diagonal elements as 1 and 0, respectively; the second term describes the local discriminative structure of data; the third term describes the global discriminative structure; and and are the two balancing parameters. Since both local and global regressions are regularized in our method, we refer our method as LGR. Finally, by performing derivatives of w.r.t. z to zero, we can calculate the solution of z as
Then, we can obtain the optimal projection matrix and bias term by replacing z in Eq. (6).
2.3. Weight selection for bias reduction
In this section, we consider how to select the weights in the proposed method suggested in Section 2.2. Note, our goal of using the weights is to weaken the effects of outliers and the weight should be set to a small value if is an outlier. Then we can make the weight inversely proportional to the distance between and a center , i.e., . Such a center is expected to represent the idea center of data in the neighborhoods of xj and should be far away from outliers. Hence, the weight is usually small if is an outlier. But this center is unknown. We next present an iterative approach to calculate and the weight simultaneously. The approach is converged and proved afterward.
|1. Initialize as the average center of all data points in the local patch of xj.|
|2. Update for each as and form the weight matrix .|
|3. Update .|
|4. Iterate steps 2 and 3 until no changes. Output.|
Table 1 shows the basic steps of the iterative approach. Following Table 1, the weight at each iteration is updated from the last and the newly updated center is calculated from current . The whole iterations are continued until convergence, so that the weight can be adaptively and iteratively re-weighted to minimize . In addition, as can be seen in simulation of Figure 2, the updated will be adaptively re-weighted to be close to the main center of most data points, while the updated will be weaken if is outliers or be strengthened if is close to the ideal center. We next discuss a theorem to guarantee the convergence of the approach of Table 1.
Theorem 1. The approach in Table 1 will monotonically decrease the objective function ∑i=0k−1∥xji−μjt∥ until convergence.
Proof. According to step 3 in Table 1, we know that
Based on the lemma in reference  that holds for any two nonzero value, we have
Eq. (14) indicates that the objective function is monotonically decreased in each iteration. Since there is a lower bound in the objective function (), the iterative approach will certainly converge. We thus prove Theorem 1. Finally, by incorporating the weight for reducing the bias for each local regression error into Eq. (4), we can reduce the bias of outliers of data samples.
Here, in order to show the convergence of the approach, we simply show an example in Figure 2(a), where we generalize eight normal data points and two outliers in R2. Figure 2(b) shows the converged route of , where we start as the average mean of all data points and mark in each iteration with t. From Figure 2(b), we can observe that the optimal solution will iterative close to the main center of normal data while be far away from the outliers. Figure 2(c) shows the converged curve of approach as discussed in Table 1. From Figure 2(c), we can observe that the objective will monotonically decrease until convergence. Figure 2(d) shows the converged weight of data points. From Figure 2(d), we can observe the weights of normal data points are strengthen while those of outliers can be reduced.
2.4. Normalizing graph Laplacian matrix
It can be easily proved that Ld is a graph Laplacian matrix ( see the Appendix). But Ld may not be a normalized graph Laplacian matrix. As pointed in references [8, 23], the normalization can strengthen the local regressions in the low-density region and weaken those in the high-density region. Since the data sampling is usually uniform in practice, normalization is useful for handling the case when the density of dataset varies dramatically. In this section, we show that by choosing a special weight vector for each Xj, Ld can be a normalized graph Laplacian matrix.
Specifically, let us consider a data sample xj and let Kl be the index set of those neighborhoods; set contains xj as a neighbor of xj, i.e., if , then , where xl can be denoted as in the neighborhood set , and is the local index depending on l and j. Obviously, if xl is in the low-density area, it has sparse neighbors and Kl is relatively small. As a result, its connections to other samples will be weaker than that which has large Kl. Here, to strengthen the connections of samples in the low-density area, we need to normalize the weights corresponding to each Kl. Let be the weight of and l be the global index of . We then define as follows:
Hence, based on this definition, we have the following theorem:
Theorem 2. With the normalization for each wji as in Eq. (14), Ld is both graph Laplaican matrix and normalized graph Laplacian matrix.
Proof. The proof that Ld is a graph Laplacian matrix can be seen in the Appendix. In order to prove Ld is a normalized graph Laplacian matrix, we need prove Ld can be reformulated in the form of and the sum of each row or column of the affinity matrix Wd is equal to 1. Note and , where
, we first define the affinity matrix Wd as follows:
where each satisfies
Then, Ld can be reformulated as
Here, for each , we have , where is a column vector by putting each to its global index l corresponding to . We thus have
The second equation holds as ; hence, the sum of all in each element is equal to 1. Then, following Eq. (18), it indicates is an identity matrix, i.e., . Then based on the above analysis, we can reformulate Ld in the form of . In addition, since Ld is a graph Laplaican matrix (as proved in the Appendix), it satisfies , then we have
which indicates that the sum of each column or row of Wd is equal to 1. We thus prove the theorem. Theorem 2 indicates that by choosing a special weight vector for each , Ld can be both graph Laplacian matrix and normalized graph Laplacian matrix.
Here, it should be noted that if xl is an outlier, its local weights can be significantly decreased, whether taking xl as a neighbor of itself or of other data points. Otherwise, the normalization does not change the magnitude of its original local weights. For some data points in the low-density area, normalizing the weights can increase the information convection through those points. Finally, the basic steps of the proposed LGR are given in Table 2 and the flowchart by utilizing the proposed LGR method for face recognition is given in Figure 3.
|Input: Data matrix , the initial label matrix , and other related parameters.|
|Output: The projection matrix and estimated label matrix .|
|1. Determine the weight for each local patch based on Table 1.|
|2. Normalize the weight as in Eq. (14).|
|3. Form local regression regularized term Ld as in Eq. (5) with special local weight vector.|
|4. Form global regression regularized term Lg as in Eq. (7).|
|5. Solve the regression problem as in Eq. (8):|
and calculate estimated label matrix as in Eq. (9). Output .
|6. Calculate the projection matrix V* by replacing z* to Eq. (6) as . Output V*.|
2.5. Discussion and relative work
In this section, we discuss the relationship of Learning from Local and Global Information (LLGDI) with other state-of-the-art methods including MR, Flexible Manifold Embedding (FME), and Local Regression and Global Alignment (LRGA).
2.5.1. Relationship to manifold regularization (Lap-RLS/L) 
The goal of MR  is to develop a semisupervised learning strategy by extending the original supervised methods, such as RLS and SVM to their semisupervised learning versions, i.e., Laplacian RLS and Laplacian SVM. For example, Lap-RLS/L is to fix a linear model by regressing X on Y and simultaneously to preserve the manifold smoothness in the embeddings of both the labeled and the unlabeled set. The objective function of Lap-RLS/L can be given as
However, it can be observed that Lap-RLS/L cannot sufficiently train the classification function due to the utilization of labeled samples, though it uses manifold term as complementary. Hence, the proposed LGR is superior to Lap-RLS/L.
Nie et al. has proposed another unified framework, i.e., FME [7, 10], for semisupervised dimensionality reduction, in which they verify that LLGC, GFHF, and Lap-RLS/L are only special cases in the framework. The basic objective function of FME can be given as
It can be observed that Eq. (22) is almost the same as the objective function of LGR in Eq. (10), when we consider . However, LGR has utilized a weighted and normalized local discriminative Laplacian matrix to preserve manifold and discriminative structure in a dataset. This is a better way than only relying on neighborhood graph.
Recently, Yang et al. has proposed semisupervised transductive learning method, namely, LRGA [13, 14], for multimedia retrieval. They share the similar concept with the proposed method. The basic objective function of LRGA can be given as
It can be noted that LRGA is a special case of LGR when . Therefore, LRGA is only a transductive learning method and cannot handle the out-of-sample problem, while LGR is a transductive and inductive learning method. Another superiority of LGR over LRGA is that LGR has adopted a weighted normalized each local regression term. Thus, as shown in the simulation results, LLGDI can handle outliers and multi-density dataset remarkably.
3. Simulation results
In this section, we will evaluate the proposed LGR based on three synthetic datasets and two real-world datasets.
3.1. Synthetic datasets
In this section, we evaluate the performance of the proposed LGR and SLP for transductive learning. The SLP is an extensive method to GFHF, LLGC, and Random Walk (RW) hence, it is representative. Here, we utilize two-moon and two-cycle datasets in Figure 1(a and b) for evaluation. Figure 4 shows the results of LGR and SLP for transductive learning. From Figure 4, we can see that LGR can achieve better simulation result than SLP, in a way that less data are misclassified in LGR than SLP. This indicates the proposed LGR is robust to the outliers.
We also evaluate the inductive performance of the proposed LGR for handling the out-of-sample problem. Figure 5 shows the gray images of decision surfaces and boundaries learned by LGR, which are formed as follows: for each pixel, we form the its gray value as the difference from each pixel to its nearest labeled data of different classes in the reduced subspace. Here, we set the reduced dimensionality as 1. Then, we form the decision boundaries by the pixels with the value 0. Following Figure 5, we can observe that the proposed LGR can learn clear decision boundary that can well separate two classes, which verifies the effectiveness of LGR for handling the out-of-sample problem.
To show the merit of normalization, we utilize two-plate dataset in Figure 1(c) for evaluation. Our goal is to show LGR can handle multi-density dataset. Figure 6 shows the gray images of decision surfaces and boundaries learned by LGR without normalization and LGR with normalization. From Figure 6, we can observe that LGR without normalization cannot find proper boundary. However, LGR with normalization can achieve better performance, as there are less missing-classified data points separated by the decision boundary, which becomes more distinctive and accurate. The improved results are believed to be due to the fact that normalization can strengthen the local regressions in the low-density region and weaken those in the high-density region. This is proved to be advantageous to be used for multi-density dataset.
3.2. Semisupervised face recognition based on real-world benchmark datasets
For handling the face recognition problem, we use three real-world face datasets to evaluate the performance of methods, which include UMNIST: cannot find the full name , Extended Yale-B , and Massachusetts Institute of Technology Center for Biological and Computational Learning (MIT-CBCL)  datasets. The UMIST dataset is a multi-view face dataset, consisting of 1012 images of 20 peoples, each covering a wide range of poses from profile to frontal views. Therefore, the UMIST has widely been used for general purpose face recognition under different face poses. The size of each image is 112×92 with 256 gray levels per pixel. In our simulation, we down-sample the size of each image to 28×23 and no other preprocessing is performed. The Extended Yale-B dataset contains 16,123 images of 38 human subjects under 9 poses and 64 illumination conditions. Because of the illumination variability, the same object can appear dramatically different even when viewed in fixed pose. Hence, this is another challenge for face recognition, and Extended Yale-B dataset are extensively used for testing appearance-based face recognition methods. Similar to the UMIST dataset, the images are also cropped and resized to 32×32 pixels. This dataset now has around 64 near frontal images under different illuminations per individual. The MIT-CBCL dataset provides 3240 synthetic images rendered from 3D head models of 10 peoples. The head models are generated by fitting a morphable model to the high-resolution training images. Different from UMNIST dataset, the MIT-CBCL dataset is based on the 3D morphable model, which is rendered under varying pose and illumination conditions making the face recognition task more challengeable. The size of each image is originally 200×200 with 256 gray levels per pixel. In our simulation, we down-sample the size of each image to 32×32 and no other preprocessing is performed. The detailed information of dataset and some sampled images of real-world datasets can be shown in Table 3 and Figure 7. For each dataset, we randomly select 10, 50 and 30 samples from each class as training samples for UMNIST, Extended Yale-B, and MIT-CBCL datasets. The test set is then formed by the selected or all remaining samples. The data partitioning for each dataset is also given in Table 3.
Next, we compare our method with other supervised and semisupervised dimension reduction methods. These methods include Regularized Linear discriminant analysis (RLDA), SDA , Lap-RLS/L , least-square solution for solving SDA in Eq. (16) (in Table 1, we refer to it as LS-SDA) , FME [7, 10], and the proposed LGR. Note that Principal Component Analysis (PCA) is an unsupervised method while RLDA is supervised methods, and the remaining methods LGR are all semisupervised methods. The simulation settings are as follows: for SDA, Lap-RLS/L, two parameters, i.e., αt and αm, need to be determined for balancing the trade-off between the manifold and Tikhonov terms. We use fivefold cross validation to determine the best values and the candidate set is . The above candidate set is also used for determining the best value for the Tikhonov term parameter αt in RLDA and the addition regularized parameter αr in FME and LGR. In order to eliminate the null space before performing dimension reduction, the training sets in all datasets are preliminarily processed with PCA operator. Since most of methods, such as RLDA, SDA, Lap-RLS/L and FME, and the proposed LGR have a limited rank of c–1, we simply reduce the dimensionality of all methods to c–1. All methods used labeled set in the output reduced subspace to train a nearest neighborhood classifier in order to evaluate the classification accuracy of test set. We also compare the performance of nearest neighborhood classifier with other state-of-the-art methods as a baseline.
|Dataset||Database Type||#Samples||#Dim||#Class||#Training per Class||#Test per Class|
The average accuracies over 20 random splits with the above parameters for each dataset are shown in Table 4. From the simulation results, we can obtain the following observation: (1) given sufficient labeled samples, all the supervised and semisupervised dimension reduction methods outperform nearest neighborhood classifier due to the utilization of label information and feature extraction; (2) the semisupervised dimension reduction methods are better than the corresponding supervised methods. For example, SDA outperforms RLDA by about 5–6% in COIL100 dataset with two labeled samples per class. For other datasets, it can outperform by 2–3%. This indicates that by incorporating the unlabeled set into the training procedure, the classification performance can be markedly improved, as the manifold structure embedded in the dataset is preserved; (3) we also observe that both SDA and the least-square solution in Table 1 can achieve the same classification results due to the reason as analyzed in Section 3; (4) the proposed LGR can deliver better accuracies than those delivered by other semisupervised dimension reduction methods such as SDA and Lap-RLS/L by about 3–4% in most datasets. The improvement can even achieve almost 8% in ETH80 dataset with two labeled samples per class. The improvement is believed to be true that LGR aims to characterize both local and global discriminative information embedded in dataset, which is better to handle classification problem; (5) we observe that LGR outperform FME by about 2% in most cases. The main reason is that LGR has utilized a weighted normalized local discriminative Laplacian matrix to preserve both manifold and discriminative structures in dataset, which is better than only relying on neighborhood graph.
|Dataset||Method||4 labeled samples per class||7 labeled samples per class||10 labeled samples per class|
|Dataset||Method||4 labeled samples per class||7 labeled samples per class||10 labeled samples per class|
|Dataset||Method||4 labeled samples per class||7 labeled samples per class||10 labeled samples per class|
In this chapter, we propose a semisupervised method, namely LGR, for face recognition. With the above analysis, the following conclusions can be drawn: (1) the proposed LGR can achieve better results in face recognition than those delivered by other state-of-the-art methods as more discriminative information are captured based on local and global regressions, (2) the proposed LGR is robust to outliers and can handle the imbalanced data, and (3) the proposed LGR can deal with out-of-sample extrapolation to estimate the labels of new-coming face data by casting it to the global projection matrix.
In order to prove that Ld is graph Laplacian matrix, we need to prove Ld is positive semidefinite matrix and the sum of each row or column of Ld is equal to zero. We first have the following Lemmas:
Lemma 1. For each local patch Xj, Lj can be reformulated as follows:
Proof. First, it can be easily noted that , which is verified as follows:
Then, we have
The second equation holds as , for any matrix A. Thus, Lemma 1 is proved.
Lemma 2. Given a positive semidefinite matrix C, DCDT is a positive semidefinite matrix for any matrix D.
Lemma 3. Given a set of positive semidefinite matrixes then is a positive semidefinite matrix.
We neglect the proofs of Lemmas 2 and 3 as they can be seen in reference . Then with Lemmas 1–3, we can easily prove Theorem 2 as follows:
Proof of Theorem 2. Note that following Lemma 1, we reformulate each Lj as . It can be noted is a positive semidefinite matrix, then, following Lemmas 2 and 3, we have each is a positive semidefinite matrix and Ld, i.e.,
is also a positive semidefinite matrix. In addition, for each , we have and
which indicates that the sum of each row or column of Ld is equal to zero. We thus prove Ld is graph Laplacian matrix.