Iterative approach for calculating the weight.
The insufficiency of labeled data is an important problem in image classification such as face recognition. However, unlabeled data are abundant in the real-world application. Therefore, semisupervised learning methods, which corporate a few labeled data and a large number of unlabeled data into learning, have received more and more attention in the field of face recognition. During the past years, graph-based semisupervised learning has been becoming a popular topic in the area of semisupervised learning. In this chapter, we newly present graph-based semisupervised learning method for face recognition. The presented method is based on local and global regression regularization. The local regression regularization has adopted a set of local classification functions to preserve both local discriminative and geometrical information, as well as to reduce the bias of outliers and handle imbalanced data; while the global regression regularization is to preserve the global discriminative information and to calculate the projection matrix for out-of-sample extrapolation. Extensive simulations based on synthetic and real-world datasets verify the effectiveness of the proposed method.
- Semi-supervised Learning
- Dimensionality Reduction
- Local and Global Regressions
- Face Recognition
- Transductive and Inductive Learning
In the real world, there are ever-increasing vision face data generated from Internet surfing and daily social communication. These metadata can be labeled or unlabeled, and accordingly be utilized for image retrieval, summarization, and indexing. To handle these datasets for realizing the above tasks, automatic annotation is an elementary step, which can be formulated as a pattern classification problem and accomplished by learning-based techniques. Traditionally, the supervised-learning-based methods, such as Linear discriminant analysis (LDA) and Support Vector Machine (SVM), can deliver satisfactory recognition accuracy given that the number of labeled data is adequate. But labeling a huge amount of data is expensive and time consuming. On the other hand, the unlabeled data are sufficient and can be easily obtained from real-world application. Therefore, semisupervised learning-based methods that utilize a few of labeled data and a huge amount of unlabeled data are becoming more and more popular than only relying on the supervised learning methods [27–33].
Recently, since two pioneer semisupervised methods, i.e., Gaussian Fields and Harmonic Functions (GFHF) and Learning with Local and Global Consistency (LLGC), have been proposed in 2003 and 2004, respectively, graph-based semisupervised learning methods have received considerable research interest in the area of semisupervised learning. These methods usually represent both labeled and unlabeled sets by a graph, and then utilize their graph Laplacian matrix to characterize the manifold structure. Finally, different learning tasks such as image classification, clustering, and dimensionality reduction are performed on the graph Laplacian matrix. For example, GFHF and LLGC work in a transductive way by directly propagating the class label information from the labeled set to the unlabeled set along the graph, where the labels of unlabeled data can be estimated. Other similar works include Random Walk  and Special Label Propagation (SLP) . However, the transductive learning methods cannot predict the class labels of new-coming samples, hence suffering the out-of-sample problem.
To solve the out-of-sample problem, inductive learning methods are proposed during the past decades. Typical methods for inductive learning are Manifold Regularization (MR)  and Semisupervised Discriminant Analysis (SDA) . The MR tries to learn a projection matrix by adding the graph Laplaican regularized term to the cost function of original supervised methods. Therefore, both unlabeled and new-coming data can be cast into a low-dimensional subspace, hence the out-of-sample problem can be naturally solved [7, 9, 10, 16]. For example, MR has extended the regularized least square and SVM to their semisupervised learning extensions, i.e., Laplacian regularized least squares (Lap-RLS) and Laplacian SVM by adding a manifold regularized term. Similarly, Cai et al.  have extended LDA to SDA for semisupervised dimensionality reduction.
It should be noted that the success of semisupervised learning is based on how to utilizing the unlabeled data for characterizing the distribution of labels in data space. Several methods including Locally Linear Reconstruction [11, 12, 20], Local Regression and Global Alignment [13, 14], and Local Spline Regression [18, 19] have been developed to discover the intrinsic manifold structure of data. However, when we do semisupervised classification, the data points lying far away the data manifold are noisy for learning the correct classifier and can deteriorate the classification performance. On the other hand, sampling in real-world applications is usually not uniform. As a result, the sampled data may be imbalanced or with multi-density distribution. None of the aforementioned methods focus on solving the two problems.
In this chapter, we develop an effective semisupervised dimensionality reduction method, i.e., Local and Global Regression (LGR), for face recognition with outliers and imbalanced face data. In order to both handle transductive and inductive learning problems, LGR aims to sufficiently learn the classification function by using all data. In detail, the presented method first extends the original supervised regression term to a supervised loss term and a global regression regularized term, where the loss term is to fix the inconsistency between the predicted labels and initial labels, while the global regression term is to sufficiently learn the classification function using all training data and to obtain the projection matrix for handling out-of-sample problem. Furthermore, to capture the local discriminative information, a set of weighted local classification functions are adopted for each dataset to estimate the labels of its nearby data, where the weight is to reduce the outliers bias and to deal with imbalanced data. Thus, both local and global discriminative information of dataset can be preserved by the proposed LGR method.
The main contributions of this work are as follows: (1) we propose a new effective method for semisupervised dimensionality reduction, which can handle both transductive and inductive learning problems; (2) we develop a graph Laplacian matrix, which can characterize both local geometrical and discriminative information, as well as reduce the bias of outliers and handle imbalanced data; (3) we have also established the connection between the proposed method and other state-of-the-art methods. Theoretical analysis has shown that many popular semi‐supervised methods such as LRGA can be viewed as the special cases of the proposed method. Extensive simulations based on synthetic and real-world datasets verify the effectiveness of the proposed method.
This chapter is organized as follows. In Section 2, the notations and motivations are first given. We then propose our LGR method for both handling transductive and inductive learning problems. Finally, we also establish the connection between the proposed method and other state-of-the-art methods. Section 3 demonstrates the extensive simulations and the final conclusions are drawn in Section 4.
2. The proposed method
2.1. Notation and motivation
In semi-supervised learning, we define
be the data matrix where the first
Most semi-supervised learning methods utilize the Gaussian function based affinity matrix. As point out in references [11, 12], the Gaussian function based affinity matrix is found to be oversensitive to the Gaussian variance; only a slight variation on the variance may affect the results dramatically. Thus, Gaussian function based affinity matrix is not a good method for handling image classification. The method developed should be robust to the parameters.
Second, when carrying out semisupervised classification, the samples lying far away from the data manifold are outliers which may lead to learn an incorrect classifier and deteriorate the classification performance. Considering Figure 1(a and b) as examples, we generalize a two-cycle and two-moon datasets with outliers. Considering the distribution of two data, the ideal decision boundary should lie in the gap between two data sub-manifolds. However, since there are many outliers around the data manifold, these outliers will blur the clear distribution of the whole data and are noisy to learn a correct classifier. Therefore, it is very important to develop a method that can adaptively reduce the effects of outliers.
Third, in real-world applications, sampling is usually not uniform. Consequently, the sampled data can be imbalanced or follows multi-density distribution. Figure 1(c) shows a two-plate dataset with two classes: each class follows a Gaussian distribution but with different cores and density. Obviously, the data points (left data points) in the high-density area will take more important part than those (right data points) in the low-density area when to learn a classifier, which may cause incorrect classification results. The method developed should handle such imbalanced data with multi-density distribution.
The method developed should also solve the out-of-sample problem. To address the above problems, we, in this paper, propose a new semisupervised learning method, which is based on local and global regression.
2.2. Local and global regression
We start from the supervised least-squares regression. The least-square regression is to fix a linear model by regressing
According to Eq. (2), the classification function can be sufficiently learned by using all the predicted labels and to fix to their original labels. In other meaning, the global discriminative information can be preserved by the regression term of Eq. (2). Furthermore, to grasp the local discriminative information, we induce a local regression function for each data sample
However, minimizing the above total errors over all data samples tends to force each local error similar to each other. Given some cases that the dataset includes some outliers, assuming all the local regression errors equally may emphasize the effects from outliers and weaken the effects from normal data. In this section, to weaken the effects from outliers, we add a weight vector for each local data patch
In the following section, we will discuss how to select the weight . Our motivation is to let the weight of local error be large given are the normal data and in the contrast to let the weight be small given is outlier. In detail, to obtain local projection matrix
where ; is the selected matrix satisfying , if
where is the diagonal matrix with the first
Then, we can obtain the optimal projection matrix and bias term by replacing
2.3. Weight selection for bias reduction
In this section, we consider how to select the weights in the proposed method suggested in Section 2.2. Note, our goal of using the weights is to weaken the effects of outliers and the weight should be set to a small value if is an outlier. Then we can make the weight inversely proportional to the distance between and a center , i.e., . Such a center is expected to represent the idea center of data in the neighborhoods of
|1. Initialize as the average center of all data points in the local patch of |
|2. Update for each as and form the weight matrix .|
|3. Update .|
|4. Iterate steps 2 and 3 until no changes. Output.|
Table 1 shows the basic steps of the iterative approach. Following Table 1, the weight at each iteration is updated from the last and the newly updated center is calculated from current . The whole iterations are continued until convergence, so that the weight can be adaptively and iteratively re-weighted to minimize . In addition, as can be seen in simulation of Figure 2, the updated will be adaptively re-weighted to be close to the main center of most data points, while the updated will be weaken if is outliers or be strengthened if is close to the ideal center. We next discuss a theorem to guarantee the convergence of the approach of Table 1.
Based on the lemma in reference  that holds for any two nonzero value, we have
Eq. (14) indicates that the objective function is monotonically decreased in each iteration. Since there is a lower bound in the objective function (), the iterative approach will certainly converge. We thus prove Theorem 1. Finally, by incorporating the weight for reducing the bias for each local regression error into Eq. (4), we can reduce the bias of outliers of data samples.
Here, in order to show the convergence of the approach, we simply show an example in Figure 2(a), where we generalize eight normal data points and two outliers in
2.4. Normalizing graph Laplacian matrix
It can be easily proved that
Specifically, let us consider a data sample
Hence, based on this definition, we have the following theorem:
, we first define the affinity matrix
where each satisfies
Here, for each , we have , where is a column vector by putting each to its global index
The second equation holds as ; hence, the sum of all in each element is equal to 1. Then, following Eq. (18), it indicates is an identity matrix, i.e., . Then based on the above analysis, we can reformulate
which indicates that the sum of each column or row of
Here, it should be noted that if
|1. Determine the weight for each local patch based on Table 1.|
|2. Normalize the weight as in Eq. (14).|
|3. Form local regression regularized term
|4. Form global regression regularized term
|5. Solve the regression problem as in Eq. (8):
and calculate estimated label matrix as in Eq. (9). Output .
|6. Calculate the projection matrix
2.5. Discussion and relative work
In this section, we discuss the relationship of Learning from Local and Global Information (LLGDI) with other state-of-the-art methods including MR, Flexible Manifold Embedding (FME), and Local Regression and Global Alignment (LRGA).
2.5.1. Relationship to manifold regularization (Lap-RLS/L) 
The goal of MR  is to develop a semisupervised learning strategy by extending the original supervised methods, such as RLS and SVM to their semisupervised learning versions, i.e., Laplacian RLS and Laplacian SVM. For example, Lap-RLS/L is to fix a linear model by regressing
However, it can be observed that Lap-RLS/L cannot sufficiently train the classification function due to the utilization of labeled samples, though it uses manifold term as complementary. Hence, the proposed LGR is superior to Lap-RLS/L.
Nie et al. has proposed another unified framework, i.e., FME [7, 10], for semisupervised dimensionality reduction, in which they verify that LLGC, GFHF, and Lap-RLS/L are only special cases in the framework. The basic objective function of FME can be given as
It can be observed that Eq. (22) is almost the same as the objective function of LGR in Eq. (10), when we consider . However, LGR has utilized a weighted and normalized local discriminative Laplacian matrix to preserve manifold and discriminative structure in a dataset. This is a better way than only relying on neighborhood graph.
Recently, Yang et al. has proposed semisupervised transductive learning method, namely, LRGA [13, 14], for multimedia retrieval. They share the similar concept with the proposed method. The basic objective function of LRGA can be given as
It can be noted that LRGA is a special case of LGR when . Therefore, LRGA is only a transductive learning method and cannot handle the out-of-sample problem, while LGR is a transductive and inductive learning method. Another superiority of LGR over LRGA is that LGR has adopted a weighted normalized each local regression term. Thus, as shown in the simulation results, LLGDI can handle outliers and multi-density dataset remarkably.
3. Simulation results
In this section, we will evaluate the proposed LGR based on three synthetic datasets and two real-world datasets.
3.1. Synthetic datasets
In this section, we evaluate the performance of the proposed LGR and SLP for transductive learning. The SLP is an extensive method to GFHF, LLGC, and Random Walk (RW) hence, it is representative. Here, we utilize two-moon and two-cycle datasets in Figure 1(a and b) for evaluation. Figure 4 shows the results of LGR and SLP for transductive learning. From Figure 4, we can see that LGR can achieve better simulation result than SLP, in a way that less data are misclassified in LGR than SLP. This indicates the proposed LGR is robust to the outliers.
We also evaluate the inductive performance of the proposed LGR for handling the out-of-sample problem. Figure 5 shows the gray images of decision surfaces and boundaries learned by LGR, which are formed as follows: for each pixel, we form the its gray value as the difference from each pixel to its nearest labeled data of different classes in the reduced subspace. Here, we set the reduced dimensionality as 1. Then, we form the decision boundaries by the pixels with the value 0. Following Figure 5, we can observe that the proposed LGR can learn clear decision boundary that can well separate two classes, which verifies the effectiveness of LGR for handling the out-of-sample problem.
To show the merit of normalization, we utilize two-plate dataset in Figure 1(c) for evaluation. Our goal is to show LGR can handle multi-density dataset. Figure 6 shows the gray images of decision surfaces and boundaries learned by LGR without normalization and LGR with normalization. From Figure 6, we can observe that LGR without normalization cannot find proper boundary. However, LGR with normalization can achieve better performance, as there are less missing-classified data points separated by the decision boundary, which becomes more distinctive and accurate. The improved results are believed to be due to the fact that normalization can strengthen the local regressions in the low-density region and weaken those in the high-density region. This is proved to be advantageous to be used for multi-density dataset.
3.2. Semisupervised face recognition based on real-world benchmark datasets
For handling the face recognition problem, we use three real-world face datasets to evaluate the performance of methods, which include UMNIST: cannot find the full name , Extended Yale-B , and Massachusetts Institute of Technology Center for Biological and Computational Learning (MIT-CBCL)  datasets. The UMIST dataset is a multi-view face dataset, consisting of 1012 images of 20 peoples, each covering a wide range of poses from profile to frontal views. Therefore, the UMIST has widely been used for general purpose face recognition under different face poses. The size of each image is 112×92 with 256 gray levels per pixel. In our simulation, we down-sample the size of each image to 28×23 and no other preprocessing is performed. The Extended Yale-B dataset contains 16,123 images of 38 human subjects under 9 poses and 64 illumination conditions. Because of the illumination variability, the same object can appear dramatically different even when viewed in fixed pose. Hence, this is another challenge for face recognition, and Extended Yale-B dataset are extensively used for testing appearance-based face recognition methods. Similar to the UMIST dataset, the images are also cropped and resized to 32×32 pixels. This dataset now has around 64 near frontal images under different illuminations per individual. The MIT-CBCL dataset provides 3240 synthetic images rendered from 3D head models of 10 peoples. The head models are generated by fitting a morphable model to the high-resolution training images. Different from UMNIST dataset, the MIT-CBCL dataset is based on the 3D morphable model, which is rendered under varying pose and illumination conditions making the face recognition task more challengeable. The size of each image is originally 200×200 with 256 gray levels per pixel. In our simulation, we down-sample the size of each image to 32×32 and no other preprocessing is performed. The detailed information of dataset and some sampled images of real-world datasets can be shown in Table 3 and Figure 7. For each dataset, we randomly select 10, 50 and 30 samples from each class as training samples for UMNIST, Extended Yale-B, and MIT-CBCL datasets. The test set is then formed by the selected or all remaining samples. The data partitioning for each dataset is also given in Table 3.
Next, we compare our method with other supervised and semisupervised dimension reduction methods. These methods include Regularized Linear discriminant analysis (RLDA), SDA , Lap-RLS/L , least-square solution for solving SDA in Eq. (16) (in Table 1, we refer to it as LS-SDA) , FME [7, 10], and the proposed LGR. Note that Principal Component Analysis (PCA) is an unsupervised method while RLDA is supervised methods, and the remaining methods LGR are all semisupervised methods. The simulation settings are as follows: for SDA, Lap-RLS/L, two parameters, i.e.,
|Dataset||Database Type||#Samples||#Dim||#Class||#Training per Class||#Test per Class|
The average accuracies over 20 random splits with the above parameters for each dataset are shown in Table 4. From the simulation results, we can obtain the following observation: (1) given sufficient labeled samples, all the supervised and semisupervised dimension reduction methods outperform nearest neighborhood classifier due to the utilization of label information and feature extraction; (2) the semisupervised dimension reduction methods are better than the corresponding supervised methods. For example, SDA outperforms RLDA by about 5–6% in COIL100 dataset with two labeled samples per class. For other datasets, it can outperform by 2–3%. This indicates that by incorporating the unlabeled set into the training procedure, the classification performance can be markedly improved, as the manifold structure embedded in the dataset is preserved; (3) we also observe that both SDA and the least-square solution in Table 1 can achieve the same classification results due to the reason as analyzed in Section 3; (4) the proposed LGR can deliver better accuracies than those delivered by other semisupervised dimension reduction methods such as SDA and Lap-RLS/L by about 3–4% in most datasets. The improvement can even achieve almost 8% in ETH80 dataset with two labeled samples per class. The improvement is believed to be true that LGR aims to characterize both local and global discriminative information embedded in dataset, which is better to handle classification problem; (5) we observe that LGR outperform FME by about 2% in most cases. The main reason is that LGR has utilized a weighted normalized local discriminative Laplacian matrix to preserve both manifold and discriminative structures in dataset, which is better than only relying on neighborhood graph.
|Dataset||Method||4 labeled samples per class||7 labeled samples per class||10 labeled samples per class|
|Dataset||Method||4 labeled samples per class||7 labeled samples per class||10 labeled samples per class|
|Dataset||Method||4 labeled samples per class||7 labeled samples per class||10 labeled samples per class|
In this chapter, we propose a semisupervised method, namely LGR, for face recognition. With the above analysis, the following conclusions can be drawn: (1) the proposed LGR can achieve better results in face recognition than those delivered by other state-of-the-art methods as more discriminative information are captured based on local and global regressions, (2) the proposed LGR is robust to outliers and can handle the imbalanced data, and (3) the proposed LGR can deal with out-of-sample extrapolation to estimate the labels of new-coming face data by casting it to the global projection matrix.
In order to prove that
Proof. First, it can be easily noted that , which is verified as follows:
Then, we have
The second equation holds as , for any matrix
We neglect the proofs of Lemmas 2 and 3 as they can be seen in reference . Then with Lemmas 1–3, we can easily prove Theorem 2 as follows:
is also a positive semidefinite matrix. In addition, for each , we have and
which indicates that the sum of each row or column of
M. Belkin, P. Niyogi, V. Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled samples. Journal of Machine Learning Research, 7:2399–2434, 2006.
D. Cai, X. He, J. Han. Semi-supervised discriminant analysis. IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, IEEE, 1–7, 2007.
X. Zhu, Z. Ghahramani, J. D. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of ICML, Washington DC, USA, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003.
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, B. Scholkopf. Learning with local and global consistency. In Proceedings of NIPS, Vancouver, Canada, Massachusetts Institute of Technology Press, Cambridge, MA, USA, 2004.
M. Szummer, T. Jaakkola. Patially labeled classification with Markov random walks. In Proceedings of NIPS, Vancouver, Canada, Massachusetts Institute of Technology Press, Cambridge, MA, USA, 2002.
F. Nie, H. Huang, X. Cai, C. Ding. Efficient and robust feature selection via joint L21-norms minimization. In Proceedings of NIPS, Vancouver, Canada, Massachusetts Institute of Technology Press, Cambridge, MA, USA 2010.
F. Nie, D. Xu, I. W. H. Tsang, C. Zhang. Flexible Manifold Embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Transactions on Image Processing, 19(7):1921–1932, 2010.
F. Nie, S. Xiang, Y. Liu, C. Zhang. A general graph based semi-supervised learning with novel class discovery. Neural Computing and Application, 19(4):549–555, 2010.
F. Nie, D. Xu, X. Li, S. Xiang. Semi-supervised dimensionality reduction and classification through virtual label regression. IEEE Transactions on Systems, Man and Cybernetics, Part B, 41(3):675–685, 2011.
F. Nie, D. Xu, I. W. H. Tsang, C. Zhang. A flexible and effective linearization method for subspace learning. Graph Embedding for Pattern Analysis, 177–203, Yun Fu, Yunqian Ma, Eds. Springer, New York, 2013.
F. Wang, C. Zhang. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 20(1):55–67, 2008.
J. Wang, F. Wang, C. Zhang, H. C. Shen, L. Quan. Linear neighborhood propagation and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9):1600–1615, 2009.
Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, Y. Pan. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5):723–742, 2012.
Y. Yang, D. Xu, F. Nie, J. Luo, Y. Zhuang. Ranking with local regression and global alignment for cross medial retrieval. In Proceedings of MM, Beijing, China, ACM New York, NY, USA, 2009.
Y. Yang, F. Nie, S. Xiang, Y. Zhuang, W. Wang. Image clustering using local discriminant models and global integration. IEEE Transactions on Image Processing, 19(10):2761–2773, 2010.
D. Wang, F. Nie, H. Huang. Large-scale adaptive semi-supervised learning via unified inductive and transductive model. In Proceeding of KDD, New York, NY, USA, ACM New York, NY, USA, 2014.
X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang. Face recognition using Laplacian faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):328–340, 2005.
S. Xiang, F. Nie, C. Zhang. Semi-supervised classification via local spline regression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(11):2039–2053, 2010.
S. Xiang, F. Nie, C. Zhang. Nonlinear dimensionality reduction with local spline embedding. IEEE Transactions on Knowledge and Data Engineering, 21(9):1285–1298, 2009.
S. T. Roweis, L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 209: 2323–2326, 2000.
S. A. Nene, S. K. Nayar, H. Murase. Columbia object image library (COIL-100). Technical Report CUCS-005-96, Columbia University, New York, NY, 1996.
B. Leibe, B. Schiele. Analyzing appearance and contour based methods for object categorization. In Proceedings of CVPR, Madison, Wisconsin, USA, IEEE, 2003.
R. Johnson, T. Zhang. On the effectiveness of Laplacian normalization for graph semi-supervised learning. Journal of Machine Learning Research, 8:1489–1517, 2007.
D. B. Graham, N. M. Allinson. Characterizing virtual eigensignatures for general purpose face recognition in face recognition: from theory to application. NATO ASI Series F, Computer and Systems Sciences, 163:446–456, 1998.
K. C. Lee, J. Ho, D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):947–963, 2005.
B. Weyrauch, J. Huang, B. Heisele, V. Blanz. Component-based Face Recognition with 3D Morphable Models. First IEEE Workshop on Face Processing in Video, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington DC, USA, IEEE, 2004.
M. Zhao, Z. Zhang, T. W. S. Chow, Trace ratio criterion based generalized discriminative information for semi-supervised dimensionality reduction. Pattern Recognition, 45(4):1482–1499, 2012.
M. Zhao, Z. Zhang, H. Zhang. Learning from local and global discriminative information for semi-supervised dimensionality reduction. The International Joint Conference on Neural Networks (IJCNN), 1–8, Dallas, TX, USA, IEEE, 2013.
M. Zhao, Z. Zhang, T. W. S. Chow, B. Li. Soft label based linear discriminant analysis for image recognition and retrieval. Computer Vision and Image Understanding, 121:86–99, 2014.
M. Zhao, Z. Zhang, T. W. S. Chow, B. Li. A general soft label based linear discriminant analysis for semi-supervised dimension reduction. Neural Networks, 55:83–97, 2014.
M. Zhao, T. W. S. Chow, Z. Zhang, B. Li. Automatic image annotation via compact graph based semi-supervised learning. Knowledge Based Systems, 76:148–165, 2015.
M. Zhao, C. Zhan, Z. Wu, P. Tang. Semi-supervised image classification based on local and global regression. IEEE Signal Processing Letters, 22(10):1666–1670, 2015.
M. Zhao, T. W. S. Chow, Z. Wu, Z. Zhang, B. Li. Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction. Information Sciences, 324(10):286–309, 2015.