Recent Advances of Manifold Regularization Recent Advances of Manifold Regularization

Semi-supervised learning (SSL) that can make use of a small number of labeled data with a large number of unlabeled data to produce significant improvement in learning performance has been received considerable attention. Manifold regularization is one of the most popular works that exploits the geometry of the probability distribution that gener-ates the data and incorporates them as regularization terms. There are many representa- tive works of manifold regularization including Laplacian regularization (LapR), Hessian regularization (HesR) and p -Laplacian regularization (pLapR). Based on the manifold regularization framework, many extensions and applications have been reported. In the chapter, we review the LapR and HesR, and we introduce an approximation algorithm of graph p -Laplacian. We study several extensions of this framework for pairwise constraint, p -Laplacian learning, hypergraph learning, etc.


Introduction
In practical applications, it is generally laborious to obtain the labeled samples, though vast amounts of unlabeled samples are easily achieved and provide auxiliary information. Semisupervised learning (SSL), which takes the full advantages of unlabeled data, is specifically designed to improve learning performance. In representative semi-supervised learning algorithms, it is usually assumed that the intrinsic geometry of the data distribution is supported on the low-dimensional manifold.
The popular manifold learning methods include principal components analysis (PCA), multidimensional scaling (MDS) [1,2], generative topological mapping (GTM) [3], locally linear embedding (LLE) [4], ISOMAP [5], Laplacian eigenmaps (LE) [6], Hessian eigenmaps (HLLE) [7], and local tangent space alignment (LTSA) [8]. PCA aims to find the lowdimensional linear subspace which captures the maximum proportion of the variation within the data. MDS aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. GTM can be seen as a nonlinear form of principal component analysis or factor analysis. LLE assumes a given sample can be reconstructed by its neighbors, represents the local geometry and then seeks a low-dimensional embedding. ISOMAP incorporates the geodesic distances imposed by a weighted graph. LE preserves neighbor relations of pairwise samples by manipulations on an undirected weighted graph. HLLE obtains the final low-dimensional representations by applying eigenanalysis to a matrix, which is built by estimating the Hessian over neighborhood. LTSA [8] exploits the local tangent information as a representation of the local geometry, and this local tangent information is then aligned to provide a global coordinate. Regularization is a key idea in the theory of splines [9] and is widely used in machine learning [10] (e.g., support vector machines). In 2006, Belkin et al. [11] proposed the manifold regularization framework by introducing a new regularization term to exploit the geometry of the probability distribution. Based on this framework, many successful manifold regularized semi-supervised learning (MRSSL) algorithms have been reported.
Laplacian regularization (LapR) [11,12] is one prominent manifold regularization-based SSL algorithm, which approximates the manifold by using the graph Laplacian. Putting the simple calculation and prominent performance together, the LapR-based SSL algorithms have been widely used in many applications. Liu et al. [13] introduced Laplacian regularization for local structure preserving and proposed manifold regularized kernel logistic regression (KLR) for web image annotation. Luo et al. [14] employed manifold regularization to smooth the functions along the data manifold for multitask learning. Ma et al. [15] proposed a local structure preserving method that effectively integrates Laplacian regularization and pairwise constraints for human action recognition. Hu et al. [16] introduced graph Laplacian regularization for joint denoising and superresolution of generalized piecewise smooth images.
Hessian regularization [17] (HesR) has attracted considerable attentions and has shown empirically to perform well in practical problems [18][19][20][21][22][23][24][25][26]. Liu et al. [27] incorporated both Hessian regularization and sparsity constraints into auto-encoders and proposed a new auto-encoder algorithm called Hessian regularized sparse auto-encoders (HSAE). Liu et al. [28] proposed multi-view Hessian regularized logistic regression for action recognition. While the null space of the graph Laplacian along the underlying manifold is a constant function, HesR steers the learned function varying linearly in reference to the geodesic distance. In result, HesR can be more accurate to describe the underlying manifold of data and achieves the better learning performance than LapR-based ones [18]. However, the stability of Hessian estimation depends mostly on the quality of the local fit for each data point, which leads to inaccurate estimation particularly when the function is heavily oscillating [17].
As a nonlinear generalization of the standard graph Laplacian, discrete p-Laplacian has been well studied in mathematics community and solid properties have been investigated by previous work [29,30]. Meanwhile, graph p-Laplacian has been proved having the advantages for exploiting the manifold of data distribution. Bühler et al. [31] provided a rigorous proof of the approximation of the second eigenvector of p-Laplacian to the Cheeger cut which indicates the superiority of graph p-Laplacian in local geometry exploiting. Luo et al. [32] proposed full eigenvector analysis of p-Laplacian and obtain a natural global embedding for multi-class clustering problems, instead of using greedy search strategy implemented by previous researchers. Liu et al. [33] proposed p-Laplacian regularized sparse coding for human activity recognition.
In this chapter, we first present some related work, and then introduce several extensions based on the manifold regularization framework. Specifically, we present the approximation of graph p-Laplacian and the p-Laplacian regularization framework.
Notations: We present some notations that will be used throughout this chapter. We use L 00 as the novel graph Laplacian constructed by the traditional graph Laplacian L and the side information. L p , L hp p and L represent the graph p-Laplacian, hypergraph p-Laplacian and ensemble graph p-Laplacian, respectively.

Related works
This section reviews some related works on manifold regularization, pairwise constraints and hypergraph learning.

Manifold regularization framework
In semi-supervised learning, assume that N training samples X containing l labeled samples É l i¼1 and u unlabeled samples x j À Á È É lþu j¼lþ1 are available. The labeled samples are pairs generated from probability distribution, while unlabeled samples are simply drawn according to the marginal distribution. To utilize marginal distribution induced by unlabeled samples, we assume that if two points x 1 , x 2 are close in the intrinsic geometry of marginal distribution, then the labels of x 1 and x 2 are similar.
Manifold regularized method introduces appropriate penalty term ∥f ∥ 2 I À Á and reproducing kernel Hilbert spaces (RKHS) norm ∥f ∥ 2 K À Á that is used to control the complexity of the intrinsic geometric structure of the function and the complexity of the classification model, respectively. By incorporating two regularization terms, the standard framework aims to minimize the following function: where V is some loss function, such as the hinge loss function max 0; 1 À y i f x i ð Þ Â Ã for support vector machines (SVM). The parameters Υ A and Υ I balance the loss function and two regularization terms. For semi-supervised learning, the manifold regularization term f k k 2 I is a key to smooth function along the manifold estimated from the unlabeled samples.

Pairwise constraints
Pairwise constraints (side information) [34,35] is a type of supervised information that specify whether a pair of data samples belong to the same class (must-link constraints) or different classes (cannot-link constraints). Compared with class labels, pairwise constraints can provide us weak and more general supervised information. Currently, it has been widely used in semisupervised clustering [36,37], distance metric learning [38], feature selection [39] and dimension reduction [40,41].
É be the pairwise must-link constraints set and C ¼ x i ; x j À Á È É be the pairwise cannot-link constraints set, that is, Defined on the pairwise must-link constraint set and the cannot-link constraint set, we construct similarity matrices S M and S C , respectively: Then, the must-link Laplacian matrix L M is given by L M ¼ D M À S M , and the cannot-link Laplacian matrix L C is given by L C ¼ D C À S C . Where D M and D C are two diagonal matrices with D M ii ¼ P n j¼1 S M ij and D C ii ¼ P n j¼1 S C ij , respectively.
Ding et al. [42] introduced pairwise constraints into spectral clustering algorithm. Especially, they revised the distances between sample points by the distance matrix D, Kalakech et al. [43] developed a semi-supervised constraint score by using both pairwise constraints and local properties of the unlabeled data.
Luo et al. [44] denoted the training set with side information by , where y ij ¼ AE1 indicates x i and x j are similar or dissimilar. The side information was utilized by denoting the loss function y ij 1 À ∥x i À x j ∥ 2 Am h i , where A m is the metric in the m'th heterogeneous domain.

Hypergraph learning
Hypergraph [45] is a generalization of a simple graph. Compared with simple graphs, a hypergraph illustrates the complex relationship by hyperedges that connect three or more vertices (see in Figure 1). Thus, the hypergraph contains more local structure information in comparison to simple graph. Hypergraph has been widely used in image classification [46], ranking [47] and video segmentation [48].
Then, we denote Dv as the diagonal matrices consisting of vertex degree, D e as the diagonal degree matrices of each hyperedge and W as the diagonal matrix of edge weights. Then, the hypergraph Laplacian can be defined.
A number of different methods have been used in the literature to build the graph Laplacian of hypergraphs. The first category includes star expansion [49], clique expansion [49], Rodriquez's Laplacian [50], etc. These methods aim to construct a simple graph from the original hypergraph, and then partitioning the vertices by spectral clustering techniques. The second category of approaches defines a hypergraph Laplacian using analogies from the simple graph Laplacian. Representative methods in this category include Bolla's Laplacian [51], Zhou' normalized Laplacian [52], etc. According to [52], the normalized hypergraph Laplacian L hp is defined as It is worth noting that L hp is positive semi-definite. The adjacency matrix of hypergraph W hp can be formulated as follows: For a simple graph, the edge degree matrix D e is replaced by 2I. Thus, the standard graph Laplacian is

LapR-based SSL
Laplacian regularization is one of most prominent manifold regularization methods that utilizes the graph Laplacian matrix to characterize the manifold structure. In this section, we introduce the traditional Laplacian support vector machines (LapSVM) and Laplacian kernel least squares (LapKLS) as examples of Laplacian regularization algorithms. Then, we extend the algorithms by building the novel graph Laplacian L 00 which combines the traditional graph Laplacian L with the side information to boost locality preservation.

LapSVM and LapKLS
As previously mentioned, the manifold regularization framework is built by Eq. (1). The traditional LapSVM solves this optimization problem with the hinge loss function According to the representer theorem, the solution of the above problem can be expressed as below: where K is the kernel function. Therefore, we rewrite the objective function as By employing the least square loss in Eq. (10), we can present the locality preserved kernel least squares model defined in Eq. (11) as follows Taking the derivation to the objective functions, we can get the solution of α.

Pairwise constraints-combined manifold regularization
Assume that samples with the similar features tend to have the similar class labels, combining the Laplacian regularization and pairwise constraints is a good way to exploit the local structure and boost the classification results. Therefore, we introduce the pairwise constraints into traditional LapR. Particularly, we introduce three combination strategies based on experiences. Finally, we present the locality preserved support vector machines and kernel least squares respectively.
According to the definition, we can compute the must-link Laplacian matrix L M and the cannot-link Laplacian matrix L C . The first two forms of the combination are defined on the traditional graph Laplacian L and must-link constraints and can be written as and respectively, where α is the parameter to balance the weight between the two types of Laplacian matrices.
Based on the cannot-link constraints C, we can compute the similarity matrix S as The third form of the combination is defined on the traditional graph Laplacian and pairwise cannot-link constraints and can be written as Actually, there are other combination strategies using both the must-link and cannot-link constraints to get a better result than traditional methods. However, the performance is no better than the result using one only from the experiences. Therefore, we just put these three proposed graph Laplacian into practice.
Introducing the novel graph Laplacian L 00 to SVM, we rewrite the learning model as follows: According to the representer theorem, the solution of the above problem can be expressed as below: Therefore, we rewrite the objective function as By employing the least square loss in Eq. (17), we can present the locality preserved kernel least squares model defined in Eq. (18) as follows We compare our proposed local structure preserving algorithms with the traditional wellknown Laplacian algorithms on CAS-YNU-MHAD dataset [53]. CAS-YNU-MHAD dataset contains 10 human actions including jumping up, jumping forward, running, walking S, walking quickly, walking, standing up, sitting down, lying down and typing. Figure 2 shows the examples. In experiments, we choose the data from four sensors (be placed in the right shoulder, left forearm, left hand and spine) to construct multi-view features. Ninety percent data of per action are randomly selected as the training data, and the rest for testing.
In semi-supervised classification experiments, we randomly select a certain percentage (10, 20, 30, 50%) samples of training data as labeled data. All the classification methods are measured by the average precision (AP) [54] based on the testing data. Note that the supervised information (labeled information and side information) are randomly selected from training set. To avoid any potential bias induced by data selecting, the above process is repeated for five times.
For the first two proposed algorithms using the must-link constraints, we first determine the parameter α which balances the traditional graph Laplacian and the must-link Laplacian matrix. The parameter α of novel methods is tuned from the candidate set e i ji ¼ À10, À 9, À 8, ⋯, 10 È É through cross-validation. In addition, the regularization parameters Υ A ,Υ I are chosen from 10 À8 ; ; 10 À7 ; ; 10 À6 ; ⋯; ; 10 6 ; ; 10 7 ; ; 10 8 È É through cross-validation on the training data. We verify the AP performance to select the proper parameters. Note that the parameter α may be different for the same classifier to get the best performance under the different proportion of side  information. In results, the legend NewLapKLS-1 represents the kernel least squares classifier using algorithm L 00 ¼ L L M þ αΙ À Á , NewLapSVM-2 stands for the support vector machines classifier using algorithm L 00 ¼ L þ αL M , and so on. Figure 3 shows the classification results achieved by KLS and SVM classifiers under the 10% labeled samples. We can see two main points. First, our proposed three local structure preserving algorithms with pairwise constraints usually get the overall better performances than the well-known semi-supervised methods (LapKLS and LapSVM) without side information. Second, we can clearly see, in most cases, the results gradually become better with the increase of side information. From Figures 4-6, we can get the analogous observations for our proposed  methods compared with their counterparts. These observations indicate that our proposed learning model can better explore and exploit the local structure by taking advantage of the geometrical structure information in the pairwise constraints and manifold regularization. What we can note is that the classification results have slight fluctuation with more side information when the number of class labels is large. These observations suggest it is critical to select parameters for our proposed methods.
To investigate whether the single action of CAS-YNU-MHAD can get the outperformance, we choose jumping up as an example in Figure 7. We can find that, our proposed algorithm consistently performs better than the previous algorithm without side information. Especially, we can see, the classification result can get a significant development when the number of labeled samples is limited.

HesR-based SSL
Although LapR has received extensive attention, it is observed that the null space of the graph Laplacian along the underlying manifold is a constant function that possibly results in poor generalization. In contrast to Laplacian, Hessian can properly exploit the intrinsic local geometry of the data manifold. In recent works [23][24][25][26]28], HesR based SSL algorithms have been proved to achieve better performance than LapR based ones.
Hessian matrix can be computed by the following four steps.
Step 1: Neighborhood construction. Using k-neighborhood to define neighbors in Euclidean distance for each input point x i , we get neighborhood matrix N i .
Step 2: Create local tangent coordinates. Conduct singular value decomposition on neighborhood matrix N i ¼ UDV. The first d columns of V (V i ¼ v 1 ; v 2 ; …; v d ½ ) mean the tangent coordinates of data points x i . Step 3: Build local Hessian estimator. Apply Gram-Schmidt procedure on the matrix 1; V i ; Q i ½ with the first column is a vector of ones, columns to get b M k i . Then taking the last m m þ 1 ð Þ=2 columns of b M k i as H i .
Step 4: Construct Hessian matrix H. A symmetric matrix H is constructed with the entry The HesR model can be expressed in: Hessian has been widely utilized in improving the SSL classification performance. Liu et al. [18] present multi-view Hessian discriminative sparse coding (mHDSC) which seamlessly integrates Hessian regularization with discriminative sparse coding for multi-view learning problems. In [24], HesR was employed into support vector machine to boost the classifier. In [19], HesR was integrated into multi-view learning for image annotation, extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of HesR by comparing it with LapR.

pLapR-based SSL
Although the p-Laplacian has nice theoretical foundations, it is still a strenuous work to approximate graph p-Laplacian, which extremely limits the applications of p-Laplacian regularization. In this section, we provide an effect and efficient fully approximation of graph p-Laplacian, which significantly lows down the computation cost. Then we integrate the approximated graph p-Laplacian into manifold regularization framework and develop p-Laplacian regularization. Based on the pLapR, several extended algorithms were proposed.

pLapR
The graph p-Laplacian is approximated by getting all eigenvectors and eigenvalues of p-Laplacian [55]. Assume that f * 1 , f * 2 , ⋯, f * K are K eigenvectors of p-Laplacian Δ w p associated with unique eigenvalues λ * 1 , λ * 2 , ⋯, λ * K . Luo et al. [32] introduced an approximation for full eigenvectors of p-Laplacian by solving the following p-Laplacian embedding problem: Solving the Eq. (20) with the gradient descend optimization, we can then obtain the full eigenvalues . Finally, the graph p-Laplacian approximated by L p ¼ F * ΛF * T .
We introduce the approximation graph p-Laplacian into a regularizer to exploit the intrinsic local geometry of the data manifold. Therefore, in p-Laplacian regularization framework, the optimization problem in Eq.
(1) becomes Here, L p is the graph p-Laplacian.
The proposed pLapR can be applied to variant MRSSL-based applications with different choices of loss function. Here, we apply pLapR to support vector machines (SVM) and kernel least squares (KLS) as examples.
Applying the hinge loss function in p-Laplacian learning, the p-Laplacian support vector machines (pLapSVM) solves the following optimization problem: The representer theorem has been proved exist and has the general form in Eq. (16). Hence the optimization problem (21) can be expressed as We outline the KLS with p-Laplacian regularization. For p-Laplacian kernel least squares (pLapKLS), it solves the following optimization problem To evaluate the effectiveness of the proposed pLapR, we apply pLapSVM and pLapKLS to scene recognition on the Scene 67 database [56] and Scene 15 data set [57]. Figure 8 illustrates the framework of pLapR for scene recognition.
The Scene 67 data set contains 15,620 indoor scene images collected from different sources including online image search tools, online photo sharing sites and the LabelMe dataset. Particularly, these images can be categorized into 67 classes covering 5 big scene groups (i.e., stores, home, public spaces, leisure and working place). Some example images are shown in Figure 9.
Scene 15 data set is composed of 15 scene categories, totally 4485 images. Each category has 200-400 images. The images contain not only indoor scenes, such as living room, kitchen, and store, but also outdoor scenes, such as forest, mountain, tall building, open country, and so on (see in Figure 10).
For Scene 67 dataset, we randomly select 80 images of each class to form the training set and the rest as testing set. For Scene 15 dataset, 100 images per class are randomly selected as the training set, and the rest for testing. In semi-supervised experiments, a certain percentage (10, 20, 30, 50%) samples of training set are randomly assigned as labeled data. To avoid any bias introduced by the random partitioning of samples, the above assignment is carried out for five times independently.  The regularization parameters that is, γ A and γ I are tuned from the candidate set 10 i ji ¼ À10, È À9, À 8, ⋯, 10g and the parameter p for pLapR from the candidate set 1; 1:1; 1:2; ⋯; 3 f g through cross-validation on the training data with 10% labeled sample, respectively. The performance is measured by the average precision (AP) for single class and mean average precision (mAP) for overall classes. Firstly, we show the mAP boxplot of the pLapR on Scene 67 dataset when p ¼ 2 and the standard LapR for comparison in Figure 11. We can clearly see that the performance of pLapR with p ¼ 2 is similar to standard LapR, which demonstrates that the graph p-Laplacian with p ¼ 2 becomes the standard graph Laplacian. Figure 12 illustrates the performance of pLapKLS with different p values. The upper subfigure is the performance of the Scene 67 database. We observe that the best performance of indoor scene classification on the Scene 67 dataset can be obtained with p ¼ 1:1. The lower subfigure is the performance of the Scene 15 database and the best performance is achieved when p = 1.  Then we evaluate the performance of the pLapR with the representative LapR and HesR. Figure 13 and Figure 14 show the mAP performance on Scene 67 data set and Scene 15 data set, respectively. The four subfigures of upper row are KLS methods, and the lower four ones are SVM methods. From the results of two data sets, we can see that the pLapR outperforms both LapR and HesR especially when only a small number of samples labeled.  KLS methods, and the lower four ones are SVM methods. In each subfigure, the y-axis is the AP results and the x-axis is the number of labeled samples. From the AP results, we can find that, in most cases, the pLapR performs better than the traditional methods including LapR and HesR ( Figure 15).

Hypergraph p-Laplacian (HpLapR)
In this subsection, we propose a hypergraph p-Laplacian regularized method for image recognition. The hypergraph and p-Laplacian [31,58,59] both provide convincing theoretical evidence to better preserve the local structure of data. However, the computation of hypergraph p-Laplacian is difficult. We provide an effect and efficient approximation algorithm of hypergraph p-Laplacian. Considering the higher order relationship of samples, the hypergraph p-Laplacian regularizer is built for preserving local structures. Hypergraph p-Laplacian regularization (HpLapR) is also introduced to logistic regression for remote sensing image recognition.
Assume that hypergraph p-Laplacian has n eigenvectors F * hp ¼ f * hp1 ; ; f * hp2 ; ⋯; ; f * hpn associated with unique eigenvalues λ * hp ¼ λ * hp 1 ; λ * hp 2 ; ⋯; λ * hp n , we compute the approximation of hypergraph p-Laplacian L hp p by L hp p ¼ F * hp λ * hp F * hp T . Thus, it is important to obtain all eigenvectors and eigenvalues of hypergraph p-Laplacian.
Although a complete analysis of hypergraph p-Laplacian is challenging, we can easily generate a hypergraph with a group of hyperedges [52]. In detail, we construct hypergraph Laplacian L hp and compute adjacency matrix W hp by Eq. (5) and Eq. (6), respectively.
Following the study on plapR [31,55], eigenvalue and the corresponding eigenvector on hypergraph p-Laplacian can be computed by the following hypergraph p-Laplacian embedding problem: min F hp J E F hp À Á ¼ X k as testing samples. For hypergraph construction, we regard each sample in the training set as a vertex, and generate a hyperedge for each vertex with its k nearest neighbors (so the hyperedge connects k þ 1 samples) [62]. It is worthy to notice that, for our experiments, the kNN-based hyperedges generating method is implemented only in six groups, not in the overall training samples. For example, for a sample of baseball diamond, the vertices of the corresponding hyperedge are chosen from the first group (baseball diamond, golf course and tennis courts) of Figure 17. The setting of class labels is as same as pLapR.
We conduct the experiments on the data set to obtain the proper modal parameters. The neighborhood size k of a hypergraph varies in a range 5; 6; 7; ⋯; 15 f g through cross-validation. The setting of regularization parameters γ A ,γ I and p are as same as pLapR experiments.   Figure 18 illustrates the mAP performance of pLapR and HpLapR on the validation set when p varies. The x-axis is the parameter p and the y-axis is mAP for performance measure. We can see that the best mAP performance for pLapR can be obtained when p ¼ 2:3, while the best performance of HpLapR is achieved when p = 2.6.
We compare our proposed HpLapR with the representative LapR, HLapR and pLapR. From Figure 19, we can observe that, HpLapR outperforms other methods especially when only a small number of samples are labeled. This suggests that our proposed method has the superiority to preserve the local structure of the data because it integrates hypergraph learning with graph p-Laplacian. To evaluate the effectiveness of HpLapR for single class, Figure 20 shows the AP results of different methods on several land-use classes including beach, dense residential, freeway and tennis court. From Figure 20, we can find that, in most cases, HpLapR performs better than both pLapR and HLapR, while pLapR and HLapR consistently outperforms than LapR.

Ensemble p-Laplacian regularization (EpLapR)
As a natural nonlinear generalization of graph Laplacian, p-Laplacian has been proved having the rich theoretical foundations to better preserve the local structure. However, it is difficult to determine the fitting graph p-Lapalcian, that is, the parameter p that is a critical factor for the performance of graph p-Laplacian. In this section, we develop an ensemble p-Laplacian regularization to fully approximate the intrinsic manifold of the data distribution. EpLapR incorporates multiple graphs into a regularization term in order to sufficiently explore the complementation of graph p-Laplacian. Specifically, we construct a fused graph by introducing an optimization approach to assign suitable weights on different p-value graphs. Then, we conduct semi-supervised learning framework on the fused graph.
Assume a set of candidate graph p-Laplacian L p 1 ; ⋯; L p m È É , according to the manifold regularization framework, the proposed EpLapR can be written as the following optimization problem: where L is the optimal fused graph with L ¼ P m k¼1 μ k L p k , s:t: To avoid the parameter μ k overfitting to one graph [63], we make a relaxation by changing μ k to μ γ k , and obtain the optimization problem as:.
The representor theorem presents us with the existence and the general form of Eq. (16) under a fixed μ. Therefore, we rewrite the objective function as s:t: Here, an alternating optimization procedure is utilized to minimize f * .
We compare EpLapR with other local structure preserving algorithms including LapR, HesR and pLapRon UC-Merced data set. We apply the support vector machines and kernel least squares for remote sensing image classification.
We compare our proposed EpLapR with the representative LapR, HesR and pLapR. Figures 21 and 22 demonstrate the mAP results of different algorithms on KLS methods and SVM methods, respectively. We can see that, in most cases, the EpLapR outperforms LapR, HesR and pLapR, which shows the advantages of EpLapR in local structure of preserving.

Conclusions
In this chapter, we show the LapR, HesR, pLapR and present several extensions based on the manifold regularization framework. We propose a local structure preserving method that effectively integrates manifold regularization and pairwise constraints. We develop an efficient approximation algorithm of graph p-Laplacian and propose p-Laplacian regularization to preserve the local geometry. Considering the hypergraph contains more local grouping information in comparison to simple graph, we propose hypergraph p-Laplacian regularization to preserve the geometry of the probability distribution. In practical application of p-Laplacian regularization model, it is difficult to determine the optimal graph p-Lapalcian because the parameter p usually chose by cross validation method which lacks the ability to approximate the optimal solution. Therefore, we propose an ensemble p-Laplacian regularization to better approximate the geometry of the data distribution.

Expectations
In the general image recognition, images are naturally represented by multi-view features, such as color, shape and texture. Each view of a feature summarizes a specific characteristic of the image, and features for different views are complementary to one another. Therefore, in the future work, we will study the multi-view p-Laplacian regularization to effectively explore the complementary properties of different features from different views. Meanwhile, we will try to combine the p-Laplacian learning with the deep learning to get a more effective p-Laplacian learning algorithm.