Open access peer-reviewed chapter

Recent Advances of Manifold Regularization

By Xueqi Ma and Weifeng Liu

Submitted: April 19th 2018Reviewed: June 8th 2018Published: November 5th 2018

DOI: 10.5772/intechopen.79383

Downloaded: 306

Abstract

Semi-supervised learning (SSL) that can make use of a small number of labeled data with a large number of unlabeled data to produce significant improvement in learning performance has been received considerable attention. Manifold regularization is one of the most popular works that exploits the geometry of the probability distribution that generates the data and incorporates them as regularization terms. There are many representative works of manifold regularization including Laplacian regularization (LapR), Hessian regularization (HesR) and p-Laplacian regularization (pLapR). Based on the manifold regularization framework, many extensions and applications have been reported. In the chapter, we review the LapR and HesR, and we introduce an approximation algorithm of graph p-Laplacian. We study several extensions of this framework for pairwise constraint, p-Laplacian learning, hypergraph learning, etc.

Keywords

  • Laplacian regularization
  • Hessian regularization
  • p-Laplacian regularization
  • semi-supervised learning
  • manifold learning

1. Introduction

In practical applications, it is generally laborious to obtain the labeled samples, though vast amounts of unlabeled samples are easily achieved and provide auxiliary information. Semi-supervised learning (SSL), which takes the full advantages of unlabeled data, is specifically designed to improve learning performance. In representative semi-supervised learning algorithms, it is usually assumed that the intrinsic geometry of the data distribution is supported on the low-dimensional manifold.

The popular manifold learning methods include principal components analysis (PCA), multidimensional scaling (MDS) [1, 2], generative topological mapping (GTM) [3], locally linear embedding (LLE) [4], ISOMAP [5], Laplacian eigenmaps (LE) [6], Hessian eigenmaps (HLLE) [7], and local tangent space alignment (LTSA) [8]. PCA aims to find the low-dimensional linear subspace which captures the maximum proportion of the variation within the data. MDS aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. GTM can be seen as a nonlinear form of principal component analysis or factor analysis. LLE assumes a given sample can be reconstructed by its neighbors, represents the local geometry and then seeks a low-dimensional embedding. ISOMAP incorporates the geodesic distances imposed by a weighted graph. LE preserves neighbor relations of pairwise samples by manipulations on an undirected weighted graph. HLLE obtains the final low-dimensional representations by applying eigenanalysis to a matrix, which is built by estimating the Hessian over neighborhood. LTSA [8] exploits the local tangent information as a representation of the local geometry, and this local tangent information is then aligned to provide a global coordinate. Regularization is a key idea in the theory of splines [9] and is widely used in machine learning [10] (e.g., support vector machines). In 2006, Belkin et al. [11] proposed the manifold regularization framework by introducing a new regularization term to exploit the geometry of the probability distribution. Based on this framework, many successful manifold regularized semi-supervised learning (MRSSL) algorithms have been reported.

Laplacian regularization (LapR) [11, 12] is one prominent manifold regularization-based SSL algorithm, which approximates the manifold by using the graph Laplacian. Putting the simple calculation and prominent performance together, the LapR-based SSL algorithms have been widely used in many applications. Liu et al. [13] introduced Laplacian regularization for local structure preserving and proposed manifold regularized kernel logistic regression (KLR) for web image annotation. Luo et al. [14] employed manifold regularization to smooth the functions along the data manifold for multitask learning. Ma et al. [15] proposed a local structure preserving method that effectively integrates Laplacian regularization and pairwise constraints for human action recognition. Hu et al. [16] introduced graph Laplacian regularization for joint denoising and superresolution of generalized piecewise smooth images.

Hessian regularization [17] (HesR) has attracted considerable attentions and has shown empirically to perform well in practical problems [18, 19, 20, 21, 22, 23, 24, 25, 26]. Liu et al. [27] incorporated both Hessian regularization and sparsity constraints into auto-encoders and proposed a new auto-encoder algorithm called Hessian regularized sparse auto-encoders (HSAE). Liu et al. [28] proposed multi-view Hessian regularized logistic regression for action recognition. While the null space of the graph Laplacian along the underlying manifold is a constant function, HesR steers the learned function varying linearly in reference to the geodesic distance. In result, HesR can be more accurate to describe the underlying manifold of data and achieves the better learning performance than LapR-based ones [18]. However, the stability of Hessian estimation depends mostly on the quality of the local fit for each data point, which leads to inaccurate estimation particularly when the function is heavily oscillating [17].

As a nonlinear generalization of the standard graph Laplacian, discrete p-Laplacian has been well studied in mathematics community and solid properties have been investigated by previous work [29, 30]. Meanwhile, graph p-Laplacian has been proved having the advantages for exploiting the manifold of data distribution. Bühler et al. [31] provided a rigorous proof of the approximation of the second eigenvector of p-Laplacian to the Cheeger cut which indicates the superiority of graph p-Laplacian in local geometry exploiting. Luo et al. [32] proposed full eigenvector analysis of p-Laplacian and obtain a natural global embedding for multi-class clustering problems, instead of using greedy search strategy implemented by previous researchers. Liu et al. [33] proposed p-Laplacian regularized sparse coding for human activity recognition.

In this chapter, we first present some related work, and then introduce several extensions based on the manifold regularization framework. Specifically, we present the approximation of graph p-Laplacian and the p-Laplacian regularization framework.

Notations: We present some notations that will be used throughout this chapter. We use Las the novel graph Laplacian constructed by the traditional graph Laplacian Land the side information. Lp, Lphpand Lrepresent the graph p-Laplacian, hypergraph p-Laplacian and ensemble graph p-Laplacian, respectively.

2. Related works

This section reviews some related works on manifold regularization, pairwise constraints and hypergraph learning.

2.1. Manifold regularization framework

In semi-supervised learning, assume that Ntraining samples Xcontaining llabeled samples xiyii=1land uunlabeled samples xjj=l+1l+uare available. The labeled samples are pairs generated from probability distribution, while unlabeled samples are simply drawn according to the marginal distribution. To utilize marginal distribution induced by unlabeled samples, we assume that if two points x1,x2are close in the intrinsic geometry of marginal distribution, then the labels of x1and x2are similar.

Manifold regularized method introduces appropriate penalty term fI2and reproducing kernel Hilbert spaces (RKHS) norm fK2that is used to control the complexity of the intrinsic geometric structure of the function and the complexity of the classification model, respectively. By incorporating two regularization terms, the standard framework aims to minimize the following function:

f=argminΗΚ1li=1lVxiyif+ΥAfK2+ΥIfI2.E1

where Vis some loss function, such as the hinge loss function max01yifxifor support vector machines (SVM). The parameters ΥAand ΥIbalance the loss function and two regularization terms. For semi-supervised learning, the manifold regularization term fI2is a key to smooth function along the manifold estimated from the unlabeled samples.

2.2. Pairwise constraints

Pairwise constraints (side information) [34, 35] is a type of supervised information that specify whether a pair of data samples belong to the same class (must-link constraints) or different classes (cannot-link constraints). Compared with class labels, pairwise constraints can provide us weak and more general supervised information. Currently, it has been widely used in semi-supervised clustering [36, 37], distance metric learning [38], feature selection [39] and dimension reduction [40, 41].

Donate X=xii=1nas data set with Y=yii=1nas class labels. Let M=xixjbe the pairwise must-link constraints set and C=xixjbe the pairwise cannot-link constraints set, that is,

M=xixjxiandxjbelong to the same class
C=xixjxiandxjbelong to different classes.

Defined on the pairwise must-link constraint set and the cannot-link constraint set, we construct similarity matrices SMand SC, respectively:

SijM=1,ifxixjM0,otherwiseE2
SijC=1,ifxixjC0,otherwise.E3

Then, the must-link Laplacian matrix LMis given by LM=DMSM, and the cannot-link Laplacian matrix LCis given by LC=DCSC. Where DMand DCare two diagonal matrices with DiiM=j=1nSijMand DiiC=j=1nSijC, respectively.

Ding et al. [42] introduced pairwise constraints into spectral clustering algorithm. Especially, they revised the distances between sample points by the distance matrix D, where Dij=0ifxixjMifxixjC.

Kalakech et al. [43] developed a semi-supervised constraint score by using both pairwise constraints and local properties of the unlabeled data.

Luo et al. [44] denoted the training set with side information by xixjyiji,j=1N, where yij=±1indicates xiand xjare similar or dissimilar. The side information was utilized by denoting the loss function yij1xixjAm2, where Amis the metric in the m’th heterogeneous domain.

2.3. Hypergraph learning

Hypergraph [45] is a generalization of a simple graph. Compared with simple graphs, a hypergraph illustrates the complex relationship by hyperedges that connect three or more vertices (see in Figure 1). Thus, the hypergraph contains more local structure information in comparison to simple graph. Hypergraph has been widely used in image classification [46], ranking [47] and video segmentation [48].

Figure 1.

The block scheme of hypergraph. Left: A simple graph in which two points are joined together by an edge if they are highly similar. A hypergraph completely illustrates the complex relationship among points by hyperedges. Right: The H matrix of the hypergraph. The entry viej is set to 1 if a hyperedge ej contains vi, or 0 otherwise.

Let G=VEdenote a hypergraph with the vertex set Vand the hyperedge set E. Denote the weight associated with each hyperedge eas we. The degree dvof a vertex is defined by dv=eEvewe. The degree of a hyperedge eis denoted as δe=e. Denote the vertex-edge incident matrix Hby a V×Ematrix, where entry hve=1if ve, and hve=0otherwise. By these definitions, we have:

dv=eEwehve,δe=vVhve.E4

Then, we denote Dvas the diagonal matrices consisting of vertex degree, Deas the diagonal degree matrices of each hyperedge and Was the diagonal matrix of edge weights. Then, the hypergraph Laplacian can be defined.

A number of different methods have been used in the literature to build the graph Laplacian of hypergraphs. The first category includes star expansion [49], clique expansion [49], Rodriquez’s Laplacian [50], etc. These methods aim to construct a simple graph from the original hypergraph, and then partitioning the vertices by spectral clustering techniques. The second category of approaches defines a hypergraph Laplacian using analogies from the simple graph Laplacian. Representative methods in this category include Bolla’s Laplacian [51], Zhou’ normalized Laplacian [52], etc. According to [52], the normalized hypergraph Laplacian Lhpis defined as

Lhp=IDv1/2HWDe1HTDv1/2.E5

It is worth noting that Lhpis positive semi-definite. The adjacency matrix of hypergraph Whpcan be formulated as follows:

Whp=HWHTDv.E6

For a simple graph, the edge degree matrix Deis replaced by 2I. Thus, the standard graph Laplacian is

L=I12Dv12HWHTDv12=12IDv1/2WhpDv1/2.E7

3. LapR-based SSL

Laplacian regularization is one of most prominent manifold regularization methods that utilizes the graph Laplacian matrix to characterize the manifold structure. In this section, we introduce the traditional Laplacian support vector machines (LapSVM) and Laplacian kernel least squares (LapKLS) as examples of Laplacian regularization algorithms. Then, we extend the algorithms by building the novel graph Laplacian Lwhich combines the traditional graph Laplacian Lwith the side information to boost locality preservation.

3.1. LapSVM and LapKLS

As previously mentioned, the manifold regularization framework is built by Eq. (1). The traditional LapSVM solves this optimization problem with the hinge loss function

f=argminΗΚ1li=1l1yifxi++ΥAfK2+ΥIl+u2fTLf.E8

where fis given as f=fx1fx2fxl+uT, Lis the graph Laplacian with L=DW, where Wijis weight vector, the diagonal matrix Dis given by Dii=j=1nWij.

According to the representer theorem, the solution of the above problem can be expressed as below:

fx=i=1l+uαiKxix.E9

where Kis the kernel function. Therefore, we rewrite the objective function as

f=argminΗΚ1li=1l1yifxi++ΥAαTKα+ΥIl+u2αTKLKα.E10

By employing the least square loss in Eq. (10), we can present the locality preserved kernel least squares model defined in Eq. (11) as follows

f=minΗΚ1li=1lyifxi2+ΥAαTKα+ΥIl+u2αTKLKα.E11

Taking the derivation to the objective functions, we can get the solution of α.

3.2. Pairwise constraints-combined manifold regularization

Assume that samples with the similar features tend to have the similar class labels, combining the Laplacian regularization and pairwise constraints is a good way to exploit the local structure and boost the classification results. Therefore, we introduce the pairwise constraints into traditional LapR. Particularly, we introduce three combination strategies based on experiences. Finally, we present the locality preserved support vector machines and kernel least squares respectively.

According to the definition, we can compute the must-link Laplacian matrix LMand the cannot-link Laplacian matrix LC. The first two forms of the combination are defined on the traditional graph Laplacian Land must-link constraints and can be written as

L=LLM+αΙE12

and

L=L+αLME13

respectively, where αis the parameter to balance the weight between the two types of Laplacian matrices.

Based on the cannot-link constraints C, we can compute the similarity matrix Sas Sij=1,ifxixjC1,otherwise. The third form of the combination is defined on the traditional graph Laplacian and pairwise cannot-link constraints and can be written as

L=L.S.E14

Actually, there are other combination strategies using both the must-link and cannot-link constraints to get a better result than traditional methods. However, the performance is no better than the result using one only from the experiences. Therefore, we just put these three proposed graph Laplacian into practice.

Introducing the novel graph Laplacian Lto SVM, we rewrite the learning model as follows:

f=argminΗΚ1li=1l1yifxi++ΥAfK2+ΥIl+u2fTLf.E15

According to the representer theorem, the solution of the above problem can be expressed as below:

fx=i=1l+uαiKxix.E16

Therefore, we rewrite the objective function as

f=argminΗΚ1li=1l1yifxi++ΥAαTKα+ΥIl+u2αTKLKα.E17

By employing the least square loss in Eq. (17), we can present the locality preserved kernel least squares model defined in Eq. (18) as follows

f=minΗΚ1li=1lyifxi2+ΥAΥAαTKα+ΥIl+u2αTKLKα.E18

We compare our proposed local structure preserving algorithms with the traditional well-known Laplacian algorithms on CAS-YNU-MHAD dataset [53]. CAS-YNU-MHAD dataset contains 10 human actions including jumping up, jumping forward, running, walking S, walking quickly, walking, standing up, sitting down, lying down and typing. Figure 2 shows the examples. In experiments, we choose the data from four sensors (be placed in the right shoulder, left forearm, left hand and spine) to construct multi-view features. Ninety percent data of per action are randomly selected as the training data, and the rest for testing.

Figure 2.

Three examples from 10 actions, jumping up, walking S and sitting down (up to bottom).

In semi-supervised classification experiments, we randomly select a certain percentage (10, 20, 30, 50%) samples of training data as labeled data. All the classification methods are measured by the average precision (AP) [54] based on the testing data. Note that the supervised information (labeled information and side information) are randomly selected from training set. To avoid any potential bias induced by data selecting, the above process is repeated for five times.

For the first two proposed algorithms using the must-link constraints, we first determine the parameter αwhich balances the traditional graph Laplacian and the must-link Laplacian matrix. The parameter αof novel methods is tuned from the candidate set eii=10,9,8,,10through cross-validation. In addition, the regularization parameters ΥA,ΥIare chosen from 108107106106107108through cross-validation on the training data. We verify the AP performance to select the proper parameters. Note that the parameter αmay be different for the same classifier to get the best performance under the different proportion of side information. In results, the legend NewLapKLS-1 represents the kernel least squares classifier using algorithm L=LLM+αΙ, NewLapSVM-2 stands for the support vector machines classifier using algorithm L=L+αLM, and so on.

Figure 3 shows the classification results achieved by KLS and SVM classifiers under the 10% labeled samples. We can see two main points. First, our proposed three local structure preserving algorithms with pairwise constraints usually get the overall better performances than the well-known semi-supervised methods (LapKLS and LapSVM) without side information. Second, we can clearly see, in most cases, the results gradually become better with the increase of side information. From Figures 46, we can get the analogous observations for our proposed methods compared with their counterparts. These observations indicate that our proposed learning model can better explore and exploit the local structure by taking advantage of the geometrical structure information in the pairwise constraints and manifold regularization. What we can note is that the classification results have slight fluctuation with more side information when the number of class labels is large. These observations suggest it is critical to select parameters for our proposed methods.

Figure 3.

The total classification result under 10% labels (a) KLS, (b) SVM.

Figure 4.

The total classification result under 20% labels (a) KLS, (b) SVM.

Figure 5.

The total classification result under 30% labels (a) KLS, (b) SVM.

Figure 6.

The total classification result under 50% labels (a) KLS, (b) SVM.

To investigate whether the single action of CAS-YNU-MHAD can get the outperformance, we choose jumping up as an example in Figure 7. We can find that, our proposed algorithm consistently performs better than the previous algorithm without side information. Especially, we can see, the classification result can get a significant development when the number of labeled samples is limited.

Figure 7.

The result of jumping up with the different proportion of side information by LapKLS, LapSVM, NewLapKLS-1 and NewLapSVM-1.

4. HesR-based SSL

Although LapR has received extensive attention, it is observed that the null space of the graph Laplacian along the underlying manifold is a constant function that possibly results in poor generalization. In contrast to Laplacian, Hessian can properly exploit the intrinsic local geometry of the data manifold. In recent works [23, 24, 25, 26, 28], HesR based SSL algorithms have been proved to achieve better performance than LapR based ones.

Hessian matrix can be computed by the following four steps.

  1. Step 1: Neighborhood construction. Using k-neighborhood to define neighbors in Euclidean distance for each input point xi, we get neighborhood matrix Ni.

  2. Step 2: Create local tangent coordinates. Conduct singular value decomposition on neighborhood matrix Ni=UDV. The first d columns of V(Vi=v1v2vd) mean the tangent coordinates of data points xi.

  3. Step 3: Build local Hessian estimator. Apply Gram-Schmidt procedure on the matrix 1ViQiwith the first column is a vector of ones, Qi=vivj1ijdis a matrix of mm+1/2columns to get M̂ik. Then taking the last mm+1/2columns of M̂ikas Hi.

  4. Step 4: Construct Hessian matrix H. A symmetric matrix His constructed with the entry Hij=lrHr,ilHr,jl.

The HesR model can be expressed in:

f=argminΗΚ1li=1lVxiyif+ΥAfK2+ΥIl+u2fTHf.E19

Hessian has been widely utilized in improving the SSL classification performance. Liu et al. [18] present multi-view Hessian discriminative sparse coding (mHDSC) which seamlessly integrates Hessian regularization with discriminative sparse coding for multi-view learning problems. In [24], HesR was employed into support vector machine to boost the classifier. In [19], HesR was integrated into multi-view learning for image annotation, extensive experiments on the PASCAL VOC’07 dataset validate the effectiveness of HesR by comparing it with LapR.

5. pLapR-based SSL

Although the p-Laplacian has nice theoretical foundations, it is still a strenuous work to approximate graph p-Laplacian, which extremely limits the applications of p-Laplacian regularization. In this section, we provide an effect and efficient fully approximation of graph p-Laplacian, which significantly lows down the computation cost. Then we integrate the approximated graph p-Laplacian into manifold regularization framework and develop p-Laplacian regularization. Based on the pLapR, several extended algorithms were proposed.

5.1. pLapR

The graph p-Laplacian is approximated by getting all eigenvectors and eigenvalues of p-Laplacian [55]. Assume that f1,f2,,fKare Keigenvectors of p-Laplacian pwassociated with unique eigenvalues λ1,λ2,,λK. Luo et al. [32] introduced an approximation for full eigenvectors of p-Laplacian by solving the following p-Laplacian embedding problem:

minFJEF=kijwijfikfjkpfkpps.t.FTF=I.E20

Solving the Eq. (20) with the gradient descend optimization, we can then obtain the full eigenvalues Λ=λ1λ2λKof p-Laplacian associated with the eigenvectors F=f1f2fKby λp=ijwijfifjp2fpp. Finally, the graph p-Laplacian approximated by Lp=FΛFT.

We introduce the approximation graph p-Laplacian into a regularizer to exploit the intrinsic local geometry of the data manifold. Therefore, in p-Laplacian regularization framework, the optimization problem in Eq. (1) becomes

f=argminΗΚ1li=1lVxiyif+ΥAfK2+ΥIl+u2fTLpf.E21

Here, Lpis the graph p-Laplacian.

The proposed pLapR can be applied to variant MRSSL-based applications with different choices of loss function. Here, we apply pLapR to support vector machines (SVM) and kernel least squares (KLS) as examples.

Applying the hinge loss function in p-Laplacian learning, the p-Laplacian support vector machines (pLapSVM) solves the following optimization problem:

f=argminΗΚ1li=1l1yifxi++ΥAfK2+ΥIl+u2fTLpf.E22

The representer theorem has been proved exist and has the general form in Eq. (16). Hence the optimization problem (21) can be expressed as

f=argminΗΚ1li=1l1yifxi++ΥAαTKα+ΥIl+u2αTKLpKα.E23

We outline the KLS with p-Laplacian regularization. For p-Laplacian kernel least squares (pLapKLS), it solves the following optimization problem

f=minΗΚ1li=1lyifxi2+ΥAαTKα+ΥIl+u2αTKLpKα.E24

To evaluate the effectiveness of the proposed pLapR, we apply pLapSVM and pLapKLS to scene recognition on the Scene 67 database [56] and Scene 15 data set [57]. Figure 8 illustrates the framework of pLapR for scene recognition.

Figure 8.

The framework of pLapR for indoor scene recognition.

The Scene 67 data set contains 15,620 indoor scene images collected from different sources including online image search tools, online photo sharing sites and the LabelMe dataset. Particularly, these images can be categorized into 67 classes covering 5 big scene groups (i.e., stores, home, public spaces, leisure and working place). Some example images are shown in Figure 9.

Figure 9.

Some example images of Scene 67 database. The dataset totally has 67 indoor scene categories that can be grouped into 5 big scene groups. Each row demonstrates one big scene group.

Scene 15 data set is composed of 15 scene categories, totally 4485 images. Each category has 200–400 images. The images contain not only indoor scenes, such as living room, kitchen, and store, but also outdoor scenes, such as forest, mountain, tall building, open country, and so on (see in Figure 10).

Figure 10.

Some example images of Scene 15 data set. The dataset totally has 15 scene categories.

For Scene 67 dataset, we randomly select 80 images of each class to form the training set and the rest as testing set. For Scene 15 dataset, 100 images per class are randomly selected as the training set, and the rest for testing. In semi-supervised experiments, a certain percentage (10, 20, 30, 50%) samples of training set are randomly assigned as labeled data. To avoid any bias introduced by the random partitioning of samples, the above assignment is carried out for five times independently.

The regularization parameters that is, γAand γIare tuned from the candidate set 10ii=10,9,8,,10and the parameter pfor pLapR from the candidate set 11.11.23through cross-validation on the training data with 10% labeled sample, respectively. The performance is measured by the average precision (AP) for single class and mean average precision (mAP) for overall classes. Firstly, we show the mAP boxplot of the pLapR on Scene 67 dataset when p=2and the standard LapR for comparison in Figure 11. We can clearly see that the performance of pLapR with p=2is similar to standard LapR, which demonstrates that the graph p-Laplacian with p=2becomes the standard graph Laplacian.

Figure 11.

mAP of pLapR(p = 2) and LapR on Scene 67 dataset. Each subfigure reports the results under different labeled samples. In each subfigure, the y-axis is the mAP over all scene classes, and the x-axis is different classifiers.

Figure 12 illustrates the performance of pLapKLS with different pvalues. The upper subfigure is the performance of the Scene 67 database. We observe that the best performance of indoor scene classification on the Scene 67 dataset can be obtained with p=1.1. The lower subfigure is the performance of the Scene 15 database and the best performance is achieved when p = 1.

Figure 12.

mAP results of pLapKLS under different p with 10% labeled sample.The y-axis is the mAP over all classes, and the x-axis is the parameter p.

Then we evaluate the performance of the pLapR with the representative LapR and HesR. Figure 13 and Figure 14 show the mAP performance on Scene 67 data set and Scene 15 data set, respectively. The four subfigures of upper row are KLS methods, and the lower four ones are SVM methods. From the results of two data sets, we can see that the pLapR outperforms both LapR and HesR especially when only a small number of samples labeled.

Figure 13.

mAP of different algorithms on Scene 67 data set. The four subfigures of upper row are KLS methods, and the lower four ones are SVM methods.

Figure 14.

mAP of different algorithms on Scene 15 data set. The four subfigures of upper row are KLS methods, and the lower four ones are SVM methods.

To discuss the AP performance of different algorithms for single class, we show the results of several classes of Scene 15 data set including mountain, open country, tall building and industrial. Each subfigure corresponds on single scene class. The upper four subfigures are KLS methods, and the lower four ones are SVM methods. In each subfigure, the y-axis is the AP results and the x-axis is the number of labeled samples. From the AP results, we can find that, in most cases, the pLapR performs better than the traditional methods including LapR and HesR (Figure 15).

Figure 15.

AP of different methods on several classes of Scene 15 data set including mountain, open country, tall building and industrial. Each subfigure corresponds on single scene class. The upper four subfigures are KLS methods, and the lower four ones are SVM methods. In each subfigure, the y-axis is the AP results and the x-axis is the number of labeled samples.

5.2. Hypergraph p-Laplacian (HpLapR)

In this subsection, we propose a hypergraph p-Laplacian regularized method for image recognition. The hypergraph and p-Laplacian [31, 58, 59] both provide convincing theoretical evidence to better preserve the local structure of data. However, the computation of hypergraph p-Laplacian is difficult. We provide an effect and efficient approximation algorithm of hypergraph p-Laplacian. Considering the higher order relationship of samples, the hypergraph p-Laplacian regularizer is built for preserving local structures. Hypergraph p-Laplacian regularization (HpLapR) is also introduced to logistic regression for remote sensing image recognition.

Assume that hypergraph p-Laplacian has neigenvectors Fhp=fhp1fhp2fhpnassociated with unique eigenvalues λhp=λ1hpλ2hpλnhp, we compute the approximation of hypergraph p-Laplacian Lphpby Lphp=FhpλhpFhpT. Thus, it is important to obtain all eigenvectors and eigenvalues of hypergraph p-Laplacian.

Although a complete analysis of hypergraph p-Laplacian is challenging, we can easily generate a hypergraph with a group of hyperedges [52]. In detail, we construct hypergraph Laplacian Lhpand compute adjacency matrix Whpby Eq. (5) and Eq. (6), respectively.

Following the study on plapR [31, 55], eigenvalue and the corresponding eigenvector on hypergraph p-Laplacian can be computed by the following hypergraph p-Laplacian embedding problem:

minFhpJEFhp=kijwijhpfihpkfjhpkpfhpkpps.t.FhpTFhp=IE25

Solving the problem of Eq. (25) with the gradient descend optimization. We can also get the full eigenvalue λhp=λ1hpλ2hpλnhpby λkhp=ijwijhpfihpkfjhpkpfhpkpp.

Finally, the approximation of Lphpcan be solved by Lphp=FhpλhpFhpT.

According to the manifold regularization framework, the proposed HpLapR can be written as the following optimization problem:

f=argminΗΚ1li=1lVxiyif+ΥAfK2+ΥIl+u2fTLphpf.E26

Here, Lphpis hypergraph p-Laplacian. We employ the proposed HpLapR with logistic regression.

Substitute logistic loss function in Eq. (26), the HpLapR can be rewritten as

f=argminΗΚ1li=1llog1+eyifxi+ΥAfK2+ΥIl+u2fTLphpfT.E27

According to the representer theorem, the solution of (27) w.r.t. fexists and can be expressed by Eq. (16). Thus, we finally construct the HpLapR as the following optimization problem:

f=argminΗΚ1li=1llog1+eyiKxixα+ΥAαTKα+ΥIl+u2αTKLphpKα.E28

Apply the conjugate gradient algorithm, we can get the solution of the optimized f.

To evaluate the effectiveness of the proposed HpLapR, we compare HpLapR with other local structure preserving algorithms including LapR, HLapR and pLapR. Figure 16 illustrates the framework of HpLapR for UC-Merced data set.

Figure 16.

The framework of HpLapR for remote sensing image classification.

UC-Merced data set [60] consists of totally 2100 land-use images collected from aerial orthoimage with the pixel resolution of one foot. These images were manually selected into 21 classes: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium density residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis courts (see in Figure 17).

Figure 17.

Class examples of UC-Merced data set. The dataset totally has 21 remote sensing categories that can be simply grouped into six groups according to the distinction of land use. Each column represents one group.

In our experiments, we extract high-level visual features using the deep convolution neural network (CNN) [61]. We randomly choose 50 images per class as training samples and the rest as testing samples. For hypergraph construction, we regard each sample in the training set as a vertex, and generate a hyperedge for each vertex with its knearest neighbors (so the hyperedge connects k+1samples) [62]. It is worthy to notice that, for our experiments, the kNN-based hyperedges generating method is implemented only in six groups, not in the overall training samples. For example, for a sample of baseball diamond, the vertices of the corresponding hyperedge are chosen from the first group (baseball diamond, golf course and tennis courts) of Figure 17. The setting of class labels is as same as pLapR.

We conduct the experiments on the data set to obtain the proper modal parameters. The neighborhood size kof a hypergraph varies in a range 56715through cross-validation. The setting of regularization parameters γA,γIand pare as same as pLapR experiments.

Figure 18 illustrates the mAP performance of pLapR and HpLapR on the validation set when pvaries. The x-axis is the parameter pand the y-axis is mAP for performance measure. We can see that the best mAP performance for pLapR can be obtained when p=2.3, while the best performance of HpLapR is achieved when p = 2.6.

Figure 18.

Performance of mAP with different p values on validation set.

We compare our proposed HpLapR with the representative LapR, HLapR and pLapR. From Figure 19, we can observe that, HpLapR outperforms other methods especially when only a small number of samples are labeled. This suggests that our proposed method has the superiority to preserve the local structure of the data because it integrates hypergraph learning with graph p-Laplacian. To evaluate the effectiveness of HpLapR for single class, Figure 20 shows the AP results of different methods on several land-use classes including beach, dense residential, freeway and tennis court. From Figure 20, we can find that, in most cases, HpLapR performs better than both pLapR and HLapR, while pLapR and HLapR consistently outperforms than LapR.

Figure 19.

mAP performance of different algorithms.

Figure 20.

AP performance of different methods on several classes.

5.3. Ensemble p-Laplacian regularization (EpLapR)

As a natural nonlinear generalization of graph Laplacian, p-Laplacian has been proved having the rich theoretical foundations to better preserve the local structure. However, it is difficult to determine the fitting graph p-Lapalcian, that is, the parameter pthat is a critical factor for the performance of graph p-Laplacian. In this section, we develop an ensemble p-Laplacian regularization to fully approximate the intrinsic manifold of the data distribution. EpLapR incorporates multiple graphs into a regularization term in order to sufficiently explore the complementation of graph p-Laplacian. Specifically, we construct a fused graph by introducing an optimization approach to assign suitable weights on different p-value graphs. Then, we conduct semi-supervised learning framework on the fused graph.

Assume a set of candidate graph p-Laplacian L1pLmp, according to the manifold regularization framework, the proposed EpLapR can be written as the following optimization problem:

f=argminΗΚ1li=1lVxiyif+ΥAfK2+ΥIn2fTLf.E29

where Lis the optimal fused graph with L=k=1mμkLkp, s.t.k=1mμk=1,μk0,fork=1,,m.

To avoid the parameter μkoverfitting to one graph [63], we make a relaxation by changing μkto μkγ, and obtain the optimization problem as:.

f=argminΗΚ1li=1lVxiyif+ΥAfK2+ΥIn2fTk=1mμkγLkpf.s.t.k=1mμk=1,μk0,fork=1,,mE30

The representor theorem presents us with the existence and the general form of Eq. (16) under a fixed μ. Therefore, we rewrite the objective function as

f=argminΗΚ1li=1lVxiyif+ΥAαTKα+ΥIl+u2αTKk=1mμkγLkpKα.s.t.k=1mμk=1,μk0,fork=1,,mE31

Here, an alternating optimization procedure is utilized to minimize f.

We compare EpLapR with other local structure preserving algorithms including LapR, HesR and pLapRon UC-Merced data set. We apply the support vector machines and kernel least squares for remote sensing image classification.

In the experiments, we apply the parameter setting as the same as pLapR, and the experiment of pLapR is conducted with p=2.8. For EpLapR, we created two graph p-Laplacian sets. For the first set (EpLapR-3G), we choose p=2.52.7,2.8, which led to 3 graphs. For another one (EpLapR-5G), with 5 graphs where p=2.42.52.62.7,2.8.

We compare our proposed EpLapR with the representative LapR, HesR and pLapR. Figures 21 and 22 demonstrate the mAP results of different algorithms on KLS methods and SVM methods, respectively. We can see that, in most cases, the EpLapR outperforms LapR, HesR and pLapR, which shows the advantages of EpLapR in local structure of preserving.

Figure 21.

mAP performance of different algorithms on KLS method.

Figure 22.

mAP performance of different algorithms on SVM method.

6. Conclusions

In this chapter, we show the LapR, HesR, pLapR and present several extensions based on the manifold regularization framework. We propose a local structure preserving method that effectively integrates manifold regularization and pairwise constraints. We develop an efficient approximation algorithm of graph p-Laplacian and propose p-Laplacian regularization to preserve the local geometry. Considering the hypergraph contains more local grouping information in comparison to simple graph, we propose hypergraph p-Laplacian regularization to preserve the geometry of the probability distribution. In practical application of p-Laplacian regularization model, it is difficult to determine the optimal graph p-Lapalcian because the parameter pusually chose by cross validation method which lacks the ability to approximate the optimal solution. Therefore, we propose an ensemble p-Laplacian regularization to better approximate the geometry of the data distribution.

7. Expectations

In the general image recognition, images are naturally represented by multi-view features, such as color, shape and texture. Each view of a feature summarizes a specific characteristic of the image, and features for different views are complementary to one another. Therefore, in the future work, we will study the multi-view p-Laplacian regularization to effectively explore the complementary properties of different features from different views. Meanwhile, we will try to combine the p-Laplacian learning with the deep learning to get a more effective p-Laplacian learning algorithm.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Xueqi Ma and Weifeng Liu (November 5th 2018). Recent Advances of Manifold Regularization, Manifolds II - Theory and Applications, Paul Bracken, IntechOpen, DOI: 10.5772/intechopen.79383. Available from:

chapter statistics

306total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Manifold Learning in Medical Imaging

By Samuel Kadoury

Related Book

First chapter

Classical and Quantum Conjugate Dynamics – The Interplay Between Conjugate Variables

By Gabino Torres-Vega

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us