A Fusion Scheme of Local Manifold Learning Methods A Fusion Scheme of Local Manifold Learning Methods

Spectral analysis-based dimensionality reduction algorithms, especially the local mani- fold learning methods, have become popular recently because their optimizations do not involve local minima and scale well to large, high-dimensional data sets. Despite their attractive properties, these algorithms are developed based on different geometric intuitions, and only partial information from the true geometric structure of the under- lying manifold is learned by each method. In order to discover the underlying manifold structure more faithfully, we introduce a novel method to fuse the geometric informa- tion learned from different local manifold learning algorithms in this chapter. First, we employ local tangent coordinates to compute the local objects from different local algorithms. Then, we utilize the truncation function from differential manifold to connect the local objects with a global functional and finally develop an alternating optimi- zation-based algorithm to discover the low-dimensional embedding. Experiments on synthetic as well as real data sets demonstrate the effectiveness of our proposed method.


Introduction
Nonlinear dimensionality reduction (NLDR) plays an important role in the modern data analysis system, since many objects in our world can only be electronically represented with high-dimensional data such as images, videos, speech signals, and text documents. We usually need to analyze a large amount of data and process them, and however, it is very complicated or even infeasible to process these high-dimensional data directly, due to their high computational complexity on both time and space. Over the past decade, numerous manifold learning methods have been proposed for nonlinear dimensionality reduction. From methodology, these methods can be divided into two categories: global algorithms and local algorithms. Representative global algorithms contain isometric mapping [1], maximum variance unfolding [2], and local coordinates alignment with global preservation [3]. Local methods mainly include Laplacian eigenmaps (LEM) [4], locally linear embedding (LLE) [5], Hessian eigenmaps (HLLE) [6], local tangent space alignment (LTSA) [7], local linear transformation embedding [8], stable local approaches [9], and maximal linear embedding [10].
Different local approaches try to learn different geometric information of the underlying manifold, since they are developed based on the knowledge and experience of experts for their own purposes [11]. Therefore, only partial information from the true underlying manifold is learned by each existing local manifold learning method. Thus, to better discover the underlying manifold structure, it is more informative and essential to provide a common framework for synthesizing the geometric information extracted from different local methods. In this chapter, we propose an interesting method to unify the local manifold learning algorithms (e. g., LEM, LLE, HLLE, and LTSA). Inspired by HLLE which employs local tangent coordinates to compute the local Hessian, we propose to utilize local tangent coordinates to estimate the local objects defined in different local methods. Then, we employ the truncation function from differential manifold to connect the local objects with a global functional. Finally, we develop an alternating optimization-based algorithm to discover the global coordinate system of lower dimensionality.

Local tangent coordinates system
A manifold is a topological space that locally resembles Euclidean space near every point. For example, around each point, there is a neighborhood that is topologically the same as the open unit ball in ℝ D . The simplest manifold is a linear manifold, usually called a hyperplane. There exists a tangent space at each point of a nonlinear manifold. The tangent space is a linear manifold which locally approximates the manifold. Suppose there are N points {x 1 ;…;x N } in ℝ D residing on a smooth manifold M⊂ℝ D , which is the image of a coordinate space Y⊂ℝ d under a smooth mapping ψ : Y ! ℝ D , where d≪D. The mapping ψ is assumed as a locally isometric embedding. The aim of a NLDR algorithm is to acquire the corresponding lowdimensional representation y i ∈Y of each x i ∈M and preserve certain intrinsic structures of data at the same time. Suppose M is smooth such that the tangent space T x ðMÞ is well defined at every point x∈M. We can regard the local tangent space as a d-dimensional affine subspace of ℝ D which is tangent to M at x. Thus, the tangent space has the natural inner product induced by the embedding M⊂ℝ D . Within some neighborhood of x, each point x∈M has a sole closest point in T x ðMÞ, and therefore, an orthonormal coordinate system from the corresponding local coordinates on M can be associated with the tangent space.
A manifold can be represented by its coordinates. While the current research of differential geometry focuses on the characterization of the global properties of manifolds, NLDR algorithms, which try to find the coordinate representations of data, only need the local properties of manifolds. In this chapter, we use local coordinates associated with the tangent space to estimate the local objects over the manifold. To acquire the local tangent coordinates, we first perform Principal Component Analysis (PCA) [12] on the points in N ðx i Þ ¼ {x i ; x i1 ;…; x i k } that is the local patch built by the point x i and its k nearest neighborhoods, and get d leading PCA to an orthogonal basis of T xi ðMÞ (the orthogonal basis can be seen as a d-dimensional affine subspace of ℝ D which is tangent to M at x i ). For high-dimensional data, we employ the trick presented by Turk and Pentland for EigenFaces [13]. Then, we obtain the local tangent coordinates U i ¼ {0;u i 1 ;…;u i k } of the neighborhood N ðx i Þ by projecting the local neighborhoods to this tangent subspace: An illustration of the local tangent space at x i and the corresponding tangent coordinates system (i.e., the point x ij 's local tangent coordinate is u i j ) is shown in Figure 1.

Reformulation of Laplacian eigenmaps
The method LEM was introduced by Belkin and Niyogi [4]. We can summarize the geometrical motivation of LEM as follows. Assume that we are searching for a smooth one-dimensional embedding f : M ! ℝ from the manifold to the real line so that data points near each together on the manifold are also mapped close together on the line. Think about two adjacent points, x;z∈M, which are mapped to f ðxÞ and f ðzÞ, respectively, we can obtain that where ∇ M f is the gradient vector field along the manifold. Thus, to the first order, ∥∇ M f ∥ provides us with an estimate of how far apart f maps nearby points. When we look for a map that best preserves locality on average, a natural choice to find f is to minimize [4]: where the integral is taken with respect to the standard measure over the manifold. Thus, the function f that minimizes Φ lap ð f Þ has to be an eigenfunction of the Laplace-Beltrami operator Δ M , which is a key geometric object associated with a Riemannian manifold [14].
Suppose that the tangent coordinate of x∈N ðxÞ is given by u.
where u ¼ ðu 1 ;…;u d Þ∈ℝ d , and we keep up tan in the notation to make clear that it counts on the coordinate system in T x ðMÞ. For different local coordinate systems, although the tangent gradient vector will be different, the norm ∥∇ tan f ðxÞ∥ is inimitably defined such that equation (3) can be approximated by estimating the following functional: where dx stands for the probability measure on M.
In order to compute the local object ∥∇ tan f ðxÞ∥ 2 , we first use the first-order Taylor series expansion to approximate the smooth functions {f ðx ij Þ} k j¼1 ; f : M ! ℝ, and together with Eq. (4), we have: Over U i , we develop the operator α i ¼ ½gð0Þ;∇gð0Þ ¼ ½gð0Þ;∇ tan f ðx i Þ that approximates the function gðu i j Þ by its projection on the basis The least-squares estimation of the operator α i can be computed by: It is easy to show that the least-squares solution of the above object function is Furthermore, the local object ∥∇ tan f ðx i Þ∥ 2 can be computed as: An unresolved problem in our reformulation is how to connect the local object ∥∇ tan f ðxÞ∥ 2 with the global functionalΦ lap ðf Þ in (5) and its discrete approximation. In Section4, we will discuss this issue in detail.

Reformulation of locally linear embedding
The LLE method was introduced by Roweis and Saul [5]. It is based on simple geometric intuitions, which can be depicted as follows. Globally, the data points are sampled from a nonlinear manifold, while each data point and its neighbors are residing on or close to a linear patch of the manifold locally. Thus, it is possible to describe the local geometric properties of the neighborhood of each data point in the high-dimensional space by linear coefficients which reconstruct the data point from its neighbors under suitable conditions. The method of LLE computes the low-dimensional embedding which is optimized to preserve the local configurations of the data. In each locally linear patch, the reconstruction error in the original LLE can be written as:ε where {w ij } k j¼1 are the reconstruction weights which encode the geometric information of the high-dimensional inputs and are constrained to satisfy ∑ j w ij ¼ 1.
Since the geometric structure of the local patch can be approximated by its projection on the tangent space T xi ðMÞ, we utilize the local tangent coordinates to estimate the local objects over the manifold in our reformulation framework. We can write the reconstruction error of each local tangent coordinate as: where we have employed the fact that the weights sum to one, and G i is the local Gram matrix, The optimal weights can be obtained analytically by minimizing the above reconstruction error. We solve the linear system of equations and then normalize the solution by ∑ k w ik ¼ 1. Consider the problem of mapping the data points from the manifold to a line such that each data point on the line can be represented as a linear combination of its neighbors. Let f ðx i1 Þ;…;f ðx i k Þ denote the mappings of u i 1 ;…;u i k , respectively. Motivated by the spirit of LLE, the neighborhood of f ðx i Þ should share the same geometric information as the neighborhood of u i , so we can define the following local object: The optimal mapping f can be obtained by minimizing the following global functional: where dx stands for the probability measure on the manifold.

Reformulation of Hessian eigenmaps
The HLLE method was introduced by Donoho and Grimes [6]. In contrast to LLE that obtains linear embedding by minimizing the l 2 error in Eq. (10), the HLLE achieves linear embedding by minimizing the Hessian functional on the manifold where the data points reside. HLLE supposes that we can obtain the low-dimensional coordinates from the ðd þ 1Þ-dimensional null-space of the functional ℋðf Þ which presents the average curviness of f upon the manifold, if the manifold is locally isometric to an open connected subset of ℝ d . We can measure the functional ℋðf Þ by averaging the Frobenius-norm of the Hessians on the manifold M as [6]: where H tan f stands for the Hessian of f in tangent coordinates. In order to estimate the local Hessian matrix, we first perform a second-order Taylor expansion at a fixed x i on the smooth Here, ∇f ¼ ∇g is the gradient defined in (4), and H i f is the local Hessian matrix defined as: where g : U ! ℝ uses the local tangent coordinates and satisfies the rule gðuÞ ¼ f ðxÞ ¼ f ∘ψðuÞ.
In the second identity of Eq. (17), we have exploited the fact that Over U i , we develop the operator β i that approximates the function gðu i j Þ by its projection on the basis and we have: Let β i ¼ ½gð0Þ;∇g;h i ∈ℝ 1þdþdðdþ1Þ=2 , then h i ∈ℝ dðdþ1Þ=2 is the vector form of local Hessian matrix H i f over neighborhood Nðx i Þ. The least-squares estimation of the operator β i can be obtained by: The least-squares solution is , and ðU i Þ † signifies the pseudo-inverse of U i . Notice that h i is the vector form of local Hessian matrix H i f , while the last dðd þ 1Þ=2 components of β i correspond to h i . Meanwhile, we can construct the local Hessian operator H i ∈ℝ ðdðdþ1Þ=2Þ · k by the last dðd þ 1Þ=2 rows of ðU i Þ † , and therefore, we can obtain h i ¼ H i f i . Thus, the local object ∥H tan f ðx i Þ∥ 2 F can be estimated with:

Reformulation of local tangent space alignment
The method LTSA was introduced by Zhang and Zha [7]. LTSA is based on similar geometric intuitions as LLE. The neighborhoods of each data point remain nearby and similarly colocated in the low-dimensional space, if the data set is sampled from a smooth manifold. LLE constructs low-dimensional data so that the local linear relations of the original data are preserved, while LTSA constructs a locally linear patch to approximate the tangent space at the point. The coordinates provided by the tangent space give a low-dimensional representation of the patch. From Eq. (6), we can obtain: From the above equation, we can discover that there are some relations between the global coordinate f ðx ij Þ in the low-dimensional feature space and the local coordinate u i j which represents the local geometry. The LTSA algorithm requires the global coordinates f ðx ij Þ that should respect the local geometry determined by the u i j : where f ðx i Þ is the mean of f ðx ij Þ, j ¼ 1;…;k. Inspired by LTSA, the affine transformation L i should align the local coordinate with the global coordinate, and we can define the following local object: and e is a k-dimensional column vector of all ones. Naturally, we should seek to find the optimal mapping f and a local affine transformation L i to minimize the following global functional: Obviously, the optimal affine transformation L i that minimizes the local reconstruction error for a fixed f i is given by: and therefore, the local object κ f ðx i Þ can be estimated as:

Fusion of local manifold learning methods
So far we have discussed four basic local objects: ∥∇ tan f ðxÞ∥ 2 , jσ f ðxÞj 2 , ∥H tan f ðx i Þ∥ 2 F , and jκ f ðx i Þj 2 . From different perspectives, they depict the geometric information of the manifold. We look forward to collect these geometric information together to better reflect the geometric structure of the underlying manifold. Notice that we can estimate these local objects under the local tangent coordinate system according to Eqs. (9), (14), (21), and (28), respectively. Taking stock of the structure of these equations, it is not hard to discover that we can fuse these local objects together under our proposed framework. Assume that there are M different local manifold learning algorithms, we can define the fused local object as follows: where {c j } M j¼1 are the nonnegative balance parameters, {LO j ðxÞ} M j¼1 are the local objects, such as ∥∇ tan f ðxÞ∥ 2 , jσ f ðxÞj 2 , ∥H tan f ðx i Þ∥ 2 F , and jκ f ðx i Þj 2 , from different algorithms. It is worth to note that the other local manifold learning algorithms can also be reformulated to incorporate into our unified framework.
We employ the truncation function from differential manifold to connect the local objects with their corresponding global functional such that we can obtain a consistent alignment of the local objects to discover a single global coordinate system of lower dimensionality. The truncation function is a crucial tool in differential geometry to build relationships between global and local properties of the manifold. Assume that U and V are two nonempty subsets of a smooth manifold M, where V is compact and V∈U ( V is the closure of V ). Accordingly, the truncation function [15] can be defined as a smooth function s : M ! ℝ such that: The truncation function s can be discretely approximated by the 0-1 selection matrix S i ∈ℝ N · k . An entry of S i is defined as: where N i ¼ {i 1 ;…;i k } denotes the set of indices for the k-nearest neighborhoods of data point x i . Let f ¼ ½f ðx 1 Þ;…;f ðx N Þ∈ℝ N be a function defined on the whole data set sampled from the global manifold. Thus, the local mapping f i ¼ ½f ðx i 1 Þ;…;f ðx i k Þ∈ℝ k can be expressible by f i ¼ ðS i Þ T f . With the help of the selection matrix, we can discretely approximate the global functional Gðf Þ as follows: where {L i j } M j¼1 are the local matrices such as ðG i Þ T G i , ðW i Þ T W i , ðH i Þ T H i , and ðW i Þ T W i which are defined in Eqs. (9), (14), (21), and (28). P j ¼ 1 N ∑ N i¼1 S i L i j ðS i Þ T is the alignment matrix of the j-th local manifold learning method. The global embedding coordinates Y ¼ ½y 1 ;y 2 ;…;y N ∈ℝ d · N can be obtained by minimizing the functional Gð f Þ. Let y ¼ f ¼ ½f ðx 1 Þ;…;f ðx N Þ be a row vector of Y. It is not hard to show that the global embedding coordinates and the nonnegative weights c ¼ ½c 1 ;…;c M can be obtained by minimizing the following objective function: where the power parameter r > 1 is set to avoid the phenomenon that the solution to c is c j ¼ 1 corresponding to the minimum TrðYP j Y T Þ over different local methods and c k ¼ 0ðk≠jÞ otherwise, since our aim is to utilize the complementary geometric information from different manifold learning methods.
We propose to solve the objective function [Eq. (33)] by employing the alternating optimization [16] method, which iteratively updates Y and c in an alternating fashion. First, we fix c to update Y. The optimization problem in Eq. (33) is equivalent to: argmin Y TrðYPY T Þ s:t: YY T ¼ I where P ¼ ∑ M j¼1 c r j P j . When c is fixed, we can solve the optimization problem [Eq. (34)] and obtain the global optimal solution Y as the second to ðd þ 1Þ st smallest eigenvectors of the matrix P. Second, we fix Y to update c. While Y is fixed, we can minimize the objective function [Eq. (33)] analytically through utilizing a Lagrange multiplier to enforce the constraint that ∑ M j¼1 c j ¼ 1. And the global optimal c can be obtained as:

Experimental results
In this section, we experiment on both synthetic and real-world data sets to evaluate the performance of our method, named FLM. For LEM, LLE, HLLE, LTSA, and our Fusion of local manifolds (FLM) algorithms, we experiment on these data sets to obtain both visualization and quantitative evaluations. We utilize the global smoothness and co-directional consistence (GSCD) criteria [17] to quantitatively compare the embedding qualities of different algorithms: the smaller the value of GSCD, the higher the global smoothness, and the better the codirectional consistence. There are two adjustable parameters in our FLM method, that is, the tuning parameter r and the number of nearest neighbors k. FLM works well when the values of r and k are neither too small nor too large. The reason is that only one local method is chosen when r is too small, while the relative weights of different methods tend to be close to each other when it is too large. As a general recommendation, we suggest to work with r∈½2; 6 and k∈½0:7⌈logðNÞ⌉, 2⌈logðNÞ⌉.

Synthetic data sets
We first apply our FLM to the synthetic data sets that have been commonly used by other researchers: S-Curve, Swiss Hole, Punctured Sphere, and Toroidal Helix. The character of these data sets can be summarized as: general, non-convex, nonuniform, and noise, respectively. In each data set, we have total 1000 sample points, and the number of nearest neighbors is fixed to k ¼ 10 for all the algorithms. For the S-Curve and Swiss Hole, we empirically set r ¼ 2, and for the Punctured Sphere and Toroidal Helix data sets, we set r=3. Figures 2-5 show the embedding results of the above algorithms on the four synthetic data sets. Each manifold learning algorithm and the corresponding GSCD result are shown in the title of each subplot. We can evaluate the performances of these methods by comparing the coloring of the data points, the smoothness, and the shape of the projection coordinates with their original manifolds. Figures 2-5 reveal the following interesting observations.
1. On some particular data sets, the traditional local manifold learning methods perform well. For example, LEM works well on the Toroidal Helix; LLE works well on the Punctured Sphere; HLLE works well on the S-Curve and Swiss Hole; and LTSA performs well on the S-Curve, Swiss Hole, and Punctured Sphere.

2.
In general, our FLM performs the best on all the four data sets.
The above consequence is because only partial geometric information of the underlying manifold is learned by each traditional local manifold learning method, while the complementary geometric information learned from different manifold learning algorithms is respected by our FLM method.

Real-world data set
We next conduct experiments on the isometric feature mapping face (ISOFACE) data set [1], which contains 698 images of a 3-D human head. The ISOFACE data set is collected under different poses and lighting directions. The resolution of each image is 64 · 64. The intrinsic degrees of freedom are the horizontal rotation, vertical rotation, and lighting direction. The 2-D embedding results of different algorithms and the corresponding GSCD results are shown in Figure 6. In the embedding, we randomly mark about 8% points with red circles and attach their corresponding training images. In the experiment, we fix the number of nearest neighbors to k ¼ 12 for all the algorithms. We empirically set r in FLM as 4. Figure 6 reveals the following interesting observations. Figure 6b and c, the embedding results of LEM and LLE show that the orientations of the faces change smoothly from left to right along the horizontal direction, and the orientations of the faces change from down to up along the vertical direction. However, as we can see at the right-hand side of Figure 6b and c, the embedding results of both LEM and LLE come out to be severely compressed, and it is not obvious to survey the changes along the vertical direction.

2.
As we can observe from Figure 6d and e, the horizontal rotation and variations in the brightness of the faces can be well revealed by the embedding result of HLLE and LTSA.     3. As we can observe from Figure 6f, orientations of the faces change smoothly from left to right along the horizontal direction, while the orientations of the faces change from down to up, and the light of the faces varies from bright to dark simultaneously along the vertical direction. These results illustrate that our FLM method successfully discovers the underlying manifold structure of the data set.
Our FLM performs the best on the ISOFACE data set, since our method makes full use of the complementary geometric information learned from different manifold learning methods. The corresponding GSCD results further verify the above visualization results in a quantitative way.

Conclusions
In this chapter, we introduce an interesting method, named FLM, which assumes a systematic framework to estimate the local objects and align them to reveal a single global low-dimensional coordinate space. Within the framework, we can fuse together the geometric information learned from different local methods easily and effectively to better discover the underlying manifold structure. Experimental results on both the synthetic and real-world data sets show that the proposed method leads to satisfactory results.