3D RMS errors (mm), dice (%) and cobb angles (

## Abstract

Manifold learning theory has seen a surge of interest in the modeling of large and extensive datasets in medical imaging since they capture the essence of data in a way that fundamentally outperforms linear methodologies, the purpose of which is to essentially describe things that are flat. This problematic is particularly relevant with medical imaging data, where linear techniques are frequently unsuitable for capturing variations in anatomical structures. In many cases, there is enough structure in the data (CT, MRI, ultrasound) so a lower dimensional object can describe the degrees of freedom, such as in a manifold structure. Still, complex, multivariate distributions tend to demonstrate highly variable structural topologies that are impossible to capture with a single manifold learning algorithm. This chapter will present recent techniques developed in manifold theory for medical imaging analysis, to allow for statistical organ shape modeling, image segmentation and registration from the concept of navigation of manifolds, classification, as well as disease prediction models based on discriminant manifolds. We will present the theoretical basis of these works, with illustrative results on their applications from various organs and pathologies, including neurodegenerative diseases and spinal deformities.

### Keywords

- manifold learning
- medical imaging
- discriminant manifolds
- piecewise geodesic regression
- spine deformities
- neurodegenerative diseases
- shape modeling

## 1. Introduction

Learning on large medical imaging datasets is an emerging discipline driven from the availability of vast amounts of raw data in many of today’s biomedical studies. However, challenges such as unbalanced data distributions, complex multivariate data and highly variable structural topologies demonstrated by real-world samples makes it much more difficult to efficiently learn the associated representation. An important goal of scientific data analysis in medicine, particularly in neurosciences or oncology, is to understand the behavior of biological process or physiological/morphological alterations. This introduces the need to synthesize large amounts of multivariate data in a robust manner and raises the fundamental question of data reduction: how to discover meaningful representations from unstructured high-dimensional medical images.

Several approaches have attempted to understand how dimension reduction and regression establishes the relationship in subspaces and finally determine statistics on manifolds that optimally describe the relationships between the samples [1]. However, certain assumptions based on the representation of shapes and images using smooth manifolds are made in most cases, which frequently will not be adequate in the presence of medical imaging data and often perturbed by nuisance articulations, clutter or varying contrast.

High-dimensional classification methods have shown promise to measure subtle and spatially complex imaging patterns that have diagnostic value [2, 3]. Defining statistics on a manifold is not a straightforward process when simple statistics cannot be directly applied to general manifolds [4]. But while Euclidean estimators have been used for vector spaces, none have been adapted for multimodal data lying in different spaces. Still, there has been interest in the characterization of data in a Riemann space [5, 6]. Unfortunately, manifold-valued metrics based on the centrality theory or the geometric median [7] often lacks robustness to outliers.

A related topic lies in dimensionally reduced growth trajectories of various anatomical sites which have been investigated in neurodevelopment of newborns for example, based on geodesic shape regression to compute the diffeomorphisms with image time series of a population [8]. These regression models were also used to estimate spatiotemporal evolution of the cerebral cortex [9]. The concept of parallel transport curves in the tangent space from low-dimensional manifolds proposed by Schiratti et al. [10] was used to analyze shape morphology [11] and adapted for radiotherapy response [12]. Regression models were proposed for both cortical and subcortical structures, with 4D varifold-based learning framework with local topography shape morphing being proposed by Rekik et al. [13].

This chapter presents several manifold learning methodologies designed to address challenges encountered in medical imaging. In Section 2, we present an articulated shape inference model from nonlinear embeddings, expressing the global and local shape variations of the spine and vertebrae composing it, introduced in [14]. We then present in Section 3 a probabilistic model from discriminant manifolds to classify the neurodegenerative stage of Alzheimer’s disease. Finally, a piecewise-geodesic transport curve in the tangent space from low-dimensional manifolds designed for the prediction of correction in spinal surgeries is shown in Section 4, introducing a time-warping function controlling the rate of shape evolution. We conclude this article in Section 5.

## 2. Shape inference through navigating manifolds

Statistical models of shape variability have been successful in addressing fundamental vision tasks such as segmentation and registration in medical imaging. However, the high dimensionality and complex nonlinear underlying structure unfortunately makes the commonly used linear statistics inapplicable for anatomical structures. Manifold learning approaches map high-dimensional observation data that are presumed to lie on a nonlinear manifold, onto a single global coordinate system of lower dimensionality.

Inferring a model from the underlying manifold is not a novel concept but far from being trivial. In this section, we model both global statistics of the articulated model and local shape variations of vertebrae based on local measures in manifold space. We describe a spine inference/segmentation method from CT and MR images, where the model representation is optimized through a Markov Random Field (MRF) graph, balancing prior distribution with image data.

### 2.1. Data representation

Our spine model

### 2.2. Manifold embedding

For nonlinear embeddings, we rely on the absolute vector representation

The main limitation of embedding algorithms is the assumption of Euclidean metrics in the ambient space to evaluate similarity between sample points. Thus, a metric in the space of articulated structures is defined so that it accommodates for anatomical spine variability and adopts the intrinsic nature of the Riemannian manifold geometry allowing us to discern between articulated shape deformations in a topological invariant framework. For each point, the

While for the translation, the

Afterwards, the manifold reconstruction weights are estimated by assuming the local geometry of the patches can be described by linear coefficients that permit the reconstruction of every model point from its neighbors. In order to determine the value of the weights, the reconstruction errors are measured using the following objective function:

Thus,

The algorithm maps each high-dimensional

with

To obtain the articulation vector for a new embedded point in the ambient space (image domain), one has to determine the representation in high-dimensional space based on its intrinsic coordinates. We first assume an explicit mapping

which captures the overall trend of the data in * Nadaraya-Watson*kernel regression [16], we replace densities by kernel functions as

By assuming

which integrates the distance metric

### 2.3. Optimization on manifold

Once an appropriate modeling of spine shape variations is determined with a manifold, a successful inference between the image and manifold must be accomplished. We describe here how a new model is generated. We search the optimal embedded manifold point

The global alignment of the model with the target image primarily drives the deformation of the model. The purpose is to estimate the set of articulations describing the global spine model by determining its optimal representation

The inverse transform allows to obtain

where

The prior constraint for the rigid alignment are pairwise potentials between neighboring models

This term represents the smoothness term of the global cost function to ensure that the deformation

One can integrate the global data and prior terms along with local shape terms parameterized as the higher-order cliques, by combining (9), (11):

The optimization strategy of the resulting MRF (12) in the continuous domain is not a straightforward problem. The convexity of the solution domain is not guaranteed, while gradient-descent optimization approaches are prone to nonlinearity and local minimums. We seek to assign the optimal labels

We solve the minimization of the higher-order cliques in (13) by transforming them into quadratic functions [18]. We apply the FastPD method [19] which solves the problem by formulating the duality theory in linear programming.

### 2.4. Results

** Manifold learning**. The manifold was built from a database containing

Adaptation of the articulated model was done on two different data sets. The first consisted of volumetric CT scans (

** CT imaging experiments**. We first evaluated the model accuracy in CT images by computing the correspondence of the inferred vertebral mesh models to the segmented target structures. As a preprocessing step, a rough thresholding was performed on the whole volume to filter out noise artifacts. The overall surface-to-surface comparison results between the inferred 3D vertebral models issued from the articulated model and from known segmentations were first calculated. The mean errors are

** MR imaging experiments.**For the experiments involving the segmentation of 3D spine models from MR images, the surface-to-surface comparison showed encouraging results (thoracic:

## 3. Probabilistic modeling of discriminant nonlinear manifolds in the identification of Alzheimer’s

Neurodegenerative pathologies, such as Alzheimer’s disease (AD), are linked with morphological and metabolic alterations which can be assessed from medical imaging and biological data. Recent advances in machine learning have helped to improve classification and prognosis rates, but lack a probabilistic framework to measure uncertainty in the data. In this section, we present a method to identify progressive mild cognitive impairment (MCI) and predict their conversion to AD from MRI and positron emitting tomography (PET) images. We show a discriminative probabilistic manifold embedding where locally linear mappings transform data points in low-dimensional space to corresponding points in high-dimensional space. A discriminant adjacency matrix is constructed to maximize the separation between different clinical groups, including MCI converters and nonconverters, while minimizing the distance in latent variables belonging to the same class.

### 3.1. Probabilistic model for discriminant manifolds

Manifold learning algorithms are based on the premise that data are often of artificially high dimension and can be embedded in a lower dimensional space. However the presence of outliers and multiclass information can on the other hand affect the discrimination and/or generalization ability of the manifold. We propose to learn the optimal separation between four classes (1) normal controls, (2) nonconverter MCI patients, (3) converter MCI patients and (4) AD patients, by using a discriminant graph-embedding. Here,

In order to effectively discover the low-dimensional embedding, it is necessary to maintain the local structure of the data in the new embedding. The graph

Using the theoretical framework from [20], we can determine a distribution of linear maps associated with the low-dimensional representation to describe the data likelihood for a specific model:

This joint distribution can be separated into three prior terms: the linear maps, latent variables and the likelihood of the high dimensional points

We now define the discriminant similarity graphs establishing neighborhood relationships, as well define each of the three prior terms included in the joint distribution.

** Within and between similarity graphs**: In our work, the geometrical structure of

with

with

** Model components**: The prior added on the latent variables

The prior added to the linear maps defines how the tangent planes described in low and high dimensional spaces are similar based on the Frobenius norm. This prior ensures smooth manifolds:

Finally, approximation errors from the linear mapping

with

### 3.2. Variational inference

The objective is to infer the low-dimensional coordinates and linear mapping function for the described model, as well as the intrinsic parameters of the model

By assuming the posterior

The discriminant latent variable model can then be used to perform the mapping of new image feature vectors to the manifold. The variational EM algorithm described in the previous section can be used to transform a set of new input points

### 3.3. Experiments

We used the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database with 1.5 or 3.0 T structural MR images (adni.loni.usc.edu) and FDG-PET images. For this study, 187 subjects with both MRI and PET images during a 24 month period were used to train the probabilistic manifold model, including 46 AD patients, 94 MCI patients, and 47 normal controls. During the follow-up period, 43 MCI subjects converted to AD and 56 remained stable. All groups are matched approximately by age (mean of

A 9-fold cross-validation was performed to assess the performance of the method. The optimal manifold dimensionality was set at

## 4. Spatiotemporal manifold prediction model for surgery prediction

In this final section, we present a statistical framework for predicting the surgical outcomes following spine surgery of adolescents with idiopathic scoliosis. A discriminant manifold is first constructed to maximize the separation between responsive and nonresponsive groups of patients. The model then uses subject-specific correction trajectories based on articulated transformations in order to map spine correction profiles to a group-average piecewise-geodesic path. Spine correction trajectories are described in a piecewise-geodesic fashion to account for varying times at follow-up exams, regressing the curve via a quadratic optimization process. To predict the evolution of correction, a baseline reconstruction is projected onto the manifold, from which a spatiotemporal regression model is built from parallel transport curves inferred from neighboring exemplars (Figure 5).

### 4.1. Discriminant embedding of spine models

We propose to embed a collection of nonresponsive (NR) and (2) responsive (R) patients to surgery which will offer a maximal separation between the classes, by using a discriminant graph-embedding. Here,

Because the discriminant manifold structure in

### 4.2. Piecewise-geodesic spatiotemporal manifold

Once sample points

However, due to the fact the representation of the continuous curve is a variational problem of infinite dimensional space, the implementation follows a discretization process which is derived from the procedure in [22], such that:

This minimization process simplifies the problem to a quadratic optimization, solved with LU decomposition. The piecewise nature is represented by the term

### 4.3. Prediction of spine correction

Finally, to predict the evolution of spine correction from an unseen preoperative spine model, we use the geodesic curve

Based on Riemannian theory, an exponential mapping function at

Hence, given the manifold at time

Therefore by repeating this mapping for manifold points seen as samples of individual progression trajectories along

A time warp function allowing

For spine correction evolution, displacement vectors

which yields a predicted postoperative model

### 4.4. Experiments

The discriminant manifold was trained from a database of

FE visit | 1-year visit | 2-year visit | |||||||
---|---|---|---|---|---|---|---|---|---|

3D RMS | Dice | Cobb | 3D RMS | Dice | Cobb | 3D RMS | Dice | Cobb | |

Biomec. sim | 3.3 | 85 | 2.8 | 3.6 | 84 | 3.2 | 4.1 | 82 | 3.6 |

LL-LVM [20] | 3.6 | 83 | 3.8 | 4.7 | 79 | 5.5 | 6.6 | 71 | 7.0 |

Deep AE [24] | 4.1 | 80 | 5.1 | 5.0 | 77 | 5.8 | 6.3 | 72 | 6.6 |

Proposed | 2.4 | 92 | 1.8 | 2.9 | 90 | 2.0 | 3.2 | 87 | 2.1 |

## 5. Discussion

Algorithms capable of extracting clinically relevant and meaningful descriptions from medical imaging datasets have become of widespread interest to theoreticians as well as practitioners in the medical field, accelerating the pace in recent years involving varied fields such as in machine learning, geometry, statistics and genomics to propose new insights for the analysis of imaging and biologic datasets. Towards this end, manifold learning has demonstrated a tremendous potential to learn the underlying representation of high-dimensional, complex imaging datasets.

We presented frameworks describing longitudinal, multimodal image features from neuroimaging data using a Bayesian model for discriminant nonlinear manifolds to predict the conversion of progressive MCI to Alzheimer’s disease. This probabilistic method introduces class-dependent latent variables which is based on the concept that local structure is transformed from manifold to the high-dimensional domain. This variational learning method can ultimately assess uncertainty within the manifold domain, which can lead to a better understanding of relationships between converters and nonconverters for patients with MCI.

Finally, a prediction method for the outcomes of spine surgery using geodesic parallel transport curves generated from probabilistic manifold models was presented. The mathematical models allow to describe patterns in a nonlinear and discriminant Riemannian framework by first distinguishing nonprogressive and progressive cases, followed by a prediction of structural evolution. The proposed model provides a way to analyze longitudinal samples from a geodesic curve in manifold space, thus simplifying the mixed effects when studying group-average trajectories.