Manifold Learning in Medical Imaging Manifold Learning in Medical Imaging

Manifold learning theory has seen a surge of interest in the modeling of large and exten- sive datasets in medical imaging since they capture the essence of data in a way that fundamentally outperforms linear methodologies, the purpose of which is to essentially describe things that are flat. This problematic is particularly relevant with medical imaging data, where linear techniques are frequently unsuitable for capturing variations in anatomical structures. In many cases, there is enough structure in the data (CT, MRI, ultrasound) so a lower dimensional object can describe the degrees of freedom, such as in a manifold structure. Still, complex, multivariate distributions tend to demonstrate highly variable structural topologies that are impossible to capture with a single manifold learn- ing algorithm. This chapter will present recent techniques developed in manifold theory for medical imaging analysis, to allow for statistical organ shape modeling, image seg- mentation and registration from the concept of navigation of manifolds, classification, as well as disease prediction models based on discriminant manifolds. We will present the theoretical basis of these works, with illustrative results on their applications from various organs and pathologies, including neurodegenerative diseases and spinal deformities. The proposed model provides a way to analyze longitudinal samples from a geodesic curve in manifold space, thus simplifying the mixed effects when studying group-average trajectories.


Introduction
Learning on large medical imaging datasets is an emerging discipline driven from the availability of vast amounts of raw data in many of today's biomedical studies. However, challenges such as unbalanced data distributions, complex multivariate data and highly variable structural topologies demonstrated by real-world samples makes it much more difficult to efficiently learn the associated representation. An important goal of scientific data analysis in medicine, particularly in neurosciences or oncology, is to understand the behavior of biological process or physiological/morphological alterations. This introduces the need to synthesize large amounts of multivariate data in a robust manner and raises the fundamental question of data reduction: how to discover meaningful representations from unstructured high-dimensional medical images.
Several approaches have attempted to understand how dimension reduction and regression establishes the relationship in subspaces and finally determine statistics on manifolds that optimally describe the relationships between the samples [1]. However, certain assumptions based on the representation of shapes and images using smooth manifolds are made in most cases, which frequently will not be adequate in the presence of medical imaging data and often perturbed by nuisance articulations, clutter or varying contrast.
High-dimensional classification methods have shown promise to measure subtle and spatially complex imaging patterns that have diagnostic value [2,3]. Defining statistics on a manifold is not a straightforward process when simple statistics cannot be directly applied to general manifolds [4]. But while Euclidean estimators have been used for vector spaces, none have been adapted for multimodal data lying in different spaces. Still, there has been interest in the characterization of data in a Riemann space [5,6]. Unfortunately, manifold-valued metrics based on the centrality theory or the geometric median [7] often lacks robustness to outliers.
A related topic lies in dimensionally reduced growth trajectories of various anatomical sites which have been investigated in neurodevelopment of newborns for example, based on geodesic shape regression to compute the diffeomorphisms with image time series of a population [8]. These regression models were also used to estimate spatiotemporal evolution of the cerebral cortex [9]. The concept of parallel transport curves in the tangent space from lowdimensional manifolds proposed by Schiratti et al. [10] was used to analyze shape morphology [11] and adapted for radiotherapy response [12]. Regression models were proposed for both cortical and subcortical structures, with 4D varifold-based learning framework with local topography shape morphing being proposed by Rekik et al. [13].
This chapter presents several manifold learning methodologies designed to address challenges encountered in medical imaging. In Section 2, we present an articulated shape inference model from nonlinear embeddings, expressing the global and local shape variations of the spine and vertebrae composing it, introduced in [14]. We then present in Section 3 a probabilistic model from discriminant manifolds to classify the neurodegenerative stage of Alzheimer's disease. Finally, a piecewise-geodesic transport curve in the tangent space from low-dimensional manifolds designed for the prediction of correction in spinal surgeries is shown in Section 4, introducing a time-warping function controlling the rate of shape evolution. We conclude this article in Section 5.

Shape inference through navigating manifolds
Statistical models of shape variability have been successful in addressing fundamental vision tasks such as segmentation and registration in medical imaging. However, the high dimensionality and complex nonlinear underlying structure unfortunately makes the commonly used linear statistics inapplicable for anatomical structures. Manifold learning approaches map high-dimensional observation data that are presumed to lie on a nonlinear manifold, onto a single global coordinate system of lower dimensionality.
Inferring a model from the underlying manifold is not a novel concept but far from being trivial. In this section, we model both global statistics of the articulated model and local shape variations of vertebrae based on local measures in manifold space. We describe a spine inference/segmentation method from CT and MR images, where the model representation is optimized through a Markov Random Field (MRF) graph, balancing prior distribution with image data.

Data representation
To perform global shape modeling of S, we convert A to an absolute representation A abs ¼ using recursive compositions. The transformations are expressed in the local coordinate system (LCS) of the lower vertebra. Center of transformation is the intersection of all three vertebral axes, following anteroposterior, cranial-caudal and leftright directions. Rigid transformations described here are the combination of a rotation matrix R, a translation t and scaling s. We formulate the rigid transformation T ¼ s; R; t f g of a triangular mesh model as y ¼ sRx þ t where x, y, t ∈ ℜ 3 .

Manifold embedding
For nonlinear embeddings, we rely on the absolute vector representation A abs as given previously. Let us now consider N articulated shape models expressed by the feature vectors A i abs , of dimensionality D. The aim is to create a low-dimensional manifold consisting of N points Y i , [15]. In such a framework, if an adequate number of data points is available, then the underlying manifold M is considered to be "well-sampled." Therefore, it can represent the underlying population structure. In the sub-cluster corresponding to a pathological population, each point of the training set and its neighbors would lie within a locally linear patch as illustrated in Figure 1.
The main limitation of embedding algorithms is the assumption of Euclidean metrics in the ambient space to evaluate similarity between sample points. Thus, a metric in the space of articulated structures is defined so that it accommodates for anatomical spine variability and adopts the intrinsic nature of the Riemannian manifold geometry allowing us to discern between articulated shape deformations in a topological invariant framework. For each point, the K closest neighbors are selected using a distortion metric which is particularly suited for geodesics. The metric d M A i abs ; A j abs estimates the distance of articulated models i, j where A i abs . The distance measure for absolute representations can therefore be expressed as a sum of articulation deviations While for the translation, the L 2 norm is chosen, geodesical distances are used between rotation neighborhoods. This is expressed as where the log map is used to map a point in the manifold to the tangent plane.
Afterwards, the manifold reconstruction weights are estimated by assuming the local geometry of the patches can be described by linear coefficients that permit the reconstruction of every model point from its neighbors. In order to determine the value of the weights, the reconstruction errors are measured using the following objective function: Thus, ε W ð Þ sums the squared distances between all data points and their corresponding reconstructed points. The weights W ij represent the importance of the j th data point to the reconstruction of the i th element. The algorithm maps each high-dimensional A i abs to a low-dimensional Y i . These internal coordinates are found with a cost function minimizing the reconstruction error: with M as a sparse and symmetric N Â N matrix enclosing the reconstruction weights W ij such The optimal embedding, up to a global rotation, is obtained from the bottom d þ 1 eigenvectors of M and helps to minimize the cost function Φ Y ð Þ as a simple eigenvalue problem. The d eigenvectors form the d embedding coordinates. The coordinates Y i can be translated by a constant displacement without affecting the overall cost Φ Y ð Þ. The eigenvector corresponding to the smallest eigenvalue corresponds to the mean value of the embedded data Y 0 ¼ y 1 ; …; y d È É , y i ¼ 0, ∀i. This can be discarded with P Y i ¼ 0 to obtain an embedding centered at the origin. Hence, a new ADM can be inferred in the embedded d-space as a low-dimensional point Y new by finding its optimal manifold coordinates y i .
To obtain the articulation vector for a new embedded point in the ambient space (image domain), one has to determine the representation in high-dimensional space based on its intrinsic coordinates. We first assume an explicit mapping f : M ! ℜ D from manifold space M to the ambient space ℜ D . The inverse mapping of Y i is then performed by estimating the relationship between ℜ D and M as a joint distribution, such there exists a smooth functional which belongs to a local neighborhood. Theoretically the manifold should follow the conditional expectation: . The Gaussian regression kernels G require the neighbors A j abs of j ∈ N i ð Þ to determine the bandwidths h, g so it includes all K data points (N i ð Þ representing the neighborhood of i). Plugging these estimates in Eq.(5), this gives: By assuming G is symmetric about the origin, we propose to integrate in the kernel regression estimator, the manifold-based distortion metric d M which is particularly suited for geodesic metrics and articulated diffeomorphisms. This generalizes the expectation such that the observations Y are defined in manifold space M: using the closest neighbors of point Y i in the manifold space. This constrains the regression to be valid for similar data points in its vicinity since locality around Y i preserves locality in A i abs .

Optimization on manifold
Once an appropriate modeling of spine shape variations is determined with a manifold, a successful inference between the image and manifold must be accomplished. We describe here how a new model is generated. We search the optimal embedded manifold point Y ¼ of the global spine model. Such a strategy offers an ideal compromise between the prior constraints, as well as the individual shape variations described by the weight vector Þin a localized sub-patch. The energy E of inferring the model S in the image I is a function of the set of displacement vectors Δ in the manifold space for global shape representation. This involves: (a) a data-related term expressing the image cost and (b) a global prior term measuring deformation between low-dimensional vectors with shape models. The third term represents (c) a higher-order term which is expressed by the reconstruction weights Ω for local vertebra modeling. The energy E can be expressed as the following combination of a global and local optimization: The global alignment of the model with the target image primarily drives the deformation of the model. The purpose is to estimate the set of articulations describing the global spine model by determining its optimal representation Y 0 in the embedded space. This is performed by obtaining the global representation using the mapping in (7) so that: . This allows to optimize the model in manifold space coordinates while retrieving the articulations in I . The global cost can be expressed as: The inverse transform allows to obtain A i abs þ D, with D as deformations in the image space. Since the transformations T i are implicitly modeled in the absolute representation A 0 abs , we can formally consider the singleton image-related term as a summation of costs associated with each L vertebra of the model: Þ minimizes the distance between mesh vertices of the inferred shape and gradient image I by a rigid transformation. Here, n i is the normal pointing outwards and ∇I v i ð Þ the image gradient at v i .
The prior constraint for the rigid alignment are pairwise potentials between neighboring models y i such that the difference in manifold coordinates is minimal with regards to a prior distribution of neighboring distances P: This term represents the smoothness term of the global cost function to ensure that the deformation δ i applied to point coordinates are regular, with V ij ¼ 0; 1 ð Þ a distance assigning function based on the distances to P.
One can integrate the global data and prior terms along with local shape terms parameterized as the higher-order cliques, by combining (9), (11): The optimization strategy of the resulting MRF (12) in the continuous domain is not a straightforward problem. The convexity of the solution domain is not guaranteed, while gradientdescent optimization approaches are prone to nonlinearity and local minimums. We seek to assign the optimal labels L Δ ¼ l 1 ; …; l d f gand L Ω ¼ l 1 ; …; l n f gwhich are associated to the quantized space Δ of displacements and local weight parameters Ω respectively. We consider that displacing the coordinates of point y 0 i by δ li is equivalent to assigning label l i to y 0 i . An incremental approach is adopted where in each iteration t we look for the set of labels that which is a temporal minimization problem. Then (12) can be rewritten as: We solve the minimization of the higher-order cliques in (13) by transforming them into quadratic functions [18]. We apply the FastPD method [19] which solves the problem by formulating the duality theory in linear programming.

Results
Manifold learning. The manifold was built from a database containing 711 scoliotic spines demonstrating several types of deformities. Each spine model in the database was obtained from biplanar radiographic stereo-reconstructions. It is modeled with 12 thoracic and 5 lumbar vertebrae (17 in total), represented by 6 landmarks on each vertebra (4 pedicle extremities and 2 endplate center points) which were manually identified by an expert on the radiographic images. The resulting manifold is shown in Figure 2.
Adaptation of the articulated model was done on two different data sets. The first consisted of volumetric CT scans (512 Â 512 Â 251, resolution: 0:8 Â 0:8 mm, thickness: 1 À 2 mm) of the lumbar and main thoracic regions obtained from 21 different patients acquired for operative planning purposes. The MR dataset comprised multi-parametric volumetric data (256 Â 256 Â 160, resolution: 1:3 Â 0:9 mm, thickness: 1 mm) of 8 patients acquired for diagnostic purposes. For this study, only the T1 sequence was selected for the experiments. All patients on both datasets (29 in total) had 12 thoracic and 5 lumbar vertebrae. Both CT and MR data were manually annotated with 3D landmarks by an expert in radiology, corresponding to left and right pedicle tips as well as midpoints of the vertebral body. Segmentation of the vertebrae from the CT and MR slices were also made by the same operator.
CT imaging experiments. We first evaluated the model accuracy in CT images by computing the correspondence of the inferred vertebral mesh models to the segmented target structures. As a preprocessing step, a rough thresholding was performed on the whole volume to filter out noise artifacts. The overall surface-to-surface comparison results between the inferred 3D vertebral models issued from the articulated model and from known segmentations were first calculated. The mean errors are 2:2 AE 1:5 mm (range: 0:6 À 5:4 mm) for thoracic vertebra and 2:8 AE 1:9 mm (range: 0:7 À 8:1 mm) for lumbar vertebra.

MR imaging experiments.
For the experiments involving the segmentation of 3D spine models from MR images, the surface-to-surface comparison showed encouraging results (thoracic: 2:9 AE 1:8 mm, lumbar: 3:0 AE 1:9 mm) based on differences to ground-truth. As in the previous experiments with CT imaging, ground-truth data was generated by manually segmenting the structures models which were validated by an expert in radiology. As difficult as the CT inference is, the MR problem represent an even greater challenge as the image resolution is more limited and interslice spacing is increased compared to CT. Modeling of the statistical properties of the shape variations and global pose becomes even more important in this case, as it relies heavily in the nonlinear distribution of the patient morphology.

Probabilistic modeling of discriminant nonlinear manifolds in the identification of Alzheimer's
Neurodegenerative pathologies, such as Alzheimer's disease (AD), are linked with morphological and metabolic alterations which can be assessed from medical imaging and biological data. Recent advances in machine learning have helped to improve classification and prognosis rates, but lack a probabilistic framework to measure uncertainty in the data. In this section, we present a method to identify progressive mild cognitive impairment (MCI) and predict their conversion to AD from MRI and positron emitting tomography (PET) images. We show a discriminative probabilistic manifold embedding where locally linear mappings transform data points in low-dimensional space to corresponding points in high-dimensional space. A discriminant adjacency matrix is constructed to maximize the separation between different clinical groups, including MCI converters and nonconverters, while minimizing the distance in latent variables belonging to the same class.

Probabilistic model for discriminant manifolds
Manifold learning algorithms are based on the premise that data are often of artificially high dimension and can be embedded in a lower dimensional space. However the presence of outliers and multiclass information can on the other hand affect the discrimination and/or generalization ability of the manifold. We propose to learn the optimal separation between four classes (1) normal controls, (2) nonconverter MCI patients, (3) converter MCI patients and (4) AD patients, by using a discriminant graph-embedding. Here, n labeled points Y ¼ y i ; l i À Á È É n i¼1 defined in R D are generated from the underlying manifold M, where l i denotes the label (NC, cMCI, nMCI or AD). For the labeled data, there exists a low-dimensional (latent) representation of the high-dimensional samples such that X ¼ x i ; l i ð Þ f g n i¼1 defined in R d . We assume here that the mapping M i ∈ R DÂd between high and low-dimensional spaces is locally linear, such that tangent spaces in local neighborhoods can be estimated with y j À y i and x j À x i , representing the pairwise differences between connected neighbors i, j. Therefore the relationship can be established as y j À y i ≈ M i x j À x i À Á .
In order to effectively discover the low-dimensional embedding, it is necessary to maintain the local structure of the data in the new embedding. The graph G ¼ V; W ð Þ is an undirected similarity graph, with a collection of nodes V connected by edges, and the symmetric matrix W with elements describing the relationships between the nodes. The diagonal matrix D and the Laplacian matrix L are defined as Using the theoretical framework from [20], we can determine a distribution of linear maps associated with the low-dimensional representation to describe the data likelihood for a specific model: This joint distribution can be separated into three prior terms: the linear maps, latent variables and the likelihood of the high dimensional points Y: We now define the discriminant similarity graphs establishing neighborhood relationships, as well define each of the three prior terms included in the joint distribution.
Within and between similarity graphs: In our work, the geometrical structure of M can be modeled by building a within-class similarity graph W w for feature vectors of same group and a between-class similarity graph W b , to separate features from all four classes. When constructing the discriminant locally linear latent variable embedding, elements are partitioned into W w and W b classes. The intrinsic graph G is first created by assigning edges only to samples of the same class (ex: nMCI). Each sample is therefore reconstructed only from feature vectors of the same clinical group. Local reconstruction coefficients are incorporated in the within-class similarity graph, such that W w is defined as: with N w containing neighbors of the same class. Conversely, W b depicts the statistical properties to be avoided in the inference process. Distances between samples from different clinical groups are computed as: with N b containing neighbors having different class labels from the ith sample. The objective is to transform points to a new manifold M of dimensionality d, i.e., y i ! x i , by mapping connected samples from the same group in W w as close as possible to the class cluster, while moving NC, nMCI, cMCI and AD samples of W b as far away from one another.

Model components:
The prior added on the latent variables X are located at the origin of the low-dimensional domain, while minimizing the Euclidean distance of neighboring points that are associated with the neighborhood of high-dimensional points and maximizing the distance between coordinates of different classes. In order to set the variables with an expected scale α and H representing the probability density function, the following log prior is defined: The prior added to the linear maps defines how the tangent planes described in low and high dimensional spaces are similar based on the Frobenius norm. This prior ensures smooth manifolds: Finally, approximation errors from the linear mapping M i between low and high-dimensional domains are penalized by including the following log likelihood: with Δ i; j ð Þ the difference in Euclidean distance between pairs of neighbors in high and lowdimensional space and γ the update parameter for the EM inference. Samples of y are drawn from a multivariate normal distribution.

Variational inference
The objective is to infer the low-dimensional coordinates and linear mapping function for the described model, as well as the intrinsic parameters of the model Φ ¼ α; γ ð Þ. This is achieved by maximizing the marginal likelihood of: By assuming the posterior r M; X ð Þcan be factored in separate terms r M ð Þ and r X ð Þ, a variational expectation maximization algorithm can be used to determine the model's parameters, which are initialized with Φ. The E-step updates the independent posteriors r X ð Þ and r M ð Þ, while the parameters of Φ are updated in the M-step by maximizing Eq. (21).
The discriminant latent variable model can then be used to perform the mapping of new image feature vectors to the manifold. The variational EM algorithm described in the previous section can be used to transform a set of new input points y q without changing the overall neighborhood graph structure, by finding the distribution of the local linear map y q and it is lowdimensional coordinate using the E-step explained above. Once the manifold representation x q is obtained, a cluster analysis finds the corresponding class in the manifold, yielding a prediction of the input feature vector y q .

Experiments
We used the Alzheimer's Disease Neuroimaging Initiative (ADNI) database with 1.5 or 3.0 T structural MR images (adni.loni.usc.edu) and FDG-PET images. For this study, 187 subjects with both MRI and PET images during a 24 month period were used to train the probabilistic manifold model, including 46 AD patients, 94 MCI patients, and 47 normal controls. During the follow-up period, 43 MCI subjects converted to AD and 56 remained stable. All groups are matched approximately by age (mean of 76:7 AE 5:4) and gender. Images were non-rigidly registered to a standard template, which was then segmented using FSL-FIRST automatic segmentation [21].
A 9-fold cross-validation was performed to assess the performance of the method. The optimal manifold dimensionality was set at d ¼ 8, when the trend of the nonlinear residual reconstruction error stabilized for the entire training set. We evaluated the classification performance of the proposed method for discriminating between cMCI and nMCI patients, by training the model with MRI, PET and with MRI + PET biomarkers from the ROIs illustrated in Figure 3. Figure 4 presents ROC curves obtained by the proposed and comparative methods such as SVM (nonlinear RBF kernel), LLE and LL-LVM [20]. The discriminative nature of the proposed framework clearly shows an improvement to standard learning approaches models which were trained using MRI only, PET only and combined multimodal features. It illustrates that increased accuracy (77.4%) can be achieved by combining MRI and PET features, showing the benefit of extracting complementary features from the dataset for prediction purposes. When comparing the performance of the proposed method to the other learning methods (SVM, LLE,

Spatiotemporal manifold prediction model for surgery prediction
In this final section, we present a statistical framework for predicting the surgical outcomes following spine surgery of adolescents with idiopathic scoliosis. A discriminant manifold is first constructed to maximize the separation between responsive and nonresponsive groups of patients. The model then uses subject-specific correction trajectories based on articulated transformations in order to map spine correction profiles to a group-average piecewise-geodesic path. Spine correction trajectories are described in a piecewise-geodesic fashion to account for varying times at follow-up exams, regressing the curve via a quadratic optimization process.
To predict the evolution of correction, a baseline reconstruction is projected onto the manifold, from which a spatiotemporal regression model is built from parallel transport curves inferred from neighboring exemplars ( Figure 5).

Discriminant embedding of spine models
We propose to embed a collection of nonresponsive (NR) and (2) responsive (R) patients to surgery which will offer a maximal separation between the classes, by using a discriminant graph-embedding. Here, n labeled points Y ¼ y i ; l i ; t i À Á È É n i¼1 defined in R D are embedded in the low-dimensional manifold M, where l i describes the label (NR or R) and t i defines the time of follow-up. We assume that for the sampled data, an underlying manifold of the highdimensional data exists such that X ¼ We rely on the assumption that a locally linear mapping M i ∈ R DÂd exists, where local neighborhoods are defined as tangent planes estimated with y j À y i and x j À x i , describing the paired distances between linked neighbors i, j. Hence, the relationship can be established as Because the discriminant manifold structure in R d requires to maintain the local structure of the underlying data, a undirected similarity graph G ¼ V; W ð Þis built, where each node V are connected to each other with edges that are weighted with the graph W. The overall structure of M is therefore defined with W w for feature vectors belonging to the same class and W b , which separate features from both classes. During the embedding of the discriminant locally linear latent manifold, data samples are divided between W w and W b .

Piecewise-geodesic spatiotemporal manifold
Once sample points x i are in manifold space, the objective is to regress a regular and smooth piecewise-geodesic curve γ : t 1 ; t N ½ that accurately fits the embedded data describing the spatiotemporal correction following surgery within a 2 year period. For each sample data x i , the K closest individuals demonstrating similar baseline features are identified from the embedded data, creating neighborhoods N x q À Á with measurements at different time points, thus creating a low-dimensional Riemannian manifold where data points x i, j , with i denoting a particular individual, j the time-point measurement and j ¼ 0 the preoperative model. By assuming the manifold domain is complete and piecewise-geodesic curves are defined for each time trajectories, time-labeled data can be regressed continuously in R D , thereby creating smooth curves in time intervals described by samples in R d .
However, due to the fact the representation of the continuous curve is a variational problem of infinite dimensional space, the implementation follows a discretization process which is derived from the procedure in [22], such that: This minimization process simplifies the problem to a quadratic optimization, solved with LU decomposition. The piecewise nature is represented by the term K d ∈ N x q À Á , defined as samples along γ. The first component of Eq. (22) is a penalty term to minimize the geodesic distance between samples x i, j and the regressed curve, where w i are weight variables based on sample distances. This helps regress a curve that will lie close to x i, j , shifted by x q in order to have the initial reconstructions co-registered. The second term represents the velocity of the curve (defined by v i , approximating _ γ t i ð Þ), minimizing the L 2 distance of the 1 st derivative of γ. By minimizing the value of the curve's first derivatives, this prohibits any discontinuities or rapid transitions of the curve's direction, and is modulated by α i . Finally, an acceleration penalty term (defined by a i ) focuses on the 2 nd derivative of γ with respect to t i by minimizing the L 2 norm. The acceleration is modulated by β i . Estimates for v i and a i (weighted by λ; μ È É , respectively), are generated using geometric finite differences. These estimates dictates the forward and backward step-size on the regressed curve, leading to directional vectors in M as shown in [22]. In order to minimize E γ ð Þ, a nonlinear conjugate gradient technique defined in the low-dimensional space R d is used, thus avoiding convergence and speed issues. The regressed curve γ is therefore defined for all time points, originating at t 0 . The curve creates a group average of spatiotemporal transformations based on individual correction trajectories.

Prediction of spine correction
Finally, to predict the evolution of spine correction from an unseen preoperative spine model, we use the geodesic curve γ : R D ! M modeling the spatiotemporal changes of the spine, where each point x ∈ M is associated to a speed vector v defined with a tangent plane on the manifold such that v ∈ T x M.
Based on Riemannian theory, an exponential mapping function at x with velocity v can be defined from the geodesics such that e M x v ð Þ. Using this concept, parallel transport curves defined in T x can help define a series of time-index vectors along γ as proposed by [10]. The collection of parallel transport curves allows to generate an average trajectory in ambient space R D , describing the spine changes due to the corrective forces of tethering. The general goal is to begin the process at the preoperative sample, and navigate the piecewise-geodesic curve describing correction evolution in time, where one can extract the appearance at any point (time) in R D using the exponential mapping. For implementation purposes, the parallel transport curve are constrained within a smooth tubular boundary perpendicular to the curve (from an ICA) to generate the spatiotemporal evolution in the coordinate system of the preoperative model.
Hence, given the manifold at time t 0 with v defined in the tangent plane and the regressed piecewise-geodesic curve γ, the parallel curve is obtained as: Therefore by repeating this mapping for manifold points seen as samples of individual progression trajectories along γ s ð Þ, an evolution model can be generated. Whenever a new sample is embedded, new samples points along γ s ð Þ, denoted as η v γ; Á ð Þ can be generated parallel to the regressed piecewise curve in M, capturing the spatiotemporal changes in correction.
A time warp function allowing s to vary along the geodesic curve is described as ϕ i t ð Þ ¼ θ i t À t 0 À τ i ð Þþt 0 . Here, we propose to incorporate a personalized acceleration factor based on the spine maturity and flexibility derived from the spine bending radiographs and Risser grade. A coefficient θ i ¼ C i Â R i describing the change in Cobb angle C i between poses, and modulated by the Risser grade R i . This coefficient regulates the rate of correction based on the K neighboring samples. Finally, to take under account the relative differences between the group-wise samples and the query model once mapped onto the regressed curve, a time-shift parameter τ i is incorporated in the warp function.
For spine correction evolution, displacement vectors v i are obtained by a PCA of the hyperplane crossing T xi M in manifold M [10]. Hence, for any query sample x q which represents the mapped preoperative 3D reconstruction (prior to surgery), the predicted model at time t k can be regressed from the piecewise-geodesic curve generated from embedded samples x in N x q À Á such that: which yields a predicted postoperative model y q, t k in high-dimensional space R D , and ε q, t k a zero-mean Gaussian distribution. The generated model offers a complete constellation of interconnected vertebral models composing the spine shape S, at first-erect (FE), 1 or 2year visits, including landmarks on vertebral endplates and pedicle extremities, which can be used to capture the local shape morphology with the correction process.

Experiments
The discriminant manifold was trained from a database of 438 3D spine reconstructions generated from biplanar images [23], originating from 131 patients demonstrating several types of deformities with immediate follow-up (FE), 1 and 2 year visits. Patients were recruited from a single center prospective study. Patients were divided in two groups, with the first group composed of 94 responsive patients showing a reduction in Cobb angle over or equal to 10 ∘ between the FE and follow-up visit. The second group was composed of 37 nonresponsive (NP) patients with a reduction of less than 10 ∘ . We evaluated the geometrical accuracy of the predictive manifold for 56 unseen surgical patients (mean age 12 AE 3, average main Cobb angle on the frontal plane at the first visit was 47 AE 10 ∘ ), with predictions at t ¼ 0 (FE), t ¼ 12 and t ¼ 24 months. For the predicted models, we evaluated the 3D root-mean-square difference of the vertebral landmarks generated, the Dice coefficients of the vertebral shapes and in the main Cobb angle. The results are shown in Table 1. Results were confronted to other techniques such as biomechanical simulations performed on each subject using finite element modeling with ex-vivo parameters [25], a locally linear latent variable model [20] and a deep auto-encoder network [24]. Results from the predicted geometrical models show the regressed spatiotemporal geodesic curve yields anatomically coherent structures, with accurate local vertebral morphology.

Discussion
Algorithms capable of extracting clinically relevant and meaningful descriptions from medical imaging datasets have become of widespread interest to theoreticians as well as practitioners in the medical field, accelerating the pace in recent years involving varied fields such as in machine learning, geometry, statistics and genomics to propose new insights for the analysis of imaging and biologic datasets. Towards this end, manifold learning has demonstrated a tremendous potential to learn the underlying representation of high-dimensional, complex imaging datasets.
We presented frameworks describing longitudinal, multimodal image features from neuroimaging data using a Bayesian model for discriminant nonlinear manifolds to predict the conversion of progressive MCI to Alzheimer's disease. This probabilistic method introduces classdependent latent variables which is based on the concept that local structure is transformed from manifold to the high-dimensional domain. This variational learning method can ultimately assess uncertainty within the manifold domain, which can lead to a better understanding of relationships between converters and nonconverters for patients with MCI.
Finally, a prediction method for the outcomes of spine surgery using geodesic parallel transport curves generated from probabilistic manifold models was presented. The mathematical models allow to describe patterns in a nonlinear and discriminant Riemannian framework by first distinguishing nonprogressive and progressive cases, followed by a prediction of structural evolution. The proposed model provides a way to analyze longitudinal samples from a geodesic curve in manifold space, thus simplifying the mixed effects when studying groupaverage trajectories.