Open access peer-reviewed chapter

Manifold Learning in Medical Imaging

Written By

Samuel Kadoury

Reviewed: 06 July 2018 Published: 05 November 2018

DOI: 10.5772/intechopen.79989

From the Edited Volume

Manifolds II - Theory and Applications

Edited by Paul Bracken

Chapter metrics overview

1,261 Chapter Downloads

View Full Metrics

Abstract

Manifold learning theory has seen a surge of interest in the modeling of large and extensive datasets in medical imaging since they capture the essence of data in a way that fundamentally outperforms linear methodologies, the purpose of which is to essentially describe things that are flat. This problematic is particularly relevant with medical imaging data, where linear techniques are frequently unsuitable for capturing variations in anatomical structures. In many cases, there is enough structure in the data (CT, MRI, ultrasound) so a lower dimensional object can describe the degrees of freedom, such as in a manifold structure. Still, complex, multivariate distributions tend to demonstrate highly variable structural topologies that are impossible to capture with a single manifold learning algorithm. This chapter will present recent techniques developed in manifold theory for medical imaging analysis, to allow for statistical organ shape modeling, image segmentation and registration from the concept of navigation of manifolds, classification, as well as disease prediction models based on discriminant manifolds. We will present the theoretical basis of these works, with illustrative results on their applications from various organs and pathologies, including neurodegenerative diseases and spinal deformities.

Keywords

  • manifold learning
  • medical imaging
  • discriminant manifolds
  • piecewise geodesic regression
  • spine deformities
  • neurodegenerative diseases
  • shape modeling

1. Introduction

Learning on large medical imaging datasets is an emerging discipline driven from the availability of vast amounts of raw data in many of today’s biomedical studies. However, challenges such as unbalanced data distributions, complex multivariate data and highly variable structural topologies demonstrated by real-world samples makes it much more difficult to efficiently learn the associated representation. An important goal of scientific data analysis in medicine, particularly in neurosciences or oncology, is to understand the behavior of biological process or physiological/morphological alterations. This introduces the need to synthesize large amounts of multivariate data in a robust manner and raises the fundamental question of data reduction: how to discover meaningful representations from unstructured high-dimensional medical images.

Several approaches have attempted to understand how dimension reduction and regression establishes the relationship in subspaces and finally determine statistics on manifolds that optimally describe the relationships between the samples [1]. However, certain assumptions based on the representation of shapes and images using smooth manifolds are made in most cases, which frequently will not be adequate in the presence of medical imaging data and often perturbed by nuisance articulations, clutter or varying contrast.

High-dimensional classification methods have shown promise to measure subtle and spatially complex imaging patterns that have diagnostic value [2, 3]. Defining statistics on a manifold is not a straightforward process when simple statistics cannot be directly applied to general manifolds [4]. But while Euclidean estimators have been used for vector spaces, none have been adapted for multimodal data lying in different spaces. Still, there has been interest in the characterization of data in a Riemann space [5, 6]. Unfortunately, manifold-valued metrics based on the centrality theory or the geometric median [7] often lacks robustness to outliers.

A related topic lies in dimensionally reduced growth trajectories of various anatomical sites which have been investigated in neurodevelopment of newborns for example, based on geodesic shape regression to compute the diffeomorphisms with image time series of a population [8]. These regression models were also used to estimate spatiotemporal evolution of the cerebral cortex [9]. The concept of parallel transport curves in the tangent space from low-dimensional manifolds proposed by Schiratti et al. [10] was used to analyze shape morphology [11] and adapted for radiotherapy response [12]. Regression models were proposed for both cortical and subcortical structures, with 4D varifold-based learning framework with local topography shape morphing being proposed by Rekik et al. [13].

This chapter presents several manifold learning methodologies designed to address challenges encountered in medical imaging. In Section 2, we present an articulated shape inference model from nonlinear embeddings, expressing the global and local shape variations of the spine and vertebrae composing it, introduced in [14]. We then present in Section 3 a probabilistic model from discriminant manifolds to classify the neurodegenerative stage of Alzheimer’s disease. Finally, a piecewise-geodesic transport curve in the tangent space from low-dimensional manifolds designed for the prediction of correction in spinal surgeries is shown in Section 4, introducing a time-warping function controlling the rate of shape evolution. We conclude this article in Section 5.

Advertisement

2. Shape inference through navigating manifolds

Statistical models of shape variability have been successful in addressing fundamental vision tasks such as segmentation and registration in medical imaging. However, the high dimensionality and complex nonlinear underlying structure unfortunately makes the commonly used linear statistics inapplicable for anatomical structures. Manifold learning approaches map high-dimensional observation data that are presumed to lie on a nonlinear manifold, onto a single global coordinate system of lower dimensionality.

Inferring a model from the underlying manifold is not a novel concept but far from being trivial. In this section, we model both global statistics of the articulated model and local shape variations of vertebrae based on local measures in manifold space. We describe a spine inference/segmentation method from CT and MR images, where the model representation is optimized through a Markov Random Field (MRF) graph, balancing prior distribution with image data.

2.1. Data representation

Our spine model S=s1sL consists of an interconnection of L vertebrae. For each vertebra si, we recover a triangular mesh with vertices vjij=1V, where the jth vertex corresponds to approximately the same location from one shape to another and V the number of vertices. Additionally, every si is annotated with landmarks on each model to rigidly register each object to its upper neighbor. Hence, an articulated deformable model (ADM) is represented by a vector of local intervertebral rigid transformations A=T1T2TL. To perform global shape modeling of S, we convert A to an absolute representation Aabs=T1T1T2T1T2TL using recursive compositions. The transformations are expressed in the local coordinate system (LCS) of the lower vertebra. Center of transformation is the intersection of all three vertebral axes, following anteroposterior, cranial-caudal and left-right directions. Rigid transformations described here are the combination of a rotation matrix R, a translation t and scaling s. We formulate the rigid transformation T=sRt of a triangular mesh model as y=sRx+t where x,y,t3.

2.2. Manifold embedding

For nonlinear embeddings, we rely on the absolute vector representation Aabs as given previously. Let us now consider N articulated shape models expressed by the feature vectors Aabsi, of dimensionality D. The aim is to create a low-dimensional manifold consisting of N points Yi, Yid, i1N where dD based on [15]. In such a framework, if an adequate number of data points is available, then the underlying manifold M is considered to be “well-sampled.” Therefore, it can represent the underlying population structure. In the sub-cluster corresponding to a pathological population, each point of the training set and its neighbors would lie within a locally linear patch as illustrated in Figure 1.

Figure 1.

Representation of intervertebral transformations in manifold space.

The main limitation of embedding algorithms is the assumption of Euclidean metrics in the ambient space to evaluate similarity between sample points. Thus, a metric in the space of articulated structures is defined so that it accommodates for anatomical spine variability and adopts the intrinsic nature of the Riemannian manifold geometry allowing us to discern between articulated shape deformations in a topological invariant framework. For each point, the K closest neighbors are selected using a distortion metric which is particularly suited for geodesics. The metric dMAabsiAabsj estimates the distance of articulated models i,j where Aabsi. The distance measure for absolute representations can therefore be expressed as a sum of articulation deviations

dMAabsiAabsj=k=1LdMTkiTkj=k=1Ltkitkj+k=1LdGRkiRkj.E1

While for the translation, the L2 norm is chosen, geodesical distances are used between rotation neighborhoods. This is expressed as dGRkiRkj=logRki1RkjF where the log map is used to map a point in the manifold to the tangent plane.

Afterwards, the manifold reconstruction weights are estimated by assuming the local geometry of the patches can be described by linear coefficients that permit the reconstruction of every model point from its neighbors. In order to determine the value of the weights, the reconstruction errors are measured using the following objective function:

εW=i=1NAabsij=1KWijAabsj2E2
subjecttoWij=0ifAabsinotneighborAabsjjWij=1foreveryi.E3

Thus, εW sums the squared distances between all data points and their corresponding reconstructed points. The weights Wij represent the importance of the jth data point to the reconstruction of the ith element.

The algorithm maps each high-dimensional Aabsi to a low-dimensional Yi. These internal coordinates are found with a cost function minimizing the reconstruction error:

ΦY=i=1NYij=1KWijYj2=i=1Nj=1NMijYiTYjE4

with M as a sparse and symmetric N×N matrix enclosing the reconstruction weights Wij such that M=IWTIW, and Y spanning the Yi’s. The optimal embedding, up to a global rotation, is obtained from the bottom d+1 eigenvectors of M and helps to minimize the cost function ΦY as a simple eigenvalue problem. The d eigenvectors form the d embedding coordinates. The coordinates Yi can be translated by a constant displacement without affecting the overall cost ΦY. The eigenvector corresponding to the smallest eigenvalue corresponds to the mean value of the embedded data Y0=y1yd,yi=0,i. This can be discarded with Yi=0 to obtain an embedding centered at the origin. Hence, a new ADM can be inferred in the embedded d-space as a low-dimensional point Ynew by finding its optimal manifold coordinates yi.

To obtain the articulation vector for a new embedded point in the ambient space (image domain), one has to determine the representation in high-dimensional space based on its intrinsic coordinates. We first assume an explicit mapping f:MD from manifold space M to the ambient space D. The inverse mapping of Yi is then performed by estimating the relationship between D and M as a joint distribution, such there exists a smooth functional which belongs to a local neighborhood. Theoretically the manifold should follow the conditional expectation:

fYiEAabsiMAi=Yi=AipYiAipMAiYidDE5

which captures the overall trend of the data in D-space. Here, both pMAiYi (marginal density of MAi) and pYiAi (joint density) are unknown. Based on the Nadaraya-Watson kernel regression [16], we replace densities by kernel functions as pMAiYi=1KjNiGhYiYj and pYiAi=1KjNiGhYiYjGgAiAj [17]. The Gaussian regression kernels G require the neighbors Aabsj of jNi to determine the bandwidths h,g so it includes all K data points (Ni representing the neighborhood of i). Plugging these estimates in Eq.(5), this gives:

fNWYi=Ai1KjNiGhYiYjGgAiAj1KjNiGhYiYjdD.E6

By assuming G is symmetric about the origin, we propose to integrate in the kernel regression estimator, the manifold-based distortion metric dM which is particularly suited for geodesic metrics and articulated diffeomorphisms. This generalizes the expectation such that the observations Y are defined in manifold space M:

fNWYi=argminAabsijNiGYiYjdMAabsiAabsjjNiGYiYjE7

which integrates the distance metric dMAabsiAabsj defined in Eq. (1) and updates fNWYi using the closest neighbors of point Yi in the manifold space. This constrains the regression to be valid for similar data points in its vicinity since locality around Yi preserves locality in Aabsi.

2.3. Optimization on manifold

Once an appropriate modeling of spine shape variations is determined with a manifold, a successful inference between the image and manifold must be accomplished. We describe here how a new model is generated. We search the optimal embedded manifold point Y=y1yd of the global spine model. Such a strategy offers an ideal compromise between the prior constraints, as well as the individual shape variations described by the weight vector W=w1wn in a localized sub-patch. The energy E of inferring the model S in the image I is a function of the set of displacement vectors Δ in the manifold space for global shape representation. This involves: (a) a data-related term expressing the image cost and (b) a global prior term measuring deformation between low-dimensional vectors with shape models. The third term represents (c) a higher-order term which is expressed by the reconstruction weights Ω for local vertebra modeling. The energy E can be expressed as the following combination of a global and local optimization:

ES0IΔΩ=VY0+ΔI+αVNΔ+βVHΔΩ.E8

The global alignment of the model with the target image primarily drives the deformation of the model. The purpose is to estimate the set of articulations describing the global spine model by determining its optimal representation Y0 in the embedded space. This is performed by obtaining the global representation using the mapping in (7) so that: fNWYi+Δ=fNWy1+δ1yd+δd. This allows to optimize the model in manifold space coordinates while retrieving the articulations in I. The global cost can be expressed as:

VY0+ΔI=VfNWy1+δ1yd+δdI).E9

The inverse transform allows to obtain Aabsi+D, with D as deformations in the image space. Since the transformations Ti are implicitly modeled in the absolute representation Aabs0, we can formally consider the singleton image-related term as a summation of costs associated with each L vertebra of the model:

VAabs0+DI=i=1LVisiTi0+diIE10

where VisI=visniTviIvi minimizes the distance between mesh vertices of the inferred shape and gradient image I by a rigid transformation. Here, ni is the normal pointing outwards and Ivi the image gradient at vi.

The prior constraint for the rigid alignment are pairwise potentials between neighboring models yi such that the difference in manifold coordinates is minimal with regards to a prior distribution of neighboring distances P:

αVNΔ=αiGjNiVijyi0+δiyj0+δjP.E11

This term represents the smoothness term of the global cost function to ensure that the deformation δi applied to point coordinates are regular, with Vij=01 a distance assigning function based on the distances to P.

One can integrate the global data and prior terms along with local shape terms parameterized as the higher-order cliques, by combining (9), (11):

ES0IΔΩ=VfNWy1+δ1yd+δdI)+αiGjNiVijyi0+δiyj0+δj+βcCVcwc0+ωc.E12

The optimization strategy of the resulting MRF (12) in the continuous domain is not a straightforward problem. The convexity of the solution domain is not guaranteed, while gradient-descent optimization approaches are prone to nonlinearity and local minimums. We seek to assign the optimal labels LΔ=l1ld and LΩ=l1ln which are associated to the quantized space Δ of displacements and local weight parameters Ω respectively. We consider that displacing the coordinates of point yi0 by δli is equivalent to assigning label li to yi0. An incremental approach is adopted where in each iteration t we look for the set of labels that improves the current solution s.t. yit=yi0+tδlit, which is a temporal minimization problem. Then (12) can be rewritten as:

EtLΔLΩ=VfNWy1t1l1Δydt1ldΔI)+αiGjNiVijyit1yjt1liΔljΔ+βcCVcwct1lcΩ.E13

We solve the minimization of the higher-order cliques in (13) by transforming them into quadratic functions [18]. We apply the FastPD method [19] which solves the problem by formulating the duality theory in linear programming.

2.4. Results

Manifold learning. The manifold was built from a database containing 711 scoliotic spines demonstrating several types of deformities. Each spine model in the database was obtained from biplanar radiographic stereo-reconstructions. It is modeled with 12 thoracic and 5 lumbar vertebrae (17 in total), represented by 6 landmarks on each vertebra (4 pedicle extremities and 2 endplate center points) which were manually identified by an expert on the radiographic images. The resulting manifold is shown in Figure 2.

Figure 2.

Low-dimensional manifold embedding of the spine dataset comprising 711 models exhibiting various types of deformities. The sub-domain was used to estimate both the global shape pose costs and individual shape instances based on local neighborhoods.

Adaptation of the articulated model was done on two different data sets. The first consisted of volumetric CT scans (512×512×251, resolution: 0.8×0.8 mm, thickness: 12 mm) of the lumbar and main thoracic regions obtained from 21 different patients acquired for operative planning purposes. The MR dataset comprised multi-parametric volumetric data (256×256×160, resolution: 1.3×0.9 mm, thickness: 1 mm) of 8 patients acquired for diagnostic purposes. For this study, only the T1 sequence was selected for the experiments. All patients on both datasets (29 in total) had 12 thoracic and 5 lumbar vertebrae. Both CT and MR data were manually annotated with 3D landmarks by an expert in radiology, corresponding to left and right pedicle tips as well as midpoints of the vertebral body. Segmentation of the vertebrae from the CT and MR slices were also made by the same operator.

CT imaging experiments. We first evaluated the model accuracy in CT images by computing the correspondence of the inferred vertebral mesh models to the segmented target structures. As a preprocessing step, a rough thresholding was performed on the whole volume to filter out noise artifacts. The overall surface-to-surface comparison results between the inferred 3D vertebral models issued from the articulated model and from known segmentations were first calculated. The mean errors are 2.2±1.5 mm (range: 0.65.4 mm) for thoracic vertebra and 2.8±1.9 mm (range: 0.78.1 mm) for lumbar vertebra.

MR imaging experiments. For the experiments involving the segmentation of 3D spine models from MR images, the surface-to-surface comparison showed encouraging results (thoracic: 2.9±1.8 mm, lumbar: 3.0±1.9 mm) based on differences to ground-truth. As in the previous experiments with CT imaging, ground-truth data was generated by manually segmenting the structures models which were validated by an expert in radiology. As difficult as the CT inference is, the MR problem represent an even greater challenge as the image resolution is more limited and interslice spacing is increased compared to CT. Modeling of the statistical properties of the shape variations and global pose becomes even more important in this case, as it relies heavily in the nonlinear distribution of the patient morphology.

Advertisement

3. Probabilistic modeling of discriminant nonlinear manifolds in the identification of Alzheimer’s

Neurodegenerative pathologies, such as Alzheimer’s disease (AD), are linked with morphological and metabolic alterations which can be assessed from medical imaging and biological data. Recent advances in machine learning have helped to improve classification and prognosis rates, but lack a probabilistic framework to measure uncertainty in the data. In this section, we present a method to identify progressive mild cognitive impairment (MCI) and predict their conversion to AD from MRI and positron emitting tomography (PET) images. We show a discriminative probabilistic manifold embedding where locally linear mappings transform data points in low-dimensional space to corresponding points in high-dimensional space. A discriminant adjacency matrix is constructed to maximize the separation between different clinical groups, including MCI converters and nonconverters, while minimizing the distance in latent variables belonging to the same class.

3.1. Probabilistic model for discriminant manifolds

Manifold learning algorithms are based on the premise that data are often of artificially high dimension and can be embedded in a lower dimensional space. However the presence of outliers and multiclass information can on the other hand affect the discrimination and/or generalization ability of the manifold. We propose to learn the optimal separation between four classes (1) normal controls, (2) nonconverter MCI patients, (3) converter MCI patients and (4) AD patients, by using a discriminant graph-embedding. Here, n labeled points Y=yilii=1n defined in RD are generated from the underlying manifold M, where li denotes the label (NC, cMCI, nMCI or AD). For the labeled data, there exists a low-dimensional (latent) representation of the high-dimensional samples such that X=xilii=1n defined in Rd. We assume here that the mapping MiRD×d between high and low-dimensional spaces is locally linear, such that tangent spaces in local neighborhoods can be estimated with yjyi and xjxi, representing the pairwise differences between connected neighbors i,j. Therefore the relationship can be established as yjyiMixjxi.

In order to effectively discover the low-dimensional embedding, it is necessary to maintain the local structure of the data in the new embedding. The graph G=VW is an undirected similarity graph, with a collection of nodes V connected by edges, and the symmetric matrix W with elements describing the relationships between the nodes. The diagonal matrix D and the Laplacian matrix L are defined as L=DW, with Dii=jiWiji.

Using the theoretical framework from [20], we can determine a distribution of linear maps associated with the low-dimensional representation to describe the data likelihood for a specific model:

logpYG=logpYMXGdxdME14

This joint distribution can be separated into three prior terms: the linear maps, latent variables and the likelihood of the high dimensional points Y:

pYMXG=pYMXGpMGpXGE15

We now define the discriminant similarity graphs establishing neighborhood relationships, as well define each of the three prior terms included in the joint distribution.

Within and between similarity graphs: In our work, the geometrical structure of M can be modeled by building a within-class similarity graph Ww for feature vectors of same group and a between-class similarity graph Wb, to separate features from all four classes. When constructing the discriminant locally linear latent variable embedding, elements are partitioned into Ww and Wb classes. The intrinsic graph G is first created by assigning edges only to samples of the same class (ex: nMCI). Each sample is therefore reconstructed only from feature vectors of the same clinical group. Local reconstruction coefficients are incorporated in the within-class similarity graph, such that Ww is defined as:

Wwi,j=1ifyiNwyjoryjNwyi0,otherwise.E16

with Nw containing neighbors of the same class. Conversely, Wb depicts the statistical properties to be avoided in the inference process. Distances between samples from different clinical groups are computed as:

Wbi,j=1ifyiNbyjoryjNbyi0,otherwiseE17

with Nb containing neighbors having different class labels from the ith sample. The objective is to transform points to a new manifold M of dimensionality d, i.e., yixi, by mapping connected samples from the same group in Ww as close as possible to the class cluster, while moving NC, nMCI, cMCI and AD samples of Wb as far away from one another.

Model components: The prior added on the latent variables X are located at the origin of the low-dimensional domain, while minimizing the Euclidean distance of neighboring points that are associated with the neighborhood of high-dimensional points and maximizing the distance between coordinates of different classes. In order to set the variables with an expected scale α and H representing the probability density function, the following log prior is defined:

logpXWα=12i=1nαxi+j=1nWwi,jyiyj2j=1nWbi,jyiyj2logHXE18

The prior added to the linear maps defines how the tangent planes described in low and high dimensional spaces are similar based on the Frobenius norm. This prior ensures smooth manifolds:

logpMW=12i=1nxiF2i=1nj=1nWwi,jWbi,jMiMjF2logHME19

Finally, approximation errors from the linear mapping Mi between low and high-dimensional domains are penalized by including the following log likelihood:

logpYXWγ=i=1nyi212i=1nj=1nWwi,jΔijTγIΔij+12i=1nj=1nWbi,jΔijTγIΔijlogHyE20

with Δij the difference in Euclidean distance between pairs of neighbors in high and low-dimensional space and γ the update parameter for the EM inference. Samples of y are drawn from a multivariate normal distribution.

3.2. Variational inference

The objective is to infer the low-dimensional coordinates and linear mapping function for the described model, as well as the intrinsic parameters of the model Φ=αγ. This is achieved by maximizing the marginal likelihood of:

logpYWΦ=ρMXlogpYMXWΦρMXdxdM.E21

By assuming the posterior ρMX can be factored in separate terms ρM and ρX, a variational expectation maximization algorithm can be used to determine the model’s parameters, which are initialized with Φ. The E-step updates the independent posteriors ρX and ρM, while the parameters of Φ are updated in the M-step by maximizing Eq. (21).

The discriminant latent variable model can then be used to perform the mapping of new image feature vectors to the manifold. The variational EM algorithm described in the previous section can be used to transform a set of new input points yq without changing the overall neighborhood graph structure, by finding the distribution of the local linear map yq and it is low-dimensional coordinate using the E-step explained above. Once the manifold representation xq is obtained, a cluster analysis finds the corresponding class in the manifold, yielding a prediction of the input feature vector yq.

3.3. Experiments

We used the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database with 1.5 or 3.0 T structural MR images (adni.loni.usc.edu) and FDG-PET images. For this study, 187 subjects with both MRI and PET images during a 24 month period were used to train the probabilistic manifold model, including 46 AD patients, 94 MCI patients, and 47 normal controls. During the follow-up period, 43 MCI subjects converted to AD and 56 remained stable. All groups are matched approximately by age (mean of 76.7±5.4) and gender. Images were non-rigidly registered to a standard template, which was then segmented using FSL-FIRST automatic segmentation [21].

A 9-fold cross-validation was performed to assess the performance of the method. The optimal manifold dimensionality was set at d=8, when the trend of the nonlinear residual reconstruction error stabilized for the entire training set. We evaluated the classification performance of the proposed method for discriminating between cMCI and nMCI patients, by training the model with MRI, PET and with MRI + PET biomarkers from the ROIs illustrated in Figure 3. Figure 4 presents ROC curves obtained by the proposed and comparative methods such as SVM (nonlinear RBF kernel), LLE and LL-LVM [20]. The discriminative nature of the proposed framework clearly shows an improvement to standard learning approaches models which were trained using MRI only, PET only and combined multimodal features. It illustrates that increased accuracy (77.4%) can be achieved by combining MRI and PET features, showing the benefit of extracting complementary features from the dataset for prediction purposes. When comparing the performance of the proposed method to the other learning methods (SVM, LLE, LL-LVM), the probabilistic model integrating similarity graphs shows a statistically significant improvement p<0.01 to all three approaches based on paired t-test.

Figure 3.

Selected FSL segmented brain regions for feature selection on (left) MRI and (right) PET images.

Figure 4.

ROC curves comparing the SVM, LLE and LL-LVM with the proposed method for cMCI/nMCI prediction using MRI, PET and multimodality data.

Advertisement

4. Spatiotemporal manifold prediction model for surgery prediction

In this final section, we present a statistical framework for predicting the surgical outcomes following spine surgery of adolescents with idiopathic scoliosis. A discriminant manifold is first constructed to maximize the separation between responsive and nonresponsive groups of patients. The model then uses subject-specific correction trajectories based on articulated transformations in order to map spine correction profiles to a group-average piecewise-geodesic path. Spine correction trajectories are described in a piecewise-geodesic fashion to account for varying times at follow-up exams, regressing the curve via a quadratic optimization process. To predict the evolution of correction, a baseline reconstruction is projected onto the manifold, from which a spatiotemporal regression model is built from parallel transport curves inferred from neighboring exemplars (Figure 5).

Figure 5.

Proposed prediction framework for spine surgery outcomes. In the training phase, a dataset of spine models are embedded in a spatiotemporal manifold M, into responsive (R) or nonresponsive (NR) groups. During testing, an unseen baseline 3D spine reconstruction yq is projected on M using fNW based on Nadaraya-Watson kernels. The closest samples to the projected point x are selected to regress the spatiotemporal curve γ used for predicting the correction due with surgery.

4.1. Discriminant embedding of spine models

We propose to embed a collection of nonresponsive (NR) and (2) responsive (R) patients to surgery which will offer a maximal separation between the classes, by using a discriminant graph-embedding. Here, n labeled points Y=yilitii=1n defined in RD are embedded in the low-dimensional manifold M, where li describes the label (NR or R) and ti defines the time of follow-up. We assume that for the sampled data, an underlying manifold of the high-dimensional data exists such that X=xilitii=1n defined in Rd. We rely on the assumption that a locally linear mapping MiRD×d exists, where local neighborhoods are defined as tangent planes estimated with yjyi and xjxi, describing the paired distances between linked neighbors i,j. Hence, the relationship can be established as yjyiMixjxi.

Because the discriminant manifold structure in Rd requires to maintain the local structure of the underlying data, a undirected similarity graph G=VW is built, where each node V are connected to each other with edges that are weighted with the graph W. The overall structure of M is therefore defined with Ww for feature vectors belonging to the same class and Wb, which separate features from both classes. During the embedding of the discriminant locally linear latent manifold, data samples are divided between Ww and Wb.

4.2. Piecewise-geodesic spatiotemporal manifold

Once sample points xi are in manifold space, the objective is to regress a regular and smooth piecewise-geodesic curve γ:t1tN that accurately fits the embedded data describing the spatiotemporal correction following surgery within a 2 year period. For each sample data xi, the K closest individuals demonstrating similar baseline features are identified from the embedded data, creating neighborhoods Nxq with measurements at different time points, thus creating a low-dimensional Riemannian manifold where data points xi,j, with i denoting a particular individual, j the time-point measurement and j=0 the preoperative model. By assuming the manifold domain is complete and piecewise-geodesic curves are defined for each time trajectories, time-labeled data can be regressed continuously in RD, thereby creating smooth curves in time intervals described by samples in Rd.

However, due to the fact the representation of the continuous curve is a variational problem of infinite dimensional space, the implementation follows a discretization process which is derived from the procedure in [22], such that:

Eγ=1Kdi=1Kdj=0tNwiγti,jxi,jxi,0xq2+λ2i=1Kdαivi2+μ2i=1Kdβiai2.E22

This minimization process simplifies the problem to a quadratic optimization, solved with LU decomposition. The piecewise nature is represented by the term KdNxq, defined as samples along γ. The first component of Eq.(22) is a penalty term to minimize the geodesic distance between samples xi,j and the regressed curve, where wi are weight variables based on sample distances. This helps regress a curve that will lie close to xi,j, shifted by xq in order to have the initial reconstructions co-registered. The second term represents the velocity of the curve (defined by vi, approximating γ̇ti), minimizing the L2 distance of the 1st derivative of γ. By minimizing the value of the curve’s first derivatives, this prohibits any discontinuities or rapid transitions of the curve’s direction, and is modulated by αi. Finally, an acceleration penalty term (defined by ai) focuses on the 2nd derivative of γ with respect to ti by minimizing the L2 norm. The acceleration is modulated by βi. Estimates for vi and ai (weighted by λμ, respectively), are generated using geometric finite differences. These estimates dictates the forward and backward step-size on the regressed curve, leading to directional vectors in M as shown in [22]. In order to minimize Eγ, a nonlinear conjugate gradient technique defined in the low-dimensional space Rd is used, thus avoiding convergence and speed issues. The regressed curve γ is therefore defined for all time points, originating at t0. The curve creates a group average of spatiotemporal transformations based on individual correction trajectories.

4.3. Prediction of spine correction

Finally, to predict the evolution of spine correction from an unseen preoperative spine model, we use the geodesic curve γ:RDM modeling the spatiotemporal changes of the spine, where each point xM is associated to a speed vector v defined with a tangent plane on the manifold such that vTxM.

Based on Riemannian theory, an exponential mapping function at x with velocity v can be defined from the geodesics such that exMv. Using this concept, parallel transport curves defined in Tx can help define a series of time-index vectors along γ as proposed by [10]. The collection of parallel transport curves allows to generate an average trajectory in ambient space RD, describing the spine changes due to the corrective forces of tethering. The general goal is to begin the process at the preoperative sample, and navigate the piecewise-geodesic curve describing correction evolution in time, where one can extract the appearance at any point (time) in RD using the exponential mapping. For implementation purposes, the parallel transport curve are constrained within a smooth tubular boundary perpendicular to the curve (from an ICA) to generate the spatiotemporal evolution in the coordinate system of the preoperative model.

Hence, given the manifold at time t0 with v defined in the tangent plane and the regressed piecewise-geodesic curve γ, the parallel curve is obtained as:

ηvγs=eγsMxγ,t0,sv,sRd.E23

Therefore by repeating this mapping for manifold points seen as samples of individual progression trajectories along γs, an evolution model can be generated. Whenever a new sample is embedded, new samples points along γs, denoted as ηvγ can be generated parallel to the regressed piecewise curve in M, capturing the spatiotemporal changes in correction.

A time warp function allowing s to vary along the geodesic curve is described as ϕit=θitt0τi+t0. Here, we propose to incorporate a personalized acceleration factor based on the spine maturity and flexibility derived from the spine bending radiographs and Risser grade. A coefficient θi=Ci×Ri describing the change in Cobb angle Ci between poses, and modulated by the Risser grade Ri. This coefficient regulates the rate of correction based on the K neighboring samples. Finally, to take under account the relative differences between the group-wise samples and the query model once mapped onto the regressed curve, a time-shift parameter τi is incorporated in the warp function.

For spine correction evolution, displacement vectors vi are obtained by a PCA of the hyperplane crossing TxiM in manifold M [10]. Hence, for any query sample xq which represents the mapped preoperative 3D reconstruction (prior to surgery), the predicted model at time tk can be regressed from the piecewise-geodesic curve generated from embedded samples x in Nxq such that:

yq,tk=ηvqγϕitk+εq,tkE24

which yields a predicted postoperative model yq,tk in high-dimensional space RD, and εq,tk a zero-mean Gaussian distribution. The generated model offers a complete constellation of interconnected vertebral models composing the spine shape S, at first-erect (FE), 1 or 2-year visits, including landmarks on vertebral endplates and pedicle extremities, which can be used to capture the local shape morphology with the correction process.

4.4. Experiments

The discriminant manifold was trained from a database of 438 3D spine reconstructions generated from biplanar images [23], originating from 131 patients demonstrating several types of deformities with immediate follow-up (FE), 1 and 2 year visits. Patients were recruited from a single center prospective study. Patients were divided in two groups, with the first group composed of 94 responsive patients showing a reduction in Cobb angle over or equal to 10 between the FE and follow-up visit. The second group was composed of 37 nonresponsive (NP) patients with a reduction of less than 10. We evaluated the geometrical accuracy of the predictive manifold for 56 unseen surgical patients (mean age 12±3, average main Cobb angle on the frontal plane at the first visit was 47±10), with predictions at t=0 (FE), t=12 and t=24 months. For the predicted models, we evaluated the 3D root-mean-square difference of the vertebral landmarks generated, the Dice coefficients of the vertebral shapes and in the main Cobb angle. The results are shown in Table 1. Results were confronted to other techniques such as biomechanical simulations performed on each subject using finite element modeling with ex-vivo parameters [25], a locally linear latent variable model [20] and a deep auto-encoder network [24]. Results from the predicted geometrical models show the regressed spatiotemporal geodesic curve yields anatomically coherent structures, with accurate local vertebral morphology.

FE visit1-year visit2-year visit
3D RMSDiceCobb3D RMSDiceCobb3D RMSDiceCobb
Biomec. sim3.3 ± 1.185 ± 3.42.8 ± 0.83.6 ± 1.284 ± 3.63.2 ± 0.94.1 ± 2.382 ± 3.93.6 ± 1.0
LL-LVM [20]3.6 ± 1.483 ± 4.03.8 ± 1.54.7 ± 3.379 ± 4.45.5 ± 2.66.6 ± 4.471 ± 5.97.0 ± 3.9
Deep AE [24]4.1 ± 1.580 ± 4.45.1 ± 2.75.0 ± 1.977 ± 4.95.8 ± 3.06.3 ± 4.672 ± 5.76.6 ± 4.2
Proposed2.4 ± 0.892 ± 2.71.8 ± 0.52.9 ± 0.990 ± 2.82.0 ± 0.73.2 ± 1.387 ± 3.12.1 ± 0.6

Table 1.

3D RMS errors (mm), dice (%) and cobb angles (o) for the proposed method, and compared with biomechanical simulations, locally linear latent variable models (LL-LVM) and deep auto-encoders (AE).

Predictions are evaluated at FE, 1 and 2-years.

Advertisement

5. Discussion

Algorithms capable of extracting clinically relevant and meaningful descriptions from medical imaging datasets have become of widespread interest to theoreticians as well as practitioners in the medical field, accelerating the pace in recent years involving varied fields such as in machine learning, geometry, statistics and genomics to propose new insights for the analysis of imaging and biologic datasets. Towards this end, manifold learning has demonstrated a tremendous potential to learn the underlying representation of high-dimensional, complex imaging datasets.

We presented frameworks describing longitudinal, multimodal image features from neuroimaging data using a Bayesian model for discriminant nonlinear manifolds to predict the conversion of progressive MCI to Alzheimer’s disease. This probabilistic method introduces class-dependent latent variables which is based on the concept that local structure is transformed from manifold to the high-dimensional domain. This variational learning method can ultimately assess uncertainty within the manifold domain, which can lead to a better understanding of relationships between converters and nonconverters for patients with MCI.

Finally, a prediction method for the outcomes of spine surgery using geodesic parallel transport curves generated from probabilistic manifold models was presented. The mathematical models allow to describe patterns in a nonlinear and discriminant Riemannian framework by first distinguishing nonprogressive and progressive cases, followed by a prediction of structural evolution. The proposed model provides a way to analyze longitudinal samples from a geodesic curve in manifold space, thus simplifying the mixed effects when studying group-average trajectories.

References

  1. 1. Yang Y, Dunson DB, et al. Bayesian manifold regression. The Annals of Statistics. 2016;44(2):876-905
  2. 2. Davatzikos C, Resnick SM, Wu X, Parmpi P, Clark CM. Individual patient diagnosis of AD and FTD via high-dimensional pattern classification of MRI. NeuroImage. 2008;41(4):1220-1227
  3. 3. Li S, Shi F, Pu F, Li X, Jiang T, Xie S, Wang Y. Hippocampal shape analysis of Alzheimer disease based on machine learning methods. American Journal of Neuroradiology. 2007;28(7):1339-1345
  4. 4. Beg MF, Miller MI, Trouvé A, Younes L. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International Journal of Computer Vision. 2005;61(2):139-157
  5. 5. Fletcher PT, Lu C, Joshi S. Statistics of shape via principal geodesic analysis on lie groups. In: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on. Vol. 1. IEEE; 2003. pp. I-I
  6. 6. Pennec X. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision. 2006;25(1):127
  7. 7. Fletcher PT, Venkatasubramanian S, Joshi S. The geometric median on Riemannian manifolds with application to robust atlas estimation. NeuroImage. 2009;45(1):S143-S152
  8. 8. Singh N, Hinkle J, Joshi S, Fletcher PT. A hierarchical geodesic model for diffeomorphic longitudinal shape analysis. In: International Conference on Information Processing in Medical Imaging. Springer; 2013. pp. 560-571
  9. 9. Fishbaugh J, Prastawa M, Gerig G, Durrleman S. Geodesic regression of image and shape data for improved modeling of 4D trajectories. In: 2014 International Symposium on Biomedical Imaging. IEEE; 2014. pp. 385-388
  10. 10. Schiratti JB, Allassonniere S, Colliot O, Durrleman S. Learning spatiotemporal trajectories from manifold-valued longitudinal data. In: Advances in Neural Information Processing Systems. 2015:2404-2412
  11. 11. Kadoury S, Mandel W, Roy-Beaudry, Nault ML, Parent S. 3-D morphology prediction of progressive spinal deformities from probabilistic modeling of discriminant manifolds. IEEE Transactions on Medical Imaging. 2017;36(5):1194-1204
  12. 12. Chevallier J, Oudard S, Allassonnière S. Learning spatiotemporal piecewise-geodesic trajectories from longitudinal manifold-valued data. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA; 2017
  13. 13. Rekik I, Li G, Lin W, Shen D. Predicting infant cortical surface development using a 4D varifold-based learning framework and local topography-based shape morphing. Medical Image Analysis. 2016;28:1-12
  14. 14. Kadoury S, Labelle H, Paragios N. Spine segmentation in medical images using manifold embeddings and higher-order MRFs. IEEE Transactions on Medical Imaging. 2013;32:1227-1238
  15. 15. Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290:2323-2326
  16. 16. Nadaraya EA. On estimating regression. Theory of Probability and its Applications. 1964;10:186-190
  17. 17. Davis B, Fletcher P, Bullitt E, Joshi S. Population shape regression from random design data. In: Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007. IEEE. 2007;1:1-8
  18. 18. Rother C, Kohli P, Feng W, Jia J. Minimizing sparse higher order energy functions of discrete variables. In: Conference on Computer Vision and Pattern Recognition; 2009. pp. 1382-1389
  19. 19. Komodakis N, Tziritas G, Paragios N. Performance vs computational efficiency for optimizing single and dynamic MRFs: Setting the state of the art with primal dual strategies. Computer Vision and Image Understanding. 2008;112(1):14-29
  20. 20. Park M, Jitkrittum W, Qamar A, Szabó Z, Buesing L, Sahani M. Bayesian manifold learning: The locally linear latent variable model (LL-LVM). In: Advances in Neural Information Processing Systems. 2015:154-162
  21. 21. Patenaude B, Smith SM, Kennedy DN, Jenkinson M. A Bayesian model of shape and appearance for subcortical brain segmentation. NeuroImage. 2011;56(3):907-922
  22. 22. Boumal N, Absil PA. A discrete regression method on manifolds and its application to data on SO (n). IFAC Proceedings Volumes. 2011;44(1):2284-2289
  23. 23. Humbert L, de Guise J, Aubert B, Godbout B, Skalli W. 3D reconstruction of the spine from biplanar X-rays using parametric models based on transversal and longitudinal inferences. Medical Engineering & Physics. 2009;31(6):681-687
  24. 24. Thong W, Parent S, Wu J, Aubin CE, Labelle H, Kadoury S. Three-dimensional morphology study of surgical adolescent idiopathic scoliosis patient from encoded geometric models. European Spine Journal. 2016;25(10):3104-3113
  25. 25. Cobetto N, Parent S, Aubin CE. 3D correction over 2 years with anterior vertebral body growth modulation: A finite element analysis of screw positioning, cable tensioning and postop functional activities. Clinical Biomechanics. 2018;51:26-33

Written By

Samuel Kadoury

Reviewed: 06 July 2018 Published: 05 November 2018