Open access peer-reviewed chapter

Manifold Learning in Medical Imaging

By Samuel Kadoury

Submitted: April 30th 2018Reviewed: July 6th 2018Published: November 5th 2018

DOI: 10.5772/intechopen.79989

Downloaded: 238

Abstract

Manifold learning theory has seen a surge of interest in the modeling of large and extensive datasets in medical imaging since they capture the essence of data in a way that fundamentally outperforms linear methodologies, the purpose of which is to essentially describe things that are flat. This problematic is particularly relevant with medical imaging data, where linear techniques are frequently unsuitable for capturing variations in anatomical structures. In many cases, there is enough structure in the data (CT, MRI, ultrasound) so a lower dimensional object can describe the degrees of freedom, such as in a manifold structure. Still, complex, multivariate distributions tend to demonstrate highly variable structural topologies that are impossible to capture with a single manifold learning algorithm. This chapter will present recent techniques developed in manifold theory for medical imaging analysis, to allow for statistical organ shape modeling, image segmentation and registration from the concept of navigation of manifolds, classification, as well as disease prediction models based on discriminant manifolds. We will present the theoretical basis of these works, with illustrative results on their applications from various organs and pathologies, including neurodegenerative diseases and spinal deformities.

Keywords

  • manifold learning
  • medical imaging
  • discriminant manifolds
  • piecewise geodesic regression
  • spine deformities
  • neurodegenerative diseases
  • shape modeling

1. Introduction

Learning on large medical imaging datasets is an emerging discipline driven from the availability of vast amounts of raw data in many of today’s biomedical studies. However, challenges such as unbalanced data distributions, complex multivariate data and highly variable structural topologies demonstrated by real-world samples makes it much more difficult to efficiently learn the associated representation. An important goal of scientific data analysis in medicine, particularly in neurosciences or oncology, is to understand the behavior of biological process or physiological/morphological alterations. This introduces the need to synthesize large amounts of multivariate data in a robust manner and raises the fundamental question of data reduction: how to discover meaningful representations from unstructured high-dimensional medical images.

Several approaches have attempted to understand how dimension reduction and regression establishes the relationship in subspaces and finally determine statistics on manifolds that optimally describe the relationships between the samples [1]. However, certain assumptions based on the representation of shapes and images using smooth manifolds are made in most cases, which frequently will not be adequate in the presence of medical imaging data and often perturbed by nuisance articulations, clutter or varying contrast.

High-dimensional classification methods have shown promise to measure subtle and spatially complex imaging patterns that have diagnostic value [2, 3]. Defining statistics on a manifold is not a straightforward process when simple statistics cannot be directly applied to general manifolds [4]. But while Euclidean estimators have been used for vector spaces, none have been adapted for multimodal data lying in different spaces. Still, there has been interest in the characterization of data in a Riemann space [5, 6]. Unfortunately, manifold-valued metrics based on the centrality theory or the geometric median [7] often lacks robustness to outliers.

A related topic lies in dimensionally reduced growth trajectories of various anatomical sites which have been investigated in neurodevelopment of newborns for example, based on geodesic shape regression to compute the diffeomorphisms with image time series of a population [8]. These regression models were also used to estimate spatiotemporal evolution of the cerebral cortex [9]. The concept of parallel transport curves in the tangent space from low-dimensional manifolds proposed by Schiratti et al. [10] was used to analyze shape morphology [11] and adapted for radiotherapy response [12]. Regression models were proposed for both cortical and subcortical structures, with 4D varifold-based learning framework with local topography shape morphing being proposed by Rekik et al. [13].

This chapter presents several manifold learning methodologies designed to address challenges encountered in medical imaging. In Section 2, we present an articulated shape inference model from nonlinear embeddings, expressing the global and local shape variations of the spine and vertebrae composing it, introduced in [14]. We then present in Section 3 a probabilistic model from discriminant manifolds to classify the neurodegenerative stage of Alzheimer’s disease. Finally, a piecewise-geodesic transport curve in the tangent space from low-dimensional manifolds designed for the prediction of correction in spinal surgeries is shown in Section 4, introducing a time-warping function controlling the rate of shape evolution. We conclude this article in Section 5.

2. Shape inference through navigating manifolds

Statistical models of shape variability have been successful in addressing fundamental vision tasks such as segmentation and registration in medical imaging. However, the high dimensionality and complex nonlinear underlying structure unfortunately makes the commonly used linear statistics inapplicable for anatomical structures. Manifold learning approaches map high-dimensional observation data that are presumed to lie on a nonlinear manifold, onto a single global coordinate system of lower dimensionality.

Inferring a model from the underlying manifold is not a novel concept but far from being trivial. In this section, we model both global statistics of the articulated model and local shape variations of vertebrae based on local measures in manifold space. We describe a spine inference/segmentation method from CT and MR images, where the model representation is optimized through a Markov Random Field (MRF) graph, balancing prior distribution with image data.

2.1. Data representation

Our spine model S=s1sLconsists of an interconnection of Lvertebrae. For each vertebra si, we recover a triangular mesh with vertices vjij=1V, where the jthvertex corresponds to approximately the same location from one shape to another and Vthe number of vertices. Additionally, every siis annotated with landmarks on each model to rigidly register each object to its upper neighbor. Hence, an articulated deformable model (ADM) is represented by a vector of local intervertebral rigid transformations A=T1T2TL. To perform global shape modeling of S, we convert Ato an absolute representation Aabs=T1T1T2T1T2TLusing recursive compositions. The transformations are expressed in the local coordinate system (LCS) of the lower vertebra. Center of transformation is the intersection of all three vertebral axes, following anteroposterior, cranial-caudal and left-right directions. Rigid transformations described here are the combination of a rotation matrix R, a translation tand scaling s. We formulate the rigid transformation T=sRtof a triangular mesh model as y=sRx+twhere x,y,t3.

2.2. Manifold embedding

For nonlinear embeddings, we rely on the absolute vector representation Aabsas given previously. Let us now consider Narticulated shape models expressed by the feature vectors Aabsi, of dimensionality D. The aim is to create a low-dimensional manifold consisting of Npoints Yi, Yid, i1Nwhere dDbased on [15]. In such a framework, if an adequate number of data points is available, then the underlying manifold Mis considered to be “well-sampled.” Therefore, it can represent the underlying population structure. In the sub-cluster corresponding to a pathological population, each point of the training set and its neighbors would lie within a locally linear patch as illustrated in Figure 1.

Figure 1.

Representation of intervertebral transformations in manifold space.

The main limitation of embedding algorithms is the assumption of Euclidean metrics in the ambient space to evaluate similarity between sample points. Thus, a metric in the space of articulated structures is defined so that it accommodates for anatomical spine variability and adopts the intrinsic nature of the Riemannian manifold geometry allowing us to discern between articulated shape deformations in a topological invariant framework. For each point, the Kclosest neighbors are selected using a distortion metric which is particularly suited for geodesics. The metric dMAabsiAabsjestimates the distance of articulated models i,jwhere Aabsi. The distance measure for absolute representations can therefore be expressed as a sum of articulation deviations

dMAabsiAabsj=k=1LdMTkiTkj=k=1Ltkitkj+k=1LdGRkiRkj.E1

While for the translation, the L2norm is chosen, geodesical distances are used between rotation neighborhoods. This is expressed as dGRkiRkj=logRki1RkjFwhere the log map is used to map a point in the manifold to the tangent plane.

Afterwards, the manifold reconstruction weights are estimated by assuming the local geometry of the patches can be described by linear coefficients that permit the reconstruction of every model point from its neighbors. In order to determine the value of the weights, the reconstruction errors are measured using the following objective function:

εW=i=1NAabsij=1KWijAabsj2E2
subjecttoWij=0ifAabsinotneighborAabsjjWij=1foreveryi.E3

Thus, εWsums the squared distances between all data points and their corresponding reconstructed points. The weights Wijrepresent the importance of the jthdata point to the reconstruction of the ithelement.

The algorithm maps each high-dimensional Aabsito a low-dimensional Yi. These internal coordinates are found with a cost function minimizing the reconstruction error:

ΦY=i=1NYij=1KWijYj2=i=1Nj=1NMijYiTYjE4

with Mas a sparse and symmetric N×Nmatrix enclosing the reconstruction weights Wijsuch that M=IWTIW, and Yspanning the Yi’s. The optimal embedding, up to a global rotation, is obtained from the bottom d+1eigenvectors of Mand helps to minimize the cost function ΦYas a simple eigenvalue problem. The deigenvectors form the dembedding coordinates. The coordinates Yican be translated by a constant displacement without affecting the overall cost ΦY. The eigenvector corresponding to the smallest eigenvalue corresponds to the mean value of the embedded data Y0=y1yd,yi=0,i. This can be discarded with Yi=0to obtain an embedding centered at the origin. Hence, a new ADM can be inferred in the embedded d-space as a low-dimensional point Ynewby finding its optimal manifold coordinates yi.

To obtain the articulation vector for a new embedded point in the ambient space (image domain), one has to determine the representation in high-dimensional space based on its intrinsic coordinates. We first assume an explicit mapping f:MDfrom manifold space Mto the ambient space D. The inverse mapping of Yiis then performed by estimating the relationship between Dand Mas a joint distribution, such there exists a smooth functional which belongs to a local neighborhood. Theoretically the manifold should follow the conditional expectation:

fYiEAabsiMAi=Yi=AipYiAipMAiYidDE5

which captures the overall trend of the data in D-space. Here, both pMAiYi(marginal density of MAi) and pYiAi(joint density) are unknown. Based on the Nadaraya-Watson kernel regression [16], we replace densities by kernel functions as pMAiYi=1KjNiGhYiYjand pYiAi=1KjNiGhYiYjGgAiAj[17]. The Gaussian regression kernels Grequire the neighbors Aabsjof jNito determine the bandwidths h,gso it includes all Kdata points (Nirepresenting the neighborhood of i). Plugging these estimates in Eq.(5), this gives:

fNWYi=Ai1KjNiGhYiYjGgAiAj1KjNiGhYiYjdD.E6

By assuming Gis symmetric about the origin, we propose to integrate in the kernel regression estimator, the manifold-based distortion metric dMwhich is particularly suited for geodesic metrics and articulated diffeomorphisms. This generalizes the expectation such that the observations Yare defined in manifold space M:

fNWYi=argminAabsijNiGYiYjdMAabsiAabsjjNiGYiYjE7

which integrates the distance metric dMAabsiAabsjdefined in Eq. (1) and updates fNWYiusing the closest neighbors of point Yiin the manifold space. This constrains the regression to be valid for similar data points in its vicinity since locality around Yipreserves locality in Aabsi.

2.3. Optimization on manifold

Once an appropriate modeling of spine shape variations is determined with a manifold, a successful inference between the image and manifold must be accomplished. We describe here how a new model is generated. We search the optimal embedded manifold point Y=y1ydof the global spine model. Such a strategy offers an ideal compromise between the prior constraints, as well as the individual shape variations described by the weight vector W=w1wnin a localized sub-patch. The energy Eof inferring the model Sin the image Iis a function of the set of displacement vectors Δin the manifold space for global shape representation. This involves: (a) a data-related term expressing the image cost and (b) a global prior term measuring deformation between low-dimensional vectors with shape models. The third term represents (c) a higher-order term which is expressed by the reconstruction weights Ωfor local vertebra modeling. The energy Ecan be expressed as the following combination of a global and local optimization:

ES0IΔΩ=VY0+ΔI+αVNΔ+βVHΔΩ.E8

The global alignment of the model with the target image primarily drives the deformation of the model. The purpose is to estimate the set of articulations describing the global spine model by determining its optimal representation Y0in the embedded space. This is performed by obtaining the global representation using the mapping in (7) so that: fNWYi+Δ=fNWy1+δ1yd+δd. This allows to optimize the model in manifold space coordinates while retrieving the articulations in I. The global cost can be expressed as:

VY0+ΔI=VfNWy1+δ1yd+δdI).E9

The inverse transform allows to obtain Aabsi+D, with Das deformations in the image space. Since the transformations Tiare implicitly modeled in the absolute representation Aabs0, we can formally consider the singleton image-related term as a summation of costs associated with each Lvertebra of the model:

VAabs0+DI=i=1LVisiTi0+diIE10

where VisI=visniTviIviminimizes the distance between mesh vertices of the inferred shape and gradient image Iby a rigid transformation. Here, niis the normal pointing outwards and Ivithe image gradient at vi.

The prior constraint for the rigid alignment are pairwise potentials between neighboring models yisuch that the difference in manifold coordinates is minimal with regards to a prior distribution of neighboring distances P:

αVNΔ=αiGjNiVijyi0+δiyj0+δjP.E11

This term represents the smoothness term of the global cost function to ensure that the deformation δiapplied to point coordinates are regular, with Vij=01a distance assigning function based on the distances to P.

One can integrate the global data and prior terms along with local shape terms parameterized as the higher-order cliques, by combining (9), (11):

ES0IΔΩ=VfNWy1+δ1yd+δdI)+αiGjNiVijyi0+δiyj0+δj+βcCVcwc0+ωc.E12

The optimization strategy of the resulting MRF (12) in the continuous domain is not a straightforward problem. The convexity of the solution domain is not guaranteed, while gradient-descent optimization approaches are prone to nonlinearity and local minimums. We seek to assign the optimal labels LΔ=l1ldand LΩ=l1lnwhich are associated to the quantized space Δof displacements and local weight parameters Ωrespectively. We consider that displacing the coordinates of point yi0by δliis equivalent to assigning label lito yi0. An incremental approach is adopted where in each iteration twe look for the set of labels that improves the current solution s.t. yit=yi0+tδlit, which is a temporal minimization problem. Then (12) can be rewritten as:

EtLΔLΩ=VfNWy1t1l1Δydt1ldΔI)+αiGjNiVijyit1yjt1liΔljΔ+βcCVcwct1lcΩ.E13

We solve the minimization of the higher-order cliques in (13) by transforming them into quadratic functions [18]. We apply the FastPD method [19] which solves the problem by formulating the duality theory in linear programming.

2.4. Results

Manifold learning. The manifold was built from a database containing 711scoliotic spines demonstrating several types of deformities. Each spine model in the database was obtained from biplanar radiographic stereo-reconstructions. It is modeled with 12 thoracic and 5 lumbar vertebrae (17 in total), represented by 6 landmarks on each vertebra (4 pedicle extremities and 2 endplate center points) which were manually identified by an expert on the radiographic images. The resulting manifold is shown in Figure 2.

Figure 2.

Low-dimensional manifold embedding of the spine dataset comprising 711 models exhibiting various types of deformities. The sub-domain was used to estimate both the global shape pose costs and individual shape instances based on local neighborhoods.

Adaptation of the articulated model was done on two different data sets. The first consisted of volumetric CT scans (512×512×251, resolution: 0.8×0.8mm, thickness: 12mm) of the lumbar and main thoracic regions obtained from 21 different patients acquired for operative planning purposes. The MR dataset comprised multi-parametric volumetric data (256×256×160, resolution: 1.3×0.9mm, thickness: 1mm) of 8 patients acquired for diagnostic purposes. For this study, only the T1 sequence was selected for the experiments. All patients on both datasets (29 in total) had 12 thoracic and 5 lumbar vertebrae. Both CT and MR data were manually annotated with 3D landmarks by an expert in radiology, corresponding to left and right pedicle tips as well as midpoints of the vertebral body. Segmentation of the vertebrae from the CT and MR slices were also made by the same operator.

CT imaging experiments. We first evaluated the model accuracy in CT images by computing the correspondence of the inferred vertebral mesh models to the segmented target structures. As a preprocessing step, a rough thresholding was performed on the whole volume to filter out noise artifacts. The overall surface-to-surface comparison results between the inferred 3D vertebral models issued from the articulated model and from known segmentations were first calculated. The mean errors are 2.2±1.5mm (range: 0.65.4mm) for thoracic vertebra and 2.8±1.9mm (range: 0.78.1mm) for lumbar vertebra.

MR imaging experiments. For the experiments involving the segmentation of 3D spine models from MR images, the surface-to-surface comparison showed encouraging results (thoracic: 2.9±1.8mm, lumbar: 3.0±1.9mm) based on differences to ground-truth. As in the previous experiments with CT imaging, ground-truth data was generated by manually segmenting the structures models which were validated by an expert in radiology. As difficult as the CT inference is, the MR problem represent an even greater challenge as the image resolution is more limited and interslice spacing is increased compared to CT. Modeling of the statistical properties of the shape variations and global pose becomes even more important in this case, as it relies heavily in the nonlinear distribution of the patient morphology.

3. Probabilistic modeling of discriminant nonlinear manifolds in the identification of Alzheimer’s

Neurodegenerative pathologies, such as Alzheimer’s disease (AD), are linked with morphological and metabolic alterations which can be assessed from medical imaging and biological data. Recent advances in machine learning have helped to improve classification and prognosis rates, but lack a probabilistic framework to measure uncertainty in the data. In this section, we present a method to identify progressive mild cognitive impairment (MCI) and predict their conversion to AD from MRI and positron emitting tomography (PET) images. We show a discriminative probabilistic manifold embedding where locally linear mappings transform data points in low-dimensional space to corresponding points in high-dimensional space. A discriminant adjacency matrix is constructed to maximize the separation between different clinical groups, including MCI converters and nonconverters, while minimizing the distance in latent variables belonging to the same class.

3.1. Probabilistic model for discriminant manifolds

Manifold learning algorithms are based on the premise that data are often of artificially high dimension and can be embedded in a lower dimensional space. However the presence of outliers and multiclass information can on the other hand affect the discrimination and/or generalization ability of the manifold. We propose to learn the optimal separation between four classes (1) normal controls, (2) nonconverter MCI patients, (3) converter MCI patients and (4) AD patients, by using a discriminant graph-embedding. Here, nlabeled points Y=yilii=1ndefined in RDare generated from the underlying manifold M, where lidenotes the label (NC, cMCI, nMCI or AD). For the labeled data, there exists a low-dimensional (latent) representation of the high-dimensional samples such that X=xilii=1ndefined in Rd. We assume here that the mapping MiRD×dbetween high and low-dimensional spaces is locally linear, such that tangent spaces in local neighborhoods can be estimated with yjyiand xjxi, representing the pairwise differences between connected neighbors i,j. Therefore the relationship can be established as yjyiMixjxi.

In order to effectively discover the low-dimensional embedding, it is necessary to maintain the local structure of the data in the new embedding. The graph G=VWis an undirected similarity graph, with a collection of nodes Vconnected by edges, and the symmetric matrix Wwith elements describing the relationships between the nodes. The diagonal matrix Dand the Laplacian matrix Lare defined as L=DW, with Dii=jiWiji.

Using the theoretical framework from [20], we can determine a distribution of linear maps associated with the low-dimensional representation to describe the data likelihood for a specific model:

logpYG=logpYMXGdxdME14

This joint distribution can be separated into three prior terms: the linear maps, latent variables and the likelihood of the high dimensional points Y:

pYMXG=pYMXGpMGpXGE15

We now define the discriminant similarity graphs establishing neighborhood relationships, as well define each of the three prior terms included in the joint distribution.

Within and between similarity graphs: In our work, the geometrical structure of Mcan be modeled by building a within-class similarity graph Wwfor feature vectors of same group and a between-class similarity graph Wb, to separate features from all four classes. When constructing the discriminant locally linear latent variable embedding, elements are partitioned into Wwand Wbclasses. The intrinsic graph Gis first created by assigning edges only to samples of the same class (ex: nMCI). Each sample is therefore reconstructed only from feature vectors of the same clinical group. Local reconstruction coefficients are incorporated in the within-class similarity graph, such that Wwis defined as:

Wwi,j=1ifyiNwyjoryjNwyi0,otherwise.E16

with Nwcontaining neighbors of the same class. Conversely, Wbdepicts the statistical properties to be avoided in the inference process. Distances between samples from different clinical groups are computed as:

Wbi,j=1ifyiNbyjoryjNbyi0,otherwiseE17

with Nbcontaining neighbors having different class labels from the ith sample. The objective is to transform points to a new manifold Mof dimensionality d, i.e., yixi, by mapping connected samples from the same group in Wwas close as possible to the class cluster, while moving NC, nMCI, cMCI and AD samples of Wbas far away from one another.

Model components: The prior added on the latent variables Xare located at the origin of the low-dimensional domain, while minimizing the Euclidean distance of neighboring points that are associated with the neighborhood of high-dimensional points and maximizing the distance between coordinates of different classes. In order to set the variables with an expected scale αand Hrepresenting the probability density function, the following log prior is defined:

logpXWα=12i=1nαxi+j=1nWwi,jyiyj2j=1nWbi,jyiyj2logHXE18

The prior added to the linear maps defines how the tangent planes described in low and high dimensional spaces are similar based on the Frobenius norm. This prior ensures smooth manifolds:

logpMW=12i=1nxiF2i=1nj=1nWwi,jWbi,jMiMjF2logHME19

Finally, approximation errors from the linear mapping Mibetween low and high-dimensional domains are penalized by including the following log likelihood:

logpYXWγ=i=1nyi212i=1nj=1nWwi,jΔijTγIΔij+12i=1nj=1nWbi,jΔijTγIΔijlogHyE20

with Δijthe difference in Euclidean distance between pairs of neighbors in high and low-dimensional space and γthe update parameter for the EM inference. Samples of yare drawn from a multivariate normal distribution.

3.2. Variational inference

The objective is to infer the low-dimensional coordinates and linear mapping function for the described model, as well as the intrinsic parameters of the model Φ=αγ. This is achieved by maximizing the marginal likelihood of:

logpYWΦ=ρMXlogpYMXWΦρMXdxdM.E21

By assuming the posterior ρMXcan be factored in separate terms ρMand ρX, a variational expectation maximization algorithm can be used to determine the model’s parameters, which are initialized with Φ. The E-step updates the independent posteriors ρXand ρM, while the parameters of Φare updated in the M-step by maximizing Eq. (21).

The discriminant latent variable model can then be used to perform the mapping of new image feature vectors to the manifold. The variational EM algorithm described in the previous section can be used to transform a set of new input points yqwithout changing the overall neighborhood graph structure, by finding the distribution of the local linear map yqand it is low-dimensional coordinate using the E-step explained above. Once the manifold representation xqis obtained, a cluster analysis finds the corresponding class in the manifold, yielding a prediction of the input feature vector yq.

3.3. Experiments

We used the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database with 1.5 or 3.0 T structural MR images (adni.loni.usc.edu) and FDG-PET images. For this study, 187 subjects with both MRI and PET images during a 24 month period were used to train the probabilistic manifold model, including 46 AD patients, 94 MCI patients, and 47 normal controls. During the follow-up period, 43 MCI subjects converted to AD and 56 remained stable. All groups are matched approximately by age (mean of 76.7±5.4) and gender. Images were non-rigidly registered to a standard template, which was then segmented using FSL-FIRST automatic segmentation [21].

A 9-fold cross-validation was performed to assess the performance of the method. The optimal manifold dimensionality was set at d=8, when the trend of the nonlinear residual reconstruction error stabilized for the entire training set. We evaluated the classification performance of the proposed method for discriminating between cMCI and nMCI patients, by training the model with MRI, PET and with MRI + PET biomarkers from the ROIs illustrated in Figure 3. Figure 4 presents ROC curves obtained by the proposed and comparative methods such as SVM (nonlinear RBF kernel), LLE and LL-LVM [20]. The discriminative nature of the proposed framework clearly shows an improvement to standard learning approaches models which were trained using MRI only, PET only and combined multimodal features. It illustrates that increased accuracy (77.4%) can be achieved by combining MRI and PET features, showing the benefit of extracting complementary features from the dataset for prediction purposes. When comparing the performance of the proposed method to the other learning methods (SVM, LLE, LL-LVM), the probabilistic model integrating similarity graphs shows a statistically significant improvement p<0.01to all three approaches based on paired t-test.

Figure 3.

Selected FSL segmented brain regions for feature selection on (left) MRI and (right) PET images.

Figure 4.

ROC curves comparing the SVM, LLE and LL-LVM with the proposed method for cMCI/nMCI prediction using MRI, PET and multimodality data.

4. Spatiotemporal manifold prediction model for surgery prediction

In this final section, we present a statistical framework for predicting the surgical outcomes following spine surgery of adolescents with idiopathic scoliosis. A discriminant manifold is first constructed to maximize the separation between responsive and nonresponsive groups of patients. The model then uses subject-specific correction trajectories based on articulated transformations in order to map spine correction profiles to a group-average piecewise-geodesic path. Spine correction trajectories are described in a piecewise-geodesic fashion to account for varying times at follow-up exams, regressing the curve via a quadratic optimization process. To predict the evolution of correction, a baseline reconstruction is projected onto the manifold, from which a spatiotemporal regression model is built from parallel transport curves inferred from neighboring exemplars (Figure 5).

Figure 5.

Proposed prediction framework for spine surgery outcomes. In the training phase, a dataset of spine models are embedded in a spatiotemporal manifold M, into responsive (R) or nonresponsive (NR) groups. During testing, an unseen baseline 3D spine reconstruction yq is projected on M using fNW based on Nadaraya-Watson kernels. The closest samples to the projected point x are selected to regress the spatiotemporal curve γ used for predicting the correction due with surgery.

4.1. Discriminant embedding of spine models

We propose to embed a collection of nonresponsive (NR) and (2) responsive (R) patients to surgery which will offer a maximal separation between the classes, by using a discriminant graph-embedding. Here, nlabeled points Y=yilitii=1ndefined in RDare embedded in the low-dimensional manifold M, where lidescribes the label (NR or R) and tidefines the time of follow-up. We assume that for the sampled data, an underlying manifold of the high-dimensional data exists such that X=xilitii=1ndefined in Rd. We rely on the assumption that a locally linear mapping MiRD×dexists, where local neighborhoods are defined as tangent planes estimated with yjyiand xjxi, describing the paired distances between linked neighbors i,j. Hence, the relationship can be established as yjyiMixjxi.

Because the discriminant manifold structure in Rdrequires to maintain the local structure of the underlying data, a undirected similarity graph G=VWis built, where each node Vare connected to each other with edges that are weighted with the graph W. The overall structure of Mis therefore defined with Wwfor feature vectors belonging to the same class and Wb, which separate features from both classes. During the embedding of the discriminant locally linear latent manifold, data samples are divided between Wwand Wb.

4.2. Piecewise-geodesic spatiotemporal manifold

Once sample points xiare in manifold space, the objective is to regress a regular and smooth piecewise-geodesic curve γ:t1tNthat accurately fits the embedded data describing the spatiotemporal correction following surgery within a 2 year period. For each sample data xi, the Kclosest individuals demonstrating similar baseline features are identified from the embedded data, creating neighborhoods Nxqwith measurements at different time points, thus creating a low-dimensional Riemannian manifold where data points xi,j, with idenoting a particular individual, jthe time-point measurement and j=0the preoperative model. By assuming the manifold domain is complete and piecewise-geodesic curves are defined for each time trajectories, time-labeled data can be regressed continuously in RD, thereby creating smooth curves in time intervals described by samples in Rd.

However, due to the fact the representation of the continuous curve is a variational problem of infinite dimensional space, the implementation follows a discretization process which is derived from the procedure in [22], such that:

Eγ=1Kdi=1Kdj=0tNwiγti,jxi,jxi,0xq2+λ2i=1Kdαivi2+μ2i=1Kdβiai2.E22

This minimization process simplifies the problem to a quadratic optimization, solved with LU decomposition. The piecewise nature is represented by the term KdNxq, defined as samples along γ. The first component of Eq.(22) is a penalty term to minimize the geodesic distance between samples xi,jand the regressed curve, where wiare weight variables based on sample distances. This helps regress a curve that will lie close to xi,j, shifted by xqin order to have the initial reconstructions co-registered. The second term represents the velocity of the curve (defined by vi, approximating γ̇ti), minimizing the L2distance of the 1stderivative of γ. By minimizing the value of the curve’s first derivatives, this prohibits any discontinuities or rapid transitions of the curve’s direction, and is modulated by αi. Finally, an acceleration penalty term (defined by ai) focuses on the 2ndderivative of γwith respect to tiby minimizing the L2norm. The acceleration is modulated by βi. Estimates for viand ai(weighted by λμ, respectively), are generated using geometric finite differences. These estimates dictates the forward and backward step-size on the regressed curve, leading to directional vectors in Mas shown in [22]. In order to minimize Eγ, a nonlinear conjugate gradient technique defined in the low-dimensional space Rdis used, thus avoiding convergence and speed issues. The regressed curve γis therefore defined for all time points, originating at t0. The curve creates a group average of spatiotemporal transformations based on individual correction trajectories.

4.3. Prediction of spine correction

Finally, to predict the evolution of spine correction from an unseen preoperative spine model, we use the geodesic curve γ:RDMmodeling the spatiotemporal changes of the spine, where each point xMis associated to a speed vector vdefined with a tangent plane on the manifold such that vTxM.

Based on Riemannian theory, an exponential mapping function at xwith velocity vcan be defined from the geodesics such that exMv. Using this concept, parallel transport curves defined in Txcan help define a series of time-index vectors along γas proposed by [10]. The collection of parallel transport curves allows to generate an average trajectory in ambient space RD, describing the spine changes due to the corrective forces of tethering. The general goal is to begin the process at the preoperative sample, and navigate the piecewise-geodesic curve describing correction evolution in time, where one can extract the appearance at any point (time) in RDusing the exponential mapping. For implementation purposes, the parallel transport curve are constrained within a smooth tubular boundary perpendicular to the curve (from an ICA) to generate the spatiotemporal evolution in the coordinate system of the preoperative model.

Hence, given the manifold at time t0with vdefined in the tangent plane and the regressed piecewise-geodesic curve γ, the parallel curve is obtained as:

ηvγs=eγsMxγ,t0,sv,sRd.E23

Therefore by repeating this mapping for manifold points seen as samples of individual progression trajectories along γs, an evolution model can be generated. Whenever a new sample is embedded, new samples points along γs, denoted as ηvγcan be generated parallel to the regressed piecewise curve in M, capturing the spatiotemporal changes in correction.

A time warp function allowing sto vary along the geodesic curve is described as ϕit=θitt0τi+t0. Here, we propose to incorporate a personalized acceleration factor based on the spine maturity and flexibility derived from the spine bending radiographs and Risser grade. A coefficient θi=Ci×Ridescribing the change in Cobb angle Cibetween poses, and modulated by the Risser grade Ri. This coefficient regulates the rate of correction based on the Kneighboring samples. Finally, to take under account the relative differences between the group-wise samples and the query model once mapped onto the regressed curve, a time-shift parameter τiis incorporated in the warp function.

For spine correction evolution, displacement vectors viare obtained by a PCA of the hyperplane crossing TxiMin manifold M[10]. Hence, for any query sample xqwhich represents the mapped preoperative 3D reconstruction (prior to surgery), the predicted model at time tkcan be regressed from the piecewise-geodesic curve generated from embedded samples xin Nxqsuch that:

yq,tk=ηvqγϕitk+εq,tkE24

which yields a predicted postoperative model yq,tkin high-dimensional space RD, and εq,tka zero-mean Gaussian distribution. The generated model offers a complete constellation of interconnected vertebral models composing the spine shape S, at first-erect (FE), 1 or 2-year visits, including landmarks on vertebral endplates and pedicle extremities, which can be used to capture the local shape morphology with the correction process.

4.4. Experiments

The discriminant manifold was trained from a database of 4383D spine reconstructions generated from biplanar images [23], originating from 131patients demonstrating several types of deformities with immediate follow-up (FE), 1 and 2 year visits. Patients were recruited from a single center prospective study. Patients were divided in two groups, with the first group composed of 94 responsive patients showing a reduction in Cobb angle over or equal to 10between the FE and follow-up visit. The second group was composed of 37 nonresponsive (NP) patients with a reduction of less than 10. We evaluated the geometrical accuracy of the predictive manifold for 56 unseen surgical patients (mean age 12±3, average main Cobb angle on the frontal plane at the first visit was 47±10), with predictions at t=0(FE), t=12and t=24months. For the predicted models, we evaluated the 3D root-mean-square difference of the vertebral landmarks generated, the Dice coefficients of the vertebral shapes and in the main Cobb angle. The results are shown in Table 1. Results were confronted to other techniques such as biomechanical simulations performed on each subject using finite element modeling with ex-vivo parameters [25], a locally linear latent variable model [20] and a deep auto-encoder network [24]. Results from the predicted geometrical models show the regressed spatiotemporal geodesic curve yields anatomically coherent structures, with accurate local vertebral morphology.

FE visit1-year visit2-year visit
3D RMSDiceCobb3D RMSDiceCobb3D RMSDiceCobb
Biomec. sim3.3 ±1.185 ±3.42.8 ±0.83.6 ±1.284 ±3.63.2 ±0.94.1 ±2.382 ±3.93.6 ±1.0
LL-LVM [20]3.6 ±1.483 ±4.03.8 ±1.54.7 ±3.379 ±4.45.5 ±2.66.6 ±4.471 ±5.97.0 ±3.9
Deep AE [24]4.1 ±1.580 ±4.45.1 ±2.75.0 ±1.977 ±4.95.8 ±3.06.3 ±4.672 ±5.76.6 ±4.2
Proposed2.4 ±0.892 ±2.71.8 ±0.52.9 ±0.990 ±2.82.0 ±0.73.2 ±1.387 ±3.12.1 ±0.6

Table 1.

3D RMS errors (mm), dice (%) and cobb angles (o) for the proposed method, and compared with biomechanical simulations, locally linear latent variable models (LL-LVM) and deep auto-encoders (AE).

Predictions are evaluated at FE, 1 and 2-years.

5. Discussion

Algorithms capable of extracting clinically relevant and meaningful descriptions from medical imaging datasets have become of widespread interest to theoreticians as well as practitioners in the medical field, accelerating the pace in recent years involving varied fields such as in machine learning, geometry, statistics and genomics to propose new insights for the analysis of imaging and biologic datasets. Towards this end, manifold learning has demonstrated a tremendous potential to learn the underlying representation of high-dimensional, complex imaging datasets.

We presented frameworks describing longitudinal, multimodal image features from neuroimaging data using a Bayesian model for discriminant nonlinear manifolds to predict the conversion of progressive MCI to Alzheimer’s disease. This probabilistic method introduces class-dependent latent variables which is based on the concept that local structure is transformed from manifold to the high-dimensional domain. This variational learning method can ultimately assess uncertainty within the manifold domain, which can lead to a better understanding of relationships between converters and nonconverters for patients with MCI.

Finally, a prediction method for the outcomes of spine surgery using geodesic parallel transport curves generated from probabilistic manifold models was presented. The mathematical models allow to describe patterns in a nonlinear and discriminant Riemannian framework by first distinguishing nonprogressive and progressive cases, followed by a prediction of structural evolution. The proposed model provides a way to analyze longitudinal samples from a geodesic curve in manifold space, thus simplifying the mixed effects when studying group-average trajectories.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Samuel Kadoury (November 5th 2018). Manifold Learning in Medical Imaging, Manifolds II - Theory and Applications, Paul Bracken, IntechOpen, DOI: 10.5772/intechopen.79989. Available from:

chapter statistics

238total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Trajectory Tracking Control of Parallel Manipulator with Integral Manifold and Observer

By Zhengsheng Chen

Related Book

First chapter

Classical and Quantum Conjugate Dynamics – The Interplay Between Conjugate Variables

By Gabino Torres-Vega

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us