Open access peer-reviewed chapter - ONLINE FIRST

Alzheimer’s Disease Computer-Aided Diagnosis on Positron Emission Tomography Brain Images Using Image Processing Techniques

By Mouloud Adel, Imene Garali, Xiaoxi Pan, Caroline Fossati, Thierry Gaidon, Julien Wojak, Salah Bourennane and Eric Guedj

Submitted: May 28th 2018Reviewed: March 29th 2019Published: June 5th 2019

DOI: 10.5772/intechopen.86114

Downloaded: 101

Abstract

Positron emission tomography (PET) is a molecular medical imaging modality which is commonly used for neurodegenerative disease diagnosis. Computer-aided diagnosis (CAD), based on medical image analysis, could help with the quantitative evaluation of brain diseases such as Alzheimer’s disease (AD). Ranking the effectiveness of brain volume of interest (VOI) to separate healthy or normal control (HC or NC) from AD brain PET images is presented in this book chapter. Brain images are first mapped into anatomical VOIs using an atlas. Different features including statistical, graph, or connectivity-based features are then computed on these VOIs. Top-ranked VOIs are then input into a support vector machine (SVM) classifier. The developed methods are evaluated on a local database image as well as on Alzheimer’s Disease Neuroimaging Initiative (ADNI) public database and then compared to known selection feature methods. These new approaches outperformed classification results in the case of a two-group separation.

Keywords

  • machine learning
  • computer-aided diagnosis
  • first order statistics
  • feature selection
  • positron emission tomography
  • classification
  • Alzheimer’s disease

1. Introduction

Alzheimer’s disease (AD) is a degenerative and incurable brain disease which is considered the main cause of dementia in elderly people worldwide. At present, there are around 90 million people who have been diagnosed with AD, and it is estimated that the number of AD patients will reach 300 million by 2050 [1, 2]. The diagnosis of this disease is done by clinical, neuroimaging, and neuropsychological assessments. Neuroimaging evaluation is based on nonspecific features such as cerebral atrophy, which appears very late in the progression of the disease. Advances in neuroimaging biomarkers help to identify AD in its prodromal stage, mild cognitive impairment (MCI) [3]. Therefore, developing new approaches for early and specific recognition of AD is of crucial importance [4]. Positron emission tomography (PET) is nowadays widely used in neuroscience to characterize brain molecular mechanisms involved in healthy and pathological models [5]. Applied to the brain, PET imaging provides a noninvasive evaluation of various biomarkers, such as the metabolic rate of glucose, the cerebral blood flow, and the neurotransmission, but also the evaluation of some pathological processes such as the neuroinflammation, amyloid deposits, and more recently tubulin-associated unit (TAU) aggregates. In this line, PET using 18-fluoro-deoxy-glucose (18FDG) has been proposed as a biomarker of Alzheimer’s disease [6, 7, 8, 9, 10]. 18F-FDG-PET provides 3D-volumetric brain imaging of the cerebral metabolic rate of glucose, thought to reflect the synaptic activity and thus the functional brain. However, these abnormalities, evident on average within a group of patients, or in individual cases with advanced disease, can be individually more difficult to confirm in a purely visual interpretation, particularly in the earliest stages of the disease when the current treatment (and those under development) would be yet the most effective. This visual interpretation is in addition subjective and also highly impacted by the experience of the physician. Individual, quantitative, and computer-aided approaches are thus needed for medical management of brain diseases. There has been a growing interest in using the cerebral glucose metabolism rate for AD classification and prediction of conversion from MCI to AD [11, 12, 13]. Four main groups of methods have been studied: voxels as feature (VAF)-based [14], discriminative voxel selection-based [15, 16], atlas-based [17, 18, 19, 20], and projection-based methods [21]. Recently deep learning and more specifically convolutional neural networks (CNN) and recurrent neural networks (RNN) have been investigated for image classification in the case of Alzheimer’s disease computer-aided diagnosis. These approaches show better results than many of the machine learning techniques that have been proposed for recognition tasks [22, 23, 24, 25, 26, 27].

In this chapter, a description on the main steps of a computer-aided diagnosis system for AD based on 18F-FDG-PET brain images is given. Section 2 is devoted to data acquisition and preprocessing. Section 3 describes computed features extracted from brain PET images. A feature selection description is reported in Section 4. In Section 5 classification step is explained. Finally a conclusion and future work are given.

2. Materials and methods

2.1. Overview of the computer-aided diagnosis system

A computer-aided diagnosis (CAD) (see Figure 1) system consists in different important stages including image acquisition and preprocessing, feature extraction, feature selection, and classification.

Figure 1.

Flowchart of a CAD system.

In this chapter, the described CAD system focuses on 18F-FDG-PET images from local and public databases and is atlas-based. This means that each PET brain image is mapped into anatomical volume of interests (VOIs), on which features are extracted and then input to a classifier, instead of inputting the whole voxels of each brain image. The goal of such a system is to provide doctors with tools that help them to classify AD and HC subjects.

2.2. Image dataset and preprocessing

2.2.1. Image dataset

Two datasets are presented in this chapter and used to evaluate the two approaches described in the following. The local database (Table 1) consists in 18F-FDG-PET scans that were collected from the “La Timone” University Hospital, in the Nuclear Medicine Department (Marseille, France). The local database image enrolled 171 adults 50–90 years of age, including 81 patients with AD and 61 health control (HC) and 29 mild cognitive impairment (MCI). HC were free from neurological/psychiatric disease and cognitive complaints and had a normal brain MRI. AD subjects exhibited NINCDS-ADRDA [28] clinical criteria for probable AD.

HCADMCI
Number618129
Male/female(24/37)(32/49)(12/17)
Ages (Mean [Min.Max])68.18 [50.86]70.60 [50.90]67.55 [50.85]

Table 1.

Demographic and clinical information of subjects of the local database.

The second used dataset (Table 2) was obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.

CharacteristicHCADMCI
Number of subjects909488
Female/male34/5638/5632/56
Age (Mean ± SD)76.08 ± 5.0175.83 ± 7.3776.71 ± 6.63
MMSE (Mean SD)23.46 ± 2.1428.97 ± 1.1526.92 ± 1.62

Table 2.

Demographic and clinical information of subjects of ADNI database.

Therefore, 272 post-processed baseline FDG-PET data were obtained from ADNI, including 94 subjects with AD, 88 subjects with MCI, and 90 NC subjects.

2.2.2. Image preprocessing

Image comparison with brains from different subjects is difficult due to the complexity and anatomical variations of brain structures. For that purpose image data were preprocessed into three steps: spatial normalization, smoothing, and intensity normalization. Spatial normalization was done by registration at voxel level using SPM8 software [11]. The data was spatially normalized onto the Montreal Neurological Institute atlas (MNI). These images were then smoothed using a Gaussian filter with an 8 mm full width at half maximum (FWHM) to increase the signal-to-noise ratio (SNR) [20]. After spatial normalization, intensity normalization was required in order to perform direct image comparison between different subjects. It consisted in dividing the intensity level of each voxel by the intensity level mean of the brain’s global gray matter VOI.

2.3. Feature extraction

Each 3D PET brain image was segmented into 116 volumes of interest (VOIs) using an automated anatomical labeling (AAL) atlas. In this research project, the ability of VOIs to best distinguish AD from HC subjects was studied. Different parameter combinations for each VOI were used to select and rank VOIs according to their ability to separate AD group from HC one. The top-ranked VOIs were then introduced into a classifier. Several levels of features were extracted from VOIs. Two approaches have been investigated to achieve this goal.

2.3.1. Separation power factor

In the first approach, only features that extract the statistical information from each VOI are computed. First order statistics and the entropy are extracted from the histogram h(x) of each VOI:

hx=number of voxels inagivenROIwith grel levelxtotal number of pixels in the givenROIE1

where x is a gray-level value of a voxel belonging to a VOI and lmin and lmax are the minimum and the maximum gray-level values in VOI, respectively.

P1=x=lminx=lmaxxhxMeanE2
P2=x=lminx=lmaxxP12hxVarianceE3
Pz=x=lminx=lmaxxP1zhxP1z/2zϵ34Skewness and KurtosisE4
P5=x=lminx=lmaxhxlog2hxEntropyE5

For a given VOI, we compute a set of parameter values {Pp|pϵ {1…K}} = {P1, P2, P3, P4, P5}. For easier readability, {Pp} is used instead of {Pp|pϵ {1…K}} in the following. HC and AD subjects are plotted in a N feature space, which represents a subset of {Pp}, denoted {Pp}N, N ≤ K among subsets. N-Dimensional sphere (N-D sphere) is created over the group of healthy subjects (HC) (N of length one correspond to an interval, N of length two correspond to a disk, N of length greater than or equal to three correspond to a sphere). The N-D sphere’s center is the mass center of healthy subjects’ distribution. Figure 2 shows the case of a VOI based on three parameters: the mean, P1; the standard deviation, P2; and the kurtosis, P4. It is a 3D sphere with N = {P1, P2, P4}. At various radii of the N-D sphere, we compute the true positives (TPR) and the false positive rates.

Figure 2.

The separation between AD and HC groups relative to the region “Cingulum_Post_Left” with three parameters: the mean, the standard deviation, and the kurtosis.

The ROC curve is created by plotting the true positive rate (TPR) vs. the false positive rate (FPR) for different radii of the N-D sphere settings as it is shown in Figure 3. The SPF is defined as the area under the ROC curve (AUC) and is within the range [0, 1].

Figure 3.

ROC curve obtained for the region “Cingulum_Post_Left” using three parameters: the mean, the standard deviation, and the kurtosis.

2.3.1.1. “Combination matrix” analysis

SPF is taken as a key factor to build “combination matrix.” For each VOI, we compute this factor all over the combinations of parameter Pp with a length varying from 1 to K (number of feature parameters K = 5). “Combination matrix” is then built and contains 2K−1 columns.

Each line of this matrix represents a VOI, and each column represents the SPF (noted αv−N or αv−{Pp}N) computed on a subset {Pp}N of N elements of {Pp}. The “combination matrix” has then L lines and 2K−1 columns. L = 116 is the number of VOIs.

2.3.1.2. “Combination matrix” 1: mono-parametric analysis

Mono-parametric analysis consists in ranking VOIs according to their higher values of SPF in the “combination matrix.” The set of the top-ranked VOIs are selected for the classification step.

2.3.1.3. “Combination matrix” 2: multi-parametric analysis

Multi-parametric analysis for “combination matrix” depends on both the combination of the parameters (subset of {Pp}) for each VOI and the combination of the VOIs, subsets of {Vv,|vϵ {1… L}}.

The procedure begins with the choice of the first two combination VOIs depending on all the possible combinations of VOIs of length 2 ({Vj, Vq}, 1 ≤ j, q ≤ L, j ≠ q). Thereafter, it consists on an iterative process according to which we add one VOI at each step to a VOIs list, sequential forward selection (SFS) [29].

At each step of the iterative procedure, HC and AD subjects are plotted in a new feature space by combination of parameters for that selection of VOIs, and examine the SPF value based on this combination, which is named cumulated CSPF. The VOIs that provide the best cumulated SPF value are then added to the final VOI combination. Our algorithm stops when the same or a lower cumulated SPF is obtained.

2.3.2. Multilevel representation

In the second approach, we investigate the multilevel feature representation, which considers both region properties (first level), connectivity between any pair of VOIs (second level), and an overall connectivity between one region and the other regions (third level). The proposed method ranks regions from highly to slightly affected by the disease. A classifier selection strategy is proposed to choose a pair of classifiers with high diversity.

2.3.2.1. First-level feature

The first-level feature consists in computing the first order statistics of voxels within each VOI. The nth subject can be represented as:

rnm=rn1mrn2,..mrnpm
rns=rn1srn2,..srnps

where rnmand rnsare the mean intensity and standard deviation of each VOI, respectively, and p is the number of VOIs; here p = 116.

2.3.2.2. Second-level feature

The second-level feature consists in computing a similarity-based connectivity parameter wij between VOIs:

wij=exixj2ij0i=jE6

where xi is the feature vector containing the mean and standard deviation of the ith VOI and wij is the similarity coefficient between the ith and the jth VOIs. The second-level features of any subject is denoted Wr which is a symmetric matrix.

The second-level feature is composed of similarity coefficients between all the 116 VOIs, totally 6670 dimension (upper part of matrix Wr), which is clearly not an optimal dimension for the subsequent classification. Therefore, Wr is further decomposed into three subsets of features. Similar to the way of computing similarity coefficients between VOIs, we can obtain similarity coefficients between subjects for a specific VOI:

wuv=exuxv2uv0u=vE7

where u and v stand for the uth and vth subjects. For any VOI, a symmetric matrix for subjects, Ws is computed.

The dimension of Ws is determined by the number of subjects, N, in a group (AD, NC, MCI). Since each subject is segmented into 116 VOIs, thus there are 116 matrices like Ws.

On the one hand, a VOI that is not affected by AD will give similar coefficients between AD and NC subjects. On the other hand, a VOI affected by AD will give different similarity coefficients for the two groups. In order to quantify the difference, we compute the frequency distribution histogram of the upper triangle values of Ws. Figure 4 shows the cumulative probability curve of similarity coefficients obtained for region angular L (c), region hippocampus L (b), and region cerebellum 10 R (a), respectively. There is a clear difference between the AD and NC groups in Figure 4a, and for the other two VOIs, the difference decreases gradually. Cerebellum 10 R appears as the VOI that is almost unaffected by AD, while VOI angular L is the one that is most affected by AD. The area under curve, denoted S, quantifies differences between VOIs.

Figure 4.

Statistics of the similarity coefficients between subjects for certain VOIs. (a) VOI: angular L; (b) VOI: hippocampus L; (c) VOI: cerebellum 10 R.

After ranking all the VOIs, the similarity matrix Wr is recalculated according to the new order of VOIs. Wr is divided into four equal parts, as shown in Figure 5a. VOIs that are highly involved in AD appear in red and are denoted Wh. VOIs that are less involved in AD appear in blue and are denoted Wl. Connectivities between highly and slightly influenced VOIs, denoted Wm, appear in green.

Figure 5.

Instance of the division for a similarity matrix. Highly involved VOIs in AD (red); Less involved VOIs in AD (blue); Connectivities between highly and slightly influenced VOIs appear in green.

Since Wr is symmetric, only upper triangular matrix is taken into consideration, like in Figure 5b. Therefore, the second-level feature Wr is divided into three sets, and after converting them to vectors, the second-level feature for the nth sample is represented as wnh, wnm,and wnl, respectively. For wnhand wnl(red and blue parts in Figure 5b), the dimension is 1653 (58× (58–1)/2), and for wnm(green part), it is 3364 (58 × 58). Apparently, compared to 6670 (red, blue, and green parts), the dimension is decreased by about 50–75%.

2.3.2.3. Third-level feature

The third-level feature is extracted from a graph, which represents the overall connectivity between a VOI and the others. A graph G = (V, E) is defined by a finite set V of vertices (VOIs) and a finite set E ⊆ V × V of edges (similarity coefficient between the ith and the jth VOI is denoted αij).

After constructing a graph for a subject, several graph measures can be computed [30]. The third-level feature is represented by two graph measures: strength and clustering coefficient.

Strength: the sum of a vertex’s neighboring link weights [30]:

si=j=1pαijE8

where si is the strength of a vertex or a VOI.

Clustering coefficient: the geometric mean of all triangles associated with each vertex [30]:

c=diagWr133dd1E9

where diag() is an operator which takes the diagonal values from a matrix, c is a clustering coefficient vector, and d is a degree vector in which the element di is

di=j=1paijE10

where aij is the connection status between the ith vertex and the jth vertex: aij = 0 when wij = 0, otherwise aij = 1.

These features exhibit different ranges of values. Thus a procedure of feature normalization is necessary by z-score prior to classification:

zmn=fmnμmδmE11

where fmn is the value of the mth feature of the nth sample and μm and δm are the mean value and standard deviation of the mth feature, respectively. Most of the fmn values are within the range [−1, 1], while out-of-range values are clamped to either −1 or 1.

2.4. Classification results

2.4.1. Separation power factor approach

Classification was performed using a support vector machine (SVM) classifier. These classifiers map pattern vectors to a high-dimensional feature space where a “best” separating hyperplane (the maximal margin hyperplane) is build. In the present work, we used a linear kernel SVM classifier [29]. The reduced number of subjects (142 patients) for this first approach leads to use a leave-one-out (LOO) strategy as the most suitable for classification validation. This technique iteratively holds out a subject for test while training the classifier with the remaining subjects, so that each subject is left out once. Parameter C is used during the training phase and tells how much outliers are taken into account in calculating support vectors. A good way to estimate the better C value to be used is to perform it with cross-validation. As result, the estimation value is fixed to 10 (C = 10) depend upon database.

The results achieved using the proposed method with SVM are shown in Figure 6. These results are compared to different feature selection methods: Fisher score [31], support vector machine-recursive feature elimination (SVM-RFE) [32], feature selection with random forest [33], ReliefF [34], and minimum redundancy maximum relevance (mRMR) [35]. Cross-validation average accuracy results are shown as a function of the number of selected VOIs. We note that by injecting 19, 20, or 21 VOIs with their combination of parameters in the SVM with “combination matrix” 1, mono-parametric analysis, we obtain a classification rate of 95.07%. The “combination matrix” 2 achieved higher accuracy with lower number of features (14 VOIs). The best classification results were obtained on the “combination matrix” 2, achieving 96.47%. Therefore, the proposed feature selection from PET images is very effective providing a good discrimination between AD subjects and HC, where we considered the VOIS in the brain image illustrated in Figure 7.

Figure 6.

Average accuracy obtained with SVM classifier varying number of features for different VOI selection analyses and applying LOO cross-validation with estimation value C = 10.

Figure 7.

15 VOIs’ representation of the “combination matrix” 2 on a coronal plane (a), on a transverse plane (b), and on a sagittal plane (c).

2.4.2. Multilevel approach

The support vector machine (SVM) classifier was also applied in this second approach for classifying AD or MCI from NC subjects. This second approach takes into account three levels of features, the total of which is seven types of features that are input to seven linear SVMs. The margin parameter C of all the SVMs is fixed to one for a fair comparison. The final decision is made through a majority voting of the seven classifiers’ outputs:

Y=sgnt=1t=7ytE12

where sgn() is a sign function and yt denotes the labels of SVMt.

Classification concerns AD vs. NC and MCI vs. NC. Evaluation was done using four different parameters: classification accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the curve (AUC). A tenfold cross-validation technique was used to assess the performance and repeated 10 times to reduce the possible bias. A least absolute shrinkage and selection operator (LASSO) was used for feature selection. LASSO parameter λ retained for the experiments was obtained by nested cross-validation on the training dataset. Tables 3 and 4 show the obtained results for AD vs. NC and MCI vs. NC when using each feature, respectively. As can be seen, the first-level and second-level features outperformed the third-level feature for AD and MCI diagnosis. This could be explained since the two first-level features are more linked to VOI’s property or connectivity between each pair of VOIs, while the third-level feature represents an overall connectivity between a VOI and the others.

FeatureACCSENSPEAUC
Mean intensity85.1386.6183.9793.39
Standard deviation85.4984.9886.2493.84
Connectivity wh85.0586.2484.5693.01
Connectivity wm86.8888.8285.1793.88
Connectivity wl83.9884.3183.3791.37
Strength80.7780.2981.5088.63
Clustering coefficient83.8984.0384.2692.05

Table 3.

Performance of different types of feature for AD vs. NC (%).

FeatureACCSENSPEAUC
Mean intensity73.5575.0172.8781.36
Standard deviation78.1978.3178.6986.67
Connectivity wh72.7870.6374.3583.19
Connectivity wm74.6776.0673.6583.27
Connectivity wl74.8977.0172.6878.94
Strength71.7270.6272.0180.07
Clustering coefficient72.3174.7370.2680.36

Table 4.

Performance of different types of feature for MCI vs. NC (%).

This second proposed method was compared with the state-of-the-art methods, including Hinrichs’s method [14], Gray’s method [12], Li’s method [36], and Padilla’s method [21], which were applied to FDG-PET data. The results are shown in Tables 5 and 6. The proposed approach outperformed these methods in terms of ACC and SEN for AD diagnosis. For MCI diagnosis, our method outperforms the other methods in SEN and AUC, and the difference with the best result is 0.21 and 1.65% for ACC and SPE, respectively. Moreover, compared with Tables 3 and 4, the significant improvements indicate the effectiveness of the ensemble classification, thereby explaining the multilevel features are necessary.

MethodACCSENSPEAUC
Hinrichs et al. [14]84848287.16
Gray et al. [12]88.483.293.6
Li et al. [36]89.1928697
Padilla et al. [21]86.5987.5085.36
Our method90.4890.5889.3895.95

Table 5.

Performance comparison for AD vs. NC (%).

MethodACCSENSPEAUC
Gray et al. [12]81.379.882.9
Li et al. [36]63.2656272
Our method81.0980.9981.2587.65

Table 6.

Performance comparison for MCI vs. NC (%).

3. Conclusion

In this chapter, two novel methods for VOI ranking are developed to classify brain PET images better. The first approach consists in ranking VOIs using ROC curves and quantifies the ability of a VOI to classify HC from AD subjects thanks to the area under curve for AUC.

The second approach which uses multilevel features is proposed to address the PET brain classification problem. Three levels of features are extracted from PET brain images and ranked in order to feed a SVM. Different models are trained by using different types of features. The final decision is made through the majority voting of different models’ outputs. According to experiments on ADNI dataset, the proposed method can improve the performance of AD and MCI diagnosis when compared with those state-of-the-art methods which are also developed under FDG-PET.

To go further in computer-aided diagnosis tasks, other features like texture and gradient computed on VOIs have to be joined to first order statistical parameters in order to enrich information. Modern machine learning based on deep learning on neural network will be included in our future work.

Acknowledgments

This work has been conducted in the framework of DHU-Imaging thanks to the support of the A*MIDEX project (n°ANR-11-IDEX-0001-02) (« Investissements d’Avenir » French Government programme, managed by the French National Research Agency (ANR)).

Download

chapter PDF

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Mouloud Adel, Imene Garali, Xiaoxi Pan, Caroline Fossati, Thierry Gaidon, Julien Wojak, Salah Bourennane and Eric Guedj (June 5th 2019). Alzheimer’s Disease Computer-Aided Diagnosis on Positron Emission Tomography Brain Images Using Image Processing Techniques [Online First], IntechOpen, DOI: 10.5772/intechopen.86114. Available from:

chapter statistics

101total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us