Demographic and clinical information of subjects of the local database.

## Abstract

Positron emission tomography (PET) is a molecular medical imaging modality which is commonly used for neurodegenerative disease diagnosis. Computer-aided diagnosis (CAD), based on medical image analysis, could help with the quantitative evaluation of brain diseases such as Alzheimer’s disease (AD). Ranking the effectiveness of brain volume of interest (VOI) to separate healthy or normal control (HC or NC) from AD brain PET images is presented in this book chapter. Brain images are first mapped into anatomical VOIs using an atlas. Different features including statistical, graph, or connectivity-based features are then computed on these VOIs. Top-ranked VOIs are then input into a support vector machine (SVM) classifier. The developed methods are evaluated on a local database image as well as on Alzheimer’s Disease Neuroimaging Initiative (ADNI) public database and then compared to known selection feature methods. These new approaches outperformed classification results in the case of a two-group separation.

### Keywords

- machine learning
- computer-aided diagnosis
- first order statistics
- feature selection
- positron emission tomography
- classification
- Alzheimer’s disease

## 1. Introduction

Alzheimer’s disease (AD) is a degenerative and incurable brain disease which is considered the main cause of dementia in elderly people worldwide. At present, there are around 90 million people who have been diagnosed with AD, and it is estimated that the number of AD patients will reach 300 million by 2050 [1, 2]. The diagnosis of this disease is done by clinical, neuroimaging, and neuropsychological assessments. Neuroimaging evaluation is based on nonspecific features such as cerebral atrophy, which appears very late in the progression of the disease. Advances in neuroimaging biomarkers help to identify AD in its prodromal stage, mild cognitive impairment (MCI) [3]. Therefore, developing new approaches for early and specific recognition of AD is of crucial importance [4]. Positron emission tomography (PET) is nowadays widely used in neuroscience to characterize brain molecular mechanisms involved in healthy and pathological models [5]. Applied to the brain, PET imaging provides a noninvasive evaluation of various biomarkers, such as the metabolic rate of glucose, the cerebral blood flow, and the neurotransmission, but also the evaluation of some pathological processes such as the neuroinflammation, amyloid deposits, and more recently tubulin-associated unit (TAU) aggregates. In this line, PET using 18-fluoro-deoxy-glucose (18FDG) has been proposed as a biomarker of Alzheimer’s disease [6, 7, 8, 9, 10]. 18F-FDG-PET provides 3D-volumetric brain imaging of the cerebral metabolic rate of glucose, thought to reflect the synaptic activity and thus the functional brain. However, these abnormalities, evident on average within a group of patients, or in individual cases with advanced disease, can be individually more difficult to confirm in a purely visual interpretation, particularly in the earliest stages of the disease when the current treatment (and those under development) would be yet the most effective. This visual interpretation is in addition subjective and also highly impacted by the experience of the physician. Individual, quantitative, and computer-aided approaches are thus needed for medical management of brain diseases. There has been a growing interest in using the cerebral glucose metabolism rate for AD classification and prediction of conversion from MCI to AD [11, 12, 13]. Four main groups of methods have been studied: voxels as feature (VAF)-based [14], discriminative voxel selection-based [15, 16], atlas-based [17, 18, 19, 20], and projection-based methods [21]. Recently deep learning and more specifically convolutional neural networks (CNN) and recurrent neural networks (RNN) have been investigated for image classification in the case of Alzheimer’s disease computer-aided diagnosis. These approaches show better results than many of the machine learning techniques that have been proposed for recognition tasks [22, 23, 24, 25, 26, 27].

In this chapter, a description on the main steps of a computer-aided diagnosis system for AD based on 18F-FDG-PET brain images is given. Section 2 is devoted to data acquisition and preprocessing. Section 3 describes computed features extracted from brain PET images. A feature selection description is reported in Section 4. In Section 5 classification step is explained. Finally a conclusion and future work are given.

## 2. Materials and methods

### 2.1 Overview of the computer-aided diagnosis system

A computer-aided diagnosis (CAD) (see Figure 1) system consists in different important stages including image acquisition and preprocessing, feature extraction, feature selection, and classification.

In this chapter, the described CAD system focuses on 18F-FDG-PET images from local and public databases and is atlas-based. This means that each PET brain image is mapped into anatomical volume of interests (VOIs), on which features are extracted and then input to a classifier, instead of inputting the whole voxels of each brain image. The goal of such a system is to provide doctors with tools that help them to classify AD and HC subjects.

### 2.2 Image dataset and preprocessing

#### 2.2.1 Image dataset

Two datasets are presented in this chapter and used to evaluate the two approaches described in the following. The local database (Table 1) consists in 18F-FDG-PET scans that were collected from the “La Timone” University Hospital, in the Nuclear Medicine Department (Marseille, France). The local database image enrolled 171 adults 50–90 years of age, including 81 patients with AD and 61 health control (HC) and 29 mild cognitive impairment (MCI). HC were free from neurological/psychiatric disease and cognitive complaints and had a normal brain MRI. AD subjects exhibited NINCDS-ADRDA [28] clinical criteria for probable AD.

HC | AD | MCI | |
---|---|---|---|

Number | 61 | 81 | 29 |

Male/female | (24/37) | (32/49) | (12/17) |

Ages (Mean [Min.Max]) | 68.18 [50.86] | 70.60 [50.90] | 67.55 [50.85] |

The second used dataset (Table 2) was obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (

Characteristic | HC | AD | MCI |
---|---|---|---|

Number of subjects | 90 | 94 | 88 |

Female/male | 34/56 | 38/56 | 32/56 |

Age (Mean ± SD) | 76.08 ± 5.01 | 75.83 ± 7.37 | 76.71 ± 6.63 |

MMSE (Mean SD) | 23.46 ± 2.14 | 28.97 ± 1.15 | 26.92 ± 1.62 |

Therefore, 272 post-processed baseline FDG-PET data were obtained from ADNI, including 94 subjects with AD, 88 subjects with MCI, and 90 NC subjects.

#### 2.2.2 Image preprocessing

Image comparison with brains from different subjects is difficult due to the complexity and anatomical variations of brain structures. For that purpose image data were preprocessed into three steps: spatial normalization, smoothing, and intensity normalization. Spatial normalization was done by registration at voxel level using SPM8 software [11]. The data was spatially normalized onto the Montreal Neurological Institute atlas (MNI). These images were then smoothed using a Gaussian filter with an 8 mm full width at half maximum (FWHM) to increase the signal-to-noise ratio (SNR) [20]. After spatial normalization, intensity normalization was required in order to perform direct image comparison between different subjects. It consisted in dividing the intensity level of each voxel by the intensity level mean of the brain’s global gray matter VOI.

### 2.3 Feature extraction

Each 3D PET brain image was segmented into 116 volumes of interest (VOIs) using an automated anatomical labeling (AAL) atlas. In this research project, the ability of VOIs to best distinguish AD from HC subjects was studied. Different parameter combinations for each VOI were used to select and rank VOIs according to their ability to separate AD group from HC one. The top-ranked VOIs were then introduced into a classifier. Several levels of features were extracted from VOIs. Two approaches have been investigated to achieve this goal.

#### 2.3.1 Separation power factor

In the first approach, only features that extract the statistical information from each VOI are computed. First order statistics and the entropy are extracted from the histogram *h*(*x*) of each VOI:

where *x* is a gray-level value of a voxel belonging to a VOI and l_{min} and l_{max} are the minimum and the maximum gray-level values in VOI, respectively.

For a given VOI, we compute a set of parameter values {Pp|pϵ {1…K}} = {P_{1}, P_{2}, P_{3}, P_{4}, P_{5}}. For easier readability, {P_{p}} is used instead of {P_{p}|pϵ {1…K}} in the following. HC and AD subjects are plotted in a N feature space, which represents a subset of {P_{p}}, denoted {P_{p}}_{N}, N ≤ K among subsets. N-Dimensional sphere (N-D sphere) is created over the group of healthy subjects (HC) (N of length one correspond to an interval, N of length two correspond to a disk, N of length greater than or equal to three correspond to a sphere). The N-D sphere’s center is the mass center of healthy subjects’ distribution. Figure 2 shows the case of a VOI based on three parameters: the mean, P_{1}; the standard deviation, P_{2}; and the kurtosis, P_{4}. It is a 3D sphere with N = {P_{1}, P_{2}, P_{4}}. At various radii of the N-D sphere, we compute the true positives (TPR) and the false positive rates (FPR).

The ROC curve is created by plotting the true positive rate (TPR) vs. the false positive rate (FPR) for different radii of the N-D sphere settings as it is shown in Figure 3. The SPF is defined as the area under the ROC curve (AUC) and is within the range [0, 1].

#### 2.3.1.1 “Combination matrix” analysis

SPF is taken as a key factor to build “combination matrix.” For each VOI, we compute this factor all over the combinations of parameter P_{p} with a length varying from 1 to K (number of feature parameters K = 5). “Combination matrix” is then built and contains 2^{K}−1 columns.

Each line of this matrix represents a VOI, and each column represents the SPF (noted α_{v}−N or α_{v}−{P_{p}}_{N}) computed on a subset {P_{p}}_{N} of N elements of {P_{p}}. The “combination matrix” has then L lines and 2^{K}−1 columns. L = 116 is the number of VOIs.

#### 2.3.1.2 “Combination matrix” 1: mono-parametric analysis

Mono-parametric analysis consists in ranking VOIs according to their higher values of SPF in the “combination matrix.” The set of the top-ranked VOIs are selected for the classification step.

#### 2.3.1.3 “Combination matrix” 2: multi-parametric analysis

Multi-parametric analysis for “combination matrix” depends on both the combination of the parameters (subset of {P_{p}}) for each VOI and the combination of the VOIs, subsets of {V_{v},|vϵ {1… L}}.

The procedure begins with the choice of the first two combination VOIs depending on all the possible combinations of VOIs of length 2 ({V_{j}, V_{q}}, 1 ≤ j, q ≤ L, j ≠ q). Thereafter, it consists on an iterative process according to which we add one VOI at each step to a VOIs list, sequential forward selection (SFS) [29].

At each step of the iterative procedure, HC and AD subjects are plotted in a new feature space by combination of parameters for that selection of VOIs, and examine the SPF value based on this combination, which is named cumulated CSPF. The VOIs that provide the best cumulated SPF value are then added to the final VOI combination. Our algorithm stops when the same or a lower cumulated SPF is obtained.

#### 2.3.2 Multilevel representation

In the second approach, we investigate the multilevel feature representation, which considers both region properties (first level), connectivity between any pair of VOIs (second level), and an overall connectivity between one region and the other regions (third level). The proposed method ranks regions from highly to slightly affected by the disease. A classifier selection strategy is proposed to choose a pair of classifiers with high diversity.

#### 2.3.2.1 First-level feature

The first-level feature consists in computing the first order statistics of voxels within each VOI. The nth subject can be represented as:

where

#### 2.3.2.2 Second-level feature

The second-level feature consists in computing a similarity-based connectivity parameter w_{ij} between VOIs:

where *xi* is the feature vector containing the mean and standard deviation of the ith VOI and *wij* is the similarity coefficient between the *i*th and the *j*th VOIs. The second-level features of any subject is denoted *Wr* which is a symmetric matrix.

The second-level feature is composed of similarity coefficients between all the 116 VOIs, totally 6670 dimension (upper part of matrix **W**_{r}), which is clearly not an optimal dimension for the subsequent classification. Therefore, **W**_{r} is further decomposed into three subsets of features. Similar to the way of computing similarity coefficients between VOIs, we can obtain similarity coefficients between subjects for a specific VOI:

where u and v stand for the uth and vth subjects. For any VOI, a symmetric matrix for subjects, **W**_{s} is computed.

The dimension of **W**_{s} is determined by the number of subjects, N, in a group (AD, NC, MCI). Since each subject is segmented into 116 VOIs, thus there are 116 matrices like **W**_{s}.

On the one hand, a VOI that is not affected by AD will give similar coefficients between AD and NC subjects. On the other hand, a VOI affected by AD will give different similarity coefficients for the two groups. In order to quantify the difference, we compute the frequency distribution histogram of the upper triangle values of **W**_{s}. Figure 4 shows the cumulative probability curve of similarity coefficients obtained for region angular L (c), region hippocampus L (b), and region cerebellum 10 R (a), respectively. There is a clear difference between the AD and NC groups in Figure 4a, and for the other two VOIs, the difference decreases gradually. Cerebellum 10 R appears as the VOI that is almost unaffected by AD, while VOI angular L is the one that is most affected by AD. The area under curve, denoted S, quantifies differences between VOIs.

After ranking all the VOIs, the similarity matrix **W**_{r} is recalculated according to the new order of VOIs. **W**_{r} is divided into four equal parts, as shown in Figure 5a. VOIs that are highly involved in AD appear in red and are denoted **W**^{h}. VOIs that are less involved in AD appear in blue and are denoted **W**^{l}. Connectivities between highly and slightly influenced VOIs, denoted **W**^{m}, appear in green.

Since W_{r} is symmetric, only upper triangular matrix is taken into consideration, like in Figure 5b. Therefore, the second-level feature **W**_{r} is divided into three sets, and after converting them to vectors, the second-level feature for the *n*th sample is represented as

#### 2.3.2.3 Third-level feature

The third-level feature is extracted from a graph, which represents the overall connectivity between a VOI and the others. A graph G = (V, E) is defined by a finite set V of vertices (VOIs) and a finite set E ⊆ V × V of edges (similarity coefficient between the *i*th and the *j*th VOI is denoted *αij*).

After constructing a graph for a subject, several graph measures can be computed [30]. The third-level feature is represented by two graph measures: strength and clustering coefficient.

**Strength**: the sum of a vertex’s neighboring link weights [30]:

where *si* is the strength of a vertex or a VOI.

**Clustering coefficient**: the geometric mean of all triangles associated with each vertex [30]:

where *diag*() is an operator which takes the diagonal values from a matrix, *c* is a clustering coefficient vector, and *d* is a degree vector in which the element *di* is

where *aij* is the connection status between the *i*th vertex and the *j*th vertex: *aij* = 0 when *wij* = 0, otherwise *aij* = 1.

These features exhibit different ranges of values. Thus a procedure of feature normalization is necessary by z-score prior to classification:

where *fmn* is the value of the mth feature of the nth sample and *μm* and *δm* are the mean value and standard deviation of the mth feature, respectively. Most of the *fmn* values are within the range [−1, 1], while out-of-range values are clamped to either −1 or 1.

### 2.4 Classification results

#### 2.4.1 Separation power factor approach

Classification was performed using a support vector machine (SVM) classifier. These classifiers map pattern vectors to a high-dimensional feature space where a “best” separating hyperplane (the maximal margin hyperplane) is build. In the present work, we used a linear kernel SVM classifier [29]. The reduced number of subjects (142 patients) for this first approach leads to use a leave-one-out (LOO) strategy as the most suitable for classification validation. This technique iteratively holds out a subject for test while training the classifier with the remaining subjects, so that each subject is left out once. Parameter C is used during the training phase and tells how much outliers are taken into account in calculating support vectors. A good way to estimate the better C value to be used is to perform it with cross-validation. As result, the estimation value is fixed to 10 (C = 10) depend upon database.

The results achieved using the proposed method with SVM are shown in Figure 6. These results are compared to different feature selection methods: Fisher score [31], support vector machine-recursive feature elimination (SVM-RFE) [32], feature selection with random forest [33], ReliefF [34], and minimum redundancy maximum relevance (mRMR) [35]. Cross-validation average accuracy results are shown as a function of the number of selected VOIs. We note that by injecting 19, 20, or 21 VOIs with their combination of parameters in the SVM with “combination matrix” 1, mono-parametric analysis, we obtain a classification rate of 95.07%. The “combination matrix” 2 achieved higher accuracy with lower number of features (14 VOIs). The best classification results were obtained on the “combination matrix” 2, achieving 96.47%. Therefore, the proposed feature selection from PET images is very effective providing a good discrimination between AD subjects and HC, where we considered the VOIS in the brain image illustrated in Figure 7.

#### 2.4.2 Multilevel approach

The support vector machine (SVM) classifier was also applied in this second approach for classifying AD or MCI from NC subjects. This second approach takes into account three levels of features, the total of which is seven types of features that are input to seven linear SVMs. The margin parameter C of all the SVMs is fixed to one for a fair comparison. The final decision is made through a majority voting of the seven classifiers’ outputs:

where sgn() is a sign function and *yt* denotes the labels of SVMt.

Classification concerns AD vs. NC and MCI vs. NC. Evaluation was done using four different parameters: classification accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the curve (AUC). A tenfold cross-validation technique was used to assess the performance and repeated 10 times to reduce the possible bias. A least absolute shrinkage and selection operator (LASSO) was used for feature selection. LASSO parameter λ retained for the experiments was obtained by nested cross-validation on the training dataset. Tables 3 and 4 show the obtained results for AD vs. NC and MCI vs. NC when using each feature, respectively. As can be seen, the first-level and second-level features outperformed the third-level feature for AD and MCI diagnosis. This could be explained since the two first-level features are more linked to VOI’s property or connectivity between each pair of VOIs, while the third-level feature represents an overall connectivity between a VOI and the others.

Feature | ACC | SEN | SPE | AUC |
---|---|---|---|---|

Mean intensity | 85.13 | 86.61 | 83.97 | 93.39 |

Standard deviation | 85.49 | 84.98 | 86.24 | 93.84 |

Connectivity wh | 85.05 | 86.24 | 84.56 | 93.01 |

Connectivity wm | 86.88 | 88.82 | 85.17 | 93.88 |

Connectivity wl | 83.98 | 84.31 | 83.37 | 91.37 |

Strength | 80.77 | 80.29 | 81.50 | 88.63 |

Clustering coefficient | 83.89 | 84.03 | 84.26 | 92.05 |

Feature | ACC | SEN | SPE | AUC |
---|---|---|---|---|

Mean intensity | 73.55 | 75.01 | 72.87 | 81.36 |

Standard deviation | 78.19 | 78.31 | 78.69 | 86.67 |

Connectivity wh | 72.78 | 70.63 | 74.35 | 83.19 |

Connectivity wm | 74.67 | 76.06 | 73.65 | 83.27 |

Connectivity wl | 74.89 | 77.01 | 72.68 | 78.94 |

Strength | 71.72 | 70.62 | 72.01 | 80.07 |

Clustering coefficient | 72.31 | 74.73 | 70.26 | 80.36 |

This second proposed method was compared with the state-of-the-art methods, including Hinrichs’s method [14], Gray’s method [12], Li’s method [36], and Padilla’s method [21], which were applied to FDG-PET data. The results are shown in Tables 5 and 6. The proposed approach outperformed these methods in terms of ACC and SEN for AD diagnosis. For MCI diagnosis, our method outperforms the other methods in SEN and AUC, and the difference with the best result is 0.21 and 1.65% for ACC and SPE, respectively. Moreover, compared with Tables 3 and 4, the significant improvements indicate the effectiveness of the ensemble classification, thereby explaining the multilevel features are necessary.

## 3. Conclusion

In this chapter, two novel methods for VOI ranking are developed to classify brain PET images better. The first approach consists in ranking VOIs using ROC curves and quantifies the ability of a VOI to classify HC from AD subjects thanks to the area under curve for AUC.

The second approach which uses multilevel features is proposed to address the PET brain classification problem. Three levels of features are extracted from PET brain images and ranked in order to feed a SVM. Different models are trained by using different types of features. The final decision is made through the majority voting of different models’ outputs. According to experiments on ADNI dataset, the proposed method can improve the performance of AD and MCI diagnosis when compared with those state-of-the-art methods which are also developed under FDG-PET.

To go further in computer-aided diagnosis tasks, other features like texture and gradient computed on VOIs have to be joined to first order statistical parameters in order to enrich information. Modern machine learning based on deep learning on neural network will be included in our future work.

## Acknowledgments

This work has been conducted in the framework of DHU-Imaging thanks to the support of the A*MIDEX project (n°ANR-11-IDEX-0001-02) (« Investissements d’Avenir » French Government programme, managed by the French National Research Agency (ANR)).