TCV Classification Results for Musicians Study

## 1. Introduction

The principle challenge of MRI brain scan classification is the capture of the features of interest in such a way that relative spatial information is retained while at the same time ensuring tractability. Some popular feature representations are directed at colour, texture and/or shape. Little work has been done on techniques that maintain the relative structure of the features of interest. This chapter describes a number of mechanisms whereby this may be achieved. More specifically, the work is directed at medical image classification according to a particular feature of interest that may appears across a given image set. There are many medical studies [1, 8, 10, 12, 15, 24, 26, 32, 38, 40, 49] that demonstrate that the shape and size of specific regions of interest plays an important role in medical image classification. One example (and the application focus of the work described) is that the shape and size of the corpus callosum, a prominent feature located in brain MRI scans, is influenced by neurological diseases such as epilepsy and autism, and by special abilities (such as mathematical or musical ability) [35, 43, 47].

Given the above, the work described in this chapter is motivated by a need for techniques that can classify images according to the shape and relative size of features of interest that occur across some medical image sets. The main issue to be addressed is how best to process image collections so an efficient and effective representation can be generated suited to the classification of such images, according to some Region of Interest (ROI) contained across the image set. Given that the proposed techniques assume that some appropriate ROI exists across the image set, the techniques will not be applicable to all image classification problems, but the techniques will be applicable to the subset of problems where classification according to a ROI makes sense. The resolution of the ROI classification problem, as formulated above, requires that the following issues be addressed:

Any derived solution should serve to maximise classification accuracy while at the same time allowing for efficient processing (although in the medical context efficient processing can be viewed as a secondary requirement to accuracy).

So as to achieve the desired classification accuracy any proposed feature extraction (representation) method needs to capture the salient elements of the ROI without knowing in advance what those salient elements might be. In other words any proposed feature extraction method, whatever form this might take, must retain as much relevant information as possible.

Not withstanding point 2 it is also desirable to conduct the classification in reasonable time, although there tends to be a trade off between accuracy and efficiency that must also be addressed.

Not all potential representations are compatible with all available classification paradigms, thus different representations may require the application of different classification techniques.

The rest of this chapter is organised as follows. Section 2 provides an overview of the application domain. An essential precursor to the techniques described, although not the focus f this paper, is the registration and segmentation of the region of interest; a note on the registration and segmentation process adopted is therefore given in Section 3. The four proposed techniques for classifying MRI brain scan data according to a single object that occurs across the data, are founded on weighted graph mining, time series analysis, the Hough transform and Zernike Moments respectively. Each is described in the following four sections; Sections 4, 5, 6 and 7. Section 8 then reports on the comparative evaluation of the proposed techniques. Some conclusions are then presented in Section 9.

## 2. Application domain

Magnetic Resonance Imaging (MRI) came into prominence in the 1970s. MRI is similar to Computerized Topography (CT) in that cross-sectional images are produced of some object. A special kind of MRI, called Magnetic Resonance Angiography (MRA) can be used to examine blood vessels. MRI is also used for brain diagnosis, for example to detect abnormal changes in different parts of the brain. A MRI scan of the brain produces a very detailed picture. An example brain scan image is given in Figure 1. MRI brain scans underpin the diagnosis and management of patients suffering from various neurological and psychiatric conditions. Analysis of MRI data relies on the expertise of specialists (radiologists) and is therefore subjective. Automated classification of MRI image data can this provide useful support for the categorisation process and potentially free up resources.

As noted in the introduction to this chapter the focus of the work described is the classification of MRI brain scan data according to a feature called the corpus callosum. Figure 2 gives an example midsagittal slice of a MRI scan[1] -, the corpus callosum is located in the centre of the brain (highlighted in the lefthand image, an associated structure, the fornix, is also shown). The size and shape of the corpus callosum has been shown to be correlated to sex, age, neurodegenerative diseases (e.g. epilepsy, multiple sclerosis and schizophrenia) and various lateralised behaviour in people (such as handedness). It is also conjectured that the size and shape of the corpus callosum reflects certain human characteristics (such as a mathematical or musical ability). Within neuro-imaging research considerable effort has been directed at quantifying parameters such as length, surface area and volume of structures in living adult brains, and investigating differences in these parameters between sample groups. As noted in [33] a number of reported studies have demonstrated that the size and shape of the human corpus callosum, in humans, is related to gender[1, 12, 40], age [40, 49], handedness [10], brain development and degeneration [24, 32], conditions such as epilepsy [8, 38, 47] and brain disfunction [15, 26]. It is worth noting that although the work described in this thesis is directed at MRI brain scan classification, there are other features in MRI brain scans to which the techniques could be applied, such as the ventricles.

## 3. Image preproessing and registration

Although the primary concern of this chapter is the representation of images to permit classification according to some feature that appears across these images, more specifically the classification of MRI brain scans according to the nature of the corpus callosum, for this to happen images must first be segmented and registered. In our case the images were registered by trained physicians using the Brain Voyager QX software package [21] that supports registration using the Talairach transformation. Segmentation was conducted using a variation of of the Normalized Cuts (NCuts) segmentation technique. NCuts formulates segmentation as a graph-partitioning problem. The basic NCut algorithm was proposed by Shi and Malik [44]. However, the authors found that the basic NCuts algorithm did not operate well when applied to large images such as MRI brain scan images. An established enhancement to the basic NCuts algorithm, the multiscale normalized cuts algorithm proposed by Cour et al. [11], was also considered. In the context of the corpus callosum application it was found that the multiscale normalized cuts algorithm could be improved upon so as to reduce the computational resource required to achieve the segmentation. A variation of the multiscale normalized cuts algorithm, developed by the authors, was thus adopted. Details of the algorithm can be found in [17]. Alternative registration and segmentation techniques can clearly be adopted. What is important to note, with respect to the contents of this chapter, is that the start point for each of the techniques described is a segmented corpus callosum.

## 4. Method 1: Region of interest image classification using a hough transform signature representation

The Hough transform was originally proposed by Paul Hough in 1962 [25]. Subsequently it was refined, in various manners, with respect to a number of proposed image analysis techniques directed at a great variety of application domains. In the context of image analysis the Hough transform is principally used for the purpose of detecting parametric shapes (boxes, cylinders, cones, etc.) in image data. The Hough transform was initially used for the purpose of detecting straight lines in image data, then extended with respect to simple parametric forms, and eventually generalised to detect any parametric shape [2]. The fundamental idea behind the Hough transform is that image patterns can be “transformed” (translated) into some alternative parameter space so that the desired shape detection problem becomes one of simply identifying peaks in the new defined space. The principle disadvantages of the Hough transform are: (i) its substantial storage requirement and (ii) the associated computational overhead. The effect of these two disadvantages can be partially reduced by utilising additional information from the image data to limit the range of parameters that are required to be calculated with respect to each point in a given image. For example, Ballard [2] used gradient information to support circle detection.

The proposed image classification method, based on the Hough transform, is directed at the extraction of shape signatures which can be used as feature vectors in a classification process. It is assumed that the input image is a binary representation of a region of interest (i.e. the corpus callosum with respect to the focus of the work described in this chapter), that has been appropriately segmented from “source” MRI brain scans of the form described above. The proposed shape signature extraction method is founded on an idea first presented in Vlachos et al. [45], which gave good results when classifying simple line drawn symbol images according to their shapes. However, direct application of the Vlachos approach was found to perform consistently badly with respect to the classification of brain MRI scans according to the nature of the corpus callosum. Therefore the proposed method commences by simplifying the shape of the region of interest using a polygonal approximation method. Then the signature extraction process, using the Vlachos approach, was applied.

The proposed image classification technique based on the Hough transform comprises three majors steps. We start with a data set of pre-labelled images from which the ROI (the corpus callosum in our case) has been extracted. Then (Step 1), for each image, the ROI is processed using a Canny edge detector [6] to determine its boundary. Secondly (Step 2), a polygonal approximation technique is applied to reduce the complexity of the boundaries by approximating the boundaries with a minimum number of line segments. Thirdly (Step 3), signature extraction using the Vlachos approach is applied to extract the desired feature vectors which are then placed in a Case Base (CB). The CB ultimately comprises feature vectors extracted from all the images in the given training set and their corresponding class labels. This CB can then used, in the context of a Case Based Reasoning (CBR) framework, to classify unseen MRI brain scans according to the nature of the corpus callosum. Each of these steps is considered in further detail in the following three sub-sections.

### 4.1. Preprocessing (Edge detection)

As already noted the extraction of the desired shape signatures (one per region of interest within each image) commences by applying the Canny edge detector technique [6]. The Canny operator detects the edge pixels of an object using a multi-stage process. First of all, the region boundary is smoothed by applying a Gaussian filter. Then the edge strength is calculated by applying a simple 2D first derivative operator. The region is then scanned along the region gradient direction, and if pixels are not part of the local maxima they are set to zero, a process known as non-maximal suppression. Finally, a threshold is applied to select the correct edge pixels. When the edge detection technique is applied to the corpus callosum each region will be represented by its boundaries.

### 4.2. Polygonal approximation

The aim of the region boundary simplification step is to obtain a smooth curve over a minimum number of line segments describing the region’s boundary. This process is referred to as the polygonal approximation of a polygonal curve which consists of a set of vertices. The approximation of polygonal curves is aimed at finding a subset of the original vertices so that a given objective function is minimised. The problem can be defined in a number of ways, the definition used here is referred to as the *min-# problem*. Given a N-vertex polygonal curve *C*, approximate it by another polygonal curve *C*_{a} with a given number of straight line segments *M* so that the approximation error is minimised.

One of the most widely used solutions to the min-# problem is a heuristic method called the Douglas-Peucker (DP) algorithm [14]. With respect to the work described in thus chapter the Douglas-Peucker (DP) algorithm was used to simplify the boundaries of the regions of interest before the application of the Hough transform to extract signatures. The DP algorithm uses the closeness of a vertex to an edge segment. This algorithm works in a top down manner starting with a crude initial guess at a simplified polygonal curve, namely the single edge joining the first and last vertices of the polygonal curve. Then the remaining vertices are tested for closeness to that edge. If there are vertices further than a specified tolerance, > 0, away from the edge, then the vertex furthest from it is added to the simplification. This creates a new guess for the simplified polygonal curve. Using recursion, this process continues for each edge of the current guess until all vertices of the original polygonal curve are within tolerance of the simplification.

In the case of the approximation of the corpus callosum boundary as a closed curve, we have to find an optimal allocation of all approximation vertices including the starting point. A straightforward solution is to try all vertices as the starting points, and choose the one with minimal error. The complexity of this straightforward algorithm for a *N*-vertex curve is *N* times that of the algorithm for an open curve. There exist a number of heuristic approaches for selecting the starting point. In this work we adopted a heuristic approach founded on that presented in Sato [42]. In this approach, the farthest point from the centroid of the region of interest is chosen as the starting point.

The value of the tolerance affects the approximation of the original polygonal curves. For smaller values of tolerance, the polygonal curve is approximated by a large number of line segments *M* which means that the approximation is very similar to the original curve. While the larger values give a much coarser approximation of the original curve with a smaller number of line segments *M*. Figure 3 shows an example of a simplification of the boundary of a corpus callosum using = 0.9 resulting in 17 line segments. Figure 4 shows another example using = 0.4 resulting in 52 line segments.

### 4.3. Shape signature extraction

The generation of the shape signature based on the Straight Line Hough Trans- form (SLHT) relies on creating an *M* × *N* accumulator matrix *A*, where (using the polar coordinate scheme) each row corresponds to one value of *ρ* (length), and each column to one value of *θ* (orientation). As already noted the he procedure for generating the feature vector from the accumulator matrix is founded on that presented in Vlachos et al. [45] and is as follows:

Determine the set of boundary pixels corresponding to the region of interest.

Transform each pixel in the set into a parametric curve in the parameter space.

Increment the cells in the accumulator matrix

*A*as directed by the parametric curve.Calculate a preliminary feature vector.

Calculate the vector mean.

Normalise the feature vector.

By transforming every point (*x*, *y*) in the image into the parameter space, the line parameters can be found in the intersections of the parametrized curves in the accumulator matrix as show in Figure 4.1. In step 4, the accumulator matrix is projected to a one-dimensional *θ* vector by summing up the *ρ* values in each column. Finally the feature vector is normalized according to its mean in steps 5 and 6. The extracted feature vector describing the ROI within each image can then be used as an image signature.

### 4.4. Classification

The signatures from a labelled training set can thus be collected together and stored in a Case Base (CB) within a Case Based Reasoning (CBR) framework. Euclidean distance may then be used as a similarity measure in the context of a CBR framework. Let us assume that we have the feature vector *T* for a pre-labelled image and the feature vector *Q* for the test image (both of size *N*). Their distance apart is calculated as:

Here *dist* = 0 indicates a perfect match, and *dist* = *distmax* indicates two images with maximum dissimilarity.

To categorise “unseen” MRI brain scans, according to the nature of the corpus callosum, signatures describing the unseen cases were compared with the signatures of labelled cases held in the CB. The well established K-Nearest Neighbour (KNN) technique was used to identify the most similar signature in the CB from which a class label was then extracted.

## 5. Method 2: Region of interest image classification using a weighted frequent subgraph representation

As already noted, the application of techniques to classify image data according to some common object that features across an image set requires the representation of the image objects in question using some appropriate format. The previous section considered representing image objects using a signature generation process founded on the Hough transform. In this section an image decomposition method is considered whereby the ROIs are represented using a quad-tree representation. More specifically the Minimum Bounding Rectangles (MBRss) surrounding the ROIs are represented using a quad-tree representation. The conjectured advantage offered is that a quad-tree representation will maintain the structural information (shape and size) of the ROI contained in the MBR. By applying a weighted frequent subgraph mining algorithm, gSpan-ATW [28], to this representation, frequent subgraphs that occur across the tree represented set of MBR can be identified. The identified frequent subgraphs each describing, in terms of size and shape, some part of the MBR; can then be used to form the fundamental elements of a feature space. Consequently, this feature space can be used to describe a set of feature vectors, one per image, to which standard classification processes can be applied (e.g. decision tree classifiers, SVM or rule based classifiers).

The graph based approach for image classification, as in the case of all the other methods described in this chapter, commences with segmentation and registration to isolate the Region Of Interest (ROI). Secondly, image decomposition takes place to represent the details of the identified ROI in terms of a quad-tree data structure. Feature extraction using a weighted frequent subgraph mining approach (the gSpan-ATWalgorithm with respect to the evaluation described later in this chapter) is then applied to the tree represented image set (one tree per image) to identify frequent subgraphs. The identified subtrees (subgraphs) then form the fundamental elements of a feature space (a set of attributes with which to describe the image set). Finally, due to a substantial number of features (frequent sub- graphs) being generated, feature selection takes place to select the most relevant and discriminatory features. Standard classifier generation techniques can then be applied to build a classifier that can be applied to unseen data. Each of the steps involved in the process is discussed in further detail in the following subsections.

### 5.1. Image decomposition

Image decomposition methods are commonly used in image analysis, compression, and segmentation. Different types of image decomposition mat be adopted, with respect to the work described in this chapter a quad-tree representation is proposed. A quad-tree is a tree data structure which can be used to represent a 2D area (such as images) which has been recursively subdivided into “quadrants” [31]. In the context of the representation of ROIs in terms of quad-trees, the pixels representing the MBR surrounding each ROI are tessellated into homogeneous sub-regions [16, 17]. The tessellation can be conducted according to a variety of image features such as colour or intensity. With respect to the corpus callosum a binary encoding was used, the “tiles” included in the corpus callosum were allocated a “1” (black) and the tiles not included a “0” (white). A tile was deemed to be sufficiently homogeneous if it was 95% black or white. The tessellation continues until either sufficiently homogeneous tiles are identified or some user specified level of granularity is reached. The result is then stored in a quad-tree data structure such that each leaf node represents a tile. Leaf nodes nearer the root of the tree represent larger tiles than nodes further away. Thus the tree is “unbalanced” in that some leaf nodes will cover larger areas of the ROI than others. It is argued that tiles covering small regions are of greater interests than does covering large regions because they indicate a greater level of detail (they are typically located on the boundary of the ROI). The advantage of the representation is thus that it maintains information about the relative lo-cation and size of groups of pixels (i.e. the shape of the corpus callosum). The decomposition process is illustrated in Figures 5 and 6. Figure 5 illustrates the decomposition (in this case down to a level of 3), and Figure 6 illustrates the resulting quad-tree.

### 5.2. Feature extraction using gSpan-ATW algorithm

From the literature two separate problem formulations for Frequent Subgraph Mining (FSM) can be identified: (i) transaction graph based, and (ii) single graph based. In transaction graph based mining, the input data comprises a collection of relatively small graphs, whereas in single graph based mining the input data comprises a very large single graph. The graph mining based approach adopted with respect to the work described in this chapter focuses on transaction graph based mining. In the context of transaction graph based mining, FSM aims to discover all the subgraphs whose occurrences in a graph database are over a user defined threshold *σ*. Many FSM algorithms have been proposed of which the most well known is arguably gSpan [46].

Frequent subgraph mining is computationally expensive because of the candidate generation and support computation processes that are required. The first process is concerned with the generation of candidate subgraphs in a non-redundant manner such that the same graph is not generated more than once. Thus graph isomorphism checking is required to remove duplicate graphs. The second process is to compute the support of a graph in the graph database. This also requires subgraph isomorphism checking in order to determine the set of graphs where a given candidate occurs. Although algorithms such as gSpan can achieve competitive performance compared with other FSM algorithms, its performance degrades considerably when the graph size is relatively large or the graph features few node and/or edge labels. The mechanism for addressing this issue adopted here was to use weighted frequent subgraph mining.

Given the quad-tree representation a weighted frequent subgraph mining algorithm (gSpan-ATW) was applied to identify frequently occurring subgraphs (subtrees) within the tree representation. The Average Total Weighting (ATW) scheme weights nodes according to their occurrence count. The nodes in the tree (see for example Figure 6) are labelled as being either: “black”, “white” or “nothing”. The black and white labels are used for the leaf nodes and represent the shape of the corpus callosum. These should therefore be weighted more highly than the “nothing” nodes. It can also be argued that these should be weighted more highly because they are further away (on average) from the root than the “nothing” nodes, and therefore the leaf nodes can be said to provide more detail. The ATW scheme achieves this.

The ATW weighting scheme was incorporated into the gSpan algorithm to produce gSpan-ATW. As a result of the application of gSpan-ATW the identified frequent subgraphs (i.e. subtrees) each describing, in terms of size and shape, some part of a ROI that occurs regularly across the data set, are then used to form the fundamental elements of a feature space. Using this feature space each image (ROI) can be described in terms of a feature vector of length *N*, with each element having a value equal to the frequency of that feature (sub-graph).

### 5.3. Feature selection and classification

As noted above the graph mining process typically identifies a great many frequent subgraphs; more than required for the desired classification. Therefore a feature selection strategy was applied to the feature space so that only those subgraphs that serve as good discriminators between cases are retained. A straightforward wrapper method was adopted whereby a decision tree generator was applied to the feature space. Features included as “choice points” in the decision tree were then selected, whilst all remaining features were discarded. For the work described here, the well established C4.5 decision tree algorithm [37] was adopted, although any other decision tree generator would have sufficed. On completion of the feature selection process each image was described in terms of a reduced feature vector indicating the selected features (subgraphs) that appear in the image. Once the image set had been represented in this manner any appropriate classifier generator could be applied. With respect to the work described in this chapter the C4.5 algorithm was again adopted (both appliations of C4.5 used the WEKA implementations [23]).

## 6. Method 3: Region of interest image classification using a Zernike moment signature representation

This section describes the third proposed approach to image classification according to some feature that appears across the image set. The proposed approach is founded on the concept of Zernik Moments. Moments are scalar quantities used to characterize a function and to capture its significant features. They have been widely used for many years in statistics for the description of the shape of probability density functions and in classic “rigid-body” mechanics to measure the mass distribution of a body. From the mathematical point of view, moments are “projections” of a function onto a polynomial basis.

Zernike moments are a class of orthogonal moments (moments produced using orthogonal basis sets) that can be used as an effective image descriptor. Unfortunately, direct computation of Zernike moments is computationally expensive. This makes it impractical for many applications. This limitation has prompted considerable study of algorithms for the fast evaluation of Zernike moments [3, 29, 36]. Several algorithms have been proposed to speed up the computation. Belkasim et al. [3] introduced a fast algorithm based on the series expansion of radial polynomials. Parta et al. [36] and Kintner [29] have proposed recurrence relations for fast computation of radial polynomials of Zernike moments. Chong et al. [7] modified Kintners method so that it would be applicable for all cases. Unfortunately, all of these methods approximated Zernike moment polynomials and consequently, produced inaccurate sets of Zernike moments. Wee et al. [48], proposed a new algorithm that computed exact Zernike moments through a set of exact geometric moments; their method was accurate but still entailed a significant computational overhead.

The authors have thus developed an efficient method for exact Zernike Moment computation based on the observation that exact Zernike moments can be expressed as a function of geometric moments. The proposed algorithm is based on a quad-tree representation of images (similar to that described in Section 5) whereby a given pixel represented region is decomposed into a number of non-overlapping tiles. Since the geometric moment computation for each tile is easier than that for the whole region this reduces the computational complexity significantly. The algorithm proposed by Wu et al. [50] for the fast computation of geometric moments was adopted to calculate the required geometric moments. The resulting Zernike moments were then used to define a feature vector (one per image) which can be input to a standard classification mechanism.

### 6.1. Fast calculation of Zernike moments

As noted above a new method for Zernike Moment computation, based on the observation hat exact Zernike Moments can be expressed as a function of Geometric Moments (GMs), is proposed here. The method eases the computational complexity associated with Zernike Moment calculation. Given a pixel represented object, this is first decomposed into a number of non-overlapping squares, for which GMs can be calculated.

The complex 2D Zernike moments of order *p* and repetition *q* are defined as:

where *p* = 0, 1, 2,...,∞ and *q* is a positive or negative integer according to the condition *p* - |*q*| = *even*, |*q*| ≤ *p*. * Is the complex conjugate. The Zernike polynomial:

describes a complete set of complex-valued orthogonal functions defined on the unit disk, *x*_{2} + *y*_{2} *≤* 1, with *i* =*R*_{pq}(*r*) is defined as:

where the polynomial coefficient, *B*_{p|q|k}, is defined as:

Zernike polynomials are thus defined in terms of polar coordinates (*ρ*, *θ*) over a unit disk, while the object intensity function is always defined in terms of Cartesian coordinates (*x*, *y*), therefore the computation of ZM requires an image transformation. There are two traditional mapping approaches [7]. In the first approach, the square image plan is mapped onto a unit disk, where the centre of the image is assumed to be the origin of the coordinate system. In this approach, all pixels outside the unit disk are ignored, which result in a loss of some image information. In the second approach, the whole square image plan is mapped inside the unit disk where the centre of the image is assumed to be the coordinate origin. In this paper, the second approach is used to avoid loss of information. Zernike moments can be expressed in terms of GMs as follows:

where Φ is defined as:

To speed up the calculation of Zernike moments in terms of GMs, as noted above, a quad-tree decomposition was adopted as used in the graph based approach described in Section 5 and in [50]. The GMs for each object can then be easily calculated by summing the GMs for all the squares in the decomposition that are part of the object (the computation of GMs of squares is easier than that for the whole object).

### 6.2. Feature extraction based on Zernike moments

In the context of the proposed ROI based image classification approach, the calculated exact Zernike moment magnitudes were used to define a feature space representing the image set. Each image, or more specifically the object of interest within each image, can then be represented in terms of a feature vector. The feature vector {*AFV*}_{N} will then consist of the accumulated Zernike moment magnitudes from order *p* = 0 to order *p* = *N* with all possible repetitions of *q*. For example, where N = 4, the feature vector {*AFV*}_{4} will consist of the set of all Zernike moments corresponding to the orders *p* = 0, 1, 2, 3, 4 coupled with all possible repetitions of *q* : {|*Z*_{00}|, |*Z*_{11}|, |*Z*_{20}|, |*Z*_{22}|, |*Z*_{31}|, |*Z*_{33}|, |*Z*_{40}|, |*Z*_{42}|, |*Z*_{44}|}. Consequently a set of images that contain a common ROI (such as the corpus callosum in the case of the brain MRI scan data of interest with respect to this chapter) can be represented as a set of feature vectors which can be input to standard classification techniques.

## 7. Method 4: Region of interest image classification using a time series representation

In this section the fourth proposed approach to ROIBIC, founded on a time series representation coupled with a Case Based Reasoning (CBR) mechanism, is described. In this approach the features of interest are represented as time series, one per image. There are a number of mechanisms whereby the desired time series can be generated, the method proposed in this chapter is founded on a ROI intersection mechanism. The generated time series are then stored in a Case Base (CB) which can be used to categorise unseen data using a Case Based Reasoning (CBR) approach. The unseen data is compared with the categorisations contained in the CB using a Dynamic Time Warping (DTW) similarity checking mechanism. The class associated with the most similar time series (case) in the CB is then adopted as the class for the unseen data. It should be noted that the phrase “time series” is used with respect to the adopted representation because the proposed image classification technique is founded on work on time series analysis, not because the representation includes some temporal dimension.

### 7.1. ROI Intersection Time Series Generation

Using the ROI intersection mechanism the desired image signature (“pseudo” time series) is generated using an ordered sequence of *M* “spokes” radiating out from a single reference point. The desired time series is then expressed as a series of values (one for each spoke) describing the size (length) of the intersection of the vector with the ROI. The representation thus maintains the structural information (shape and size) of the ROI. It should also be noted that the value of *M* may vary due to the differences of the shape and size of the individual ROI within the image data set.

Formally speaking it is assumed that there are *M* spokes and each spoke *i*, radiating out from some reference point, intersects the ROI boundary at two points (*x*_{1}(*i*), *y*_{1}(*i*))*and*(*x*_{2}(*i*), *y*_{2}(*i*)); then the proposed image signature is given by:

With respect to the corpus callosum application the time series generation procedure is illustrated in Figure 7. The midpoint of the lower edge of the object’s Minimum Bounding Rectangle (MBR) was selected as the reference point. This was chosen as this would ensure that there was only two boundary intersections per spoke. The vectors were derived by rotating an arc about the reference point pixel. The interval between spokes was one pixel measured along the edge of the MBR. For each spoke the intersection distance *D*_{i} (where *i* is the spoke identification number) over which the spoke intersected with a sequence of corpus callosum pixels was measured and recorded. The result was a time series with the spoke number *i* representing time and the value *D*_{i}, for each spoke, the magnitude (intersection length). By plotting *D*_{i} against *i* a pseudo time series can be derived as shown in Figure 7.

### 7.2. Similarity measuring using dynamic time warping

The objective of most similarity measures is to identify the distance between two feature vectors. There are a number of methods where this may be achieved. In the case of time series analysis a common similarity measuring technique is Dynamic Time Warping (DTW). The DTW algorithm is a well-known algorithm in many areas. It was first introduced in 1960s [4] and extensively explored in 1970s for application within speech recognition systems. DTW operates as follows. In order to align two time series (sequences) *A* and *B* with lengths *N* and *M*, an *N* × *M* matrix (*D*) is constructed, where each element (*i*, *j*) of the matrix contains the distance between the points *A*_{i} and *B*_{j}. The goal is to find a path through this matrix, which minimises the sum of the local distances of the points. The path from (1, 1) to (*N*,*M*) in the matrix *D* is called warping path *W*:

which is subject to the following constraints:

**Boundary condition:**This requires the warping path to start at*w*_{1}= (1, 1) and finish at*w*_{k}= (*N*,*M*).**Continuity:**Given two consequetive points along the warping path,*w*_{k-1}= (*c*,*d*) and*w*_{k}= (*a*,*b*):

thus restricting the allowable steps in the warping path.

**Monotonicity:**Given*w*_{k-1}= (*c*,*d*) and*w*_{k}= (*a*,*b*), then:

The above inequalities forces the points in *W* to be monotonically spaced in time. The warping path on the *D* matrix is found using some dynamic programming algorithm, which accumulates the partial distances between the sequences. If *D*(*i*, *j*) is the global distance up to (*i*, *j*) and the local distance at (*i*, *j*) is given by *d*(*i*, *j*), then the DTW algorithm uses the following recurrence relation:

Given *D*(1, 1) = *d*(*A*_{1}, *B*_{1}) as the initial condition, we have the basis for an efficient recursive algorithm for computing *D*(*i*, *j*). The algorithm starts from *D*(1, 1) and iterates through the matrix by summing the partial distances until *D*(*N*,*M*), which is the overall matching score of the times series (sequences) *A* and *B*.

The computational cost of the application of DTW is *O*(*NM*). In order to improve the computational cost global constraints may be introduced where by we ignore matrix locations away from the main diagonal. Two well known global constraints are the “Sakoe-Chiba band” [39] and “Itakura parallelogram” [27]. The Sakoe-Chiba band runs along the main diagonal and has a fixed width *R* such that *j*- *R* ≤ *i* ≤ *j*+ *R* for the indices of the warping path *w*_{k}(*i*, *j*). While the Itakura parallelogram describes a region that serves to constrain the warping path options. There are several reasons for using global constraints, one of which is that they slightly speed up the DTW distance calculation. However, the most important reason is to prevent pathological warpings, where a relatively small section of one time series maps onto a relatively large section of another. In the work described here, the Sakoe-Chiba band was adopted.

### 7.3. Image classification based on time series representation

The time series based image classification method commences, as in the case of the previous methods, with the segmentation and registration of the input images as described in Chapter 3. Once the ROI have been segmented and identified the next step is to derive the time series according to the boundary line circumscribing the ROI. In each case the ROI is represented using the proposed time series generation techniques described above. Each ROI signature is then conceptualised as a prototype or case contained in a Case Base (CB), to which a Case Based Reasoning (CBR) mechanism can be applied (as in the case of method 1).

As noted previously CBR can be used for classification purposes where, given an unseen record (case), the record can be classified according to the “best match” discovered in the CB. With respect to proposed technique, and in the case of the corpus callosum application, the CB comprises a set of pre-labelled ROI time series “signatures”, each describing a record. The DTW time series matching strategy was then adopted to identify a best match with a new (“unseen”) ROI signature. To do this each pre-labelled signature of size *N* is compared to the given “unseen” signature of size *M* using the DTW technique and a sequence of similarity measures obtained. The well established k-nearest neighbour technique (KNN) was used to identify the most similar signature in the CB from which a class label was then extracted.

## 8. Comparison of the proposed approaches

The four advocated approaches to ROI based image classification were evaluated in the context of the classification of brain MRI scans according to the nature of the corpus callosum, a particular ROI that appears across such datasets. This section reports on the evaluation the proposed approaches. The comparison was undertaken in terms of classification performance and run time complexity. The statistical analysis of the significance of the reported results was conducted using the best performing parameters with respect to each technique (so as to consider each technique to its best advantage). In addition the proposed approaches were compared with two notable alternative ROI representation techniques: the Curvature Scale Space (CSS) [34] and the Angular Radial Transform (ART) [5]. These two techniques were selected because in the MPEG-7 standard, CSS has been adopted as the contour-based shape descriptor and ART as the region-based shape descriptor.

### 8.1. Datasets

To evaluate the techniques described in this thesis to classify medical images according to the nature of the corpus callosum a number of data sets were used. As already noted the data sets were generated by extracting the *midsagittal slice* from MRI brain scan data volumes (bundles), one image per volume. Each data set comprised a number of brain MRI scans each measuring 256 × 256 pixels, with 256 grayscale levels and each describing a midsagittal slice. To support the evaluation the data sets were grouped as follows: (i) Musicians, (ii) Epilepsy and (iii) Handedness. Each group is described in some further detail as follows:

**Musicians datasets** For the musicians study the data set comprised 106 MRI scans, 53 representing musicians and 53 non-musicians (i.e. two equal classes). The study was of interest because of the conjecture that the size and shape of the corpus callosum reflects human characteristics such as a musical ability.

**Epilepsy datasets** For the epilepsy study a data set comprising the 106 MRI brain scans used for the musicians study complemented with 106 MRI brain scans from epilepsy patients, to give a data set totalling 212 MRI brain scans, was used. The objective was to seek support for the conjecture that the shape and size of the corpus callosum is influence by conditions such as epilepsy. It should be noted that, as far as the authors are aware, the musicians study did not include any epilepsy patents.

**Handedness datasets** For the handedness study a data set comprising 82 MRI brain scans was used, 42 representing right handed individuals and 40 left handed individuals. The study was of interest because of the conjecture that the size and shape of the corpus callosum reflects certain human characteristics (such as handedness).

All three brain MRI datasets were preprocessed, using the variation of the mult-iscale mormalised cuts algorithm introduced in Subsection 3, so as to extract the corpus callosum ROI. On completion of data cleaning (noise removal) a “local” registration process was undertaken by fitting each identified corpus callosum into a Minimum Bounding Rectangle (MBR) so that each identified corpus callosum was founded upon the same origin.

### 8.2. Experimental evaluation

Table 1 shows the TCV results obtained using the musician data set. The HT, GB, ZM, TS rows indicate the results using the Hough transform, frequent sub-graph, Zernike moments, and time series based approaches respectively. The CSS and ART rows indicate the MPEG-7 descriptors (Curvature Scale Space and the Angular Radial Transform) respectively. The “Acc”, “Sens”, and “Spec” columns indicate accuracy, sensitivity and specificity respectively. The best results are indicated in bold font. Inspection of Table 1 demonstrates that the overall classification accuracies obtained using the four advocated approaches were over 90%, while the overall classification accuracy obtained using the time series based approach significantly improved over that obtained using the other three approaches. The best sensitivity and specificity were also obtained using the time series based approach (100% in the case of sensitivity). The four advocated approaches all outperformed the CSS and ART techniques. These are excellent results.

Acc | Sens | Spec | |

HT | 91.51 | 92.45 | 90.57 |

GB | 95.28 | 96.23 | 94.34 |

ZM | 96.23 | 98.11 | 94.34 |

TS | 98.11 | 100.00 | 96.23 |

CSS | 86.79 | 88.68 | 84.91 |

ART | 89.62 | 90.57 | 88.68 |

Acc | Sens | Spec | |

HT | 90.24 | 92.50 | 88.1 |

GB | 93.90 | 95.00 | 92.86 |

ZM | 93.90 | 95.00 | 92.86 |

TS | 96.34 | 97.50 | 95.24 |

CSS | 85.37 | 85.00 | 85.71 |

ART | 87.80 | 90.00 | 85.71 |

Acc | Sens | Spec | |

HT | 76.42 | 81.13 | 71.70 |

GB | 86.32 | 87.74 | 84.91 |

ZM | 85.38 | 87.74 | 83.02 |

TS | 77.36 | 82.08 | 72.64 |

CSS | 68.40 | 72.64 | 64.15 |

ART | 70.28 | 73.58 | 66.98 |

Table 2 shows the TCV results obtained using the handedness data set. The column and row headers are defined as in Table 1. Inspection of Table 2 indicates that the four advocated approaches also performed well with respect to handedness study. The best overall classification results were again obtained using the time series based approach, which showed significant improvement over the other three approaches. The best sensitivity and specificity were also obtained using the time series based approach. The four advocated approaches also outperform the CSS and ART techniques. Again, these are excellent results.

Table 3 shows the TCV results obtained using the epilepsy data set. The column and row headers were defined as in Table 1. From Table 3 it can be observed that a different result was produced than that recorded with respect to the musicians and handedness studies. The graph based and Zernike moments based approaches that consider all the pixels of each ROI in the feature extraction process outperformed the Hough transform and time series based approaches (recall that these approaches consider only the pixels of the boundary of the ROI). Again all four of the advocated approaches also outperform the CSS and ART techniques. The results for the epilepsy data set seem to be at odds with those obtained using the musicians and handedness studies reported above. Subsequent discussion with medical domain experts did not give an indication as to why this might be the case. However, the suspicion is that the results reflect the fact that although the nature of the corpus callosum may play a part in the identification of epilepsy there are also other factors involved.

With respect to classification accuracy in general all four ROI based image classification approaches performed remarkably well, although the time series based approach produced the best results for the musicians and handedness studies while the graph based approach produced the best results for the epilepsy study. There is no obvious reason why this might be the case, visual inspection of the MRI scans does not indicate any obvious distinguishing attributes with respect to the size and shape of the corpus callosum. Tracing the cause of a particular classification back to a particular part of the corpus callosum is thus seen as a desirable “avenue” for future research. It is also interesting to note that the Hough transform based approach performed consistently badly with respect to all of the above evaluation studies suggesting that generating shape signatures using the Hough transform is not a technique to be recommended in the context of feature based classification, although the use of the Hough transform is popular in other branches of image analysis.

In the literature there are a few reported studies on classifying medical images according to the nature of the corpus callosum. For example, Sampat et al. [41] used the cross sectional area of the corpus callosum and the inferior subolivary Medulla Oblongata Volume (MOV) to distinguish patients with Relapsing-Remitting Multiple Sclerosis (RRMS), Secondary-Progressive Multiple Sclerosis (SPMS), and Primary-Progressive Multiple sclerosis (PPMS). Their study produced a classification accuracy of 80%. Fahmi et al. [19] proposed a classification approach in order to distinguishing between healthy controls and autistic patients according to the nature of the corpus callosum. They analysed the displacement fields generated from the non-rigid registration of different corpus callosum segments onto a chosen reference within each group. Their reported result indicated that the classification accuracy was 86%. Golland et al. [22] adopted a version of “Skeletons” for feature extraction, coupled with the Fisher linear discriminant and the linear support vector machines, for the classification of corpus callosum data for schizophrenia patients. The best classification accuracy achieved using their support vector machine classification method was less than 80%. These studies indicate how comparatively effective the classification results obtained, using the four proposed approaches, are. The results obtained using the proposed methods significantly improved on the results produced in these earlier studies.

The run time complexity of the four ROIBIC approaches using the musician, handedness, and epilepsy datasets, are presented in Figures 8, 9 and 10 respectively. The classification time is the overall run time, i.e. it incorporates the feature extraction, training and testing phases. All the experiments were performed with 1.86 GHz Intel(R) Core(TM)2 PC with 2GB RAM. The graph based approach was computationally the most expensive, while the time series based approach was computationally the least expensive. However, it is worth remarking that, especially in the medical context, it is the classification accuracy, not speed, which is the most important feature of the proposed processes.

In summary we can note that there is no constant “winner” among the four proposed ROI based image classification approaches. However, excellent classification results were produced.

### 8.3. Statistical comparison of the proposed image classification approaches

The AUC values for the best results obtained for all five dataset are given in Table 4. It should be noted that the AUC values support the results reported in Sub-section 8.2. The Friedman’s test [13, 20] was used to compare the AUCs of the different classifiers. The Friedman test statistic is based on the Average Ranked (AR) performances of the classification techniques on each data set, and is calculated as follows:

where *N* denotes the number of data sets used in the study, *K* is the total number of classifiers and *j* on data set *i*. *K* - 1 degrees of freedom. If the value of *p* < 0.005), the null hypothesis that there is no difference between the techniques can therefore be rejected. From Table 4 the technique achieving the highest AUC on each data set and the overall highest ranked technique is indicated in bold font. From the table it can be seen that the graph based approach (GB) has the best Friedman score (average rank (AR)). The AR associated with the Hough transform approach is statistically worse than the AR associated with all the other approaches, supporting the results obtained earlier.

Friedman test statistic = 10.68 (p < 0.005) | AR | |||

Musician | Handedness | Eplepsy | ||

HT GB ZM TS | 92.6 (4) 97.1 (2) 96.4 (3) 99.1 (1) | 91.3 (4) 96.2 (2) 94.7 (3) 96.8 (1) | 78.6 (4)88.3 (1)87.2 (2) 79.3 (3) | 4 1.4 2.4 2.2 |

To determine the operational difference between the individual classifiers a post hoc Nemenyi test was applied [13]. The Nemenyi test states that the performances of two or more classifiers are significantly different if their average ranks differ according to a Critical Difference (*CD*) value, given by:

where the value value for *q*_{α,∞,K} is based on the Studentised range statistic [13]. A post hoc Nemenyi test was therefore applied to each class distribution and the results displayed using a modified version of a Demšar significance diagram [30]. A Demšar diagram displays the ranked performances of the classification techniques, along with the critical difference, to highlight any techniques which are significantly different to the best performing classifiers. Figure 11 displays the Demšar diagram for the proposed classification approaches. The diagram shows the AUC performance rank for each approach, along with the Nemenyi CD tail. The CD value for the diagram shown in Figure 11 is equal to 1.48. The diagram shows the classification techniques listed in ascending order of ranked performance on the y-axis; and the image classification techniques’ average rank, across all data sets, along the x-axis. From the figure it can be seen that the graph based approach is the best performing classification technique with the time series approach coming in second. The diagram clearly again shows that, despite its popularity, the Hough transform performs significantly worse (with a value of 4) than the best performing classifiers in the context of the corpus callosum classification problem.

## 9. Discussion and conclusion

Referring back to Section 8 all four algorithms performed well, although the time series approach produced the best results for the musicians and handedness studies, while the graph based approach produced the best results with respect to the epilepsy study. Using the Friedman statistic and the post hoc Nemenyi test incited that the graph based technique provided the best overall performance (with the time series approach coming in second). It is interesting to note that, for all the data sets, visual inspection of the MRI scans does not indicate any obvious distinguishing attributes with respect to the size and shape of the corpus callosum. It is also interesting to note that the HT, although popular in the literature, performed consistently badly with respect to all of the above evaluation studies suggesting that generating shape signatures using the HT is not a technique to be recommended in the context of object based image classification.

Thus, in summary, four techniques for single object based image classification have been described. Although the work described focused on the classification of MRI brain scan data according to a particular object (the corpus callosum) that features within this data, the approach clearly has more general applicability. The main findings were that the graph based approach produced the best performance followed by the time series based approach. The HT based approach produced the worst performance in all cases. With respect to future work the research team are interested in developing techniques to trace the cause of a particular classification back its origin (back to a particular part of the corpus callosum).

## Notes

- The midsagittal slice is the middle slice of a sequence of MRI slices.