Open access peer-reviewed chapter

Incorporating Domain Knowledge into Medical Image Mining

By Haiwei Pan

Submitted: November 30th 2011Reviewed: May 27th 2012Published: August 29th 2012

DOI: 10.5772/50207

Downloaded: 2016

1. Introduction

Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases [2]. A vast amount of image data is generated in our daily life and each field, such as medical image (CT images, ECT images and MR images etc), satellite images and all kinds of digital photographs. These images involve a great number of useful and implicit information that is difficult for users to discover.

Image mining can automatically discover these implicit information and patterns from the high volume of images and is rapidly gaining attention in the field of data mining. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon computer vision, image processing, image retrieval, machine learning, artificial intelligence, database and data mining, etc. While some of individual fields in themselves may be quite matured, image mining, to date, is just a growing research focus and is still at an experimental stage.

Broadly speaking, image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images and between image and other alphanumeric data. For example, in the field of archaeology, many photographs of various archeological sites have been captured and stored as digital images. These images, once mined, may reveal interesting patterns that could shed some lights on the behavior of the people living at that period of time. Clearly, image mining is different from low-level computer vision and image processing techniques. The focus of image mining is in the extraction of patterns from a large collection of images, whereas the focus of computer vision and image processing techniques is in understanding and/or extracting specific features from a single image. While there seems to be some overlap between image mining and content-based retrieval (since both deals with large collection of images), image mining goes beyond the problem of retrieving relevant images. In image mining, the goal is the discovery of image patterns that are significant in a given collection of images and the related alphanumeric data. Perhaps the most common misconception of image mining is that image mining is yet another term for pattern recognition. While the two fields do share a large number of common functions such as feature extraction, they differ in their fundamental assumptions. In pattern recognition, the objective is to recognize some specific patterns; whereas in image mining, the aim is to generate all significant patterns without prior knowledge of what patterns may exist in the image databases. Another key difference is in the types of patterns examined by the two research fields. In pattern recognition, the patterns are mainly classification patterns. In image mining, the patterns types are more diverse. It could be classification patterns, description patterns, correlation patterns, temporal patterns, and spatial patterns. Finally, pattern recognition deals only with pattern generation and pattern analysis. In image mining, this is only one (albeit an important) aspect of image mining. Image mining deals with all aspects of large image databases which imply that the indexing scheme, the storage of images, and the retrieval of images are all of concerns in an image mining system[3]. A few interesting studies and successful applications involving image mining have been reported. For example, [4] describes the CONQUEST system that combines satellite data with geophysical data to discover patterns in global climate change. The SKICAT system [5] integrates techniques for image processing and data classification in order to identify “sky objects” captured in a very large satellite picture set. A multimedia data mining system prototype MultiMediaMiner [2, 6] uses a data cube structure for mining characteristic, association, and classification rules. However, the system does not use image content to the extent we wanted. In [7], localization of the visual features, their spatial relationships and their motion in time (for video) are presented. A discovering association rules algorithm based on image content from a simple image dataset is presented in [8]. [9]

Research in image mining can be broadly classified into two main directions [3]. The first direction involves domain-specific applications where the focus is to extract the most relevant image features into a form suitable for data mining [10, 11]. The second direction involves general applications where the focus is to generate image patterns that maybe helpful in the understanding of the interaction between high-level human perceptions of images and low level image features [2, 8, 12]. Clustering medical images belongs to the first direction.

Image clustering is unsupervised classification of images into groups. The problem in image clustering is to group a given collection of unlabeled images into meaningful clusters according to the image content without a priori knowledge [13]. The fundamental objective for carrying out image clustering in image mining is to acquire content information the users are interested in from the image group label associated with the image[3]. Image clustering is usually performed in the early stages of the mining process. Feature attributes that have received the most attention for clustering are color, texture and shape. Generally, any of the three, individually or in combination, could be used. There is a wealth of clustering techniques available: hierarchical clustering algorithms, partition-based algorithms, mixture-resolving and mode-seeking algorithms, nearest neighbor clustering, fuzzy clustering and evolutionary clustering approaches. Once the images have been clustered, a domain expert is needed to examine the images of each cluster to label the abstract concepts denoted by the cluster. Chang et al. use clustering technique in an attempt to detect unauthorized image copying on the World Wide Web [3, 14]. Yu and Zhang present an unsupervised clustering and query approach (also known as ACQ for Automatic Clustering and Query) for large-scale image databases [15]. ACQ does not require the number of clusters to be known a priori and is insensitive to noise. By intelligently applying wavelet transforms on the feature space, this clustering can effectively and efficiently detect clustering of arbitrary shape of high dimensional feature vectors. Kitamoto apply clustering methods such as k-means and the self-organizing map (SOM) for visualizing the distribution of typhoon cloud patterns on a two-dimensional space [3, 11].

Some algorithms used in the medical images [16, 17, 18] are generally for classification. [19, 20, 21] propose some new clustering methods in relational databases. They are not suitable to cluster the medical images because there are important differences between relational databases versus image databases: (1) Absolute versus relative values. In relational databases, the data values are semantically meaningful. For example, one item is milk is well understood. However, in medical image databases, the data values themselves may not be significant unless the domain of medicine supports them. For example, a grey scale value of 46 could appear darker than a grey scale value of 87 if the surrounding context pixels values are all very bright. (2) Spatial information (Independent versus dependent position) [9]. Another important difference between relational databases and medical image databases is that the implicit spatial information is critical for interpretation of image contents but there is no such requirement in relational databases. As a result, image miners try to overcome this problem by extracting position-independent features from images first before attempting to mine useful patterns from the images[3]. (3) Unique versus multiple interpretation. A third important difference deals with image characteristics of having multiple interpretations for the same visual patterns. The traditional data mining algorithm of associating a pattern to a class (interpretation) will not work well here. A new class of discovery algorithms is needed to cater to the special needs in mining useful patterns from images[3].

Association rule mining from medical images is another significant research topic in the field of image mining. Finding these valuable rules is typically done in two steps: discovering frequent itemsets and generating association rules. The second step is rather straightforward, and the first step dominates the processing time, so we explicitly focus this chapter on the first step. A number of efficient association rule mining algorithms have been proposed in the last few years. Among these, the Apriori algorithm by Agrawal, R., Imielinski, T [22][23] has been very influential. Later, many scholars have improved and optimized the Apriori algorithm and have presented new Apriori-like algorithms [24][25][26][27][28]. The Apriori-like algorithms consist of two major procedures: the join procedure and the prune procedure. These algorithms require a huge calculation and a complicated transaction process during the two procedures. Therefore, the mining efficiency of the Apriori-like algorithms is not very good when transaction database is very large.

L.Jaba Sheela proposed the FAR algorithm [29] in 2009. This algorithm transforms a transaction database into a Feature matrix stored in bits. Meanwhile it uses the Boolean vector “relational calculus” method to discover frequent itemsets. This method uses the fast and simple “and calculus” in the Feature matrix to replace the calculations and complicated transactions that deal with large number of itemsets [32]. It scans the database only once and has a good efficiency. But there is a great shortcoming in FAR algorithm. The generating of candidate itemsets relies on the combination of column of feature matrix. When the number of column of feature matrix is very large, that is to say the number of items in transaction database is very large, FAR algorithm will cost much time to do useless calculus.

In this chapter, we firstly quantify the domain knowledge about brain image (especially the brain symmetry), and then incorporate this quantified measurement into the clustering algorithm. Our algorithm contains two parts: (1) clustering regions of interest (ROI) detected from brain image; (2) clustering images based on the similarity of ROI. We apply the method to cluster brain images and present results to demonstrate its usefulness and effectiveness[1]. Secondly, we proposed the GMA (association graph and matrix pruning algorithm) algorithm to solution this problem. Yen and Chen proposed the DLG (Direct Large Itemset Generation Algorithm) algorithm [30][31] based association graph for the first time in 1996. GMA algorithm adopts both association graph and matrix pruning to reduce the generation of candidate itemsets. It also scans database only once and generates frequent itemsets by the “and calculus’’. Throughout the chapter we try to provide a general framework to understand these approaches. We believe many of the problems we are facing are likely to appear in other domains. As such this work tries to isolate those problems which we consider will be of most interest to the database community doing research on clustering [9] and association rule mining.

The rest of the chapter is organized as follows: section 2 is pre-processing describing an algorithm to detect objects in medical images [9]. Section 3 presents the medical image clustering algorithm. This section includes four parts: (1) Feature extraction is to extract the most relevant features from the object with the direction of domain knowledge; (2) Object clustering. We define the similarity measurement according to the above features and present OCA algorithm to cluster objects into some groups; (3) Image clustering. We firstly determine weights of objects that appear in images based on term frequency and inverse document frequency, similar to IR. Each image will then be represented by a vector, where a vector contains a set of weights that correspond to the importance of the objects that appear in the image. Finally ICA algorithm is presented to cluster medical images; (4) Experiment results. This part reports the results of our experiments and performance study. Section 4 presents the association rule mining algorithm. This section includes three parts: (1) Basic concept. We define Support, Confidence, Association Graph and Feature Matrix, etc. to describe this problem; (2) GMA algorithm. The GMA algorithm consists of four phases as follows: generating feature matrix and association graph, pruning the feature matrix, selecting and extending by the association graph and generating the set of frequent-k itemsets; (3) Experiment results give the experiment results and analysis. Section 5 introduces the future work. Section 6 concludes the study in this chapter.

2. Preprocessing

Since the images we studied were raw Computerized Tomography (CT) scans that were scanned at different illumination conditions, some of them appeared too bright and some were too dark. We should digitize them to no loss, no compression and 256 gray scale images through special medical scanner. We used CT scan images because this modality is the most used in radiotherapy planning for two main reasons. The first reason is that scanner images contain anatomical information which offers the possibility to plan the direction and the entry points of the radiotherapy rays which have to target the tumor and to avoid some risk organs. The second reason is that CT scan images are obtained using rays, which is the same physical principle as radiotherapy. This is very important because the radiotherapy rays intensity can be computed from the scanner image intensities[33].

In this section, we firstly use progressive water immersion method with guidance of domain knowledge to detect region of interest (ROI) in medical images, then we combine these ROIs with their location, size and other descriptors to form a table for mining.

Water immersion algorithm is considered to be a powerful technique for ROI detection. It works by grouping pixels with similar gradient information. Direct application of water immersion method to the digitized medical images typically produces over-segmentation of the trivial regions. Instead, we propose a progressive water immersion algorithm with guidance of domain knowledge to cope with this situation [34]. Details of the algorithm follow.

First, a N×N window is used to locate the local optimal points in the image. For each segmented patches, we place the center of the window over each pixel in the patches. If the grey level of the central pixel is optimal with respect to all the other pixels in the window, we say that the central pixel is a local optimum; otherwise, the window will move to be centered at another pixel to continue the search for all local optimal points. At the end of this phase, all the optimum is marked and they will be treated as the starting seeds for water immersion method. One advantage of using the sliding window approach is that with the appropriate window size, it is possible to eliminate a large amount of optimal points that correspond to the light and dark reflection regions thus removing false detection. This is because the grey level of the optimal points corresponding to the light and dark reflection patches are generally lower and higher than that of potential ROI. Given that the distances between the optimal points of the light and dark reflection patches and the nearest optimal points of the neighboring ROI are generally less than that between two touching ROI, it is possible to set the window size in such a way that these false optimal points are ‘absorbed’ by the neighboring ROI optimal points while the true optimal points are not affected.

Having identified the true optimal points, water immersion process starts from these detected points and progressively floods its neighboring pixels. The neighboring pixels are defined to be the 8-direction neighbors. These neighbors are placed in a growing queue structure sorted in descending order of the grey level of the pixels. The lowest and highest grey pixel in the growing queue will be ‘immersed’ first but respectively and it is marked as belonging to the same region label as the current seed. The marked pixel is then removed from the growing queue. All neighboring pixels whose grey level is lower or higher than the marked pixel are added to the growing queue. This immersion process continues until the growing queue is empty [9].

Unfortunately, simple application of the water immersion technique has the tendency of over-flooding. To overcome this problem [9], we firstly give some definitions.

  1. We partition all pixels in pixel set P into m blocks. Pixels in the same block have the same grey level and pixels in the different blocks have the different grey level. Let G(P)={g1, g2, …, gm} be P’s grey-scale (GS) set if G(P) is an ascending sort set of g1’, g2’, …, gm’ and gi’ is grey level of pixels in the ith block, where gi (i=1,…,m) is the ith GS, g1’ and gm’ are P’s minimum and maximum GS respectively. The GS of pixel pi is denoted as g(pi) [33].

  2. We call gmean(P) the mean GS if

|P|
gmean(P)=g(pi)/|P|.
i=1

  1. For any P and distance function DisA=|gk - gmean(P)|, mid-value GS is a middle value in the GS set that minimizes DisA. Mid-value GS set is a set of mid-value GS [33].

  2. For any P, if

    1. Mid-value GS set includes one element gmid, and gs is the minimum value between gmean and gmid;

    2. Mid GS set includes two elements, gs is the minimum value between these two values; gs is called Benchmark GS and another one is denoted as gs’ [33].

  3. For pixel set P, let

g(l)={gi | g1 gi g1+|g1-gs|/2} be low bound GS;

g(h)={gi | gm-|gm-gs’|/2gigm} be high bound GS;

g(b)= g(l) g(h) be bound GS; [33]

Our progressive water immersion ignores all those pixels whose grey level doesn’t belong to the bound GS. Bound GS is defined with guidance of domain knowledge that describes the degree of dark and light. So the optimality of point in a certain region is defined as follows [9]:

optimality={ maximal grey level, if point belongs to high bound pixel set ;minimal grey level, if point belongs to low bound pixel set ;E1

The pseudo-codes for the progressive water immersion algorithm are given as follows.

Input: medical image
Output: objects in this medical image

WHILE(image scan not finished)
{
IF(the first scan)
initialize slide window;
ELSE
relocate initial position of the slide window;
WHILE (1)
{
compare pixel grey value in the window and find the optimal point;
IF (the optimal point is the center of the window)
this point is stored as seed;
Break;
ELSE
move the slide window and make the optimal point as the center of window;
}
}
Count=1;
Con_th= seed-to-pixel contrast threshold;
FOR(each seed)
{
FOR (each 8-directional neighbor pixel of the seed)
IF (absolute value of seed-to-neighbor pixel contrast is less than Con_th)
{
push this pixel into queue and mark this pixel;
Count++;
}
WHILE (Count!=0)
{
sort pixel in queue in descent order according to the grey value;
pop the last pixel from queue;
Count--;
FOR (each 8-directional neighbor pixel of the last pixel)
IF(absolute value of seed-to- neighbor pixel contrast is less than Con_th)
{
push this pixel into queue and mark this pixel;
Count++;
}
}
}

After the above process, all the ROIs are detected and we will call them objects later, see figure 1. Next, images with many different objects are represented by transactions and we use a table to describe these transactions, see table 1 [9].

Figure 1.

Figure (a) and (c) are two original abnormal brain images. Progressive water immersion algorithm is used to mark the objects with dotted line in figure (b) and (d).

In table 1, the first column is the medical image id. The second column is the objects in each image. The other columns are the features extracted from the object.

Image IDObject IDfeature1feature n
IM1O1feature1_vfeaturen_v
IM1O2
IM1O3
IM2O1
IM2O2
IMnOn

Table 1.

Images are modeled by transactions

3. Medical image clustering

By applying progressive water immersion algorithm, we segment images into objects. Let IM={IM1, IM2, …, IMN} be a image set. After the above algorithm, Each image IMj contains k objects Rj1, Rj2, …, Rjk. For different image, k may be not equal. Let the total of objects in IM be M, then we denote the object set as R={ R1, R2, …, RM}.

3.1. Feature extraction

For each extracted object Ri, we need to extract relevant information for features mining to take place. The domain knowledge of the brain image characteristics indicates that the normal persons have nearly the same brain structure that is evident to be bilateral symmetry. That is, the distribution of density in the left hemisphere of the brain is almost identical with the right, see figure 2 [9]. The pathological regions result in irregularly shaped grey level distribution in the CT scan images and destroy the symmetry, see figure 1. At this point, the following relevant features are able to provide sufficient discriminative power to cluster objects into different groups: (1) grey level of the object of interest; (2) area of the object of interest; (3) location of the object of interest; (4) elongation of the object of interest; (5) direction of the object of interest; and (6) symmetry of the object of interest.

Figure 2.

Normal person’s brain image

With the above method, the object extracted from brain image is either brightness or darkness. So we define grey level as GL= 0 (brightness), or 1 (darkness). The second feature we have found useful relate to the size of the region in the form of the area of the region. Area of the region of interest is defined as the total number of pixels within the region, up to and including the boundary pixels. The location of an object of interest is defined as a ratio. The coordinates of the centroid of the object [9] are computed with the formula (1).

x¯=1kj=1kxjy¯=1kj=1kyjE2

k is the number of the pixels of the object and the location is (x¯/|x|, y¯/|y|).

The fourth feature is the elongation of an object which is defined as the ratio of the width of the minor axis to the length of the major axis. This ratio is computed as the minor axis width distance divided by the major axis length distance, giving a value between 0 and 1. If the ratio is equal to 1, the object is roughly a square or is circular in shaped. As the ratio decreases from 1, the object becomes more elongated. Major axis is the longest line that can be drawn through the object. The two end points of the major axis are found by selecting the pairs of boundary pixels with the maximum distance between them. This maximum distance is also known as the major axis length. Similarly, the minor axis is defined as the line that it is perpendicular with respect to the major axis and the length between the two end points of the line intersected with the object is the longest which is called the width of the minor axis. The ratio, elongation, is a measure of the degree of elongation of an object.

elongation = lenmajor/ lenminor E3

where lenmajor is the major axis length of the object and lenminor is the minor axis length of the object. We establish a common coordinate with brain midline as y and perpendicular line going through the midpoint of midline as x. With the major axis of the object, we define the next feature, direction of the object, as the inclination of the major axis and x positive direction and [0,180] [9].

Before introducing the final feature, we will give some definitions.

  1. Pixel set of IMp is defined as P={pi|pi is the pixel with coordinate (xi, yi) in the image IMp}, P(L) and P(R) are pixel set of IMp(L) and IMp(R) respectively [33].

  2. For any pliP(L), prjP(R), they are symmetric pixel if the line between pli and prj is halved vertically by brain midline. They are denoted as pli and pri below [33].

  3. g(P) is IMp’s difference set if for any symmetrical pixel pli and pri, g(P)={ gi|gi=g(pli)-g(pri), i=1,2,…,|p|/2 }, where g(pli) and g(pri) are grey level of these two pixels respectively.

Now we define the sixth feature as following.

  1. An object of interest Rpk is symmetrical if for IMp and Rp1, Rp2, …, Rpk, P'=Pi=1kP(Rpi), the mean grey level of g(Rpk), mean(g(Rpk)), and the mean grey level of g(P’), mean(g(P’)), satisfy the following condition:

|mean(Δg(Rpk))  mean(Δg(P))| <e.E4

Theorem 1: If for IMp, an object of interest Rpk is symmetrical, then there must exist another object of interest Rpt that satisfies the following conditions: (1) Rpk and Rpt must lie in different side of the midline; (2) Rpk and Rpt are either two different objects or the same one.

Proof: According to the definition of symmetric pixel and object symmetry, condition 1 is evident. For the second condition, it is also evident for Rpk and Rpt to be two different objects. If Rpk bestrides two hemispheres and is symmetrical, it satisfies the definition of object symmetry. Actually, Rpt is the part of Rpk that lies in the other side of the brain midline. Both of them belong to the same object.

3.2. Object clustering

Now, we would like to determine similarity between two objects based on these features. For an object Ri, a vector Vi(vi,1, vi,2, …, vi,k) is constructed and similarity between object Ri and object Rj is as follows [9]:

SimR(Ri,Rj) =Δvij,1*(h=2k1vi,h*vj,h)/(h=2k1vi,h2*h=2k1vj,h2)E5

According to domain knowledge, we let vij,1 be GL. Two objects are not possible to be similar if one is very darkness and the other is very brightness. So we let vi,1 be 1 if vi,1 = vj,1, or 0 if vi,1 vj,1 in formula (3) [9]. Another similar rule is that if two objects Ri and Rj satisfy the second condition in theorem 1, they will be grouped into the same cluster. This rule is prior to the above similarity function.

It is difficult for us to know the number of the clusters in advance. So we use DBscan algorithm to group these objects. We use a non-negative threshold TR to construct object clusters. If similarity between two objects is smaller than TR, then the two objects can be in the same group [9].

  1. For an object Rj, its -neighborhood, denoted by -N(Rj), is defined by:

-N(Rj) = {XR| SimR(Rj,X) TR }

  1. Ri is core object (c-object) if its -N(Ri) involves no less than MP objects, that is, |-N(Ri)|MP.

  2. Ri is directly density reachable from Rj if Ri -N(Rj) and |-N(Ri)|MP (Ri is c-object).

  3. R1 is density reachable from Rk if there is a chain of objects R1, R2, …, Rk, any object Ri+1 is directly density reachable from Ri.

  4. Ri is density connected to Rj if there is an object Rk that is density reachable from not only Ri but also Rj.

Density-connectivity is a symmetric relation. For density reachable points, the relation of density-connectivity is also reflexive. The key idea is that for each object of a cluster the neighborhood of a given radius has to contain at least a minimum number of objects, i.e. the density in the neighborhood has to exceed some threshold. The shape of a neighborhood is determined by the similarity function. To find a cluster, OCA starts with an arbitrary object R and retrieves all objects density-reachable from R. If R is a core object, this procedure yields a cluster. If R is a border object, no objects are density-reachable from R and OCA visits the next object of the dataset.

If two clusters C1 and C2 are very close to each other, it might happen that some object R belongs to both C1 and C2. Then R must be a border object in both clusters because otherwise C1 would be equal to C2 since we use global parameters. In this case, object R will be assigned to the cluster discovered first. Except from these rare situations, the result of our algorithm is independent of the order in which the objects of the database are visited.

In the following, we present the Object Clustering Algorithm (OCA) [9].

Object Clustering Algorithm (OCA):
Input: object set R, TR and MP
Output: p clusters
Assume that the size of object set R is M and examine -N(R);
For k = 1 to M {
If (Ri is unclassified) Then
If (-N(Ri) involves more than MP objects) Then {
mark Ri as the core object;
Cluster_Expanding(Ri);
else mark Ri classified;
}
}

The function Cluster_Expanding() is most important for discovering arbitrary shape clusters. It is presented bellow [9].

Cluster_Expanding(Ri)

While(all core objects) {
mark Ri classified;
cluster all density reachable objects;
record all the core objects;
mark those non-core objects as classified;
}

The average run time complexity of this algorithm is O(MlogM) with spatial index. Otherwise, it is O(M2).

3.3. Image clustering

The OCA algorithm clusters these M objects into p groups which we denoted as RC={C1, C2, …, Cp}. Image clustering is based on image similarity. To calculate image similarity, we construct a vector Wi(w1,i, w2,i, …, wp,i) for an image IMi. wp,i is the weight of object cluster Cp in image i. In this vector we keep the weight of each group. Thus, the size of vector Wi is same as the total number of object clusters (=p). It is possible that the weight of a group may be zero. This is because no object of an image may be a participant of that group during clustering. To determine an image vector, we adopt the idea from the area of information retrieval. Here, images correspond to documents, and object clusters correspond to terms (keywords) [9].

  1. Let CIMj,i be the times of objects of cluster Cj in image IMi, IMCj be the number of images in which objects of cluster Cj appear. We define

iIMCj= log (N/IMCj) E6

where N is the size of the image set [9].

IMCj is used to evaluate the function of Cj to measure the similarity of two images. For iIMCj, the higher its value is, the more impact Ci has to distinguish two different images.

  1. For a vector Wi(w1,i, w2,i, …, wp,i), we define

wj,i= CIMj,i* iIMCjE7

After computing image vectors, we get similarity between any two images using cosine similarity:

SimIM(IMi,IMj) =(h=1pwh,i*wh,j)/(h=1pwh,i2*h=1pwh,j2)E8

The higher the value of SimIM is, the more similar the two images are [9].

When two microclusters, each of which includes more than two images, are to be grouped into a new cluster, we redefine the similar function to measure the similarity of two microclusters.

SimMC(MCi, MCj) =(h=1pw¯h,i*w¯h,j)/(h=1pw¯h,i2*h=1pw¯h,j2)E9

where W¯i(w¯1,i,w¯2,i,...,w¯p,i)is the centroid of all vectors within microcluster MCi.

The algorithm will stop when the images are clustered into k group.

Image Clustering Algorithm (ICA):
Input: Image set IM and the number of clusters k;
Output: k clusters
Each element of IM is regarded as an atomic cluster and compute SimIM;
Find the biggest SimIM and Amalgamate to form a new cluster;
While (the number of clusters is not equal to k) {
If all Rts in one image are symmetrical
Then cluster this image to special cluster;
Compute SimMC;
Cluster the sub-clusters or images; }

Table 6.

he average run time complexity of this algorithm for the worst case is O(n2). The clustering algorithm starts with each input image as a separate cluster, and at each successive step merges the closest pair of clusters. In order to compute the distance between a pair of clusters, for each cluster, c representative images are stored. These are determined by first choosing c well scattered image within the cluster, and then shrinking them toward the mean of the cluster by a fraction . The distance between two clusters is then the distance between the closest pair of representative images - one belonging to each of the two clusters. Thus, only the representative images of a cluster are used to compute its distance from other clusters.

The c representative images attempt to capture the physical shape and geometry of the cluster. Furthermore, shrinking the scattered images toward the mean by a factor gets rid of surface abnormalities and mitigates the effects of outliers. The reason for this is that outliers typically will be further away from the cluster center, and as a result, the shrinking would cause outliers to move more toward the center while the remaining representative images would experience minimal shifts. The larger movements in the outliers would thus reduce their ability to cause the wrong clusters to be merged. The parameter can also be used to control the shapes of clusters. A smaller value of shrinks the scattered images very little and thus favors elongated clusters. On the other hand, with larger values of , the scattered images get located closer to the mean, and clusters tend to be more compact [9].

3.4. Experiment results

The main reason why we study on real brain CT images instead of any simulative data is to avoid insignificance and uninterestingness and the reliability of the discovered knowledge [9]. On the other hand, it is because brain tissue is human’s advanced nerve center, its function is particularly important. The disease affecting the brain has received much attention in the domain of medicine. In China, about 40,000 to 60,000 persons suffer from brain tumor every year. Especially during these years, the incidence of brain disease (especially brain tumor) has increased significantly. Therefore, the early diagnosis of brain diseases is becoming more and more crucial and is directly working on patients’ treatment [33]. To have access to real medical images is a very difficult undertaking due to legally privacy issues and management of hospital. But with some specialists' help and support, we got 618 precious images and their corresponding diagnosis records which, for simplicity, we generalized to normal(N) and abnormal(A) [9].

Our algorithm is written in Visual C++ and compiled by Microsoft Visual Studio 6.0. All of experiments are performed on Acer computer using a 2.8GHZ Intel PC, 512MB of RAM, and 1024MB of virtual memory. The operating system in use was the Microsoft Windows XP.

To measure the quality of a cluster, we use precision, recall, and E measure. Recall is the ratio of relevant images to total images for a given category. Precision is the ratio of relevant images to images that appear in a cluster for a given category.

Precision(P)=number of images correctly classified into each class
number of total images

Recall(R)=number of images correctly classified into one certain class
number of images in one certain class

E measure is defined as follows:

E(p,r)=121/p+1/rE10

where p and r are the Precision and Recall of a cluster. Note that E (p, r) is simply one minus harmonic mean of the precision and recall; E (p, r) ranges from 0 to 1 where E (p, r) =0 corresponds to perfect precision and recall, and E (p, r) corresponds to zero precision and recall. Thus, the smaller the E measure values the better the quality of a cluster [9].

The existing image clustering methods generally include two parts: (1) extract related features from image to form feature vector; (2) use distance function as the image similarity measurement to cluster images. Therefore, we design an image feature-based method as the comparison with our ROI-based method (ICA). Firstly, two related features we extract from medical images are asymmetry and grey level mean difference. Then, k-means algorithm is used to cluster these medical images. In the experiment, we sample five patients’ images each time to test two clustering algorithms. Each patient has an image sequence. All the images in the same sequence, which are similar since they belong to the same patient, should be clustered into one group. In this way, we sample five times from the medical image dataset and get the average value of precision, recall and E value for each time, see figure 3-5. It is obviously that ICA is better than the image feature-based method because the precision and recall of ICA is higher than the image feature-based method and E value is lower.

Figure 3.

Precision for different number of sampling

Figure 4.

Recall for different number of sampling

Figure 5.

E measure for different number of sampling

In figure 3-5, x axis represents different sample time and the y axis represents p, r and E respectively. We have observed that precision and recall are higher for medical images.

Figure 6 shows an instance of our clustering algorithm.

Figure 6.

a) and (b) show two images of two different clusters

In any case fairly large medical data sets exist but they are not available to us. Also, it would be interesting to apply these ideas in other domains where large complex data sets are available [9].

4. Association rule mining

In this section, we proposed the improved algorithm that generating candidate itemsets based association graph and matrix algorithm (GMA). That is, the GMA algorithm’s candidate itemsets is the intersection of DLG and FAR algorithm’s candidate itemsets. Experiments show that, GMA algorithm reduced the candidate itemsets generation greatly and had higher efficiency compared with other algorithms. The DLG algorithm’s main idea is to construct a direct edge for every itemset of frequent-2 itemsets L2. So L2 can be mapped to a digraph, association graph. DLG algorithm uses the information of association graph to mine the frequent-k itemsets, where k is an integer. The FAR algorithm’s main idea is to map the transaction database to a matrix with elements values of ‘0’ and ‘1’. In order to mine frequent itemsets fast, it deletes some column and row to prune the matrix. Moreover, it must consider column of the matrix first, then consider the row of the matrix. The FAR algorithm also needs to generate the candidate itemsets. The candidate-k itemsets generate from k-vectors combination of columns of matrix. If the feature matrix has n columns, the number of candidate-k itemsets isCnk. Then it do “and calculus” for each combination of k-vectors. If the sum of element values in the “and” calculation result is not smaller than the minimum support, the k-itemsets corresponding to this combination of k-vectors are the frequent k-itemsets and are added to the set of frequent k-itemsets Lk [32]. DLG algorithm relies on the information of association graph to generate candidate itemsets, but when the transaction database is very large, the association graph is large, and the number of candidate itemsets also is very large. FAR algorithm generates the candidate-k itemsets by arbitrary k-vectors combination, that isCnk. The column of matrix is very large normally, so the number of candidate itemsets is also large.

4.1. Basic concept

A formal statement of the association rule is shown in Definition 1, 2 and 3.

  1. Let I = {I1, I2, …, Im} be a set of m distinct attributes, also called literals. Let D be a database, where each record (tuple) T has a unique identifier, and contains a set of items such that TI. An association rule is an implication of the form XY, where X, YI, are sets of items called itemsets, and X∩Y=. Here, X is called antecedent, and Y consequent [3].

Two important measures for association rules, support (s) and confidence (), can be defined as follows.

  1. The support (s) of an association rule is the ratio (in percent) of the records that contain X∪Y to the total number of records in the database.

  2. For a given number of records, confidence () is the ratio (in percent) of the number of records that contain X∪Y to the number of records that contain X.

Mining of association rules from a database consists of finding all rules that meet the user-specified threshold support s and confidence a.

  1. Let D be a transaction database. Lk is the set of frequent-k itemsets, where k is an integer, for each itemset of Lk, it occurs with a frequency that is greater than or equal to the user-specified threshold support, s.

  2. Association Graph.

For a directed graph G = {V, E}. V(G) is the collection of vertex in G. E(G) is the collection of edge in G. <Vi, Vj> is a direct edge that from vertex of Vi to Vj{i < j}. Suppose that Vi, Vj mapped to item Oi,Oj in transaction database DB. Moreover, G can meet the following conditions:

V(G) = {Vi |OiL1},

E(G) = {<Vi, Vj>|{OiOj}L2, i<j}

G is the association graph of the transactions database.

Property 1. Suppose the directed graph G is the association graph of image database D, the number of edge in G is equal to the number of frequent-2 itemsets |L2| in image database D.

  1. Association Graph Extend and Join

Suppose X = {O1O2…Ok} is a k-itemset of the database, if the association graph G of database has a direct edge e = <Vk,Vj>, X can be extended and joined to X = {O1O2…Ok,Oj}.

Property 2. Suppose X = {O1O2…Ok} is a k-itemset of the database, X = {O1O2…Ok,Oj} is a (k+1)-itemset which extended from k-itemset X by the association graph G. The number of X is equal to the outdegree of the vertex Vk in G.

  1. Feature Matrix

For a transaction database D, T is the transactions and O is the set of item in D. R is the binary relation from T = {T1T2,…,Tm} to O = {O1,O2,…,On}, R: if the item Oi occur in transaction Ti, set R(Ti,Oj) = 1; else set R(Ti,Oj) = 0. FM is the feature matrix of the transaction database D. We denote the feature matrix as FM = (rij)mn, (rij) = R(Ti, Oj).

For the sake of convenience, we denote the feature matrix FMmn and item Oj as Amn and Ij respectively.

4.2. GMA algorithm

This section proposed the GMA algorithm to mine the association rules for ROI in medical images database. In this section, we will give the algorithm details. In general, the GMA algorithm consists of four phases as follows: generating feature matrix and association graph, pruning the feature matrix, selecting and extending by the association graph, generating the set of frequent-k itemsets Lk(k>2).

4.2.1. Generating Feature Matrix and Association Graph

In the first phase, GMA algorithm has two steps. One is to transform the medical image transaction database into the feature matrix and get the set of frequent-1 itemsets L1. The other is to get the set of frequent-2 itemsets L2 and the association graph of medical image database.

Suppose that the mined medical image transaction database is D, with D having m images and n categories of ROI. First, GMA algorithm transforms the medical image transaction database into the feature matrix. Let T={T1,T2,…,Tm} be the set of transactions and I={I1,I2,…,In}be the set of items. So the D can be mapped to a matrix Amn relative to ROI which has m rows and n columns. Scanning the image transaction database D, if item Ij is in transaction Ti, where 1≤j≤n,1≤i≤m, the element value of Aij is ‘1’, otherwise the value of Aij is ‘ 0’ [32].

After transforming, GMA algorithm scans the ROI matrix, computes the supports of all items, and generates the set of frequent-1 itemsets L1. The support number Ij.support of item Ij is the number of ‘1’ in the jth column of the feature matrix Amn. If Ij.support is not smaller than the minimum support, Ij.support≥min_sup, Ij will be added to the set of frequent-1 itemsets L1. Otherwise the column of the jth will be deleted from the feature matrix.

The phase of Generating feature matrix can be detailed in Algorithm 1.

Algorithm 1: Feature Matrix Construction
Input: DB, min_sup.
Output: Feature Matrix A, L1.
Method:
for i = 1 to m
for j = 1 to n
set Aij = 0;
Scan the DB;
for j = 1 to n
for i = 1 to m
if item j in the ith transaction do begin
set Aij to 1;
end
end
L1= Ф;
Scan Feature Matrix A;
for i = 1 to n do
denote the 1’s counts of the column of Aj as sum(Aj);
if sum(Aj) ≥ min_sup
L1=L1∪{Ij};
else delete Aj from A;
end

The set of frequent-2 itemsets generates to construct the association graph. For each itemset Ii of L1, a node is allocated for item Ii and Ii.link = NULL. For every combination of Ii and Ij (i<j) in L1, the ith and jth column of feature matrix are two vectors, the “and” relational calculus Ii∧Ij is done. If the sum of element values in the “and” calculation result is not smaller than the minimum support number min_sup, sum(Ii∧Ij) ≥min_sup, a directed edge from item Ii to item Ij is constructed and add itemset {IiIj} to the set of frequent-2 itemset L2. So after generating the L2, we can get the association graph, see Algorithm 2 [32].
Algorithm 2: Association Graph Construction
Input: L1, min_sup.
Output: L2, Association Graph.
Method:
if L1 ≠Ф, do
allocate a node for every item Ii of L1, and do
Item[Ii].link = NULL; then do
produce every 2-vectors Ai, Aj(i<j) combination
B=Ai∧Aj;
if the 1 counts of vector B, sum(B) ≥ min_sup do
allocate a node p;
p.link = Item[Ii].link; p.Item = Item[Ij];
Item[Ii].link = p; L2 = L2 ∪{Ii,Ij};
end
end

4.2.2. Pruning matrix

In order to introduce the pruning principle clearly, we give two properties and their proof in the following.

Proposition 1. Let X is a k-itemset, |Lk-1(j)| presents the number of items ‘j’ in all frequent (k-1)-itemsets of the frequent set Lk-1. There is an item j in X. If |Lk-1(j)| < k-1, itemset X is not a frequent itemset.

Proposition 2. For each row vector Ai in the feature matrix of the transaction database D, If the sum of ‘1’ in a row vector Ai is smaller than k, it is not necessary for Ai attending calculus of the k- supports. [32]

Algorithm3:OptMatrix( )
Input: Lk-1, Feature Matrix Am(n.
Output: Matrix Bp(q.
Method:
Scan the matrix A;
for j = 1 to n
if |Lk-1(Aj)| < k-1
Delete the column Aj of Matrix A;
for i = 1 to m
Compute the 1’counts of Am;
if sum(Am) < k
Delete the row Am of Matrix;
for i = 1 to p
for j = 1 to q
Output the corresponding matrix Bij;

Pruning the matrix means deleting some rows and columns from it. According to the proposition 1 and proposition 2, we can get the pruning principle. The pruning principle can be described as Algorithm 3. First, the column of the feature matrix is pruned. This is described in detail as: Let I be the set of all items in the frequent set Lk-1, where k>2. Compute all |Lk-1(j)| where j∈I, and delete the column of correspondence item j if |Lk-1(j)| is smaller than k-1. Second recompute the sum of the element values in each row in the feature matrix. These rows of the feature matrix whose sum of element values is smaller than k are deleted from this matrix. [32]

4.2.3. Selecting and extending by association graph

GMA algorithm generates the candidate-k itemsets depending on selecting and extending the frequent-(k-1) itemset by the association graph. For a (k-1)-itemset of Lk-1, if it do not contain the item j which the corresponding column of feature matrix was deleted by pruning in the (k-1)th pass. GMA extends it to a k-itemset as a candidate-k itemset.

In order to generate candidate-k itemsets, GMA need to consider all itemsets of Lk-1. However, this procedure performed following the feature matrix pruning procedure. That is to say, when GMA mine the frequent-k itemsets, it must do the two steps, one is the feature matrix pruning, the other is selecting and extending frequent-(k-1) itemsets to generate the candidate-k itemsets. If there is a column of matrix has been deleted by optimizing matrix, we will not consider the itemset of Lk-1 which contains the corresponding of item. Otherwise, for each itemset {I1, I2, …, Ik-1} of Lk-1, finding edges that from vertex Ik-1 to other vertex in association graph. If there is an edge from vertex Ik-1 to vertex u, the itemset {I1, I2, …, Ik-1, u} is a candidate-k itemset. Not that all (k-1)-itemset needs to extend, this idea can be described by Algorithm 4.

Algorithm 4: SEJ( )Select to Extend and Join
Input: Lk-1, Association Graph.
Output: Candidate-k itemset Ck.
Ck =Ф;
Scan the Lk-1;
for i = 1 to | Lk-1|
if {I1, I2, …, Ik-1}∈Lk-1&& {I1, I2, …, Ik-1} not contain item Id which the corresponding column of matrix was deleted do begin
pointer = Item[Ik-1].link;
while (pointer ≠ NULL) do begin
u = pointer.Item;
Ck = Ck ∪ {I1, I2, …, Ik-1,Iu};
pointer = pointer.link;
end
end

4.2.4. Generating Frequent-k Itemsets Lk(k>2)

The most important phase in GMA algorithm is to generate the set of frequent-k itemsets Lk (k>2). In order to find the set of frequent-k (k>2) itemset, GMA algorithm firstly optimizes the matrix. Afterwards it generates the candidate-k itemsets using the information of association graph and matrix pruning. At last it verifies whether the candidate-k itemset is a frequent-k itemset. The details of above procedures are described as follows.

Proposition 3. |Lk| presents the number of k-itemsets in the frequent set Lk. If |Lk| is smaller than k+1, the maximum length frequent itemsets is k.

Most of algorithm for mining frequent-k itemsets’s terminal condition is that Lk-1 is null. However, GMA algorithm terminal condition is described by Property 3. that is to say, when GMA get the Lk, it examine the number of Lk, if |Lk| is smaller than k+1, GMA algorithm terminate to perform the the k+1th mining.

In order to generate the set of frequent-k itemsets Lk, GMA must obtain the candidate-k itemsets, then to verify by the “and calculus” operation. We can get the candidate-k itemsets by the Algorithm 4. GMA algorithm is an iterative to mine the frequent itemsets.

GMA algorithm does the “and” relational calculus to verify whether the candidate-k itemset is a frequent-k itemset of Lk. The “and” relational calculus is for combination of k vectors which corresponding to the candidate-k itemset {I1, I2, …, Ik-1, u}. Then make k=k+1, do this again to find all frequent-k itemsets, see Algorithm 5.

Algorithm 5: Frequent-k (k2)Itemset Generation
Input: Lk-1, min_sup.
Output: Lk.
Method:
while (|Lk-1| ≥ k) do begin
Lk=Ф;
OptMatrix(A );
SEJ(Lk-1, Association Graph);
for i = 1 to |Ck|
if Ck = {I1, I2, …, Ik-1,u}&&
sum(A1∧A2∧…∧Ak-1∧Au) ≥ min_sup do
Lk = Lk∪{I1, I2, …, Ik-1,Iu};
end
k = k+1;
end

4.3. Experiment results

In order to appraise the performance of the GMA algorithm, we conducted an experiment using the DLG algorithm, the FAR algorithm and GMA algorithm. These algorithms were implemented in C and tested on Windows XP SP3 Operating system, Mobile Intel Pentium 4 1.89GHz CPU,1024MB DDR RAM,VC++6.0 compiler platform. The test database T10I4D100K was generated synthetically by an algorithm designed by the IBM Quest project. The number of items N is set to 1000; |D| is the number of transactions; |T| is the averages size of transactions, and |I| is the average size of the maximum frequent itemsets. Fig. 7 presents the experimental results for different values of minimum supports.

Figure 7.

The performance comparisons of DLG,FAR and GMA

Experiment shows that, we can get the same set of frequent-k itemsets using three algorithms. However, the run_time of the algorithms execution is not same. DLG algorithm is better than FAR algorithm when the min_sup is small. FAR algorithm excels DLG algorithm as the value of min_sup increasing. Because when the min_sup increase, the efficiency of matrix pruning is better. In general, GMA algorithm is outperform the two algorithms whatever the value of min_sup is.

5. Further research

Based on the above work, the further research involves the following 3 parts:

  1. Semantic cluster concept generation. The above clustering method allows us to automatically obtain the information regarding the spatial layout, the area and the density of a specific cluster. Based on these information, we are able to define a few semantic cluster concepts, such as center cluster, left cluster, dense cluster, sparse cluster, big cluster, small cluster and so on.

  2. Semantic concept image indexing and retrieval. After the generation of cluster semantic concepts, semantic concept indexing of medical images is built to support high-level image retrieval based on these semantic concepts. Examples of such image retrieval are:“retrieval all the medical images which have dense cluster in the center of the image”, and “retrieval all the medical images in which the clusters located in the left and lower corners are all small ones”.

  3. Trends and patterns mining. Finally, it is desirable to produce some spatial and temporal trends and patterns of the patient who have the medical images with different time. To this end, we explore the pathology cluster information to discover any spatial and temporal trends and patterns of pathology development in terms of scale, area, time duration and location. These trends and patterns are potentially useful for better understanding of the pathology behavior. [3]

6. Conclusion

In this chapter, we firstly presented a progressive water immersion algorithm with guidance of domain knowledge to preprocessing the medical image set to detect objects in medical images. Then we proposed two new algorithms with guidance of domain knowledge to cluster the medical images. We quantified the domain knowledge and use them in the clustering algorithm. Secondly, we proposed GMA algorithm for mining association rules on medical images. GMA algorithm draws on the advantages both association graph and feature matrix pruning to reduce the candidate itemsets generation. In actually, it greatly reduces the candidate itemsets and has improved the efficiency of frequent itemset mining. Experiment shows that GMA algorithm can adapt and adjust better to the change of the value of min_sup. Moreover, GMA algorithm can be used for mining association rules on medical images effectively [9]. We have described the problems with a general form to provide a common framework for other problems appeared in other domains.

Acknowledgement

I would like to express my sincere gratitude to people for helping me during my study. I would especially like to thank my adviser, Jianzhong Li, for his being supportive all the time. I wish to thank Prof. Guisheng Yin, Prof. Jing Zhang, Prof. Qilong Han, Prof. for the advice and the kindness. I would also like to thank Mr. Xiaolei Tan, Ms. Chunxin Zhang, Ms. Guizhen Sun and Mr. Mingde Pan for the research and the support.

The chapter is partly supported by the National Natural Science Foundation of China under Grant No.60803036, No.60803037, Natural Science Foundation of Heilongjiang Province under Grant No.F200903, the Fundamental Research Funds for Central Universities No. HEUCFZ1010, the National High-tech R&D Program of China under Grant No. 2009AA01Z143, and Nuclear Safety & Simulation Tech. Lab Foundation under Grant No.HEUFN0802.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Haiwei Pan (August 29th 2012). Incorporating Domain Knowledge into Medical Image Mining, Data Mining Applications in Engineering and Medicine, Adem Karahoca, IntechOpen, DOI: 10.5772/50207. Available from:

chapter statistics

2016total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Discovering Fragrance Biosynthesis Genes from Vanda Mimi Palmer Using the Expressed Sequence Tag (EST) Approach

By Seow-Ling Teh, Janna Ong Abdullah, Parameswari Namasivayam and Rusea Go

Related Book

First chapter

Towards the Formulation of a Unified Data Mining Theory, Implemented by Means of Multiagent Systems (MASs)

By Dost Muhammad Khan, Nawaz Mohamudally and D. K. R. Babajee

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us