Incorporating Local Data and KL Membership Divergence into Hard C-Means Clustering for Fuzzy and Noise-Robust Data Segmentation Incorporating Local Data and KL Membership Divergence into Hard C-Means Clustering for Fuzzy and Noise-Robust Data Segmentation

Hard C-means (HCM) and fuzzy C-means (FCM) algorithms are among the most popular ones for data clustering including image data. The HCM algorithm offers each data entity with a cluster membership of 0 or 1. This implies that the entity will be assigned to only one cluster. On the contrary, the FCM algorithm provides an entity with a membership value between 0 and 1, which means that the entity may belong to all clusters but with different membership values. The main disadvantage of both HCM and FCM algorithms is that they cluster an entity based on only its self-features and do not incorporate the influence of the entity ’ s neighborhoods, which makes clustering prone to additive noise. In this chapter, Kullback-Leibler (KL) membership divergence is incorporated into the HCM for image data clustering. This HCM-KL-based clustering algorithm provides twofold advantage. The first one is that it offers a fuzzification approach to the HCM cluster- ing algorithm. The second one is that by incorporating a local spatial membership function into the HCM objective function, additive noise can be tolerated. Also spatial data is incorporated for more noise-robust clustering. pixels. Results of segmentation of synthetic, simulated medical and real-world images have shown that the proposed local membership KL divergence-based FCM (LMKLFCM) and the local data and membership KL divergence-based entropy FCM (LDMKLFCM) algorithms outperform several widely used FCM related algorithms. Moreover, the average runtimes of all algorithms have been measured via simulation. In all runs, all algorithms start from the same randomly generated initial conditions, as mentioned in the simulation section, and stopped at the same fixed point. The LDMKLFCM, LMKLFCM, standard FCM, MEFCM, and SFCM algorithms have provided average runtime of 1.5, 1.75, 1, 0.9 and 1 sec respectively. The simulation results have been done using Matlab R2013b under windows on a processor of Intel (R) core (TM) i3, CPU M370 2.4 GHZ, 4 GB RAM.


Introduction
Image segmentation is a principle process in many image, video, scene analysis and computer vision applications [1][2][3]. The objective of segmentation process is to divide an image into multiple separate regions or clusters which make it easier to recognize and distinguish different objects in image. Over the last few decades, several image segmentation methods have been developed. However, there is still no satisfactory performance especially in noisy images. This makes development of segmentation algorithms that are capable of handling noisy images an active area of research. The current segmentation methods can be classified into thresholding, region-detection, edge-detection, probabilistic and artificial neural-network classification and clustering [1][2][3]. Among the widely used are the hard and fuzzy-based clustering methods since clustering needs no training examples . Hard C-means (HCM) also called K-means clustering algorithm is an unsupervised approach in which data is basically partitioned based on locations and distances between various data points [4][5][6]. K-means partitions the data into C-clusters so that the distances between data within each cluster are as close as possible but as far as possible between data in different clusters. HCM clustering algorithm offers crisp segmentation in which each data point belongs to only one cluster. Thereby it does not take into consideration fine details of infrastructure of data such as hybridization or mixing. Compared with HCM algorithm, fuzzy C-means (FCM) algorithm is able to provide soft segmentation by incorporating membership of belonging described by a membership function [7,8]. However, one disadvantage of the standard FCM is not incorporating any spatial or local information in image context, making it very sensitive to additive noise and other imaging artifacts. To handle this problem, different techniques have been developed [9][10][11][12][13]. These techniques have involved spatial or local data information for the enhancement and regularization of the performance of the standard FCM algorithm. Local membership information has also been employed to generate a parameter to weight or modify the membership function in order to give more weight to the pixel membership if the immediate neighborhood pixels are of the same cluster [14]. HCM algorithm has also been fuzzified by involving membership entropy optimization [15][16][17].
In this chapter, HCM clustering algorithm is modified by incorporating local spatial data and Kullback-Leibler (KL) membership divergence [18][19][20][21][22]. The local data information is incorporated via an additional weighted HCM function in which the smoothed image data is used for the distance computation. The KL membership divergence aims at minimizing the information distance between the membership function of each pixel and the locally smoothed one in the pixel vicinity. The KL membership divergence thus provides an approach for regularization and fuzzification. The chapter is organized as follows. In Section 2, clustering problem formulation is overviewed. In Section 3, HCM clustering algorithm is described. In Section 4, several FCM-related clustering algorithms are explained. In Sections 5 and 6, the proposed local membership KL divergence-based FCM (LMKLFCM) and Local Data and membership KL divergence-based FCM (LDMKLFCM) clustering algorithm are discussed. In Section 7, simulation results of clustering and segmentation of synthetic and real-world images are presented. Finally Section 8 presents the conclusion.

Problem formulation
The objective is to cluster a set of observed data x n ; n ¼ 1; 2; ::; N f g where each data point is an M À dimensional real-vector called the feature or the pattern vector, i.e., x n ∈ R 1ÂM . For gray-scale image data, x n ; n ¼ 1; 2; ::; N f gis a row-wise concatenation of a 2-D image X pq ; p ¼ 1; 2; ::; P; q ¼ 1; 2; ::; Q È É . That is n ¼ p À 1 ð ÞQ þ q and the intensity-feature x n is a single-dimensional real-value, i.e., M ¼ 1. Clustering aims at partitioning theses N observations into C < N divisions, {μ 1 , μ 2 ,…,μ C } called C clusters or segments so as to make the entities or pixels in the same cluster as similar as possible and the ones in different clusters as dissimilar as possible. One approach to cluster these data is to minimize the withinclusters sum of squares of distances (WCSS) and to maximize the between-clusters sum of squares of distances (BCSS).

Hard C-means (HCM)
In hard C-means (HCM) algorithm also called the K-means one, the objective is to minimize the following function [4][5][6]15].
is the square of the Euclidian distance between the nth pixel feature x n of the image under segmentation and v i ∈ V ¼ v 1 ; v 2 ; …; v C f g called the center of the ith cluster given by where μ i is the ith cluster label and N i is its number of patterns in cluster i. In (2), it is clear that the pattern x n belongs to only one cluster which means that u in ∈ 0; 1 f gcalled the membership function is given by [15].
From (3), it is obvious that the HCM provides a crisp membership function u in ∈ 0; 1 f gor {False, True}. u in ∈ 0; 1 f g. Thus HCM algorithm does not take into account fine details of infrastructure Given x n , n ¼ 1, 2, …, N: Check if V t À V tþ1 2 > ε (negligible change); repeat 1-5 until convergence. such as hybridization or mixing of data which is important in data clustering and decision making. The algorithm is implemented by an iterative procedure as summarized in Table 1. 4. Related fuzzy clustering algorithms 4

.1. Conventional FCM
The fuzzy C-means (FCM) algorithm seeks to minimize the following objective function [7].
It is obvious that the difference between the FCM algorithm and HCM one is the incorporation of the exponent parameter m, called the fuzzification parameter, and if it is settled to 1, the FCM algorithm reduces to the HCM one. Thus, due to this exponent m, the membership of the nth pixel to the ith cluster, u in , can take on an infinite set of values from 0 to 1. Thus each nth pixel may belong to all clusters with equal membership values of 1=C which in this case we obtain too fuzzy membership function. Then the exponent parameter 1 < m is incorporated to control the degrees of fuzzification; the bigger the m, the more the fuzzification. Finally, the fuzzy membership u in should satisfy [7].
The membership u in and the cluster-center v i that minimize the FCM function in (4), subject to P C i¼1 u in ¼ 1∀n are given by [7].

Local spatial data-based FCM (LDFCM)
The neighboring pixels of an image are highly correlated and are thus highly expected to belong to the same cluster or object. To get benefit from this spatial data information, the standard FCM objective function in (4) has been modified by adding a weighted regularization function dependent on local image data information [10][11][12]. That is, the LDFCM objective function is given by where α is a weight to be experimentally selected by the user, m is a fuzzification parameter, x n ∈ X is the nth pixel of the locally-smoothed image, X, obtained in advance from the original one by X ¼ w x ð Þ * * X, where ** means two-dimensional convolution. The weights w x ð Þ can be equal or not provided that its centerweight is zero and are summed to unity. From (8), it is clear that the LDFCM aims at minimizing the standard FCM objective function plus another weighted modified FCM function acting as a regularization function. In this regularization FCM function, the distances are generated from the locally-smoothed image data instead of the original image data. Therefore, this correlates the clustering pixel x n with its immediate spatial neighboring pixels which biases the algorithm to provide clustered images with piecewise homogenous regions. The membership u in and the cluster-center v i functions of the LDFCM method are given by [10][11][12].
It is obvious from (9) and (10) that when α ¼ 0, the membership u in and the cluster-center v i become the ones provided by the standard FCM algorithm in (6) and (7). The advantage of the LDFCM method arises from involving the locally-smoothed data α x n in computing the membership u in and the cluster-center v i functions which indeed can handle additive noise.

Spatial-based fuzzy C-means (SFCM)
An approach to incorporating local spatial data information into the standard FCM has been presented in [13]. The objective function of the SFCM algorithm is given by where D in is a modified or weighted distance between the nth pixel and the ith cluster-center.
This modified distance is computed from the original or standard distance d in ¼ x n À v i k k 2 as follows where λ ∈ 0; 1 ½ is an experimentally selected weight, and f in is a spatial or local data function given by [13].
It is obvious from (12) that with λ ¼ 1, the SFCM clustering method reduces to the standard FCM method. The spatial data function f in is dependent on the original distances of the set of pixels N n in the immediate neighborhood of the nth pixel. If all pixels in the neighbor set do not belong to the ith cluster f in is maximum since the denominator is minimum while the numerator is maximum. This implies that f in causes D in to increase when the pixels of the immediate neighborhood of the nth pixel do not belong to the ith cluster. This increase of D in contributes to decreasing the membership u in for achieving and preserving the minim of the SFCM function in (11).
The membership u in and the cluster-center v i associated with the SFCM method are given by [13].
It is obvious from (14) that similar to the standard FCM, the membership u in is inversely proportional to the weighted distance D in , which again means that, increasing D in when the immediate neighboring pixels to the nth pixel do not belong to the ith cluster, decreases the membership function u in . From (15), however, it is clear that the SFCM algorithm computes the cluster-center v i in a similar way as the standard FCM method does. Hence, additive noise can still reduce the accuracy of cluster center v i obtained by the SFCM algorithm.

HCM incorporating membership entropy
The membership entropy has been incorporated into the HCM for fuzzification. The membership entropy-based FCM (MEFCM) algorithm has the following objective function [17].
where β > 0 is a weight experimentally selected to control the fuzziness of the entropy term. We still need U to be constrained to satisfy (5). It can be shown that the membership and the cluster-center that minimize (16) are respectively given by [17] It is obvious so far that the membership function of the nth entities provided by FCM, HCM and MEFCM algorithms depends upon the inverse of the square of the Euclidean distance which is a function of only x n with no data or membership information of the clustering entity's neighbors are involved. Hence, the FCM, HCM and MEFCM algorithms miss important spatial local data and membership information. Thus additive noise can degrade x n , v i and d in , thereby biasing the membership of a degraded entity to a false cluster.

HCM incorporating local membership KL divergence
In [18], an approach to incorporating local spatial membership information into HCM algorithm has been presented. By adding Kullback-Leibler (KL) divergence between the membership function of an entity and the locally-smoothed membership in the immediate spatial neighborhood, the modified objective function, called the local membership KL divergencebased FCM (LMKLFCM), is given by [18][19][20][21][22].
where γ is a weighting parameter experimentally selected to control the fuzziness induced by the second term in (19), u in ¼ 1 À u in is the complement of the membership function u in , π in and π in are the spatial local or moving averages of membership u in and the complement membership u in , functions respectively. These local membership and membership complement averages are computed by [18][19][20][21][22].
where N n is a set of entities/pixels falling in a square window around the nth pixel and N K is its cardinality. It is obvious that all entities in the window are weighted equally by w u ð Þ Other windows can be used such as Gaussian one provided that the weight of the windowcenter is 0 and the rest weights are summed to unity. The first term in (19) provides hardcluster labeling. It pushes the membership function toward 0 or 1. The KL membership and membership-complement divergences, in addition to providing fuzzification approach to HCM clustering, measure the proximity between the membership of a pixel in a certain cluster and the local average of the membership over the immediate spatial neighborhood pixels in this cluster. So, they push the membership function to the locally smoothed membership function π in . Therefore, this can smooth out additive noise and bias the solution to piecewise homogenous labels which leads to a segmented image with piecewise homogenous regions.
The minimization of the objective function J LMKLFCM in (19) yields u in and v i to be given, respectively, by [18]. It is obvious from (22) that u in is proportional to π in and the proportional parameter δ in is inversely proportional to the entity's distance d in and the maximum δ kn occurs when d kn ¼ 0.
It is clear that if γ ! ∞, u in ¼ π in = P C j¼1 π jn . Therefore, the resultant membership is independent of the data to be clustered but dependent on the initial value of the membership matrix U 0 and on the smoothing fashion. If u 0 in is generated from a random process greater than zero, then u t in versus the number of iteration t converges, because of recursive averaging and normalizing, to a normal distribution variable with mean equal to 1 which, in this case, means too much fuzzy membership function. This has been proved experimentally by using a synthetic image of 4 clusters and γ ¼ 10 10 : Finally, as shown by (23), the computation of the cluster-center v i is still independent of the local original data.

HCM incorporating local data and membership KL divergence
To incorporate local spatial data into the LMKLFCM objective function in (19), the following objective function has been proposed in [18].
Therefore, similar to (22) and (23), the membership function u in and the cluster-center v i are, respectively, given by [18].
It is obvious that the LDMKLFCM algorithm in (24)-(26) provides a membership that depends upon the local spatial data and membership information while the cluster center is dependent upon the locally-smoothed data. Thus the algorithm has twofold approach to handle additive noise.

Simulation results
This simulation aims at examining the performance of the conventional FCM, the membership entropy-based FCM (MEFCM), the spatial distance weighted FCM (SFCM), the local membership KL divergence-based FCM (LMKLFCM) and the local data and membership KL divergence-based FCM (LDMKLFCM) algorithms. It is to be noticed that all the algorithms can be implemented almost similar to the pseudo code in Table 1 by replacing the steps 3 and 4 by the corresponding computation of the membership function and cluster centers of each algorithm.

Clustering validity
To measure the performance of the fuzzy clustering algorithms, several quantitative measures or indices have been adopted in [23,25] and references therein. Few of these measures are the partition coefficient V PC and the partition entropy V PE index of Bezdek and Xie-Beni (XB index) V XB , given respectively by The closer of the V PC to 1, the better the performance since the minimization is constrained by P C i¼1 u in ¼ 1: The closer the V PE to 0, the better the performance since this means the less fuzziness of the membership and thus clusters are well-separated.
In synthetic images, in addition to the above clustering validity measures, several clustering validity and performance measures have also been used such as the accuracy, sensitivity and specificity given respectively by where T, F, P, and N are mean true, false, positive, and negative, respectively. The TP, FP, TN, and FN are computed as follows. While generating the synthetic image, the ground truth labels are formulated as the logical matrix given by [23].
where x n is the noise-free pixel in the synthetic image and 1 and 0 represent True and False, respectively. After the segmentation is done, the estimated labels are also formulated as logical matrices generated by [20].
Finally, the TP, TN, FP, and FN are given by [20].
where "__" means the logical complement.

Artificial image
In this simulation, the artificial or synthetic noise-free image shown in Figure 1(a) is degraded by adding zero-mean white Gaussian noise (WGN) with different variances. The noisy image  Table 2.
shown in Figure 1(b) is for 0.08 noise variance. We have studied the performance of the five algorithms, namely, the standard FCM, the membership entropy-based FCM (MEFCM), the spatial distance weighted FCM (SFCM), the local membership KLFCM (LMKLFCM) and the local data and membership KLFCM (LDMKLFCM) algorithms in segmenting these noisy images with m ¼ 2 and C ¼ 4. The parameters for the algorithms have been elected via simulation as β ¼ 1000 for MEFCM; λ ¼ 0:5 for SFCM; γ ¼ 1000 for LMKLFCM; and γ ¼ 1000 and α ¼ 0:5 for LDMKLFCM. For the computation of the locally smoothed data x n , a neighboring window of size 3x3 has been used. Also, the same spatial window has been used for the computation of the locally-smoothed membership function π in . The initial values of the membership functions U and the cluster-centers V are generated from a uniformly distributed random process with means 0.5 and equal to the image mean, respectively. We have collected results from 25 Monte Carlo runs of each algorithm. In each run, the initial values of U and V of the FCM are new random samples while the ones of the rest algorithms are generated by executing few number of iterations of the FCM algorithm. Simulation results, not included for space limitation, have shown that the algorithms provide further improvement with these initial values generated by the FCM algorithm than those randomly generated. Also, in each run, a new random sample of WGN is used in generating the noisy images. Figure 1(c-g) show the clustered images generated by the five algorithms in the case of 0.08 noise variance. These clustered images show that the LMKLFCM and the LDMKLEFCM algorithms provide the ones with lesser noise which means lesser number of misclassified pixels. Moreover, the LDMKLFCM algorithm offers the superior clustered image.  against noise variance. Figure 2 shows these measures versus noise variance. It is clear that both the LMKLFCM and the LDMKLFCM algorithms provide the superior performance among the five algorithms and the LDMKLFCM algorithm shows more noise-robustness.

Magnetic resonance image (MRI)
A simulated MRI of [26], illustrated by Figure 3(a), has been used as a noise-free image. It has been degraded by adding white Gaussian noise (WGN) with zero-mean and 0.005 variance to  Table 2 show that the LMRKlCM; and LDMKLFCM provide the maximum V PC and the minimum V PE .
generate the noisy MRI illustrated by Figure 3(b). This noisy MRI image has been clustered by the five algorithms. The parameters for all algorithms have been taken similar to the ones of the synthetic image simulation except, for the MEFCM algorithm, β ¼ 200 and, for both LMKLFCM and LDMKLFCM algorithms, γ ¼ 1000: We have also executed 25 runs of each algorithm. The initial values of u in and v i have been generated and adjusted as explained in the synthetic image simulation. Figure 3(c-g) shows the resulting clustered images provided by the five algorithms in a certain run. Table 2 shows the averages and standard deviations (μ AE σ) of the performance measures V PC and V PE of the five algorithms. It obvious that the LMKLFCM and LDMKLFCM provide the segmented images with lesser noise or lesser number of misclassified pixels, the maximum V PC and the minimum V PE .
A real MRI from [27], shown in Figure 4(a), has been considered as a noise-free image. To generate the noisy MRI shown in Figure 4 Table 2 show that the LMRKlCM; and LDMKLFCM provide the maximum V PC and the minimum V PE . been added. The noisy MRI has been clustered by the FCM, SFCM, MEFCM, LMKLCM and the LDMKLFCM algorithms. The parameters for all algorithms have been taken similar to the ones of the synthetic image simulation except, for the MEFCM algorithm, β ¼ 300 and, for both the LMKLFCM and LDMKLFCM algorithms, γ ¼ 800: We have also obtained the results of 25 runs of each algorithm. The initial values of u in and v i have been generated and adjusted as  mentioned in the synthetic image simulation. Figure 4(c-g) show the segmented images provided by the five algorithms in a certain run while Table 2 summarizes the averages and standard deviations (μ AE σ) of the performance measures. It is obvious that the proposed LMKLFCM and LDMKLFCM algorithms provide the segmented images with lesser noise or lesser number of misclassified pixels, the maximum V PC and the minimum V PE .

Lena image
A popular Lena image shown in Figure 5(a) has been considered as a noise-free image example. The noisy Lena image shown in Figure 5(b) has been generated by adding WGN noise with zero-mean and 0.01 variance. The parameters of the five algorithms have been adjusted to the values similar to the ones used in the previous simulations except C ¼ 2; β ¼ 1000 for the MEFCM algorithm; γ ¼ 2000 for the LMREFCM and γ ¼ 2000 and α ¼ 0:5 for the LDMREFCM algorithms. We have also executed 25 Mont Carlo Runs of each algorithm as explained above. Figure 5(c-g) shows the resulting segmented images obtained by the five algorithms. Visually investigation of the segmented images shows that the LMKLFCM and LDMKLFCM algorithms provide the images with lesser number of misclassified pixels. Table 2 shows the average and standard deviation (μ AE σ) of the performance measures of the five algorithms. It is also clear that the two algorithms provide the maximum V PC and the minimum V PE .

Conclusions
The hard C-means algorithm has been fuzzified by incorporating into the objective function spatial local information through two KL membership divergences. The first KL membership divergence measures the information proximity between the membership of each pixel and its local membership average in the pixel neighborhood. The second one measures the information proximity between the complement membership and its local membership average in the pixel neighborhood. For regularization, the local data information has been incorporated by an additional new weighted hard C-means function in which the noisy-image is replaced by a noise-reduced one. Such incorporation of both local data and local membership information facilitates biasing the algorithm to classify each pixel in correlation with its immediate neighboring pixels. Results of segmentation of synthetic, simulated medical and real-world images have shown that the proposed local membership KL divergence-based FCM (LMKLFCM) and the local data and membership KL divergence-based entropy FCM (LDMKLFCM) algorithms outperform several widely used FCM related algorithms. Moreover, the average runtimes of all algorithms have been measured via simulation. In all runs, all algorithms start from the same randomly generated initial conditions, as mentioned in the simulation section, and stopped at the same fixed point. The LDMKLFCM, LMKLFCM, standard FCM, MEFCM, and SFCM algorithms have provided average runtime of 1.5, 1.75, 1, 0.9 and 1 sec respectively. The simulation results have been done using Matlab R2013b under windows on a processor of Intel (R) core (TM) i3, CPU M370 2.4 GHZ, 4 GB RAM.