Open access peer-reviewed chapter

Data Clustering for Fuzzyfier Value Derivation

Written By

JaeHyuk Cho

Submitted: 24 November 2020 Reviewed: 03 February 2021 Published: 03 May 2021

DOI: 10.5772/intechopen.96385

From the Edited Volume

Fuzzy Systems - Theory and Applications

Edited by Constantin Volosencu

Chapter metrics overview

505 Chapter Downloads

View Full Metrics

Abstract

The fuzzifier value m is improving significant factor for achieving the accuracy of data. Therefore, in this chapter, various clustering method is introduced with the definition of important values for clustering. To adaptively calculate the appropriate purge value of the gap type −2 fuzzy c-means, two fuzzy values m1 and m2 are provided by extracting information from individual data points using a histogram scheme. Most of the clustering in this chapter automatically obtains determination of m1 and m2 values that depended on existent repeated experiments. Also, in order to increase efficiency on deriving valid fuzzifier value, we introduce the Interval type-2 possibilistic fuzzy C-means (IT2PFCM), as one of advanced fuzzy clustering method to classify a fixed pattern. In Efficient IT2PFCM method, proper fuzzifier values for each data is obtained from an algorithm including histogram analysis and Gaussian Curve Fitting method. Using the extracted information form fuzzifier values, two modified fuzzifier value m1 and m2 are determined. These updated fuzzifier values are used to calculated the new membership values. Determining these updated values improve not only the clustering accuracy rate of the measured sensor data, but also can be used without additional procedure such as data labeling. It is also efficient at monitoring numerous sensors, managing and verifying sensor data obtained in real time such as smart cities.

Keywords

  • fuzzifier value determining
  • sensor data clustering
  • fuzzy C-means
  • histogram approach
  • interval type-2 PFCM

1. Introduction

In the majority of cases, fuzzy clustering algorithms have been verified to be a better method than hard clustering in dealing with discrimination of similar structures [1], dataset in dimensional spaces [2], and is more useful for unlabeled data with outliers [3]. Fuzzy C-means proved to offer better solutions in machine learning, and image processing than hard clustering such as Ward’s clustering and the k mean algorithm [4, 5, 6, 7, 8, 9]. Generally, fuzzy c-mean has 66% accuracy while Gustafson-Kessel scored 70% [10]. Fuzzy c-mean is one of the most largely applied and modified techniques in pattern recognition applications [11] even though the sensitivity of fuzzy C-means is counted as a weak point of outcome to the prototypes and also the optimizing process [12, 13, 14].

Classification algorithms are generally subject to various sources of uncertainty that should be appropriately managed. Fuzzy clustering can be used with datasets where the variables have a high level of overlap. Therefore, membership functions are represented as a fuzzy set which can be either Type-I, Type-II or Intuitionistic.

Data are generated by a possible distribution or collected from various resources; Since Euclidean distance leads to clustering outcomes of spherical shapes, which is suitable for most cases, it is a top choice for many applications, it is the measurement used in most clustering algorithms to decide new centers [15].

Advertisement

2. Basic notions

  • Degree of membership: The degree of likelihood of one dataset belonging to several centers. The sum of membership degrees is equivalent to 1.

  • Data: Data can be categories, compounded or numbers. Data in matrix form contains themes and features of various units. For instance, value and time.

  • Clusters: Cluster is a group of data points or datasets that share similarities. Distance or distance norm is a mathematic interpretation of likeness. The point of the model clustering algorithms is the data structure.

  • Fuzzifier value: The fuzzifier value is essential to find the clustering membership function when the density or volume of a given cluster is dissimilar to those of another cluster. It is assumed that all of the relative distances to the cluster center are equally 0.5, which implies that the fuzzifier value m is 1 and take account of a decision boundary. With these explained conditions, the fuzzy area does not exist.

Figure 1(a) the case where a small m value is set in two clusters with different volumes. Because the section with a fuzzy membership value extends to a bulky C2 cluster, applying it to the C1 cluster allot a lot of relatively unnecessary patterns. Figure 1(b) large m value is set. It seems to have good performance since similar membership values are assigned, but the center value of the C1 cluster tends to move to the C2 cluster, Figure 1(c) Fuzzy area in accordance with Interval type-2 m value. Instead of the fuzzy area according to the value of m1 and m2 using the characteristics of the Interval type-2 membership set, uncertainty can be reduced and a proper fuzzy area for the cluster volume can be formed.

Figure 1.

Fuzzy area between clusters according to m. (a) the case where a small m value, (b) large m value is set, (c) instance of appropriate fuzzy area using Interval type-2.

As presented above, deciding the lowest and highest boundary range values of the fuzzier value extracted from particular data has been suggested by some methods. The following is about PFCM membership function for deciding the fuzzifier value’s range. The membership function at k-th data point for cluster i is presented in Eq. (1). dik/dij signifies Euclidean distance value between cluster and data point.

uik=1j=1cdik/dij2/m1E1

The neighbor membership values are computed, employing the membership value presented in Eq. (1) in order to decide the fuzzifier value’s range. Summarization with an expression including fuzzifier value indicates Eq. (2). It obtains the lower and upper boundary values of the fuzzy constant which includes the number of clusters as C and the fuzzifier value as m.

1+C1C2δΔm2logdlogδ1δ1c1+1where=dididiandδis thresholdE2
Advertisement

3. Conventional fuzzy clustering algorithm

3.1 Fuzzy C- means (FCM)

FCM includes the concept of a fuzzifier m being used to determine the membership value of data Xk in a specific cluster with cluster prototype. Specifically, the equation of FCM is consist of the cluster center vi and the membership value of data Xk, representing k = 1, 2…n and i = 1, 2…c, where n indicates the number of patterns and c indicates the number of clusters. FCM requests the knowledge of the initial number of desired clusters. The membership value is by the relative distance between the pattern Xk and the cluster center Vi. However, one of the main weaknesses by using FCM is its noise sensitivity as well as its limited memberships. The weighting exponent m; is referred to the being effective on the clustering performance of FCM algorithm [16].

3.2 PCM

In order to solve problems of FCM method, PCM uses a parameter given by value estimated from the dataset itself. PCM applies the possibilistic approach which obviously means that the membership value of a point in a class represents the typicality of the point in the class. It also means the possibility of data Xk in the class with cluster prototype Vi where k = 1, 2…n and i = 1, 2…c. Then, the noise points are comparatively less typical, using typicality in PCM algorithm. Furthermore, noise sensitivity is significantly reduced [17, 18]. However, the PCM algorithm also has the problem that the clustering outcome is sensitively reacted according to the initial parameter value [19].

3.3 PFCM

The PFCM algorithm is a mixture of PCM algorithm and FCM algorithm [20]. Although the representative value limit (or constraint = 1) was mitigated, the heat constraints on the membership value were preserved, so the PFCM algorithm generated both membership and possibility, and solved the noise sensitivity problem as seen in the FCM [21]. The PFCM is based on the fuzzy value m, which determines the membership value, and the PFCM also uses constants to define the relative importance of fuzzy membership and typicality values in the objective function. The PFCM utilizes more parameters to determine the optimal solution for clustering, which increases the degree of freedom and thus controls better results than the above-mentioned study. However, when considering fuzzy sets and other parameters in certain algorithms, we face the potential for fuzzy of these parameters. In this paper, we describe the fuzziness of the fuzzy value m and the possible value of the bandwidth parameter and generate FOU of uncertainty for both considering the fuzzy m interval, i.e. the m1 and m2 intervals and the fuzzy interval. Existing studies have been implemented to measure the optimal range along the upper and lower bounds of fuzzy values through multiple iterations [22]. This study is ongoing, but the same fuzzy constant range cannot be applied to all data [23].

3.4 Type-1 fuzzy set (T1FS)

Type 1 fuzzy logic was first introduced by Jade (1965). Fuzzy logic systems are based on Type 1 fuzzy sets (T1FS), and have demonstrated their capabilities in many applications, especially for control of complex nonlinear systems that are difficult to model analytically [24, 25]. Since the Type 1 fuzzy logic system (T1FS) uses a clear and accurate type 1 fuzzy set, T1FS can be used to model user behavior under certain conditions. Type 1 fuzzy sets deal with uncertainty using precise membership functions that users think capture uncertainty [26, 27, 28, 29, 30]. When the Type 1 membership function is selected, all uncertainties disappear because the Type 1 membership function is completely accurate. The Type 2 fuzzy set concept was presented by Jade as an extension of the general fuzzy set concept., i.e. a type 1 fuzzy set [31]. All fuzzy sets are identified as membership functions. In a type 1 fuzzy set, each element is identified as a two-dimensional membership function. The membership rating for Type 1 fuzzy sets is [0, 1], which is an accurate number. The comparison of membership function and uncertainty extracted from the result of the conventional fuzzy clustering algorithm is shown as below [32].

FCMJFCMVUX=i=1ck=1nuikmxkvi1<m<
PCMJPCMVUX=i=1ck=1nuikmdik2+i=1cηik=1n1uikm
η:scale,typicality η=k=1nuikmxkvi2k=1nuikm
FPCMJFPCMUTV=i=1ck=1nuikm+tikηxkvi2
PFCMJPFCMUTV=i=1ck=1nauikm+btikηxkvi2+i=1cδik=1n1+τikη
T1FCJT1FCXUC=i=1ck=1nujximdij2

Advertisement

4. Advanced fuzzy clustering algorithm

Fuzzy c-means (FCM) is an unsupervised form of a clustering algorithm where unlabeled data X = {x1, x2…, xN} is grouped together in accordance with their fuzzy membership values [33, 34]. Since, data analysis and computer vision problems, analyzing and dealing the uncertainties are a very important issue, FCM is being widely used in these fields. Several methods of other IT2 approach for pattern recognition algorithms have been successfully reported [35, 36, 37, 38, 39, 40, 41]. Type-1 fuzzy sets cannot deal uncertainties therefore; type-2 fuzzy sets were defined to represent the uncertainties associated with type-1 fuzzy sets. As shown in Figure 2, the type-reduction process in IT2 FSs requires a relatively large amount of computation as type-2 fuzzy methods increase the computational complexity due to the numerous combinations of embedded T2 FSs. Methods for reducing the computational complexity have been proposed, such as, the increase in computational complexity of T2 FSs may be less costly for improved performance by applying satisfactory results using T1 FSs. In [42], it was suggested that two Fuzzifier m values is used and the centroid type reduction algorithm for center update is incorporated for interval type-2 (IT2) fuzzy approach to FCM clustering. The IT2 FCM was suggested to clear up the complication with FCM for clusters with different number of volumes and patterns. Moreover, it was suggested that miscellaneous uncertainties were linked with clustering algorithms such as FCM and PCM [43]. Motivation of the success IT2 FSs has made on T1 FSs algorithms.

Figure 2.

(a) Cluster position uncertainty for T1FCM, (b) 1 T2 FCM, (c) QT2 FCM, (d) GT2 FCM algorithms.

4.1 Type-2 fuzzy set (T2 FS)

Due to their potential to model various uncertainties, Type-2 fuzzy sets (T2 FSs) have primarily received interest of increased research [44]. Type-2 fuzzy sets are characterized by a three-dimensional fuzzy membership function. The [0, 1] fuzzy set is the membership grade for each element of a type-2 fuzzy set. The extra third dimension provides extra degrees of freedom to get more information about the expressed term. Type-2 fuzzy sets are valuable in situations where it is difficult to resolve the exact membership function of the fuzzy set. This helps to incorporate uncertainty [45].

The computational complexity of the Type-2 fuzzy set is higher than that of the Type 1 fuzzy set. However, the results gained by the Type-2 fuzzy set are much better than those gained by the Type 1 fuzzy set. Therefore, if type-2 fuzzy sets can significantly improve performance (depending on the application), the increased computational complexity of the type-2 fuzzy sets can be an affordable price to pay [46].

4.2 Type-2 FCM (T2-FCM)

Type-2 FCM (T2-FCM), whose type-2 membership is promptly generated by extending a scalar membership degree to a T1-FS. When limiting the secondary fuzzy set to have a triangular membership function, T2-FCM extends the scalar membership uij to a triangular secondary membership function [47, 48].

4.3 General type-2 FCM

The GT2 FCM algorithm accepts a linguistic description of the fuzzifier value expressed as a set of T1 fuzzy- upper and lower value [49]. The linguistic fuzzifier value is denoted as a T1 fuzzy set of m. Figure 3 is shown as two examples of encoding the linguistic nation of the appropriate Fuzzifier value for the GT2 FCM algorithm using three linguistic terms.

Figure 3.

Two possible linguistic representation of the Fuzzifier M using T1 fuzzy sets. (a) membership value for a sample x′ (b) vertical slice x′.

4.4 Interval type 2 fuzzy sets (IT2 FSs)

In order to model uncertainty associated to a type-1 fuzzy set with an interval type 2 fuzzy set, a membership interval with all secondary grades of the primary memberships equaling to one can represent the primary membership Jx′ of a sample point x′ [18, 50].

Figure 3(a) represents an instance of an interval type 2 fuzzy set where the gray shaded region indicates FOU. In the figure, the membership value for a sample x’ is represented by the interval between upper μ¯A˜x', and lower μ¯A˜x' membership. Therefore, each x’ has a primary membership interval as

Jx'=μ¯A˜x'μ¯A˜x'E3

In the Figure 3(b) shown as the vertical slice x′, where the secondary grade for the primary membership of each x′ equals one, in accordance with the property of interval type-2 fuzzy sets. This interval is defined as the FOU. An interval type 2 fuzzy set A can be expressed as

A˜=xuμA˜(xu)xAuJx01xuμA˜(xu)=1E4

4.5 Interval type-2 FCM (IT2-FCM)

In fuzzy clustering algorithms such as FCM, the fuzzy fire value m plays a significant [50] role in determining clustering uncertainty. However, it is generally difficult to properly determine the value of m. IT2-FCM regards fuzzy fire values as intervals [m1, m2] and settles two optimization matters [51].

First, an interval type 2 FCM is used to obtain a rough estimate of which data points belong to which cluster.

In Eq. (3) is minimized with respect to uij to provide upper and lower membership values.

u¯jxi=1k=1cdij/dik2/m11,if1/k=1cdij/dik<1c1k=1cdij/dik2/m21,otherwiseE5
u¯jxi=1k=1cdij/dik2/m11,if1/k=1cdij/dik1c1k=1cdij/dik2/m21,otherwiseE6

After this cluster prototypes are calculated, then type reduction and then classification is done. Qiu et al. (2014) proposed this complete method of interval type-2 FCM for finding the clusters in each class of the histogram in individual dimensions is acquired with these labeled clusters. This histogram is smoothed by the mean of moving window (using a triangular window in my case). The curve fitting of this smoothed histogram gets the membership function. Histograms with values greater than the membership value are assigned as histograms for higher membership, and histograms for values less than membership value are saved as histograms for lower membership. Curve fitting is carried out severally in the top and bottom histograms to supply the top and bottom member values [52]. This membership value is suggested to estimate the values of fuzzifiers m1 and m2. Fixed-point iteration is a method of expressing the transcendental equation f(x) = 0 in the form of x = g(x) and then solving this expression iteratively for x in iterative relationship.

xi+1=gxi,I=0,1,2,E7

where x0 being some initial guess. Rewriting the equation to express Eq. (5) and (6) in the form of (7) and dropping the upper and lower term,

uj=1k=1cdij/dik2/m1E8
1uj=k=1cdij/dik2/m1

log on both sides, Eq. (8) can be rewritten as

log1uj=logk=1cdij/dik2/m1E9
loga+c=loga+log1+ca

Extending this logarithmic identity to the sum of N elements,

loga0+k=1Nak=loga0+log1+k=1Naka0E10
log1uj=2m1logdijd1j+log1+k=2cdijdik2/miold1E11

Rearranging Eq. (11) and expressing it in terms of m, gives us Eq. (12).

γ=log1ujlog1+k=2cdijdik2/miold1logdijd1jE12
mjnew=1+2γ

So, Eq. (13) gives m1jnew and m2jnew, where m1j new ≥ m2j new. Eq. (12) is used to calculate fuzzifier values of each data. In some cases, the value of fuzzifier of particular data shows relatively large variation. Here, upper (mupper) and a lower (mlower) fuzzifier is necessary, using Eq. (2). If the curtain data point has a fuzzy fire value below the lower bound, the fuzzy fire value is set to the mlower bound, and if it exceeds the upper bound, the fuzzy fire value is set to the mupper bound. In the end, a mean of these fuzzifiers is taken to get the last fuzzifier values m1 and m2.

4.6 Multiple kernels PFCM algorithm

Typically, the kernel method uses a spatial conversion function to convert input data from input property space to kernel property space [53]. This is to change the kernel property space to a kernel property space so that it is easy to distinguish between overlapping data and having a nonlinear boundary surface in the input property space. If the data in the input space is Xi,i=1,,N, the data converted to the kernel property space through the function is represented by ΦXj,j=1N. Alike as general PFCM, in the case of Kernels-PFCM, the goal is to minimize the following objective function.

Jϕ=k=1ni=1cauikm+btikη×dij2+i=1cγk=1n1tikηE13

In the input space for kernel K, the pattern xi and the distance dij in the kernel attribute space of cluster prototype vj are expressed as Eq. (14) by the kernel function.

dij=ΦxjΦvj2=ΦxjΦxj+ΦvjΦvj2ΦxjΦvj=Kxjxj+Kvjvj2kxjvjE14

Commonly, the new Gaussian multi-kernel k using a Gaussian kernel assumes a multi-kernel with the number of kernels S, and the formula is as follows [54].

k˜j=xjvj=l=1swilσlexpxjvj22σl2t=1swσtE15

From [55] way, using e FCM-MK, normalized kernel is defined to recognize weights by cluster prototypes, resolution and membership values. Using this optimization way, following PFCM objective equation should be minimized. By minimizing the objective function, cluster prototype vi, resolution-specific weight wil, and membership value uij are defined.

Jm,ηUTVX=2k=1ni=1c(auikm+btikη×1l=1swilσ2expxjvi22σl2×1t=1swσt+i=1cγik=1n1tikηE16

Here, ρ is a gradient descent way to learn rate parameter. Finally, using type reduction and hard partitioning, clustering is performed as described in the Interval Type-2 PFCM [56].

4.7 Interval type-2 fuzzy c-regression clustering

Let the regression function be represented by Eq. (17)

yi=fzxiαj=a1zx1i+a2zx2i++aMzxMi+b0zE17

where, xi = [x1i,x2i,. ..,xMi] represents points of data, the number of data indicates i = 1,.. .,n, the number of clusters (or rules) indicates j = 1,.. ., c, the number of variables in each regression indicates q = 1,.. .,M and the number of regression functions indicates z = 1,.. ., r. By aj, regression coefficients are denoted. We use weighted least square method (WLS) for calculating regression coefficients aj, In this way, membership grades of partition matrix P are worked for weights. In Eq. (18), Xi is a data point matrix with inputs, y is a data point matrix with outputs.

xi=x1,ix2,ixM,iT,y=y1,y2yMT,Pj=ujx100ujx10000ujx1E18
αj=XtPjX1XTPjy

The partition matrix P is acquired through Gaussian mixture distribution which is the first stage for computing regression coefficients. We consider two fuzzifiers or weighting exponent m1 and m2 for indicating the problem into IT2F. However, there is a difference that this model is FCM although our model is FCRM. These two fuzzy fires divide the objective function into two separate functions. The aim is to minimize the total error from Eq. (19) shows these two objective functions. It should be mentioned that the following proof is an extended and modified version of’type-1, which has been presented in [57].

Jm1Uυ=i=1nj=1Cujxim1EjiαjJm2Uυ=i=1nj=1Cujxim2EjiαjE19

Where type-1 FCRM, Eji is the total error, which indicates the distance between actual output and estimated regression equation, and it is presented by Eq. (20).

Ejiαj=yifjxiαj2E20

Eq. (21) represents the Lagrangian of the objective functions of IT2 FCRM model. We expend the type-1 NFCRM algorithm to interval type-2 NFCRM.

L1λ1uj=i=1nj=1Cujxim1Ejiαjλ1j=1cuj1L2λ2uj=i=1nj=1Cujxim2Ejiαjλ2j=1cuj1E21

The partial derivatives with respect to uj of Eq. (21) are set to 0 in Eq. (22) and (23) for minimizing the objective function.

L1u1xi=m1u1xim11E1iα1λ1=0L1uCxi=m1uCxim11ECiαCλ1=0E22
L2u1xi=m2u1xim21E1iα1λ2=0L2uCxi=m2uCxim11ECiαCλ2=0E23

Next, the partial derivatives with respect to k1 and k2 are performed.

L1λ1=j=1cujxi1=0E24

To adapt KPCM to IT2 KPCM, three steps are included. In other words, we update the prototype location via initialization, two different fuzzy devices, high and low membership or typicality value calculation, format reduction, and de-fuzzing for data patterns. In the way we propose, by using IT2FS, our point lies in the development of a prototype update process that can solve the cluster matching problem caused by KPCM. Cluster matching usually results in a set of patterns containing clusters that are relatively close to each other. This allows by definition a type 1 fuzzy set to obtain a type reduction via an embedded fuzzy set, but a type-reduced fuzzy set can be obtained by a combination of central intervals estimated from the embedded fuzzy set. This approach is a standard method for obtaining reduced fuzzy set types from IT2FS. However, this approach avoids due to its huge computational requirements, which include a number of embedded fuzzy sets. Therefore, we consider the KM algorithm as an alternative type reduction method. Since KM is an iterative algorithm which estimates both ends of an interval, calculating the left (right) interval vL (vR) can be found without using all of the embedded fuzzy sets.

Form KERNELS SFCM ALGORITHM in Figure 4,

Figure 4.

FOU representation for our proposed IT2 KPCM approach with m1 = 2, m2 = 5 and variance = 0.5; (a) FOU of cluster 1 (b) FOU of cluster 2 [58].

The kernel distance,

Φxkvi2E25

can be derived using the kernel way as

Φxkvi2=Kxkxk2j=1NuijmKxkxjj=1Nuijm+j=1Nl=1NuijmuilmKxjxlj=1Nuijm2E26

The inverse mapping of prototypes is also needed to approximate the prototypes expressions vi in the feature space. The objective equation can be written as

Vv̂ivi=i=1CΦv̂ivi=i=1C(Φv̂iTΦv̂i2Φv̂ivi+viTviE27

While, the final location for v̂i in the KPCM algorithm becomes,

v̂i=k=1NuikmKxkv̂ixkk=1NuikmKxkv̂iE28

The left (right) interval of the centroids can be found by employing the KM algorithm on the ascending order of a pattern set and its associated interval memberships. The result of the KM algorithm can be expressed as,

vi=1.0/vLvRE29

While the procedure to calculate the left value of interval set vL and right value vR, defuzzification is used next to calculate the crisp centers and is defined as the midpoint between vL, vR. We can now compute the defuzzified output that is a crisp value of the prototypes by using the expression.

vi=vJYiuvvvJYiuv=vL+vR2E30

Hard partitioning is used to classify test patterns using the resulting prototype of the procedure above. Euclidian distance is now used to hard partition patterns because the prototype is in feature space. The pattern is assigned to a cluster prototype with a minimum Euclidean distance. Experimental results presented in the following sections will demonstrate the validity of the proposed IT2 approach to KPCM clustering.

4.8 Interval type-2 possibilistic fuzzy C-means (IT2PFCM)

In order to solve the uncertainty existing in the fuzzifier value m in the general PFCM algorithm, Multiple Kernels PFCM algorithm should be extended to the Interval Type-2 fuzzy set. If there are N data, W set of resolution-specific weight, U partition matric, C clusters, V set of cluster prototype and S kernels, the cluster prototype can be obtained from minimizing the Gaussian kernel objective function as follows.

wilnew=wiloldρJwilE31
dij2=22i=1Swilσlexpxjvj22σl2t=1swσtE32

Where,

vi=22i=1Swilσlexpxjvj22σl2t=1swσtE33

The cluster prototype is calculated to optimize the objective function for the center vi of each cluster [23].

Where,

K¯ixjvi=i=1Swjlσl3expxjv2t=1SwσtE34

optimized membership value- the smallest membership value and the largest membership value for each pattern using the Interval Type-2 fuzzy set- is used for calculating the crisp value vi. In order to compute vR and vL, determination of the upper or lower bound of fuzzifier is essential. It is organized as follows by given Eq. (38) [59] .

JUVW=2i=1Cj=1Nuijmdij2E35

Using the final vR and vL, the crisp center value is obtained from defuzzification as follows.

ForvR,ifvi<kthenuij=u¯ijelseuij=u¯ijE36

Using the cluster Prototype vi, obtained through the optimization function and the membership value uij, the resolution-specific weight value wil is re-obtain as follows.

Jwil=2i=1Nuijmt=1SwtσtK(xjviK¯ixivjE37

Where

viR=j=1NuijmK¯ixj,vixjj=1NuijmK¯ixjviE38

To define the Interval Type-2 fuzzy set and calculate uncertainty for membership, the input data, the primary fuzzy set, is needed to assign into the Interval Type-2 fuzzy set. Eventually, the upper and lower membership function are created from the primary membership functions.

After calculating the upper and lower membership for each cluster, we need to update the new center values. The membership is obtained from the Type-2 fuzzy set, however, the center value is a crisp value, the value cannot be calculated from the above method. Therefore, in order to compute the center value, type reduction is performed by the Type-1 fuzzy set. In addition, defuzzification is accomplished to change the value of Type-1 to a crisp value.

Advertisement

5. Heuristic method: histogram analysis

The goal of heuristic method is to extract information from data, and then adaptively calculates the fuzzifier value. In this approach, some heuristic type- 1 membership function is used appropriately for given dataset. The parameters are defined as the upper and lower membership is decided according to following rules. First, given that the membership values are determined, the IT2 PFCM algorithm calculates roughly in which cluster the data belongs to and then secure a histogram based on the classified clusters. The histogram from IT2 PFCM tends to be gentler and smoother through the membership function by curve fitting of the same histogram. Curve fitting is enforced separately on upper and lower histograms to obtain upper and lower membership values. In order to reach to the IT2 FS, determination of FOU is necessary, which is generally the set of membership values of the T2 FS. Given that, the greater values of the histogram than the membership value are allocated as the highest membership histogram while the opposite case is calculated. Figure 5 shows histograms and FOU determined by classification and dimensional calculation. To find X, satisfying f (X) = 0, it can be expressed as X = g(X) using fixed-point iteration, where X is,

Figure 5.

FOU obtained for individual class and dimension updated fuzzifier value m1 and m2 are obtained (a) class 1 dimension 1, and (b) class 2 dimension 1.

Xi+1=gX,i=0,1,,NE39

Eq. (7) and (8) of the membership function ui can be shown in the form of Eq. (38) as follows.

ui=1dikdij2m1E40

Where fuzzifier value m is a value that determines the degree of final clustering fuzzifier as the value of the fuzzy parameter. This value of m1 and m2 is then applied into the algorithm for calculate updated clusters and this routine is repeated repeatedly. The detailed algorithm is as follows:

  1. Set the initial fuzzifier value of m1 and m2.

  2. Apply m1 and m2 to interval type-2 FCM and obtain the membership of data.

  3. Generate a histogram of each cluster from the membership.

  4. Curve fit the histogram to get primary memberships.

  5. Create histogram of upper and lower membership.

  6. Use curve fitting over upper and lower histograms to calculate upper and lower memberships.

  7. Normalize the memberships according to upper membership.

  8. Fuzzifier m1i and m2i are obtained using Eq. (13).

  9. Average m1i and m2i and update m1 and m2 from the average.

  10. The algorithm is iteratively performed using updated m1 and m2.

The Upper Membership Function (UMF) Histogram and Lower Membership Function (LMF) Histogram are drawn in Figure 5. A new membership function obtained from the Gaussian Curve Fitting (GF-F) method as.

From simply log process on both sides in Eq. (39), Eq. (40) can be expressed as follows:

log1u1=2m1logdkid1i+log1+j=2cdkidji2mdd1.E41

Rearranging Eq. (40) and calculate it in terms of m, gives us Eq. (41), (42).

γ=log1ujlog1+k=2Cdijdik2/mold1logdijdikE42
mjnew=1+2γE43

As in the above process, the membership value ui ∈ {ui(Xk)} and mjnew is used as a function to get the ui. Where Eq. (9) is applied to each clustered data and updated, m1inew and m2inew values is easily calculated, averaging the fuzzifier value by Eq. (42), the new fuzzifier value m1 and m2 are finally calculated as follow

m1=i=1Nm1i/N,m2=i=1Nm2i/NE44
Advertisement

6. Comparing performances algorithms

Algorithms can be compared in previous experiences using the following criteria:

Root Mean Squared Error (RMSE): The evaluation metric used by all algorithms of clustering is RMSE. RMSE is calculated by the root of the averaging all squared errors between the original data (X) and the corresponding predicted values data (X̅).

RMSE=k=1ni=1c(xikx¯iknE45

where n is the total number of patterns in a given data set and c is the number of clusters; xikandx¯ik the actual and predicted rating values data respectively.

Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions the model got right. Formally, accuracy has the following definition:

Accuracy=number of correct samplestotal number of samples100E46
Advertisement

Acknowledgments

This work was supported by Institute of Korea Evaluation Institute of Industrial Technology (KEIT, Next Generation Artificial Intelligence Semiconductor R&D Program) grant funded by the Korea government (Ministry of Trade, Industry & Energy, MOTIE) (Project No. 20010098, Development of Mixed Signal SoC with complex sensor for Smart Home Appliances).

References

  1. 1. M.J. Bayley, V.J. Gillet, P. Willett, J. Bradshaw and D.V.S. Green, “Computational Analysis of Molecular Diversity for Drug Discovery”, Proceeding of the 3rd Annual Conference on Research in Computational Molecular Biology, pp. 321–330, 1999.
  2. 2. J. M. Barnard and G.M. Downs, “Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures”, Journal of Chemical Information and Computer Science, vol. 32, pp. 644–649, 1992.
  3. 3. M. Feher and J.M. Schmidt, “Fuzzy Clustering as a Means of Selecting Representative Conformers and Molecular Alignment”, Journal of Chemical Information and Computer Science, vol. 43, pp. 810–818, 2003.
  4. 4. R. Guthke, W. Schmidt-Heck, D. Hahn and M. Pfaff, “Gene Expression data Mining for Functional Genomics using Fuzzy Technology” in Advances in Computational Intelligence and Learning Methods and Applications, Kluwer, pp. 475–487, 2002.
  5. 5. S.L. Rodgers, J.D. Holliday and P. Willet, “Clustering Files of Chemical Structures Using the Fuzzy k-Means Clustering Method”, Journal of Chemical Information and Computer Science, vol. 44, pp. 894–902, 2004.
  6. 6. Rohit Rastogi et al., “GA-Based Clustering of Mixed Data Type of Attributes (Numeric Categorical Ordinal Binary and Ratio-Scaled)”, BVICAM’s International Journal of Information Technology, vol. 7, no. 2, 2015.
  7. 7. M. Ramze Rezaee, B.p.f. Lelieveldt and J.h.c. Reiber, “A New Cluster Validity Index for the Fuzzy C-mean”, Pattern Recognition Letters, vol. 19, no. 3–4, pp. 237–46, 1998.
  8. 8. J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters”, J. Cybern., vol. 3, no. 3, pp. 32–57, 1973.
  9. 9. L. J. Hubert and P. Arabie, “Comparing Partitions”, J. Classification, vol. 2, pp. 193–218, 1985.
  10. 10. J. C. Bezdek, “Mathematical models for systematics and taxonomy”, Proceedings of the 8th International Conference on Numerical Taxonomy, 1975.
  11. 11. Veit Schwämmle and Ole N. Jensen, “A Simple and Fast Method to Determine the Parameters for Fuzzy c means Cluster Validation”, arXiv preprint arXiv:1004.1307, 2010.
  12. 12. J.C. Bezdek, “Cluster Validity with Fuzzy Sets”, J. Cybernet., pp. 58–73, 1974.
  13. 13. L. Zadeh, “Fuzzy sets”, Inform. Control, vol. 8, pp. 338–353, 1965.
  14. 14. U. Kaymak and M. Setnes, “Extended Fuzzy Clustering Algorithm”, ERIM Report Series Research in Management, pp. 1–23, 2000.
  15. 15. J. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Cluster”, Journal of Cybernetics, vol. 3, no. 3, pp. 32–57, 1973.
  16. 16. Şahinli, F., 1999. Kümeleme analizine fuzzy set teorisi yaklaşımı. Yüksek Lisans Tezi, Gazi Üniversitesi Fen Bilimleri Enstitüsü, Ankara, 119.
  17. 17. J.Bezdek,Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum, 1981.
  18. 18. J.ValentedeOliveiraandW.Pedrycz(Eds.),AdvancesinFuzzyClustering and its Applications. Hoboken, NJ: Wiley, 2007.
  19. 19. F. Ch-H. Rhee, “Uncertain fuzzy clustering: Insights and recommenda- tions,” IEEE Comput. Intell. Mag., vol. 2, no. 1, pp. 44–56, Feb. 2007.
  20. 20. Pal, N. R., Pal, K., Keller, J. M. and Bezdek, J. C., 2005. A possibilistic fuzzy c-means clustering algorithm. IEEE Transactions on Fuzzy Systems, 13, 517–530.
  21. 21. R. Hathaway, J. C. Bezdek, and W. Pedrycz, “A parametric model for fusing heterogeneous fuzzy data,” IEEE Trans. Fuzzy Syst., vol. 4, no. 3, pp. 270–281, Aug. 1996.
  22. 22. W. Pedrycz, J. C. Bezdek, R. J. Hathaway, and G. W. Rogers, “Two nonparametric models for fusing heterogeneous fuzzy data,” IEEE Trans. Fuzzy Syst., vol. 6, no. 3, pp. 411–425, Aug. 1998.
  23. 23. W. Pedrycz, “Shadowed sets: Representing and processing fuzzy sets,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 28, no. 1, pp. 103–109, Feb. 1998.
  24. 24. Zadeh L.A. (1973), Outline of a New Approach to the Analysis of Complex Systems and Decision Processes, IEEE Transactions on Systems, Man, and Cybernetics, 28–44, 1973.
  25. 25. KING, Peter J.; MAMDANI, Ehmat H. The application of fuzzy control systems to industrial processes. Automatica, 1977, 13.3: 235–242.
  26. 26. ZADEH, Lotfi A. From imprecise to granular probabilities. Fuzzy Sets and Systems, 2005, 154.3: 370–374.
  27. 27. KLIR, G. J.; FOLGER, T. A. Fuzzy sets, uncertainty, and information. Hall1988, 1988.
  28. 28. KARNIK, Nilesh N.; MENDEL, Jerry M. Centroid of a type-2 fuzzy set. information SCiences, 2001, 132.1–4: 195–220.
  29. 29. WU, Hongwei; MENDEL, Jerry M. Uncertainty bounds and their use in the design of interval type-2 fuzzy logic systems. IEEE Transactions on fuzzy systems, 2002, 10.5: 622–639.
  30. 30. CASTILLO, Oscar; MELIN, Patricia. Intelligent systems with interval type-2 fuzzy logic. International Journal of Innovative Computing, Information and Control, 2008, 4.4: 771–783.
  31. 31. ZADEH, Lotfi A. The concept of a linguistic variable and its application to approximate reasoning—I. Information sciences, 1975, 8.3: 199–249.
  32. 32. LINDA, Ondrej; MANIC, Milos. General type-2 fuzzy c-means algorithm for uncertain fuzzy clustering. IEEE Transactions on Fuzzy Systems, 2012, 20.5: 883–897.
  33. 33. Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10 (2–3), 191–203. http://doi.org/10.1016/ 0 098–30 04(84)90 020–7.
  34. 34. Cannon, R. L., Dave, J. V., & Bezdek, J. C. (1986). Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8 (2), 248–255. http://doi.org/10.1109/TPAMI.1986. 4767778.
  35. 35. C. Hwang and F. Rhee, “Uncertain fuzzy clustering: interval type-2 fuzzy approach to C-means,” IEEE Transactions on Fuzzy Systems, vol. 15, pp. 107–120, 2007. (Pubitemid 46444307)
  36. 36. F. Rhee, “Uncertain fuzzy clustering: insights and recommendations,” IEEE Computational Intelligence Magazine, vol. 2, pp. 44–56, 2007. (Pubitemid 46718757)
  37. 37. D. Neog, M. Raza and F. Rhee., “An interval type 2 fuzzy approach to multilevel image segmentation,” in proc. 20th IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE2011), pp. 1164–1170, Taipei, Taiwan, June 27–30, 2011.
  38. 38. C. Hwang and F. Rhee, “An interval type-2 fuzzy C spherical shells algorithm,” in proc. IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE2002), vol. 2 pp. 1117–1122, 2004. (Pubitemid 40028054)
  39. 39. F. Rhee and H. Cheul, “A type-2 fuzzy C-means clustering algorithm,” in Proc. 2001 Joint Conf. IFSA/NAFIPS, pp. 1926–1929, July 2001.
  40. 40. F. Rhee and C. Hwang, “An interval type-2 fuzzy K-nearest neighbor,” in Proc. 2003 Int. conf. Fuzzy Syst., vol. 2, pp. 802–807, May 2003.
  41. 41. F. Rhee and C. Hwang, “An interval type-2 fuzzy perceptron,” in Proc.2002 Int. Conf. Fuzzy Syst., vol. 2, pp. 1331–1335, May 2001. (Pubitemid 35466316)
  42. 42. J. Min, E. Shim, and F. Rhee, “An interval type-2 fuzzy PCM algorithm for pattern recognition,” in proc. IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE2009), pp. 480–483.
  43. 43. Y. Long, L. Yue, and X. Shixiong, “Robust interval type-2 possibilistic C-means clustering and its application for fuzzy modeling,” in Sixth Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 360–365, 2009.
  44. 44. W. Pedrycz, “Interpretation of clusters in the framework of shadowed sets,” Pattern Recog. Lett., vol. 26, no. 15, pp. 2439–2449, Nov. 2005.
  45. 45. Mendel, J. M. (2001). Uncertain rule-based fuzzy logic systems: Intro- duction and new directions. Prentice Hall PTR Retrieved from https://www.pearsonhighered.com/program/Mendel-Uncertain-Rule-Based-Fuzzy-Logic-Systems-Introduction-and-New-Directions/PGM139804.html.
  46. 46. Mendel, J. M. (2004). Computing Derivatives in interval type-2 fuzzy logic systems. IEEE Transactions on Fuzzy Systems, 12 (1), 84–98. http://doi.org/10.1109/TFUZZ. 2003.822681.
  47. 47. O. Linda and M. Manic, “Interval type-2 fuzzy voter design for fault tolerant systems,” Inf. Sci., vol. 181, no. 14, pp. 2933–2950, Jul. 2011.
  48. 48. O. Linda and M. Manic, “Interval type-2 fuzzy voter design for fault tolerant systems,” Inf. Sci., vol. 181, no. 14, pp. 2933–2950, Jul. 2011.
  49. 49. General Type-2 Fuzzy C-Means Algorithm for Uncertain Fuzzy Clustering
  50. 50. F. Rhee and C. Hwang, “A type-2 fuzzy C-means clustering algorithm,” in Proc. Joint Conf. Int. Fuzzy Syst. Assoc./North Am. Fuzzy Inf. Process. Soc., Jul., 2001, pp. 1926–1919.
  51. 51. L. A. Zadeh, “The concept of a linguistic variable and its approximate reasoning-II,” Inf. Sci., vol. 8, pp. 301–357, 1975.
  52. 52. RUBIO, Elid; CASTILLO, Oscar. Interval type-2 fuzzy clustering for membership function generation. In: 2013 IEEE Workshop on Hybrid Intelligent Models and Applications (HIMA). IEEE, 2013. p. 13–18.
  53. 53. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Inf. Sci., vol. 132, pp. 195–220, 2001.
  54. 54. O. Linda and M. Manic, “Uncertainty-robust design of interval type-2 fuzzy logic controller for delta parallel robot,” IEEE Trans. Ind. Inf., vol. 7, no. 4, pp. 661–671, Nov. 2011.
  55. 55. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Inf. Sci., vol. 132, pp. 195–220, 2001.
  56. 56. H. B. Mitchell, “Pattern recognition using type-II fuzzy sets,” Inf. Sci., vol. 170, no. 2–4, pp. 409–418, Feb. 2005.
  57. 57. Zhang, D., Zhang, D., & Chen, S. (2003). Kernel-based fuzzy and possibilistic c-means clustering. In International conference on artificial neural networks (ICANN03) (pp. 122–125). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.491.540.
  58. 58. J. M. Mendel, “Advances in type-2 fuzzy sets and systems,” Inf. Sci., vol. 177, pp. 84–110, 2007.
  59. 59. J. M. Mendel, R. John, and F. Liu, “Interval type-2 fuzzy logic systems made simple,” IEEE Trans. Fuzzy Syst., vol. 14, no. 6, pp. 808–821, Dec. 2006.

Written By

JaeHyuk Cho

Submitted: 24 November 2020 Reviewed: 03 February 2021 Published: 03 May 2021