Open access peer-reviewed chapter - ONLINE FIRST

Data Clustering for Fuzzyfier Value Derivation

By JaeHyuk Cho

Submitted: August 30th 2020Reviewed: February 3rd 2021Published: May 3rd 2021

DOI: 10.5772/intechopen.96385

Downloaded: 3

Abstract

The fuzzifier value m is improving significant factor for achieving the accuracy of data. Therefore, in this chapter, various clustering method is introduced with the definition of important values for clustering. To adaptively calculate the appropriate purge value of the gap type −2 fuzzy c-means, two fuzzy values m1 and m2 are provided by extracting information from individual data points using a histogram scheme. Most of the clustering in this chapter automatically obtains determination of m1 and m2 values that depended on existent repeated experiments. Also, in order to increase efficiency on deriving valid fuzzifier value, we introduce the Interval type-2 possibilistic fuzzy C-means (IT2PFCM), as one of advanced fuzzy clustering method to classify a fixed pattern. In Efficient IT2PFCM method, proper fuzzifier values for each data is obtained from an algorithm including histogram analysis and Gaussian Curve Fitting method. Using the extracted information form fuzzifier values, two modified fuzzifier value m1 and m2 are determined. These updated fuzzifier values are used to calculated the new membership values. Determining these updated values improve not only the clustering accuracy rate of the measured sensor data, but also can be used without additional procedure such as data labeling. It is also efficient at monitoring numerous sensors, managing and verifying sensor data obtained in real time such as smart cities.

Keywords

  • fuzzifier value determining
  • sensor data clustering
  • fuzzy C-means
  • histogram approach
  • interval type-2 PFCM

1. Introduction

In the majority of cases, fuzzy clustering algorithms have been verified to be a better method than hard clustering in dealing with discrimination of similar structures [1], dataset in dimensional spaces [2], and is more useful for unlabeled data with outliers [3]. Fuzzy C-means proved to offer better solutions in machine learning, and image processing than hard clustering such as Ward’s clustering and the k mean algorithm [4, 5, 6, 7, 8, 9]. Generally, fuzzy c-mean has 66% accuracy while Gustafson-Kessel scored 70% [10]. Fuzzy c-mean is one of the most largely applied and modified techniques in pattern recognition applications [11] even though the sensitivity of fuzzy C-means is counted as a weak point of outcome to the prototypes and also the optimizing process [12, 13, 14].

Classification algorithms are generally subject to various sources of uncertainty that should be appropriately managed. Fuzzy clustering can be used with datasets where the variables have a high level of overlap. Therefore, membership functions are represented as a fuzzy set which can be either Type-I, Type-II or Intuitionistic.

Data are generated by a possible distribution or collected from various resources; Since Euclidean distance leads to clustering outcomes of spherical shapes, which is suitable for most cases, it is a top choice for many applications, it is the measurement used in most clustering algorithms to decide new centers [15].

Advertisement

2. Basic notions

  • Degree of membership: The degree of likelihood of one dataset belonging to several centers. The sum of membership degrees is equivalent to 1.

  • Data: Data can be categories, compounded or numbers. Data in matrix form contains themes and features of various units. For instance, value and time.

  • Clusters: Cluster is a group of data points or datasets that share similarities. Distance or distance norm is a mathematic interpretation of likeness. The point of the model clustering algorithms is the data structure.

  • Fuzzifier value: The fuzzifier value is essential to find the clustering membership function when the density or volume of a given cluster is dissimilar to those of another cluster. It is assumed that all of the relative distances to the cluster center are equally 0.5, which implies that the fuzzifier value m is 1 and take account of a decision boundary. With these explained conditions, the fuzzy area does not exist.

Figure 1(a) the case where a small m value is set in two clusters with different volumes. Because the section with a fuzzy membership value extends to a bulky C2cluster, applying it to the C1cluster allot a lot of relatively unnecessary patterns. Figure 1(b) large mvalue is set. It seems to have good performance since similar membership values are assigned, but the center value of the C1cluster tends to move to the C2cluster, Figure 1(c) Fuzzy area in accordance with Interval type-2 m value. Instead of the fuzzy area according to the value of m1and m2using the characteristics of the Interval type-2 membership set, uncertainty can be reduced and a proper fuzzy area for the cluster volume can be formed.

Figure 1.

Fuzzy area between clusters according to m. (a) the case where a small m value, (b) large m value is set, (c) instance of appropriate fuzzy area using Interval type-2.

Figure 2.

(a) Cluster position uncertainty for T1FCM, (b) 1 T2 FCM, (c) QT2 FCM, (d) GT2 FCM algorithms.

As presented above, deciding the lowest and highest boundary range values of the fuzzier value extracted from particular data has been suggested by some methods. The following is about PFCM membership function for deciding the fuzzifier value’s range. The membership function at k-th data point for cluster iis presented in Eq. (1). dik/dijsignifies Euclidean distance value between cluster and data point.

uik=1j=1cdik/dij2/m1E1

The neighbor membership values are computed, employing the membership value presented in Eq. (1) in order to decide the fuzzifier value’s range. Summarization with an expression including fuzzifier value indicates Eq. (2). It obtains the lower and upper boundary values of the fuzzy constant which includes the number of clusters as Cand the fuzzifier value as m.

1+C1C2δΔm2logdlogδ1δ1c1+1where=dididiandδis thresholdE2

3. Conventional fuzzy clustering algorithm

3.1 Fuzzy C- means (FCM)

FCM includes the concept of a fuzzifier m being used to determine the membership value of data Xkin a specific cluster with cluster prototype. Specifically, the equation of FCM is consist of the cluster center vi and the membership value of data Xk, representing k = 1, 2...nand i = 1, 2...c, where n indicates the number of patterns and c indicates the number of clusters. FCM requests the knowledge of the initial number of desired clusters. The membership value is by the relative distance between the pattern Xkand the cluster center Vi. However, one of the main weaknesses by using FCM is its noise sensitivity as well as its limited memberships. The weighting exponent m; is referred to the being effective on the clustering performance of FCM algorithm [16].

3.2 PCM

In order to solve problems of FCM method, PCM uses a parameter given by value estimated from the dataset itself. PCM applies the possibilistic approach which obviously means that the membership value of a point in a class represents the typicality of the point in the class. It also means the possibility of data Xkin the class with cluster prototype Viwhere k = 1, 2...n and i = 1, 2...c. Then, the noise points are comparatively less typical, using typicality in PCM algorithm. Furthermore, noise sensitivity is significantly reduced [17, 18]. However, the PCM algorithm also has the problem that the clustering outcome is sensitively reacted according to the initial parameter value [19].

3.3 PFCM

The PFCM algorithm is a mixture of PCM algorithm and FCM algorithm [20]. Although the representative value limit (or constraint = 1) was mitigated, the heat constraints on the membership value were preserved, so the PFCM algorithm generated both membership and possibility, and solved the noise sensitivity problem as seen in the FCM [21]. The PFCM is based on the fuzzy value m, which determines the membership value, and the PFCM also uses constants to define the relative importance of fuzzy membership and typicality values in the objective function. The PFCM utilizes more parameters to determine the optimal solution for clustering, which increases the degree of freedom and thus controls better results than the above-mentioned study. However, when considering fuzzy sets and other parameters in certain algorithms, we face the potential for fuzzy of these parameters. In this paper, we describe the fuzziness of the fuzzy value m and the possible value of the bandwidth parameter and generate FOU of uncertainty for both considering the fuzzy minterval, i.e. the m1and m2intervals and the fuzzy interval. Existing studies have been implemented to measure the optimal range along the upper and lower bounds of fuzzy values through multiple iterations [22]. This study is ongoing, but the same fuzzy constant range cannot be applied to all data. [23]

3.4 Type-1 fuzzy set (T1FS)

Type 1 fuzzy logic was first introduced by Jade (1965). Fuzzy logic systems are based on Type 1 fuzzy sets (T1FS), and have demonstrated their capabilities in many applications, especially for control of complex nonlinear systems that are difficult to model analytically [24, 25]. Since the Type 1 fuzzy logic system (T1FS) uses a clear and accurate type 1 fuzzy set, T1FS can be used to model user behavior under certain conditions. Type 1 fuzzy sets deal with uncertainty using precise membership functions that users think capture uncertainty [26, 27, 28, 29, 30]. When the Type 1 membership function is selected, all uncertainties disappear because the Type 1 membership function is completely accurate. The Type 2 fuzzy set concept was presented by Jade as an extension of the general fuzzy set concept., i.e. a type 1 fuzzy set [31]. All fuzzy sets are identified as membership functions. In a type 1 fuzzy set, each element is identified as a two-dimensional membership function. The membership rating for Type 1 fuzzy sets is [0, 1], which is an accurate number. The comparison of membership function and uncertainty extracted from the result of the conventional fuzzy clustering algorithm is shown as below [32].

FCMJFCMVUX=i=1ck=1nuikmxkvi1<m<
PCMJPCMVUX=i=1ck=1nuikmdik2+i=1cηik=1n1uikm
η:scale,typicality η=k=1nuikmxkvi2k=1nuikm
FPCMJFPCMUTV=i=1ck=1nuikm+tikηxkvi2
PFCMJPFCMUTV=i=1ck=1nauikm+btikηxkvi2+i=1cδik=1n1+τikη
T1FCJT1FCXUC=i=1ck=1nujximdij2

4. Advanced fuzzy clustering algorithm

Fuzzy c-means (FCM) is an unsupervised form of a clustering algorithm where unlabeled data X = {x1, x2..., xN}is grouped together in accordance with their fuzzy membership values [33, 34]. Since, data analysis and computer vision problems, analyzing and dealing the uncertainties are a very important issue, FCM is being widely used in these fields. Several methods of other IT2 approach for pattern recognition algorithms have been successfully reported [35, 36, 37, 38, 39, 40, 41]. Type-1 fuzzy sets cannot deal uncertainties therefore; type-2 fuzzy sets were defined to represent the uncertainties associated with type-1 fuzzy sets. As shown in Figure 2, the type-reduction process in IT2 FSs requires a relatively large amount of computation as type-2 fuzzy methods increase the computational complexity due to the numerous combinations of embedded T2 FSs. Methods for reducing the computational complexity have been proposed, such as, the increase in computational complexity of T2 FSs may be less costly for improved performance by applying satisfactory results using T1 FSs. In [42], it was suggested that two Fuzzifier mvalues is used and the centroid type reduction algorithm for center update is incorporated for interval type-2 (IT2) fuzzy approach to FCM clustering. The IT2 FCM was suggested to clear up the complication with FCM for clusters with different number of volumes and patterns. Moreover, it was suggested that miscellaneous uncertainties were linked with clustering algorithms such as FCM and PCM [43]. Motivation of the success IT2 FSs has made on T1 FSs algorithms.

4.1 Type-2 fuzzy set (T2 FS)

Due to their potential to model various uncertainties, Type-2 fuzzy sets (T2 FSs) have primarily received interest of increased research [44]. Type-2 fuzzy sets are characterized by a three-dimensional fuzzy membership function. The [0, 1] fuzzy set is the membership grade for each element of a type-2 fuzzy set. The extra third dimension provides extra degrees of freedom to get more information about the expressed term. Type-2 fuzzy sets are valuable in situations where it is difficult to resolve the exact membership function of the fuzzy set. This helps to incorporate uncertainty [45].

The computational complexity of the Type-2 fuzzy set is higher than that of the Type 1 fuzzy set. However, the results gained by the Type-2 fuzzy set are much better than those gained by the Type 1 fuzzy set. Therefore, if type-2 fuzzy sets can significantly improve performance (depending on the application), the increased computational complexity of the type-2 fuzzy sets can be an affordable price to pay [46].

4.2 Type-2 FCM (T2-FCM)

Type-2 FCM (T2-FCM), whose type-2 membership is promptly generated by extending a scalar membership degree to a T1-FS. When limiting the secondary fuzzy set to have a triangular membership function, T2-FCM extends the scalar membership uijto a triangular secondary membership function [47, 48].

4.3 General type-2 FCM

The GT2 FCM algorithm accepts a linguistic description of the fuzzifier value expressed as a set of T1 fuzzy- upper and lower value [49]. The linguistic fuzzifier value is denoted as a T1 fuzzy set of m. Figure 3 is shown as two examples of encoding the linguistic nation of the appropriate Fuzzifier value for the GT2 FCM algorithm using three linguistic terms.

Figure 3.

Two possible linguistic representation of the Fuzzifier M using T1 fuzzy sets. (a) membership value for a sample x′ (b) vertical slice x′.

Figure 4.

FOU representation for our proposed IT2 KPCM approach withm1 = 2,m2 = 5 and variance = 0.5; (a) FOU of cluster 1 (b) FOU of cluster 2 [58].

4.4 Interval type 2 fuzzy sets (IT2 FSs)

In order to model uncertainty associated to a type-1 fuzzy set with an interval type 2 fuzzy set, a membership interval with all secondary grades of the primary memberships equaling to one can represent the primary membership Jx′of a sample point x′[18, 50].

Figure 3(a) represents an instance of an interval type 2 fuzzy set where the gray shaded region indicates FOU. In the figure, the membership value for a sample x’is represented by the interval between upper μ¯A˜x', and lower μ¯A˜x'membership. Therefore, each x’has a primary membership interval as

Jx'=μ¯A˜x'μ¯A˜x'E3

In the Figure 3(b) shown as the vertical slice x′, where the secondary grade for the primary membership of each x′equals one, in accordance with the property of interval type-2 fuzzy sets. This interval is defined as the FOU. An interval type 2 fuzzy set A can be expressed as

A˜=xuμA˜(xu)xAuJx01xuμA˜(xu)=1E4

4.5 Interval type-2 FCM (IT2-FCM)

In fuzzy clustering algorithms such as FCM, the fuzzy fire value m plays a significant [50] role in determining clustering uncertainty. However, it is generally difficult to properly determine the value of m. IT2-FCM regards fuzzy fire values as intervals [m1, m2] and settles two optimization matters. [51].

First, an interval type 2 FCM is used to obtain a rough estimate of which data points belong to which cluster.

In Eq. (3) is minimized with respect to uijto provide upper and lower membership values.

u¯jxi=1k=1cdij/dik2/m11,if1/k=1cdij/dik<1c1k=1cdij/dik2/m21,otherwiseE5
u¯jxi=1k=1cdij/dik2/m11,if1/k=1cdij/dik1c1k=1cdij/dik2/m21,otherwiseE6

After this cluster prototypes are calculated, then type reduction and then classification is done. Qiu et al. (2014) proposed this complete method of interval type-2 FCM for finding the clusters in each class of the histogram in individual dimensions is acquired with these labeled clusters. This histogram is smoothed by the mean of moving window (using a triangular window in my case). The curve fitting of this smoothed histogram gets the membership function. Histograms with values greater than the membership value are assigned as histograms for higher membership, and histograms for values less than membership value are saved as histograms for lower membership. Curve fitting is carried out severally in the top and bottom histograms to supply the top and bottom member values [52]. This membership value is suggested to estimate the values of fuzzifiers m1and m2. Fixed-point iteration is a method of expressing the transcendental equation f(x) = 0in the form of x = g(x)and then solving this expression iteratively for xin iterative relationship.

xi+1=gxi,I=0,1,2,E7

where x0being some initial guess. Rewriting the equation to express Eq. (5) and (6) in the form of (7) and dropping the upper and lower term,

uj=1k=1cdij/dik2/m1E8
1uj=k=1cdij/dik2/m1

log on both sides, Eq. (8) can be rewritten as

log1uj=logk=1cdij/dik2/m1E9
loga+c=loga+log1+ca

Extending this logarithmic identity to the sum of Nelements,

loga0+k=1Nak=loga0+log1+k=1Naka0E10
log1uj=2m1logdijd1j+log1+k=2cdijdik2/miold1E11

Rearranging Eq. (11) and expressing it in terms of m, gives us Eq. (12).

γ=log1ujlog1+k=2cdijdik2/miold1logdijd1jE12
mjnew=1+2γ

So, Eq. (13) gives m1jnewand m2jnew,where m1j new ≥ m2j new. Eq. (12) is used to calculate fuzzifier values of each data. In some cases, the value of fuzzifier of particular data shows relatively large variation. Here, upper (mupper) and a lower (mlower) fuzzifier is necessary, using Eq. (2). If the curtain data point has a fuzzy fire value below the lower bound, the fuzzy fire value is set to the mlowerbound, and if it exceeds the upper bound, the fuzzy fire value is set to the mupperbound. In the end, a mean of these fuzzifiers is taken to get the last fuzzifier values m1and m2.

4.6 Multiple kernels PFCM algorithm

Typically, the kernel method uses a spatial conversion function to convert input data from input property space to kernel property space. [53] This is to change the kernel property space to a kernel property space so that it is easy to distinguish between overlapping data and having a nonlinear boundary surface in the input property space. If the data in the input space is Xi,i=1,,N,the data converted to the kernel property space through the function is represented by ΦXj,j=1N. Alike as general PFCM, in the case of Kernels-PFCM, the goal is to minimize the following objective function.

Jϕ=k=1ni=1cauikm+btikη×dij2+i=1cγk=1n1tikηE13

In the input space for kernel K, the pattern xiand the distance dijin the kernel attribute space of cluster prototype vjare expressed as Eq. (14) by the kernel function.

dij=ΦxjΦvj2=ΦxjΦxj+ΦvjΦvj2ΦxjΦvj=Kxjxj+Kvjvj2kxjvjE14

Commonly, the new Gaussian multi-kernel kusing a Gaussian kernel assumes a multi-kernel with the number of kernels S, and the formula is as follows [54].

k˜j=xjvj=l=1swilσlexpxjvj22σl2t=1swσtE15

From [55] way, using e FCM-MK, normalized kernel is defined to recognize weights by cluster prototypes, resolution and membership values. Using this optimization way, following PFCM objective equation should be minimized. By minimizing the objective function, cluster prototype vi, resolution-specific weight wil,and membership value uijare defined.

Jm,ηUTVX=2k=1ni=1c(auikm+btikη×1l=1swilσ2expxjvi22σl2×1t=1swσt+i=1cγik=1n1tikηE16

Here, ρis a gradient descent way to learn rate parameter. Finally, using type reduction and hard partitioning, clustering is performed as described in the Interval Type-2 PFCM [56].

4.7 Interval type-2 fuzzy c-regression clustering

Let the regression function be represented by Eq. (17)

yi=fzxiαj=a1zx1i+a2zx2i++aMzxMi+b0zE17

where, xi = [x1i,x2i,. ..,xMi] represents points of data, the number of data indicates i = 1,.. .,n, the number of clusters (or rules) indicates j = 1,.. ., c, the number of variables in each regression indicates q = 1,.. .,M and the number of regression functions indicates z = 1,.. ., r. By aj, regression coefficients are denoted. We use weighted least square method (WLS) for calculating regression coefficients aj, In this way, membership grades of partition matrix Pare worked for weights. In Eq. (18), Xi is a data point matrix with inputs, y is a data point matrix with outputs.

xi=x1,ix2,ixM,iT,y=y1,y2yMT,Pj=ujx100ujx10000ujx1E18
αj=XtPjX1XTPjy

The partition matrix Pis acquired through Gaussian mixture distribution which is the first stage for computing regression coefficients. We consider two fuzzifiers or weighting exponent m1and m2for indicating the problem into IT2F. However, there is a difference that this model is FCM although our model is FCRM. These two fuzzy fires divide the objective function into two separate functions. The aim is to minimize the total error from Eq. (19) shows these two objective functions. It should be mentioned that the following proof is an extended and modified version of’type-1, which has been presented in [57].

Jm1Uυ=i=1nj=1Cujxim1EjiαjJm2Uυ=i=1nj=1Cujxim2EjiαjE19

Where type-1 FCRM, Ejiis the total error, which indicates the distance between actual output and estimated regression equation, and it is presented by Eq. (20).

Ejiαj=yifjxiαj2E20

Eq. (21) represents the Lagrangian of the objective functions of IT2 FCRM model. We expend the type-1 NFCRM algorithm to interval type-2 NFCRM.

L1λ1uj=i=1nj=1Cujxim1Ejiαjλ1j=1cuj1L2λ2uj=i=1nj=1Cujxim2Ejiαjλ2j=1cuj1E21

The partial derivatives with respect to ujof Eq. (21) are set to 0 in Eq. (22) and (23) for minimizing the objective function.

L1u1xi=m1u1xim11E1iα1λ1=0L1uCxi=m1uCxim11ECiαCλ1=0E22
L2u1xi=m2u1xim21E1iα1λ2=0L2uCxi=m2uCxim11ECiαCλ2=0E23

Next, the partial derivatives with respect to k1and k2are performed.

L1λ1=j=1cujxi1=0E24

To adapt KPCM to IT2 KPCM, three steps are included. In other words, we update the prototype location via initialization, two different fuzzy devices, high and low membership or typicality value calculation, format reduction, and de-fuzzing for data patterns. In the way we propose, by using IT2FS, our point lies in the development of a prototype update process that can solve the cluster matching problem caused by KPCM. Cluster matching usually results in a set of patterns containing clusters that are relatively close to each other. This allows by definition a type 1 fuzzy set to obtain a type reduction via an embedded fuzzy set, but a type-reduced fuzzy set can be obtained by a combination of central intervals estimated from the embedded fuzzy set. This approach is a standard method for obtaining reduced fuzzy set types from IT2FS. However, this approach avoids due to its huge computational requirements, which include a number of embedded fuzzy sets. Therefore, we consider the KM algorithm as an alternative type reduction method. Since KM is an iterative algorithm which estimates both ends of an interval, calculating the left (right) interval vL(vR) can be found without using all of the embedded fuzzy sets.

Form KERNELS SFCM ALGORITHM in Figure 4,

The kernel distance,

Φxkvi2E25

can be derived using the kernel way as

Φxkvi2=Kxkxk2j=1NuijmKxkxjj=1Nuijm+j=1Nl=1NuijmuilmKxjxlj=1Nuijm2E26

The inverse mapping of prototypes is also needed to approximate the prototypes expressions viin the feature space. The objective equation can be written as

Vv̂ivi=i=1CΦv̂ivi=i=1C(Φv̂iTΦv̂i2Φv̂ivi+viTviE27

While, the final location for v̂iin the KPCM algorithm becomes,

v̂i=k=1NuikmKxkv̂ixkk=1NuikmKxkv̂iE28

The left (right) interval of the centroids can be found by employing the KM algorithm on the ascending order of a pattern set and its associated interval memberships. The result of the KM algorithm can be expressed as,

vi=1.0/vLvRE29

While the procedure to calculate the left value of interval set vLand right value vR, defuzzification is used next to calculate the crisp centers and is defined as the midpoint between vL, vR. We can now compute the defuzzified output that is a crisp value of the prototypes by using the expression.

vi=vJYiuvvvJYiuv=vL+vR2E30

Hard partitioning is used to classify test patterns using the resulting prototype of the procedure above. Euclidian distance is now used to hard partition patterns because the prototype is in feature space. The pattern is assigned to a cluster prototype with a minimum Euclidean distance. Experimental results presented in the following sections will demonstrate the validity of the proposed IT2 approach to KPCM clustering.

4.8 Interval type-2 possibilistic fuzzy C-means (IT2PFCM)

In order to solve the uncertainty existing in the fuzzifier value min the general PFCM algorithm, Multiple Kernels PFCM algorithm should be extended to the Interval Type-2 fuzzy set. If there are Ndata, Wset of resolution-specific weight, Upartition matric, Cclusters, Vset of cluster prototype and Skernels, the cluster prototype can be obtained from minimizing the Gaussian kernel objective function as follows.

wilnew=wiloldρJwilE31
dij2=22i=1Swilσlexpxjvj22σl2t=1swσtE32

Where,

vi=22i=1Swilσlexpxjvj22σl2t=1swσtE33

The cluster prototype is calculated to optimize the objective function for the center viof each cluster [23].

Where,

K¯ixjvi=i=1Swjlσl3expxjv2t=1SwσtE34

optimized membership value- the smallest membership value and the largest membership value for each pattern using the Interval Type-2 fuzzy set- is used for calculating the crisp value vi. In order to compute vRand vL, determination of the upper or lower bound of fuzzifier is essential. It is organized as follows by given Eq. (38) [59] .

JUVW=2i=1Cj=1Nuijmdij2E35

Using the final vRand vL, the crisp center value is obtained from defuzzification as follows.

ForvR,ifvi<kthenuij=u¯ijelseuij=u¯ijE36

Using the cluster Prototype vi, obtained through the optimization function and the membership value uij, the resolution-specific weight value wilis re-obtain as follows.

Jwil=2i=1Nuijmt=1SwtσtK(xjviK¯ixivjE37

Where

viR=j=1NuijmK¯ixj,vixjj=1NuijmK¯ixjviE38

To define the Interval Type-2 fuzzy set and calculate uncertainty for membership, the input data, the primary fuzzy set, is needed to assign into the Interval Type-2 fuzzy set. Eventually, the upper and lower membership function are created from the primary membership functions.

After calculating the upper and lower membership for each cluster, we need to update the new center values. The membership is obtained from the Type-2 fuzzy set, however, the center value is a crisp value, the value cannot be calculated from the above method. Therefore, in order to compute the center value, type reduction is performed by the Type-1 fuzzy set. In addition, defuzzification is accomplished to change the value of Type-1 to a crisp value.

5. Heuristic method: histogram analysis

The goal of heuristic method is to extract information from data, and then adaptively calculates the fuzzifier value. In this approach, some heuristic type- 1 membership function is used appropriately for given dataset. The parameters are defined as the upper and lower membership is decided according to following rules. First, given that the membership values are determined, the IT2 PFCM algorithm calculates roughly in which cluster the data belongs to and then secure a histogram based on the classified clusters. The histogram from IT2 PFCM tends to be gentler and smoother through the membership function by curve fitting of the same histogram. Curve fitting is enforced separately on upper and lower histograms to obtain upper and lower membership values. In order to reach to the IT2 FS, determination of FOU is necessary, which is generally the set of membership values of the T2 FS. Given that, the greater values of the histogram than the membership value are allocated as the highest membership histogram while the opposite case is calculated. Figure 5 shows histograms and FOU determined by classification and dimensional calculation. To find X, satisfying f (X) = 0, it can be expressed as X = g(X)using fixed-point iteration, where Xis,

Figure 5.

FOU obtained for individual class and dimension updated fuzzifier value m1 and m2 are obtained (a) class 1 dimension 1, and (b) class 2 dimension 1.

Xi+1=gX,i=0,1,,NE39

Eq. (7) and (8) of the membership function uican be shown in the form of Eq. (38) as follows.

ui=1dikdij2m1E40

Where fuzzifier value mis a value that determines the degree of final clustering fuzzifier as the value of the fuzzy parameter. This value of m1and m2is then applied into the algorithm for calculate updated clusters and this routine is repeated repeatedly. The detailed algorithm is as follows:

  1. Set the initial fuzzifier value of m1and m2.

  2. Apply m1and m2to interval type-2 FCM and obtain the membership of data.

  3. Generate a histogram of each cluster from the membership.

  4. Curve fit the histogram to get primary memberships.

  5. Create histogram of upper and lower membership.

  6. Use curve fitting over upper and lower histograms to calculate upper and lower memberships.

  7. Normalize the memberships according to upper membership.

  8. Fuzzifier m1iand m2iare obtained using Eq. (13).

  9. Average m1iand m2iand update m1and m2from the average.

  10. The algorithm is iteratively performed using updated m1and m2.

The Upper Membership Function (UMF) Histogram and Lower Membership Function (LMF) Histogram are drawn in Figure 5. A new membership function obtained from the Gaussian Curve Fitting (GF-F) method as.

From simply log process on both sides in Eq. (39), Eq. (40) can be expressed as follows:

log1u1=2m1logdkid1i+log1+j=2cdkidji2mdd1.E41

Rearranging Eq. (40) and calculate it in terms of m, gives us Eq. (41), (42).

γ=log1ujlog1+k=2Cdijdik2/mold1logdijdikE42
mjnew=1+2γE43

As in the above process, the membership value ui∈ {ui(Xk)} and mjnewis used as a function to get the ui. Where Eq. (9) is applied to each clustered data and updated, m1inewand m2inewvalues is easily calculated, averaging the fuzzifier value by Eq. (42), the new fuzzifier value m1and m2are finally calculated as follow

m1=i=1Nm1i/N,m2=i=1Nm2i/NE44
Advertisement

6. Comparing performances algorithms

Algorithms can be compared in previous experiences using the following criteria:

Root Mean Squared Error (RMSE): The evaluation metric used by all algorithms of clustering is RMSE. RMSE is calculated by the root of the averaging all squared errors between the original data (X) and the corresponding predicted values data (X̅).

RMSE=k=1ni=1c(xikx¯iknE45

where nis the total number of patterns in a given data set and cis the number of clusters; xikandx¯ikthe actual and predicted rating values data respectively.

Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions the model got right. Formally, accuracy has the following definition:

Accuracy=number of correct samplestotal number of samples100E46

Acknowledgments

This work was supported by Institute of Korea Evaluation Institute of Industrial Technology (KEIT, Next Generation Artificial Intelligence Semiconductor R&D Program) grant funded by the Korea government (Ministry of Trade, Industry & Energy, MOTIE) (Project No. 20010098, Development of Mixed Signal SoC with complex sensor for Smart Home Appliances).

Download for free

chapter PDF

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

JaeHyuk Cho (May 3rd 2021). Data Clustering for Fuzzyfier Value Derivation [Online First], IntechOpen, DOI: 10.5772/intechopen.96385. Available from:

chapter statistics

3total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us