Open access peer-reviewed chapter

Learning Algorithms for Fuzzy Inference Systems Using Vector Quantization

By Hirofumi Miyajima, Noritaka Shigei and Hiromi Miyajima

Submitted: May 19th 2018Reviewed: July 4th 2018Published: November 5th 2018

DOI: 10.5772/intechopen.79925

Downloaded: 320

Abstract

Many studies on learning of fuzzy inference systems have been made. Specifically, it is known that learning methods using vector quantization (VQ) and steepest descent method (SDM) are superior to other methods. In their learning methods, VQ is used only in determination of the initial parameters for the antecedent part of fuzzy rules. In order to improve them, some methods determining the initial parameters for the consequent part by VQ are proposed. For example, learning method composed of three stages as VQ, generalized inverse matrix (GIM), and SDM was proposed in the previous paper. In this paper, we will propose improved methods for learning process of SDM for learning methods using VQ, GIM, and SDM and show that the methods are superior in the number of rules to the conventional methods in numerical simulations.

Keywords

  • fuzzy inference systems
  • vector quantization
  • neural gas
  • generalized inverse method

1. Introduction

There have been many studies on learning of fuzzy systems [1, 2, 3, 4, 5, 6, 7, 8]. Their aim is to construct learning methods based on SDM. Some novel methods on them have been developed which (1) generate fuzzy rules one by one starting from any number of rules, or reduce fuzzy rules one by one starting from a sufficiently large number of rules [2]; (2) use genetic algorithm (GA) and particle swarm optimization (PSO) to determine fuzzy systems [3]; (3) use fuzzy inference systems composed of a small number of input rule modules, such as single input rule modules (SIRMs) and double input rule modules (DIRMs) methods [9, 10]; and (4) use a self-organization or a vector quantization technique to determine the initial assignment of parameters [11, 12, 13, 14, 15, 19]. Specifically, it is known that learning methods using vector quantization (VQ) and steepest descent method (SDM) are superior in the number of rules (parameters) to other methods [16, 19]. So, why is it effective to combine VQ with SDM in fuzzy modeling? First, let us explain how to combine SDM with methods other than VQ. (1) Although the learning time is short, the generation method is known to have low test accuracy, while the reduction method has high test accuracy but takes long learning time [2]. (2) The method using GA and PSO shows high accuracy when the input dimension and the number of rules are small, but it is known that there is a problem of scalability [3]. (3) SIRM and DIRM methods are excellent in scalability, but the accuracy of learning is not always sufficient [9]. As described above, many methods are not necessarily effective models because of the difficulty of learning accompanying the increase of the input dimension and the number of rules and the low accuracy. On the other hand, the method combining VQ with SDM is possible to efficiently conduct learning of SDM by arranging suitably the initial parameters of fuzzy rules using VQ [1, 16]. However, since VQ is unsupervised learning, it is easy to reflect the input part of learning data, but how to capture output information in learning is difficult. With their studies, the first learning method is the one using VQ only in determining the initial parameters of the antecedent part of fuzzy rules using input part of learning data [1, 11, 12, 13, 14]. The second method is the one determining the same parameter using input/output parts of learning data [15, 19]. Further, the third method is one iterating learning process of VQ and SDM for the second method. Kishida and Pedrycz proposed the method based on the third one [13, 15]. These methods are the ones determining only the antecedent parameters by VQ. Therefore, we introduced generalized inverse matrix (GIM) to determine the initial assignment of weight parameters for the consequent part of fuzzy rules as the fourth method and showed the effectiveness in the previous paper [16, 17]. In this paper, improved methods for learning process of SDM in learning methods using VQ, GIM, and SDM are introduced and show that the method is superior in the number of rules to other methods in numerical simulations.

2. Preliminaries

2.1. The conventional fuzzy inference model

The conventional fuzzy inference model using SDM is described [1]. Let Zj = {1, …, j} and Zj∗ = {0, 1,…, j}. Let R be the set of real numbers. Let x = (x1, …, xm) and y be input and output variables, respectively, where xjR for jZm, and yR. Then, the rule of simplified fuzzy inference model is expressed as

Ri:ifx1isMi1andxjisMij·andxmisMim,thenyiswiE1

where jZm is a rule number, iZn is a variable number, Mij is a membership function of the antecedent part, and wi is the weight of the consequent part.

A membership value μi of the antecedent part for input x is expressed as

μi=j=1mMijxjE2

Then, the output y of fuzzy inference method is obtained as

y=i=1nμi·wii=1nμiE3

If Gaussian membership function is used, then Mij is expressed as

Mijxj=exp12xjcijbij2E4

where cij and bij denote the center and the width values of Mij, respectively.

The objective function E is determined to evaluate the inference error between the desirable output yr and the inference output y.

Let D = {(xp, …, xp, yr)|pZP } and D = {(xp, …, xp)|pZp} be the set of learning data and the set of input part of D, respectively. The objective of learning is to minimize the following mean square error (MSE) as

E=1Pp=1Pypypr2E5

where yp∗ and yr mean inference and desired output for the pth input xp.

In order to minimize the objective function E, each parameter of c, b, and w is updated based on SDM using the following relation:

Ewi=μiI=1nμIyyrE6
Ecij=μiI=1nμIyyrwiyxjcijbij2E7
Ecij=μiI=1nμIyyrwiyxjcij2bij3E8

where t is iteration time and is a learning constant [1].

The learning algorithm for the conventional fuzzy inference model is shown as follows:

Learning Algorithm A

Step A1: The threshold θ of inference error and the maximum number of learning time Tmax are set. Let n0 be the initial number of rules. Let t = 1.

Step A2: The parameters bij, cij, and wi are set randomly.

Step A3: Let p = 1.

Step A4: A data x1pxmpyprDis given.

Step A5: From Eqs. (2) and (3), μi and y are computed.

Step A6: Parameters wi, cij, and bij are updated by Eqs. (6), (7), and (8).

Step A7: If p = P, then go to Step A8, and if p < P then go to Step A4 with pp + 1.

Step A8: Let E(t) be inference error at step t calculated by Eq. (5). If E(t) > θ and t < Tmax, then go to Step A3 with tt + 1; else, if E(t)θ and t Tmax, then the algorithm terminates.

Step A9: If t > Tmax and E(t) > θ, then go to Step A2 with n = n + 1 and t = 1.

In particular, Algorithm SDM is defined as follows:

Algorithm SDM (c, b, w)

θ1: inference error

Tmax1: the maximum number of learning time

n: the number of rules

input: current parameters

output: parameters c, b, and w after learning

Steps A3 to A8 of Algorithm A are performed.

2.2. Neural gas method

Vector quantization techniques encode a data space VRm, utilizing only a finite set C = {ci|iZr} of reference vectors [18].

Let the winner vector ci(v) be defined for any vector vV as

iv=argminiZrvci.E9

By using the finite set C, the space V is partitioned as

Vi=vVvcivcjforjZr,E10

where V=iZrViand ViVj=φfor ij.

The evaluation function for the partition is defined by

E=i=1rvVi1nivci2,E11

where ni = |Vi|.

Let us introduce the neural gas method as follows [18]:

For any input data vector v, the neighborhood ranking cik for kZr1is

determined, being the reference vector for which there are k vectors cj with

vcj<vcikE12

Let the number k associated with each vector ci denoted by ki(v,ci). Then, the adaption step for adjusting the parameters is given by

ci=ε·hλkivci·vciE13
hλkivci=expkivci/λE14

where ε [0, 1] and λ > 0.

Let the probability of v selected from V be denoted by p(v).

The flowchart of the conventional neural gas algorithm is shown in Figure 1 [18], where εint, εfin, and Tmax2 are learning constants and the maximum number of learning, respectively. The method is called learning algorithm NG.

Figure 1.

Neural gas method [18].

Using the set D∗, a decision procedure for center and width parameters is given as follows:

Algorithm Center (c)

D = {(xp,…, xp)|pZp}

p(x): the probability of x selected for xD.

Step 1: By using p(x) for xD, NG method of Figure 1 [16, 18] is performed.

As a result, the set C of reference vectors for D is determined, where C = n.

Step 2: Each value for center parameters is assigned to a reference vector. Let

bij=1nixkCicijxkj2E15

where Ci and ni are the set and the number of learning data belonging to the ith cluster Ci and C=i=1rCiand n=i=1rni.

As a result, center and width parameters are determined from algorithm center (c).

Learning Algorithm B using Algorithm Center (c) is introduced as follows [16, 17]:

Learning Algorithm B

θ: threshold of MSE

Tmax0: maximum number of learning time for NG

Tmax: maximum number of learning time for SDM

M: the size of ranges

n: the number of rules

Step 1: Initialize()

Step 2: Center and width parameters are determined from Algorithm Center(P) and the set D.

Step 3: Parameters c, b, and w are updated using Algorithm SDM (c, b, w).

Step 4: If E(t)≤θ, then algorithm terminates else go to Step 3 with nn + 1 and tt + 1.

2.3. The probability distribution of input data based on the rate of change of output

It is known that many rules are needed at or near the places where output data change quickly in fuzzy modeling. Then, how can we find the rate of output change? The probability pM (x) is one method to perform it. As shown in Eqs. (16) and (17), any input data where output changes quickly is selected with the high probability, and any input data where output changes slowly is selected with the low probability, where M is the size of range considering output change.

Based on the literature [13], the probability (distribution) is defined as follows:

Algorithm Prob (pM (x))

Input: D = {(xp, yr)|pZP } and D = {(xp)|pZP }

Output: pM (x)

Step 1: Give an input data xiD, we determine the neighborhood ranking (xi0, xi1, …, xik ,…, xiP−1) of the vector xi with xi0 = xi, xi1 being closest to xi and xik (k = 0, …, P − 1) being the vector xi for which there are k vectors xj with xixj<xixik.

Step 2: Determine H(xi) which shows the rate of output change for input data xi, by the following equation:

Hxi=l=1Myiyilxixil,E16

where xilfor l ZM means the lth neighborhood ranking of xi, iZP, and yi and yilare output for input xi and xil, respectively. The number M means the range considering H(x).

Step 3: Determine the probability pM (xi) for xi by normalizing H(xi) as follows:

pMxi=Hxij=1PHxj,E17

where i=1PpMxi=1.

See Ref. [19] for the detailed explanation using the example of pM (x). Using pM (x), Kishida has proposed the following learning algorithm [13]:

Learning Algorithm C

θ: threshold of MSE

Tmax0: maximum number of learning time for NG

Tmax: maximum number of learning time for SDM

M: the size of ranges

n: the number of rules

Step 1: Initialize ( )

Step 2: The probability pM (x) is obtained from algorithm prob (pM (x)).

Step 3: Center and width parameters are determined using pM (x) from Algorithm Center (P) and the data set D.

Step 4: Parameters c, b, and w are updated using Algorithm SDM (c, b, w).

Step 5: If E(t)≤θ, then algorithm terminates else go to Step 3 with nn + 1 and t = 1.

2.4. Determination of weight parameters using the generalized inverse method

The optimum values of parameters c and b are determined by using pK(x). Then, how can we decide weight parameters w? We can determine them as the interpolation problem for parameters c, b, and w. That is, it is the method that membership values for antecedent part of rules are computed from c and b and weight parameters w are determined by solving the interpolation problem. So far, the method was used as a determination problem of weight parameters for RBF networks [1].

Let us explain fuzzy inference systems and interpolation problem using the generalized inverse method [1]. This problem can be stated mathematically as follows:

Given P points {xp|pZP } and P real numbers {ypr|pZP }, find a function f: RmR such that the following conditions are satisfied:

fxp=yprE18

In fuzzy modeling, this problem is solved as follows:

yp=fxp=i=1nwiφpixpciE19
φpixpci=μiI=1nμI,μi=j=1mMijxj,E20

where μi and Mij are defined as Eqs. (2) and (4).

That is,

φw=y,E21

where φ= (φij) (iZP and jZn), w = (w1, …, wn)T, and y = (y1r,…,ypr)T.

Let P = n and xi = ci. The width parameters are determined by Eq. (15). Then, if φij( ) is suitably selected as Gaussian function, then the solution of weights w is obtained as

w=φ1yE22

Let us consider the case n < P. This is the realistic case. The optimum solution w that minimizes E = ||yrφw||2 can be obtained as follows:

w+=φTyandEmin=IΨy2,E23

where Φ+≜[ΦT Φ]−1ΦT, ΨΦΦT, and I is identify matrix of P×P .

The matrix Φ+ is called the generalized inverse of φ. The method using Φ+ to determine the weights is called the generalized inverse method (GIM).

Using GIM, a decision procedure for parameters is defined as follows:

Algorithm Weight(c, b)

Input: D = {(xp, yr)|pZP }

Output: The weight parameters w

Step 1: Calculate μi based on Eq. (2)

Step 2: Calculate the matrix Φ and Φ+ using Eq. (20):

φpi=xpci=μipj=1nμjp,μip=j=1mexp12xjpcijbij2

Step 3: Determine the weight vectors w as follows:

w=Φ+yrE24

2.5. The relation between the proposed algorithm and related works

Let us explain the relation between the proposed method and related works using Figure 2.

  1. The fundamental flow of algorithm A is shown in Figure 2(a). Initial parameters of c, b, and w are set randomly, and all parameters are updated using SDM until the inference error become sufficiently small (see Figure 2(a)) [1].

  2. The first method using VQ is the one that both the initial assignment of parameters and the assignment of parameters in iterating step (see outer loop of Figure 2(b)) are also determined by NG using D. That is, it is learning method composed of two stages. The center parameters c are determined using D by VQ, b is computed by Eq. (15) using the result of center parameters, and weight parameter w is set to the results of SDM, where the initial values of w are set randomly. Further, all parameters are updated using SDM for the definite number of learning time. In iterating processes, parameters of the result obtained by SDM are set as initial ones of the next process. Outer iterating process is repeated until the inference error become sufficiently small (see Figure 2(b)).

  3. The second method using VQ is the one that is the same method as the first one except for selecting any learning data based on pM (x) (see Figure 2(c)). That is, center parameters c are determined by pM (x) using input and output learning data.

  4. The third learning method using VQ is the one that parameters w are determined using GIM after parameters c and b are determined by VQ using pM (x) and all parameters are updated based on SDM. That is, it is learning method composed of three phases. In the first phase, the center parameters c are determined using the probability pM (x), and b is computed from the result of center parameters. In the second phase, weight parameters w are determined by solving the interpolation problem using GIM. In the third phase, all parameters are updated using SDM for the definite number of learning time. In iterating process, the result of SDM is set to initial ones of the next process based on hill climing. Outer process is repeated until the inference error becomes sufficiently small (see Figure 2(d)).

  5. The fourth method is the same to the one as the third method except for using pM (x) in learning process of SDM (see Figure 2(d’)). This is a proposed method in this paper.

Figure 2.

Concept of conventional and proposed algorithms: mark 1 means that initial values of w are selected randomly and parameters w are set to the result of SDM after the second step.

3. The proposed learning method using VQ

Let us explain the detailed algorithm of Figure 2(d’). The method is called Learning Algorithm D’. It is composed of four techniques as follows:

  1. Determine the initial assignment of c using the probability pK(x).

  2. Determine the assignment of weight parameters w by solving the interpolation problem using GIM.

  3. The processes (1) and (2) and learning steps of SDM using pM (x) are iterated.

  4. The optimum value of M is determined by hill climing method [16].

The general scheme of the proposed method is shown as Figure 3, where cmin, bmin, and wmin are the optimal parameters for c, b, and w.

Figure 3.

Flowchart of Learning Algorithm D’ corresponding to Figure 2(d’).

Tmax1 and Tmax2: The maximum numbers of learning time for NG and SDM.

θ and θ1: Thresholds for MSE and SDM

M0, Mmax: The size of initial and final of ranges

△M: The rate of change of the range

D and D: Learning data D = {(xi, yr)|iZP } and D = {xi|iZP }

n: The number of rules

E(t): MSE of inference error at step t

Emin: The minimum MSE of E for the rule number

The proposed method of Figure 3 consists of five phases: In the first phase, all values for algorithm are initialized. In the second phase, the probability pM (x) is determined for the size of range M. In the third phase, parameters c are determined by NG using pM (x), and parameters b are computed from parameters c. In the forth phase, parameters w are determined from algorithm weight(c, b). In the fifth phase, all parameters are updated using pM (x) by SDM. The optimum number n of rules and the optimum size M of range are determined in Figure 4. That is, the number M for the fixed number n is adjusted, and the optimum values of n and M with the minimum number for MSE are determined. Especially, Learning Algorithm D is same method as Learning Algorithm D’ except for the step with the symbol “*” in Figure 3. In learning steps of SDM for Learning Algorithm D, learning data is selected randomly (see Figure 2(d)).

Figure 4.

The optimum values M ∗ and n∗ for M and n.

Likewise, we also propose improved methods for Figure 2(a)–(c). In learning process of SDM for algorithm (a), (b), and (c), any learning data is selected randomly. In the proposed methods, any learning data is selected based on pM (x). These algorithms are defined as (a’), (b’), and (c’).

4. Numerical simulations

In order to compare the ability of Learning Algorithms (a’), (b’), (c’), and (d’) with Learning Algorithms (a), (b), (c), and (d), numerical simulations for function approximation and pattern classification are performed.

4.1. Function approximation

The systems are identified by fuzzy inference systems. This simulation uses four systems specified by the following functions with two-dimensional input space [0, 1]2 (Eqs. (25)–(28)) and one output with the range [0, 1];

y=sinπx13x2E25
y=sin2πx13cosπx2+12E26
y=1.91.35+expx1sin13x10.62expx2sin7x22E27
y=sin10x10.52+10x20.52+12E28

In this simulation, Tmax1 = 100000 and Tmax2 = 50000 for (a) and Tmax1 = 10000 and Tmax2 = 5000 for (b), (c), and (d) and θ = 1.0 × 10−4, K0 = 100, Kmax = 190, K = 10, Kc = 0.01, Kb = 0.01, Kc = 0.1, the number of learning data is 200 and the number of test data is 2500.

Table 1 shows the results for the simulation. In Table 1, the number of rules, MSEs for learning and test, and learning time (second) are shown, where the number of rules means the one when the threshold θ of inference error is achieved in learning. The result of simulation is the average value from 20 trials. As a result, the results of (a’), (b’), (c’), and (d’) are almost same as the cases of (a), (b), (c), and (d) as shown in Table 1. It seems that there is no difference of the ability for the regression problem.

Eq. (25)Eq. (26)Eq. (27)Eq. (28)
(a)The number of rules8.322.552.46.1
MSE for learning(×10−4)0.470.350.650.41
MSE of test (×10−4)2.2921.122.837.37
(b)The number of rules4.76.89.64.0
MSE of learning (×10−4)0.440.380.840.35
MSE of test (×10−4)0.702.962.340.48
(c)The number of rules5.47.411.13.5
MSE of learning (×10−4)0.240.540.650.33
MSE of test (×10−4)0.651.364.480.44
(d)The number of rules4.36.19.73.5
MSE of learning (×10−4)0.280.390.690.29
MSE of test (×10−4)0.571.931.780.36
(a’)The number of rules5.08.911.84.7
MSE for learning (×10−4)0.370.410.520.45
MSE of test (×10−4)1.559.562.81.06
(b’)The number of rules5.08.913.04.3
MSE for learning (×10−4)0.420.380.650.39
MSE of test (×10−4)1.419.664.122.38
(c’)The number of rules5.78.013.14.1
MSE for learning (×10−4)0.400.230.570.35
MSE of test (×10−4)1.701.283.901.10
(d’)The number of rules4.66.910.03.6
MSE for learning (×10−4)0.390.490.620.35
MSE of test (×10−4)1.432.581.890.42

Table 1.

The results of simulations for function approximation.

4.2. Classification problems for UCI database

Iris, Wine, Sonar, and BCW data from UCI database shown in Table 2 are used as the second numerical simulation [20]. In this simulation, fivefold cross validation is used. As the initial conditions for classification problem, Kc = 0.001, Kb = 0.001, Kw = 0.05, εinit = 0.1, εfin = 0.01, and λ = 0.7 are used. Further, Tmax = 50000, M = 100, and θ = 1.0 × 10−2 for iris and wine. Tmax = 50000, M = 200, and θ = 2.0 × 10−2 for BCW; and Tmax = 5000, M = 100, and θ = 5.0 × 10−2 for sonar are used.

IrisWineBCWSonar
The number of data150178683208
The number of input413960
The number of class3322

Table 2.

The dataset for pattern classification.

Table 3 shows the result of classification problem. In Table 3, the number of rules, RMs for learning, and test data are shown, where RM means the rate of misclassification. As a result, the results of (a’), (b’), (c’), and (d’) are superior in the number of rules to the cases of (a), (b), (c), and (d) as shown in Table 3. It seems that there is the difference of ability for pattern classification.

IrisWineBCWSonar
(a)The number of rules3.47.814.411.0
RM for learning (%)3.01.41.65.3
RM of test (%)3.310.34.320.6
(b)The number of rules2.020.826.03.7
RM of learning (%)3.313.62.25.1
RM of test (%)3.316.63.518.2
(c)The number of rules2.03.24.84.0
RM of learning (%)3.31.51.65.1
RM of test (%)4.06.73.819.0
(d)The number of rules3.72.52.54.0
RM of learning (%)3.31.11.35.1
RM of test (%)3.86.52.118.3
(a’)The number of rules2.32.23.54.6
RM for learning (%)2.91.41.65.0
RM of test (%)3.58.53.920.0
(b’)The number of rules2.02.02.13.7
RM for learning (%)3.93.02.15.0
RM of test (%)4.99.23.919.0
(c’)The number of rules2.33.03.64.0
RM for learning (%)3.32.62.25.3
RM of test (%)4.07.23.519.4
(d’)The number of rules2.32.02.43.3
RM for learning (%)3.01.82.25.0
RM of test (%)3.57.63.719.1

Table 3.

The result for pattern classification.

Let us consider the reason why we can get the good result by using the probability pM (x). In the conventional learning method, parameters are updated by any data selected randomly from the set of learning data. In the proposed method, parameters are updated by any data selected based on the probability pM (x). The probability pM (x) is determined based on output change for learning data, so many fuzzy rules are likely to generate at or near the places where output change is large for the set of learning data.

For example, if the number of learning time is 100 and pM (x0) = 0.5, then learning data x0 is selected 50 times from the set of learning data in learning. As a result, membership functions are likely to generate at or near the places where output change is large for the set of learning data. The probability pM (x) is used in a method to improve the local search of SDM.

5. Conclusion

In this paper, we proposed the improved methods using VQ, GIM, and SDM. The features of the proposed methods are as follows:

  1. In determining the initial assignment of parameters, both input and output parts of learning data are used.

  2. The initial assignment of weight parameters is determined by GIM.

  3. In order to determine the range of the rate of output change, hill climing is used.

  4. Any learning data in SDM is selected based on the probability distribution

pM (x) considering both input and output of learning data.

As a result, it was shown that the proposed methods using the probability distribution considering both input and output parts of learning data were superior to other methods in numerical simulation of pattern classification.

In the future works, we will consider the new idea using VQ and apply the proposed method to control problem.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Hirofumi Miyajima, Noritaka Shigei and Hiromi Miyajima (November 5th 2018). Learning Algorithms for Fuzzy Inference Systems Using Vector Quantization, From Natural to Artificial Intelligence - Algorithms and Applications, Ricardo Lopez-Ruiz, IntechOpen, DOI: 10.5772/intechopen.79925. Available from:

chapter statistics

320total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Query Morphing: A Proximity-Based Approach for Data Exploration

By Jay Patel and Vikram Singh

Related Book

First chapter

Mechanical Models of Microtubules

By Slobodan Zdravković

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us