System Identification Using Fuzzy Cerebellar Model Articulation Controllers

Being an artificial neural network inspired by the cerebellum, the cerebellar model articulation controller (CMAC) was firstly developed in (Albus, 1975a, 1975b). With the advantages such as fast learning speed, high convergence rate, good generalization capability, and easier hardware implementation (Lin & Lee, 2009; Peng & Lin, 2011), the CMAC has been successfully applied to many fields; for example, identification (Lee et al., 2004), image coding (Iiguni, 1996), ultrasonic motors (Leu et al., 2010), grey relational analysis (Chang et al., 2010), pattern recognition (Glanz et al., 1991), robot control (Harmon et al., 2005; Mese, 2003; Miller et al., 1990), signal processing (Kolcz & Allinson, 1994), and diagnosis (Hung & Wang, 2004; Wang & Jiang, 2004). However, there are three main drawbacks of Albus’ CMAC, i.e., larger required computing memory (Lee et al., 2007; Leu et al., 2010; Lin et al., 2008)), relatively poor ability of function approximation (Commuri & Lewis, 1997; Guo et al., 2002; Ker et al., 1997), and difficulty of adaptively selecting structural parameters (Hwang & Lin, 1998; Lee et al., 2003).

The rest of this chapter is organized as follows.Starting from the first CMAC model in 1975 the development processes, related learning algorithms and system identification examples of the fuzzy CMACs are briefly reviewed in section 2. Sections 3 and 4 respectively discuss the self constructing FCMAC (SC-FCMAC) and the powerful parametric FCMAC (P-FCMAC).Lastly, section 5 concludes this chapter, with suggested directions of further researches.
firstly considered, while the developments of other five training schemes (i.e., credit assignment, gray relational, error norm, active deformable and Tikhonov ones) were mentioned as well.In order to reduce relative memory usages, proposed approaches of hierarchical and self-organizing CMACs were reasoned, whereas the fuzzy variation of the self-organizing CMAC will be further presented in the following section of this chapter.
In addition, the work of (Mohajeri et al., 2009b) provides a review of FCMACs, including over 23 relative aspects such as membership function, memory layered structure, defuzzification and fuzzy systems, was provided.Even FCMACs have originally reduced memory requirement for the CMAC, further discussions of clustering (such as fuzzy Cmean, discrete incremental clustering and Bayesian Ying-Yang) and hierarchical approaches for reducing memory sizes of FCMACs themselves were overviewed in (Mohajeri et al., 2009b) as well.Furthermore, as divided in (Dai et al., 2010), there are two classes of FCMACs architectures, i.e., forward and feedback fuzzy neural networks, which is useful for beginners to have a big picture of the basic concept for the FCMACs.
In the following sections, being the example models in this chapter the self-constructing FCMAC (SC-FCMAC, Lee et al., 2007a) and the powerful parametric FCMAC (P-FCMAC, Lin & Lee, 2009) are reviewed, in order to provide readers the insight knowledge of how these FCMAC work.Companied by their corresponding architectures and learning schemes, illustrative examples of system identification are provided as well.

The self-constructing fuzzy CMAC
From relative architectures to learning algorithms this section provides a brief review and discussions of the self-constructing FCMAC (SC-FMAC, Lee et al., 2007a).

Architecture of the SC-FCMAC model
As illustrated in Fig. 2, the SC-FCMAC model (Lee et al., 2007a) consists of the input space partition, association memory selection, and defuzzification.Similar to the traditional CMAC model, the SC-FCMAC model approximates a nonlinear function () yf x  by applying the following two primary mappings: : where X is an s-dimensional input space, A is an N A -dimensional association space, and D is a 1-dimensional (1-D) output space.These two mappings are realized by using fuzzy operations.The function S(x) maps each point x in the input space onto an association vector () where  represents the product operation, j  represents the j-th element of the association memory selection vector, x i represents the input value of the i-th dimension for a specific input state x, m ij represents the center of the receptive field functions, ij  represents the variance of the receptive field functions, and N D represents the number of the receptive field functions for each input state.The function P(a) computes a scalar output y by projecting the association memory selection vector onto a vector of adjustable fuzzy weights.Each fuzzy weight is inferred to produce a partial fuzzy output using the value of its corresponding association memory selection vector as the input matching degree.The fuzzy weight is considered here so that the partial fuzzy output is defuzzified into a scalar output using standard volume-based centroid defuzzification (Kosko, 1997;Paul & Kumar, 2002).The term volume is used in a general sense to include multi-dimensional functions.For 2-D functions, the volume reduces to the area.If v j is the volume of the consequent set and j  is the weight of the scale , j  then the general expression for defuzzification is where m j w is the mean value of the fuzzy weights and N L is the number of hypercube cells.The volume v j in this case is simply the area of the consequent weights, which are represented by Gaussian fuzzy sets.Therefore, ,

Learning Algorithm of the SC-FCMAC
In this section, for completing the SC-FCMAC model (Lee et al., 2007a) the self-constructing learning algorithm, which consists of an input space partition scheme (i.e., scheme 1) and a parameter-learning scheme (i.e., scheme 2), is reviewed.First, the input space partition scheme is used to determine proper input space partitioning and to find the mean and the width of each receptive field function.This scheme is based on the self-clustering method (SCM) to appropriately determine the various distributions of the input training data.Second, the parameter-learning scheme is based on the gradient descent learning algorithm.
To minimize a given cost function, the receptive field functions and the fuzzy weights are adjusted using the back-propagation algorithm.According to the requirements of the system, these parameters will be given proper values to represent the memory information.For the initial system, the values of the tuning parameters m j w and j w  of the fuzzy weights are generated randomly, and the m and  of the receptive field functions are generated by the proposed SCM clustering method.

The input space partition scheme
The receptive field functions can map input patterns.Hence, the discriminative ability of these new features is determined by the centers of the receptive field functions.To achieve good classification, centers are best selected based on their ability to provide large class separation.
An input space partition scheme, called the SCM, is used to implement scatter partitioning of the input space.Without any optimization, the proposed SCM is a fast, one-pass algorithm for a dynamic estimation of the number of hypercube cells in a set of data, and for finding the current centers of hypercube cells in the input data space.It is a distance-based connectionist-clustering algorithm.In any hypercube cell, the maximum distance between an example point and the hypercube cell center is less than a threshold value, which has been set as a clustering parameter and which would affect the number of hypercube cells to be estimated.
In the clustering process, the data examples come from a data stream, and the process starts with an empty set of hypercube cells.When a new hypercube cell is created, the hypercube cell center, C, is defined, and its hypercube cell distance and hypercube cell width, Dc and Wd, respectively, are initially set to zero.When more samples are presented one after another, some created hypercube cells will be updated by changing the positions of their centers and increasing the hypercube cell distances and hypercube cell width.Which hypercube cell will be updated and how much it will be changed depends on the position of the current example in the input space.A hypercube cell will not be updated any more when its hypercube cell distance, Dc, reaches the value that is equal to the threshold value D thr .
Figure 3 shows a brief clustering process using the SCM in a two-input space.The detailed clustering process can be found in (Lee et al., 2007a).In this way, the maximum distance from any hypercube cell center to the examples that belong to this hypercube cell is not greater than the threshold value D thr , though the algorithm does not keep any information on passed examples.The center and the jump positions of the receptive field functions are then defined by the following equation: , 1,2,..., , where 1, 2,..., , jn  and 1, 2,..., .
s rn  The threshold parameter D thr is an important parameter in the input space partition scheme.A low threshold value leads to the learning of fine clusters (such that many hypercube cells are generated), whereas a high threshold value leads to the learning of coarse clusters (such that fewer hypercube cells are generated).Therefore, the selection of the threshold value D thr critically affects the simulation results, and the threshold value is determined by practical experimentation or trial-and-error

The parameter-learning scheme
In the parameter-learning scheme, there are four adjustable parameters ( , , w , and j w  ) that need to be tuned.The parameter-learning algorithm of the SC-FCMAC model uses the supervised gradient descent method to modify these parameters.When we consider the single output case for clarity, our goal is to minimize the cost function E, defined as follows: where ( ) d y t denotes the desired output at time t and y(t) denotes the actual output at time t.The parameter-learning algorithm, based on back-propagation, is defined as follows.
The fuzzy weight cells are updated according to the following equations: (1 ) ( ) , (1 ) ( ) where η is the learning rate of the mean and the variance for the fuzzy weight functions between 0 and 1, and e is the error between the desired output and the actual output, .

d e yy 
The receptive field functions are updated according to the following equations: (1 ) ( ) where i denotes the ith input dimension for i=1,2,…,n, m ij denotes the mean of the receptive field functions, and σ ij denotes the variance of the receptive field functions.The parameters of the receptive field functions are updated by the amount where η is the learning rate of the mean and the variance for the receptive field functions.

An example: Learning chaotic behaviors
A nonlinear system y(t) with chaotic behaviors (Wang, 1994) We solved the differential Eqs. ( 18) and ( 19) with t from t=0 to t=20 and with x 1 (0)=1.0and x 2 (0)=1.0.We obtained 107 values of x 1 (t) and x 2 (t) (the chaotic glycolytic oscillator, Wang, 1994) and 107 values of y(t).  p y t , for p=1,2,...,107.For this chaotic problem, the initial parameters η=0.1 and D thr =1.3 were chosen.First, using the SCA clustering method, we obtained three hypercube cells.The learning scheme then entered parameter learning using the back-propagation algorithm.The parameter training process continued for 200 epochs, and the final trained rms (root mean square) error was 0.000474.The number of training epochs is determined by practical experimentation or trialand-error tests.
We compared the SC-FCMAC model with other models (Lin et al., 2004;Lin et al., 2001).Figure 5(a) shows the learning curves of the SC-FCMAC model, the FCMAC model (Lin et al., 2004), and the SCFNN model (Lin et al., 2001).As shown in this figure, the learning curve that resulted from our method has a lower rms error.Trajectories of the desired output y(t) and the SC-FCMAC model's output are shown in Figures 5(b)-5(d).A comparison analysis of the SC-FCMAC model, the FCMAC model (Lin et al., 2004), and the SCFNN model (Lin et al., 2001) is presented in Table 1.It can be concluded that the proposed model obtains better results than some of the other existing models (Lin et al., 2004;Lin et al., 2001).

The parametric fuzzy CMAC (P-FCMAC)
In this section the architecture and learning algorithms of the parametric FCMAC (P-FCMAC, Lin & Lee, 2009) are reviewed, which mainly derived from the traditional CMAC and Takagi-Sugeno-Kang (TSK) parametric fuzzy inference system (Sugeno & Kang, 1988;Takagi & Sugeno, 1985).Since the SCM are inherent in the scheme of input-space partition for the P-FCMAC model, the performance of P-FCMAC is definitional better than the SC-FCMAC.Therefore, another system-identification problem is taken in order to explore the benefit of the P-FCMAC, fully and more fairly.

Architecture of the P-FCMAC model
As illustrated in Fig. 6, the P-FCMAC model consists of the input space partition, association memory selection, and defuzzification.The P-FCMAC network like the conventional CMAC network that also approximates a nonlinear function y=f(x) by using two primary mappings, S(x) and P().These two mappings are realized by fuzzy operations.The function S(x) also maps each point x in the input space onto an association vector = S(x)A that has N L nonzero elements (N L <N A ). Different from conventional CMAC network, the association vector 12 (, ,, ) , where 01 for all components in , is derived from the composition of the receptive field functions and sensory inputs.Another, several hypercubes is addressed by the input state x that hypercube value is calculated by product operation through the strength of the receptive field functions for each input state.In the P-FCMAC network, we use Gaussian basis function as the receptive field functions and the linear parametric equation of the network input variance as the TSK-type output for learning.Some learned information is stored in the receptive field functions and TSK-type output vectors.A one-dimension Gaussian basis function can be given as defined in Eq. ( 3).Similar to section 2.2, if a N D -dimensional problem is considered a Gaussian basis function with N D dimensions is expressed as Eq. ( 4) defined.
Each element of the receptive field functions is inferred to produce a partial fuzzy output by applying the value of its corresponding association vector as input matching degree.The partial fuzzy output is defuzzified into a scalar output y by the centroid of area (COA) approach.Then the actual output y is derived as,

System Identification Using Fuzzy Cerebellar Model Articulation Controllers 433
The j-th element of the TSK-type output vectors is described as where 0 j a and ij a denote the scalar value, N D the number of the input dimensions, N L the number of hypercube cells, and x i denotes the ith input dimension.Based on the above structure, a learning algorithm will be proposed to determine the proper network structure and its adjustable parameters.

Learning algorithm of the P-FCMAC model
Similar to the SC-FCMAC model, the P-FCMAC's learning algorithm consists of an input space partition scheme and a parameter learning scheme.As the same SCM method was applied for input space partition, in the following paragraph the main focus is drawing to the scheme of parameter learning for the P-FCMAC, which is exhibited as in Figure 7.
First, the input space partition scheme (i.e., scheme 1) is used to determine proper input space partitioning and to find the mean and the width of each receptive field function.The input space partition is based on the SCM to appropriately determine the various distributions of the input training data.After the SCM, the number of hypercube cells is determined.That is, we can obtain the initial m and σ of receptive field functions by using SCM.Second, the parameter learning scheme (i.e., scheme 2) is based on supervised learning algorithms.The gradient descent learning algorithm is used to adjust the free parameters.To minimize a given cost function, the m and σ of the receptive field functions and the parameters 0 j a and ij a of the TSK-type output vector are adjusted using the backpropagation algorithm.According to the requirements of the system, these parameters will be given proper values to represent the memory information.For the initial system, the values of the tuning parameters 0 j a and ij a of the element of the TSK-type output vector are generated randomly and the m and σ of receptive field functions are generated by the proposed SCM clustering method.Fig. 7. Flowchart of the P-FCMAC model's learning scheme.
In the parameter learning scheme, there are four parameters need to be tuned, i.e. m ij , σ ij , a 0j , and a ij .The total number of tuning parameters for the multi-input single-output P-FCMAC network is 2N D *N L +4N L , where N D and N L denote the number of inputs and hypercube cells, respectively.The parameter learning algorithm of the P-FCMAC network uses the supervised gradient descent method to modify these parameters.When we consider the single output case for clarity, our goal is to minimize the cost function E, defined as in Eq. ( 9).
Then their parameter learning algorithm, based on backpropagation, is described in detail as follows.The TSK-type outputs are updated according to the following equation: 00 0 (1 ) ( ) where a 0j denotes the proper scalar, a ij the proper scalar coefficient of the i-th input dimension, and j the j-th element of the TSK-type output vector for j=1,2,…,N L .The elements of the TSK-type output vectors are updated by the amounts where η is the learning rate, between 0 and 1, and e is the error between the desired output and the actual output, .
d ey y   The receptive field functions are updated according to the following equation: (1 ) ( ) where i denotes the i-th input dimension for i=1,2,…,n, m ij the mean of the receptive field functions, and σ ij the variance of the receptive field functions.The parameters of the receptive field functions are updated by the amounts where η is the learning rate of the mean and the variance for the receptive field functions.

An example: Identification of a nonlinear system
In this example, a nonlinear system with an unknown nonlinear function, which is approximated by the P-FCMAC network as shown in Figure 8(b), is a model.First, some of training data from the unknown function are collected for an off-line initial learning process of the P-FCMAC network.After off-line learning, the trained P-FCMAC network is applied to the nonlinear system to replace the unknown nonlinear function for on-line test.
Consider a nonlinear system in (Wang, 1994) governed by the difference equation: ( 1) 0.3 ( ) 0. .The error is defined as in Eq. ( 9) In this example, the initial threshold value in the SCM is 0.15, and the learning rate is η=0.01.
After the SCM clustering process, there are eleven hypercube cells generated.Using the first and second parameter learning schemes, the final trained error of the output approximates 0.00057 and 0.00024 after 300 epochs.The numbers of the adjustable parameters of the trained P-FCMAC network are 66.9(a) and (c) for the scheme-1 and scheme-2 methods.The errors between the desired output and the P-FCMAC network output are shown in Figs.9(b) and (d) for the scheme-1 and scheme-2 methods.The learning curves of the scheme 1 and scheme 2 methods are shown in Fig. 10 2. Training data and approximated data obtained using the P-FCMAC network for 300 epochs Table 3 shows the comparison the learning result among various models.The previous results were taken from (Wan & Li, 2003;Wang et al., 1995;Farag et al., 1998;Juang et al., 2000).The performance of the very compact fuzzy system obtained by the P-FCMAC network is better than all previous works.www.intechopen.com

Conclusions
In this paper, starting from the discussion of traditional CMAC approach, two novel and latest developed fuzzy CMACs are reviewed.By summarizing the drawbacks of the CMAC model, relative improvement made in the literature have been addressed and presented.Via the exhibited self-constructing FCMAC (SC-FCMAC) and parametric FCMAC (P-FCMAC), not only the inference ability of FCMAC is demonstrated, but also presented the state-of-the art in the field of fuzzy inference systems.

Fig
Fig. 3.A brief clustering process using the SCM with samples P 1 to P 9 in a 2-D space.(Notations: P i for pattern, C j for hypercube cell center, Dc j is hypercube cell distance, Wd j _x represents x-dimensions hypercube cell width, and Wd j _y stands for y-dimensions hypercube cell width) (a) The example P 1 causes the SCM to create a new hypercube cell center C 1 .(b) P 2 : update hypercube cell center C 1 , P 3 : create a new hypercube cell center C 2 , P 4 : do nothing.(c) P 5 : update hypercube cell C 1 , P 6 : do nothing, P 7 : update hypercube cell center C 2 , P 8 : create a new hypercube cell C 3 .(d) P 9 : update hypercube cell C 1 .
11)where j denotes the j-th fuzzy weight cell for j=1,2,…N L , m j w the mean of the fuzzy weights, and j w  the variance of the fuzzy weights.The elements of the fuzzy weights are updated by Figure4shows y(t), which is the desired function to be learned by the SC-FCMAC model.

For
on-line testing, we assume that the series-parallel model shown in Figure 8 (b) is driven by ( ) sin(2 / 250) uk k   .The test results of the P-FCMAC network are shown in Fig.

Fig. 8 .
Fig. 8.The series-parallel identification model.(a) Off-line learning by twenty-one training data in Table IV; (b) On-line testing for real ( ) sin(2 / 250) uk k 

Fig. 9 .
Fig. 9. Comparison of simulation results.(a) Outputs of the nonlinear system (solid line) and the identification model using the proposed network (dotted line) for the scheme 1 method.(b) Identification error of the approximated model for the scheme 1 method.(c) Outputs of the nonlinear system (solid line) and the identification model using the proposed network (dotted line) for the scheme 2 method.(d) Identification error of the approximated model for the scheme 2 method.

Fig. 10 .
Fig. 10.Learning curves for the scheme 1 and scheme 2 parameter learning methods.

Table 1 .
Comparisons of the SC-FCMAC model with some existing models for dynamic system identification

Table 3 .
Comparison results of the twenty-one training data for off-line learning.