Heuristic Search Applied to Fuzzy Cognitive Maps Learning

Fuzzy Cognitive Maps were initially proposed by Kosko [1–3], as an extension of cognitive maps proposed by Axelrod [4]. FCM is a graph used for representing causal relationships among concepts that stand for the states and variables of the system, emulating the cognitive knowledge of experts on a specific area. FCM can be interpreted as a combination of Fuzzy Logic and Neural Networks, because it combines the sense rules of Fuzzy Logic with the learning of the Neural Networks. A FCM describes the behavior of a knowledge based system in terms of concepts, where each concept represents an entity, a state, a variable, or a characteristic of the system. The human knowledge and experience about the system determines the type and the number of the nodes as well as the initial conditions of the FCM.


Introduction
Fuzzy Cognitive Maps were initially proposed by Kosko [1][2][3], as an extension of cognitive maps proposed by Axelrod [4].FCM is a graph used for representing causal relationships among concepts that stand for the states and variables of the system, emulating the cognitive knowledge of experts on a specific area.FCM can be interpreted as a combination of Fuzzy Logic and Neural Networks, because it combines the sense rules of Fuzzy Logic with the learning of the Neural Networks.A FCM describes the behavior of a knowledge based system in terms of concepts, where each concept represents an entity, a state, a variable, or a characteristic of the system.The human knowledge and experience about the system determines the type and the number of the nodes as well as the initial conditions of the FCM.
The knowledge of the experts firstly defines the influence of a concept on the other, determining the causality relationships.Then, the concept values are qualitatively obtained by linguistic terms, such as strong, weak, null, and so on.These linguistic variables are transformed in numerical values using a defuzzification method, for instance, the center of gravity scheme described in [2].
Hence, in general, experts develop a FCM identifying key concepts, defining the causal relationships among the concepts, and estimating the strength of these relationships.However, when the experts are not able to express the causal relationships or they substantially diverge in opinion about it, data driven methods for learning FCMs may be necessary.
Particularly, this Chapter focuses on the FCM learning using three different population based metaheuristics: particle swarm optimization (PSO), genetic algorithm (GA) and differential evolution (DE).Two process control problems described in [8] and [9] are considered in this work.A complete convergence analysis of the PSO, GA and DE is carried out considering 10000 realizations of each algorithm in every scenario of the studied processes.
The rest of the Chapter has the following organization: Section 2 briefly describes the FCM modeling and the processes to be controlled.Section 3 considers the PSO, GA and the DE approaching for FCM learning, while Section 4 shows the simulation results.Lastly, Section 5 points out the main conclusions.

FCM modeling in control processes
In FCMs, concepts (nodes) are utilized to represent different aspects and behavior of the system.The system dynamics are simulated by the interaction of concepts.The concept Concepts are interconnected concerning the underlying causal relationships amongst factors, characteristics, and components that constitute the system.Each interconnection between two concepts, C i and C j , has a weight, W i,j , which is numerically represented by the strength of the causal relationships between C i and C j .The sign of W i,j indicates whether the concept C i causes the concept C j or vice versa.Hence, if: The number of concepts and the initial weights of the FCM are determined by human knowledge and experience.The numerical values, A i , of each concept is a transformation of the fuzzy values assigned by the experts.The FCM converges to a steady state (limit cycle) according to the scheme proposed in [3]: where k is the interaction index and f (•) is the sigmoid function that guarantees the values A i ∈ [0, 1].λ > 0 is a parameter representing the learning memory.In this work, λ = 1 has been adopted.

First control system (PROC1)
A simple chemical process frequently considered in literature [8,13,14], is initially selected for illustrating the need of a FCM learning technique.Figure 1 represents the process (PROC1) consisting of one tank and three valves that control the liquid level in the tank.
Valves V 1 and V 2 fill the tank with different liquids.A chemical reaction takes place into the tank producing a new liquid that leaves the recipient by valve V 3 .A sensor (gauger) measures the specific gravity of the resultant mixture.When the value of the specific gravity, G, is in the range [G min , G max ], the desired liquid has been produced.
The group of experts defined a list of five concepts, C i , i = 1, 2, . . ., 5, related to the main physical quantities of the process [8]: • Concept C 1 : volume of liquid inside the tank (depends on V 1 , V 2 , and, V 3 ); For this process, the fuzzy cognitive map in Figure 2 can be abstracted [8].The experts also had a consensus regarding the range of the weights between concepts, as presented in Equations (5a) to (5h For this problem the following weight matrix is obtained: According to [8], all the experts agreed on the range of values for W 2,1 , W 3,1 , and W 4,1 , and most of them agreed on the same range for W 1,2 and W 1,3 .However, regarding the weights W 1,5 , W 5,2 , and W 5,4 , their opinions varied significantly.
Finally, the group of experts determined that the values output concepts, C 1 and C 5 , which are crucial for the system operation, must lie, respectively, in the following regions:

Second control system (PROC2)
In [9] it is considered a system consisting of two identical tanks with one input and one output valve each one, with the output valve of the first tank being the input valve of the second (PROC2), as illustrated in Figure 3.The objective is to control the volume of liquid within the limits determined by the height H min and H max and the temperature of the liquid in both tanks within the limits T min and T max , such that The temperature of the liquid in tank 1 is increased by a heater.A temperature sensor continuously monitors the temperature in tank 1, turning the heater on or off.There is also a temperature sensor in tank 2. When T 2 decreases, the valve V 2 is open and hot liquid comes into tank 2.
Based on this process, a FCM is constructed with eight concepts: • Concept C 1 : volume of liquid inside the tank 1 (depends on V 1 and V 2 ); • Concept C 2 : volume of liquid inside the tank 2 (depends on V 1 and V 2 ); • Concept C 7 : Temperature of the liquid in tank 2; • Concept C 8 : Operation of the heater.
According to [9], the fuzzy cognitive map in Figure 4 can be constructed.It is assumed for PROC2 in this Chapter only causal constraints in the weights between concepts, where concepts W 4,1 and W 5,2 are ∈ (−1, 0] and the others have positive causality.The weight matrix for PROC2 is given by Finally, the values output concepts, C 1 , C 2 , C 6 and C 7 , which are crucial for the system operation, must lie, respectively, in the following regions: Two significant weaknesses of FCMs are its critical dependence on the experts opinions and its potential convergence to undesired states.In order to handle these impairments, learning procedures can be incorporated, increasing the efficiency of FCMs.In this sense, heuristic optimization approach has been deployed as an effective learning method in FCMs [15].

Heuristic FCM learning
A FCM construction can be done in the following manner: • Identification of concepts and its interconnections determining the nature (positive, negative or null) of the causal relationships between concepts.
• Initial data acquisition by the expert opinions and/or by an equation analysis when the mathematical system model is known.
• Submitting the data from the expert opinions to a fuzzy system which output represents the weights of the FCM.
• Weight adaptation and optimization of the initially proposed FCM, adjusting its response to the desired output.
• Validation of the adjusted FCM.
This section focuses on the weight adaptation (FCM learning).In [16] a very interesting survey on FCM learning is provided.The FCM weights optimization (FCM learning) can be classified into three different methods.
In the Hebbian based methodologies, the FCM weights are iteratively adapted based on a law which depends on the concepts behavior [10], [17].These algorithms require the experts' knowledge for initial weight values.The differential Hebbian learning (DHL) algorithm proposed by Dickerson and Kosko is a classic example [10].On the other hand, heuristic (metaheuristic) techniques tries to find a proper W matrix by minimizing a cost function based on the error among the desired values of the output concepts and the current output concepts' values (13).The experts' knowledge is not totally necessary, except for the causality constraints, due to the physical restrictions 1 .These techniques are optimization tools and generally are computationally complex.Examples of hybrid approaching considering Hebbian learning and Heuristic optimization techniques can be found in [18], [19].
There are several works in the literature dealing with heuristic optimization learning.Most of them are population-based algorithms.For instance, in [8] the PSO algorithm with constriction factor is adopted; in [20] it is presented a FCM learning based on a Tabu Search (TS) and GA combination; in [21] a variation of GA named RCGA (real codec-G.A.) is proposed; in [22] a comparison between GA and Simulated Annealing (SA) is done; in [13] the authors presented a GA based algorithm named Extended Great Deluge Algorithm.
The purpose of the learning is to determine the values of the FCM weights that will produce a desired behavior of the system, which are characterized by M output concept values that lie within desired bounds determined by the experts.Hence, the main goal is to obtain a connection (or weight) matrix that leads the FCM to a steady state with output concept values within the specified region.Note that, with this notation, and defining , and W = W ⊤ + I, with {•} ⊤ meaning transposition and I identity matrix, Equation ( 1) can be compactly written as After the updating procedure in (12), the following cost function is considered for obtaining the optimum weight matrix W [8]: where H (•) is the Heaviside function, and A i out , i = 1, . . ., M, represents the value of the ith output concept.

Particle Swarm Optimization
The PSO is a meta-heuristic based on the movement of a population (swarm) of individuals (particles) randomly distributed in the search space, each one with its own position and velocity.The position of a particle is modified by the application of velocity in order to reach a better performance [23,24].In PSO, each particle is treated as a point in a W-dimensional space2 and represents a candidate vector.The ith particle position at instant t is represented as In this Chapter, each x i,1 (t) represents one of the W i,j in the tth iteration.Each particle retains a memory of the best position it ever encountered.The best position among all particles until the tth iteration (best global position) is represented by x best g , while the best position of the ith particle is represented as x best i .As proposed in [25], the particles are manipulated according to the following velocity and position equations: where φ 1 and φ 2 are two positive constants representing the individual and global acceleration coefficients, respectively, U 1i and U 2i are diagonal matrices whose elements are random variables uniformly distributed (u.d.) in the interval [0, 1], and ω is the inertial weight that plays the role of balancing the global search (higher ω) and the local search (smaller ω).
A typical value for φ 1 and φ 2 is φ 1 = φ 2 = 2 [24].Regarding the inertia weight, experimental results suggest that it is preferable to initialize ω to a large value, and gradually decrease it.
The population size P is kept constant in all iterations.In order to obtain further diversification for the search universe, a factor V max is added to the PSO model, which is responsible for limiting the velocity in the range [±V max ], allowing the algorithm to escape from a possible local solution.
Regarding the FCM, the ith candidate vector x i is represented by a vector formed by W FCM weights.It is important to point out that after each particle update, restrictions must be imposed on W i,j according to the experts opinion, before the cost function evaluation.

Genetic Algorithm
Genetic Algorithm is an optimization and search technique based on selection mechanism and natural evolution, following Darwin's theory of species' evolution, which explains the history of life through the action of physical processes and genetic operators in populations or species.GA allows a population composed of many individuals to evolve under specified selection rules to a state that maximizes the "fitness" (maximizes or minimizes a cost function).Such an algorithm became popular through the work of John Holland in the early 1970s, and particularly his book Adaptation in Natural and Artificial Systems (1975).The algorithm can be implemented in a binary form or in a continuous (real-valued) form.This Chapter considers the latter case.
Initially, a set of P chromosomes (individuals) is randomly (uniformly distributed) defined, where each chromosome, x i , i = 1, 2, • • • , P consists of a vector of variables to be optimized, which, in this case, is formed by FCM weights, respecting the constraints.Each variable is represented by a continuous floating-point number.The P chromosomes are evaluated through a cost function.
T strongest chromosomes are selected for mating, generating the mating pool, using the roulette wheel method, where the probability of choosing a given chromosome is proportional to its fitness value.In this work, each pairing generates two offspring with crossover.The weakest T chromosomes are changed by the T offspring from T/2 pairing.
The crossover procedure is similar to the one presented in [26].It begins by randomly selecting a variable in the first pair of parents to be the crossover point where u is an u.d.random variable (r.v.) in the interval [0, 1], and ⌈•⌉ is the upper integer operator.The jth pair of parents, j = 1, 2, • • • , T/2 is defined as Then the selected variables are combined to form new variables that will appear in the offspring where β is also a r.v.u.d. in the interval [0, 1].Finally, In order to allow escaping from possible local minima, a mutation operation is introduced in the resultant population, except for the strongest one (elitism).It is assumed in this work a Gaussian mutation.If the rate of mutations is given by P m , there will be N m = ⌈P m • (P − 1) • W ⌉ mutations uniformly chosen among (P − 1) • W variables.If x i,w is chosen, with w = 1, 2, • • • , W, than, after Gaussian mutation, it is substituted by where N 0, σ 2 m represents a normal r.v. with zero mean and variance σ 2 m .After mutation, restrictions must be imposed on W i,j according to the experts opinion, before the cost function evaluation.

Differential Evolution
The Differential Evolution (DE) search has been introduced by Ken Price and Rainer Storn [27,28].DE is a parallel direct search method.As in GA, a population with P elements is randomly defined, where each W-dimension element consists of a x vector of variables to be optimized (FCM weights in this case) respecting the constraints.
In the classical DE, a perturbation is created by using a difference vector based mutation, where the real and constant factor F e (typically ∈ [0.5, 1.0]) controls the gain of the differential variation.The indexes r0, r1 and r2 are randomly chosen and mutually exclusive.In this work, an alternative perturbation procedure named DE/current-to-best/1/bin is considered [28,29], such that There are also other variants of the perturbation procedure [28,29].A uniform crossover operation is applied in order to have diversity enhancement, which mixes parameters of the mutation vector y i and x i , for generating the trial vector u i : where χ is the crossover constant typically ∈ [0.8, 1.0] and r is a random variable u.d. in the interval [0, 1).In order to prevent the case u i = x i , at least one component is taken from the mutation vector y i .
For selecting, the algorithm uses a simple method where the trial vector u i competes against the target vector x i , such that,

Simulation results
All the simulations were taken considering 10 4 trials for PROC1 and PROC2.It is worth noting that PSO, GA and DE input parameters were chosen previously after exhaustive simulation tests.As a result, optimal or quasi-optimal input parameters for the PSO, GA and DE heuristic algorithms have been obtained.

Process PROC1
The adopted values for the input parameters of PSO, GA and DE are summarized on Table 1.DE has less parameter inputs than the other two methods, which is a relative advantage.Four different scenarios for the process PROC1 were analyzed.The main performance results for each scenario is described in the next subsections.

Scenario 1
This scenario considers all the constraints on the FCM weights shown in Equations (5a) to (5h).As mentioned in [8] and also verified here, there is no solution in this case.

Scenario 3
In this Scenario, all the weights constrains were relaxed, but the causalities were kept, i.e., the value of the weights were fixed in the interval [0, 1] or in the interval [−1, 0), according to the causality determined by the experts.Table 5 presents the obtained results.As can be seen, P = 10 was enough for achieving 100% of convergence in GA and PSO.With P = 10, DE was not able to find the proper solution in 36 trials, resulting in a probability of success equal to 0.9964, but when P = 20, DE obtained 100% of success.Figure 7 presents the mean convergence of the concepts in 10 4 independent experiments.In this scenario, DE presented the fastest average convergence.
open or partially open); • Concept C 3 : state of V 2 (closed, open or partially open); • Concept C 4 : state of V 3 (closed, open or partially open); • Concept C 5 : specific gravity of the produced mixture.

Figure 2 .
Figure 2. Fuzzy Cognitive Map proposed in [8] for the chemical process control problem.
open or partially open); • Concept C 4 : state of V 2 (closed, open or partially open); • Concept C 5 : state of V 3 (closed, open or partially open); • Concept C 6 : Temperature of the liquid in tank 1;

Figure 4 .
Figure 4. Fuzzy Cognitive Map proposed in [9] for the chemical process control problem.

Figure 10 .
Figure 10.Mean convergence for (a) PSO and (b) DE in PROC2 with P = 20.
The height of the liquid inside, H, must lie in the range [H min , H max ].The controller has to keep G and H within their bounds, i.e., H min ≤ H ≤ H max ; ).