A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

The impact of innovated Neuro-Fuzzy System (NFS) has emerged as a dominant tech- nique for addressing various difficult research problems in business. ANFIS (Adaptive Neuro-Fuzzy Inference system) is an efficient combination of ANN and fuzzy logic for modeling highly non-linear, complex and dynamic systems. It has been proved that, with proper number of rules, an ANFIS system is able to approximate every plant. Even though it has been widely used, ANFIS has a major drawback of computational complex -ities. The number of rules and its tunable parameters increase exponentially when the numbers of inputs are large. Moreover, the standard learning process of ANFIS involves gradient based learning which has prone to fall in local minima. Many researchers have used meta-heuristic algorithms to tune parameters of ANFIS. This study will modify ANFIS architecture to reduce its complexity and improve the accuracy of classification problems. The experiments are carried out by trying different types and shapes of mem bership functions and meta-heuristics Artificial Bee Colony (ABC) algorithm with ANFIS and the training error results are measured for each combination. The results showed that modified ANFIS combined with ABC method provides better training error results than common ANFIS model.


Introduction
The recent advances in artificial intelligence and soft computing techniques have opened new avenues for researchers to explore their applications. These machine learning techniques consist of several intelligent computing paradigms, including artificial neural networks (ANN), support vector machine (SVM), decision tree, neuro-fuzzy systems (NFS), which have been successfully employed to model various real-world problems [1]. These problems broadly range from engineering to finance, geology and bio-sciences etc.
Among the other soft computing techniques mentioned above, ANFIS is an efficient combination of ANN and fuzzy logic for modeling highly non-linear, complex, and dynamic systems [2]. It has been proved that, with proper number of rules, an ANFIS system is able to approximate every plant. Therefore, ANFIS systems are widely used and play the advantage of good applicability since they can be interpreted as non-linear modeling and conventional linear techniques for state estimation and control [3]. Even though it has been widely used, ANFIS has a major drawback of computational complexities. The number of rules and its tunable parameters increase exponentially when the numbers of inputs are large. Moreover, the standard learning process of ANFIS involves gradient based learning which has prone to fall in local minima. The systems designed in literature generally have few inputs and ANFIS models with large inputs have not been implemented due to curse of dimensionality and many researchers have used meta-heuristic algorithms to tune parameters of ANFIS. This study will modify ANFIS architecture to reduce its complexity. The proposed approach will focus on trying different types of membership functions because ANFIS accuracy highly depends on the type and shape of its membership functions. This research will propose solution to implement ANFIS for the problems with large number of inputs. Therefore, the problem of curse of dimensionality will be addressed in this research.
This study is organized in the following order: Section 2 defines the comprehensive literature review and gap analysis. Section 3 presented the methodology of the research to solve the gaps identified in previous section. Section 4 explains the results and analysis based on the experiments performed in this study. Section 5 summarized the whole research finding and future work.

Research background
Recently, Neuro-Fuzzy system has gained more attraction between research communities than other types of fuzzy expert systems since it combines the advantages of learning ability of neural network and reasoning ability of fuzzy logic to solve many non-linear and complex real-world problems with high accuracy. This combination has been broadly used in the areas of education system, medical system, electrical system, traffic control, image processing, predictions and control of linear and nonlinear systems [4]. Since, every technique has few advantages and limitations such as; fuzzy logic is good on describing how they reach to certain decision, but they cannot learn rules themselves while neural networks are good on pattern matching whereas they cannot give a clear picture of how they reach to the certain decision. Adaptability is the main advantage of neural networks; therefore, it has ability to adjust their weights automatically to optimize their behavior as pattern recognizers, decision makers, predictors, etc. In fuzzy expert systems, it is not an easy task to find appropriate membership function parameters and rules. It either needs high level of expertise or a hard practice of trial-and-error [5]. These limitations are the vital reasons behind the inspiration of building hybrid networks by combining two techniques to overcome the limitations of an individual technique. Among those hybrid networks, adaptive neuro-fuzzy inference system (ANFIS) is one of the best combinations of neural network and fuzzy logic. Next section will be explaining the basic structure and working of ANFIS.

Adaptive neuro-fuzzy inference system (ANFIS)
The concept of ANFIS was introduced by Jang in 1993, which is a proficient combination of neural network and fuzzy logic. Since ANFIS use fuzzy logic, Fuzzy inference system (FIS) is a useful soft computing technique introducing the concept of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning; therefore, while designing the ANFIS, the selection of the FIS is the major concern. It is a framework of neuro-fuzzy model that can integrate human expertise as well as adapt itself through learning. As an adaptive neuro-fuzzy model; it has advantage of being flexible, adaptive and effective for non-linear complex problems [6]. Recently, ANFIS has been successfully applied to the applications involving classification, rule-based process controls and pattern recognition.
ANFIS consists of five layers, so nodes of each layer are connected to another layer by directed links. Therefore, to produce the output for a single node, each node performs a particular function on its incoming signals. Hence it is often known as a multilayer feed-forward network. The main objective of the ANFIS is to determine the optimum values of the equivalent fuzzy inference system parameters by applying a learning algorithm. Figure 1 shows complete ANFIS architecture. The five layers are: (i) fuzzification layer, (ii) product layer, (iii) normalized layer, (iv) defuzzification layer, (v) output layer. The two types of nodes are fixed (circle nodes) and adaptable (square nodes). Two fuzzy if-then rules with two inputs x, y and one output f are considered as: , r 2 , are linear parameters that are the design parameters identified during training process. Parameters in IF part are known as antecedent or premise parameters, whereas, parameters in THEN part are known as consequent parameters. Nodes of layer 1 (Premise part) and layer 4 (consequent part) are trainable or adaptable nodes, while the nodes of layer 2 (product) and layer 3 (normalization) are fixed nodes. To execute the above two rules, the five-layer architecture of ANFIS is explained as following:

Layer 1 (Fuzzification)
This layer is an adaptive layer with the nodes of square shape. Every input of node i in this layer is adaptive membership function to generate the membership degree of linguistic variables. Membership function can be of any shape, i.e. Triangle, Trapezoidal, Gaussian, or generalized Bell function.
Here, x and y are the two inputs and if μ A i and μ B i are Gaussian MFs (Eq. 2), they are specified by two parameters center c and width σ , which are referred to as premise parameters.
O 1,i is the output of layer l and the ith node.

Layer 2 (Product)
These are fixed nodes that represent the product Π to calculate firing strength of a rule. This layer accepts input values from first layer and turns as a membership function to represent fuzzy sets of respective input variables. The output of each node is the product of all the incoming signals that are coming to it.

Layer 3 (Normalization)
Nodes of layer 3 are also fixed nodes. All nodes in this layer are labeled as N. Each node normalizes firing strength of a rule from previous layer by calculating the ratio of the ith rule's firing strength to the sum of all rules' firing strength.
Where w ¯ is referred to as normalized firing strength of a rule.

Layer 4 (Defuzzification)
Nodes in this layer are adaptive with node function, where w ¯ is rule's normalized firing strength and represents output of layer 4. Parameters in this layer are linear parameters well-known as consequent parameters. These parameters are identified during the training process of ANFIS.

Layer 5 (Overall output)
This single node is called output layer which is labeled as " ∑ ." This layer only does summation of outputs of all rules in previous layer and converts fuzzy result into crisp output.

ANFIS learning algorithm
ANFIS learns and update its all modifiable parameters by using two pass learning algorithm; forward pass and backward pass. ANFIS train its parameters such as c, σ (MF parameters) and p i , q i , r i (consequent parameters) for minimizing error between actual and the desired output; using a hybrid of gradient descent (GD) and least squares estimator (LSE). During forward pass of the learning algorithm, consequent parameters are updated by LSE method and signal is node outputs. During backward pass, the premise parameters are updated by the GD algorithm and the error signals propagate backward from the output layer to until input layer. At this point, neural network learns and train to determine parameter values that can sufficiently fit the training data ( Table 1).

Data partitioning
ANFIS can be constructed by partitioning of the input-output data into rule patches. So, this can be accomplished by using three methods such as; genfis1 (grid-partitioning), genfis2 (subtractive clustering) and genfis3 (Fuzzy C-Mean). Grid partitioning employs grid partitioning approach that divides data space into grids based on the number of memberships function per input. Generally, it is appropriate to use grid partitioning only for problems with a less than six number of input variables. The number of rules increases exponentially when the number of inputs increases in underlying system [7]. To avoid this issue, the clustering approach is an effective partitioning method that divides data points into groups (clusters) according to the membership grade or degree. For this reason, clustering based methods seem to be preferred to the grid partitioning techniques. Subtractive clustering method is an effective clustering approach when there is no clear idea about how many clusters will be needed for a given dataset. It reduces the computation time by finding the center of clustering by using the data itself [8]. Fuzzy C-Mean (FCM) algorithm allows one data to belong to two or more clusters. Thus, FCM let membership function value be in the range of 0 until 1 which certainly improve the result [9]. Furthermore, other than partitioning methods, to train ANFIS parameters, many researchers have proposed ANFIS training methods based on meta-heuristic algorithms. Following Section 2.4 discuss more about meta-heuristic algorithms.

Optimization using meta-heuristics
In meta-heuristic algorithms, meta means "higher level," all meta-heuristic algorithms use some trade-off of local search and global exploration. The main components of any metaheuristic algorithm are: exploitation and exploration. Exploration means generating diverse solutions to explore the search space on the global scale, while exploitation means focusing on the search in a local region by exploiting the information that a current good solution is found in this region. As for ANFIS, structure learning and parameters identification are the two dimensions of ANFIS training. Some have focused on either of the two dimensions, while others have tried to work on both of the issues. The original ANFIS proposed by Jang uses hybrid learning, which uses GD and LSE. However, the drawbacks of GD have opted the researchers to different alternatives such as; ant colony, particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm (CSA), artificial bee colony (ABC). Among these training methods in meta-heuristic paradigm, this study will employ ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism.

Artificial Bee Colony algorithm
Artificial Bee Colony (ABC) is a swarm intelligence-based algorithm which is inspired by the intelligent behavior of honey bees. It was introduced by DervisKaraboga in 2005 to solve optimization problems. Since then, the ABC algorithm has been used infields such as data mining, image processing and numerical problems. As in Figure 2, ABC provides a population-based search, the bees searching for food are divided into three groups in ABC: employed bees, onlooker bees and scout bees. Employed bees used to look around the search space to hunt and gather information about the position and quality of food source while the onlooker bees stay in hive and choose the food source based on information given by the employed bees. Scout bees try to replace the abandoned employed bees to search for new food sources arbitrarily [21]. The food source's position is a possible solution to theoptimization problem. The amount of nectar in a food source measured the quality of the problem.

Comparative study of ANFIS
The successful integration of neural network and fuzzy logic models in the form of ANFIS holds the advantages of solving applications that are highly non-linear and complex. Based on the robustness in results, ANFIS has been implemented in a wide variety of applications including classification tasks, rule-based process control, pattern recognition [4]. Even though ANFIS is one of the best tradeoff between neural network and fuzzy systems, providing smoothness and adaptability in the system because of fuzzy logic interpolation and the neural network back-propagation, on the other hand, the model faces strong computational complexity.
Generally, literature emphasis on three main problems regarding ANFIS to solve computational complexity: reducing rule-base, developing efficient training methods and membership functions selection. A significant increment in rules increase the complexity of ANFIS architecture as rules are generated with all possible combinations of antecedent and quality of result depends on the effectiveness of these rules [10]. Therefore, a careful study of the literature reveals that there have been many techniques to achieve rule-base minimization and accuracy maximization. These techniques include selecting potential and removing nonpotential rules from the entire ANFIS knowledge-base, such as; Karnaugh map (K-Map) [11]. Moreover, other than rule-base minimization issue, training the parameters of ANFIS model is one of the main issues encountered when the model is applied to the real-world problems. The original ANFIS architecture that was introduced by Jang has drawbacks, since it uses hybrid learning algorithm which is the combination of GD and LSE. Because of using GD, it has problem to be likely trapped in local [6]. To cope with this many researchers have proposed meta-heuristic algorithms; such as, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC) etc. Similarly, a new hybrid method was introduced in literature by Orouskhani and Mansouri [12]. This study modified Cat Swarm Optimization for training the parameters of ANFIS by updating antecedent parameters. On the other hand, Karaboga and Kaya [13] proposed an Adaptive and Hybrid Artificial Bee Colony (aABC) algorithm to train all parameters of ANFIS. Najafi [14] employed PSO with ANFIS to optimize and train parameters for prediction of viscosity of mixed oils. Along with issue such as; rule base and training methods; ANFIS also suffer from uncertainty. Therefore, techniques like fuzzy logic have been applied because fuzzy logic with the help of membership function is capable to describe uncertainty [15]. As ANFIS also practice fuzzy logic, therefore correct choice of membership functions is most important factor in building the ANFIS model. Although, various studies in literature can be found for the choice of membership function in fuzzy inference system such as; Suntae [16] did the comparison of membership functions in security robot system for decision making. Similarly, the key focus of Saha [17] study was to investigate the best membership function in applications of sign language. Adil [18] compared the effects of different types of membership functions to determine the performance of fuzzy logic controller. But for the case of neuro-fuzzy systems like ANFIS, such studies have been barely found.
As ANFIS accuracy is highly dependent to the type and shape of membership function, therefore, this study will try different types and shapes of membership function to get best suitable membership function for ANFIS model. Additionally, this study will focus on modifying the standard ANFIS architecture by reducing the fourth layer as consequent part of rules contains more number of parameters, plus; one of the best meta-heuristic approach will be used to tune the parameters of ANFIS model. The final model will be ideal for dataset having large number of inputs. The proposed approach to achieve the goal of this study is further explained in next section which defines research methodology.

Research methodology
The research methodology starts by collecting classification datasets. This research will solve six real world classification problems Iris, Breast Cancer, Car evaluation, Teacher Assistant, Glass Identification and Seeds based on small to large number of inputs (4-10 inputs) taken from the University of California Irvine Machine Learning Repository (UCIMLR). For splitting the data into training and testing purpose, according to literature [19,20] most researchers practiced the 70:30 ratio (70% training, 30% testing) because the more data applied for the training, the more optimal and accurate results a system generates. Therefore, in this study the 70% of the dataset instances were selected for training set and the remaining 30% of the dataset instances were chosen for testing set.
Right after collecting, preprocessing and partitioning the selected data, in modification phase, different types and shapes of membership functions will be examined in standard ANFIS architecture while solving six classification problems taken from UCI repository. ANFIS architecture will be modified to lessen its computational complexities because ANFIS's basic architecture contains five layers and use GD and updating parameters consumes a lot of computational time when the number of parameters are large. The dataset collected will be then applied in modified ANFIS architecture to compare and validate results between standard ANFIS architecture and modified ANFIS architecture. The outcome of this research will be modified ANFIS architecture as stated in proposed approach (Figure 3: Research Framework).

Proposed approach
Among various clustering techniques available, grid partitioning (genfis1), subtractive clustering (genfis2) and fuzzy C-Mean clustering (genfis3) most widely used during clustering into fuzzy inference system (FIS). It acts as a model that will reflect the relationship between the different input parameters. As it is observed from the comparative analysis and literature review that those clustering approaches are very useful to generate better accuracy for ANFIS model, since it generates maximum number of rules by considering all possibilities, but it also increases computational cost as consequent part of rules contains more number of parameters. Therefore, fourth layer which holds linear coefficients shares most of the computational cost of training algorithm because consequent part holds more number of trainable parameters than the parameters in premise part. Following example explains number of trainable parameters holds by premise and consequent part: Where n represents number of inputs and each input n is partitioned into m number of membership functions and parameters need to be trained or modified in membership function are labeled as p. According to this, total number of premise trainable parameters are n × m × p . Similarly, consequent trainable parameters per rule are n + 1 and m n expresses total number of rules in system. Therefore, the total number of consequent trainable parameters are m n × ( n + 1 ) . For instance; If dataset applied in ANFIS model having n = 4 inputs, m = 2 (membership functions) for each input and membership type of triangular function with parameters p = 3, the number of premise and number of consequent trainable parameters will be as follows: Thus, out of the total trainable parameters 104, the number of premise trainable parameters is 24 and consequent trainable parameters are 80 which is far more than premise parameters. Therefore, removal of fourth layer may contribute in reduction of computation hence ANFIS architecture can be reduced to four layers. Following Figure 3 shows standard architecture of ANFIS with 5 layers and their associated functions in each layer. This research will modify ANFIS architecture by removing fourth layer which contain adaptable nodes and holds more parameters to update. As observed in Figure 3, the calculation performed by fourth layer practice this following equation with the node function: Where w ¯ is rule's normalized firing strength from third layer and {p i , q i , r i } is a first order polynomial from f i . Parameters in fourth layer are referred to as consequent parameters and are identified during the training process of ANFIS. This research will try to remove two extra parameter p i and q i from the function f i in fourth layer as following f To make it f i = r i , this approach will reduce two trainable parameters and likewise make third layer as an adaptable to merge the fourth layer function into third layer as: So, the resultant ANFIS will be a modified architecture that is more simple architecture comprising total four layers and less trainable parameters. For example, following Figure 4 shows proposed approach to modify ANFIS architecture to make it simple and less complex: According to the modified architecture of ANFIS, the total parameters trained by system will be reduced. By taking the same example as standard ANFIS model. If dataset applied in ANFIS model having n = 4 inputs, m = 2 (membership functions) for each input and membership type of gaussian function with parameters p = 2, the number of premise and number of consequent trainable parameters will be as follows: Hence, it can be observed that out of 32 total trainable parameters, the premise parameters are 16 and consequent parameters are 16. The consequent parameters in modified ANFIS model are far less than the consequent parameters in standard ANFIS model. Therefore, after modifying the architecture, different datasets with small to large amount of inputs will be applied in modified ANFIS to compare results in terms of computation time and efficiency with performance validation matrix MSE percentage of accuracy, number of trainable parameters and number of epochs.
Furthermore, training algorithm also plays an important role to train ANFIS model. It is observed form the literature review section, that the meta-heuristic algorithms, such as ant colony, particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm (CSA), artificial bee colony (ABC) and many more algorithms have been developed and used to train ANFIS model by researchers recently. Among these training methods in metaheuristic paradigm, this study will employ ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism. Therefore, this study will not only reduce computational complexity of ANFIS architecture, but this approach will also apply efficient training mechanism to train ANFIS parameters by using meta-heuristic swarm intelligence-based approach.

Experimental results and discussions
To achieve objectives of the research, this section discusses and evaluates the experimental results of the study in detail by following the proposed research framework carefully,explained in previous section. Firstly, the experiments have been performed with ANFIS models gen-fis1, genfis2 and genfis3 to investigate the best suitable membership function for input parameters as ANFIS model. Secondly, standard ANFIS model consists of five layers was modified by employing ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism. Therefore, this study will reduce computational complexity of ANFIS architecture. The performance of proposed modified ANFIS model was compared with another three clustering genfis1, genfis2 and genfis3 based ANFIS models with the measurement criteria of MSE, percentage of accuracy and number of trainable parameters with the number of epochs. These simulations were performed to solve the classification problems; thus, six benchmark classification datasets naming Iris, Teacher Assistant Evaluation, Car Evaluation, Seeds, Breast Cancer and Glass Identification were taken from University of California Irvine Machine Learning Repository (UCIMLR) ranging from 4 to 10 input attributes.

Membership function
To evaluate the results of membership functions on performance of ANFIS, Fuzzy Logic Toolbox TM was used in MATLAB to employ ANFIS models with clustering methods genfis1, genfis2 and genfis3 into FIS as input model. Furthermore, most of the settings for ANFIS models were used as default as mentioned in the toolbox, however the distinguishing changes are presented in Table 2.
As ANFIS with grid-partitioning method (genfis1) offers to try and test different types of membership functions, it is noteworthy to mention that ANFIS with grid-partitioning (genfis1) was used to test 4 basic types of membership functions, however subtractive clustering (genfis2) and fuzzy c-mean clustering (genfis3) use Gaussian types of membership function by default in MATLAB toolbox. ranked from smallest to largest (rank 1-6) according to the sum of training and testing RMSE for each dataset. Furthermore, the average of these ranks is being computed to generate overall rank in the group of membership functions. Table 3 presents the final result of comparison in terms of average of train and test RMSE, average of ranks and overall ranks.
The overall results of the experiments from Table 3 shows that among all three models of ANFIS (i.e. genfis1, genfis2, and genfis3) and membership functions, ANFIS with subtractive clustering (genfis2) and Gaussian membership function performed best in all classification datasets. Similarly, the result of ANFIS with grid partitioning method which is genfis1 shows that gaussian membership function have performed best and achieved best RMSE as compared to three other shapes i.e., trapezoidal, bell, and triangular because Gaussian membership function draws smooth curve which represents the data points effectively with minute differences. Therefore, as a conclusion, this experiment shows that the Gaussian membership function is best option to employ with ANFIS model to solve classification problems. was compared with other three ANFIS models, genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean), respectively. In this experiment, 6 benchmark classification datasets were employed on the ANFIS models. The performance measurement criteria used for evaluation were MSE, percentage of accuracy, number of trainable parameters and number of epochs. The summary of the simulation based on performance of modified ANFIS model and genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean), respectively, is shown in Table 4.
According to the Table 4, it is clearly demonstrated that the proposed modified ANFIS model with ABC optimization algorithm outperformed than standard ANFIS models genfis1 with grid partitioning method while solving classification problems. From the results, it can be concluded that the modified ANFIS model shows better MSE and reasonable percentage of accuracy by training the less number of trainable parameters with equal number of epochs as compare to standard ANFIS model. Hence, removing the fourth layer and reducing the parameters for training not only solved the issue of computational complexity of standard ANFIS model but it is also can be considered as a perfect solution to save the training cost of the model.

Conclusion and future work
Based on the results of experiments, this section summarized that as ANFIS accuracy highly depends on the shape and type of membership function. To achieve this, ANFIS with three partitioning method (i) grid-partitioning (genfis1), (ii) subtractive clustering (genfis2), and (iii) fuzzy c-mean clustering (genfis3) have been used to evaluate the membership functions. The simulation results clarify that gaussian membership function is best fit to employ with ANFIS model to solve classification problems. Moreover, as the standard ANFIS architecture holds five layers and uses gradient based learning that increases the complexity of ANFIS architecture when the number of inputs are large, thus, this research proposed a modified ANFIS architecture by reducing the fourth layer as well as reducing the parameters in consequent part, reducing the burden of training the parameters. Apart from that, one of the meta-heuristics optimization algorithm; artificial bee colony (ABC) algorithm have been used to train the parameters of ANFIS model instead of using gradient decent. The performance of proposed modified ANFIS model was compared with another three ANFIS models genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean). The result in Table 4.17 proves that that the designed modified ANFIS model can be implemented by researchers while solving classification problems with confidence.
Furthermore, for future work, more experiments can be performed to find out the appropriate membership function for standard ANFIS model and modified ANFIS model to deal with problems other than classification problems such as time series and clustering problems. This research focused on finding out the good membership function for ANFIS model for classification problems. Additionally, this study modified ANFIS model to make it less complex and instead of using typical two pass learning algorithms to train the parameters, this study applied one of the meta-heuristic approach which is ABC optimization algorithm. As metaheuristic approach provide huge options for optimization algorithms such as ant colony (AC), particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm (CSA), and genetic algorithm (GA) to train the ANFIS parameters. Hence, in future, other meta-heuristic algorithms can be applied in modified ANFIS model for comparison purpose to find out best meta-heuristic algorithm for the modified ANFIS model.