T-Operators used in fuzzy reasoning method.
Olive oil is an important agricultural food product. Especially, protected designation of origin (PDO) and protected geographic indications (PGI) are useful to protect the intellectual property rights of the consumers and producers. For this reason, the importance of the geographic classification increases to trace geographical indications. This chapter suggests a geographical classification system for the virgin olive oils. This system is formed on chemical parameters. These parameters include fuzziness. Novel proposed system constructs the rules by using fuzzy decision tree algorithm. It produces rules over fuzzy ID3 algorithm. It uses fuzzy entropy on the fuzzified data. The reasoning procedure depends on weighted rule-based system and is adapted into the fuzzy reasoning handled with different T-operators. Fuzzification is performed with fuzzy c-means algorithm for the olive oil data set. The cluster numbers of each variable are selected based on partition coefficient validity criteria. The model is examined by using different decision tree approaches (C4.5 and standard version fuzzy ID3 algorithm) and FID3 reasoning method with eight different T-operators. Also, the conclusions are supported by statistical analysis. Experimental results support that the weights have important manner on fuzzy reasoning method for the geographic classification system.
- fuzzy decision tree
- fuzzy rule
- geographic classification
- olive oil
Geographic indications are very important signs used on products. Their aim is to specify geographical origin of the product and follow the qualities. There are two kinds of geographical indications, protected designation of origin (PDO) and protected geographic indications (PGI). These indications are generally used for agricultural products. Olive oil has crucial manner among these agricultural food products. It is necessary to observe the properties of olive oil produced from different kinds of regions or different types of olive varieties. Geographical classification problem investigates the relationship among the chemical and sensorial parameters for each region.
Nowadays, machine learning discipline and chemical data structures come together with the information age. Machine learning is interested in the design and development of algorithms for computers. It aims to observe the relationships among the data structure and to make knowledge mining without assumptions. There are several machine learning algorithms to search the knowledge.
Decision trees as machine learning tasks, are most commonly used in machine learning discipline. There are several types of decision tree algorithms such as ID3, C4.5, CART, etc. Nowadays, fuzzy logic is adapted into decision tree algorithms to handle the uncertainty. The decision trees adapted with fuzzy logic are called as fuzzy decision tree [1, 2, 3]. It consists of nodes for testing attributes, edges for branching by test values of fuzzy sets, and leaves for deciding class according to class membership.
The chemical measurements have also uncertainty [4, 5, 6, 7, 8]. In this study, geographical classification problem uses chemical measurements. This study aims to propose an improved methodological approach for the classification of olive oil samples based on fuzzy ID3 classification approach.
This novel proposed system constructs the rules by using fuzzy decision tree algorithm. Its reasoning procedure is based on weighted rule-based system adapted into the fuzzy reasoning handled with different T-operators. The model is examined by using different decision tree approaches (C4.5 and standard version fuzzy ID3 algorithm) and FID3 reasoning method with eight different T-operators. This study is examined on 101 virgin olive oil samples collected from four different regions (North Aegean, South Aegean, Mediterranean, and South East) by using measurements of chemical parameters. Min-max normalization was applied into the dataset. The nonparametric methods were preferred for the statistical analysis because of the data structure. Leave-one-out procedure was performed in order to measure the performances of the algorithms. The Friedman aligned rank test and pairwise comparisons were performed to evaluate fuzzy reasoning method based on different T-operators. And, the comparison between unweighted and weighted fuzzy reasoning approaches was done. The rest of the paper is organized as follows: Section 2 presents the geographical classification problem definition and related works. The preliminaries such as fuzzification, fuzzy ID3 algorithm, and fuzzy rule-based classification system are given in Section 3. Experimental study on unweighted and weighted fuzzy rule-based approach to Geographic Classification of Virgin Olive Oil Using T-Operators is given in Section 4, and finally, the conclusion is represented in Section 5.
2. Geographic classification problem
Geographic classification problem aims to find the region for an unassigned olive oil sample. This problem comes to exist to support the traceability of denominated protected origin policy for olive oil samples. Especially, the definition of a methodology is an important issue for Turkey. In literature, it is seen that the scholars generally prefer to study on the classification of olive oils [9, 10]. Principal component analysis, linear discriminant, probabilistic neural networks, and classification binary tree were preferred techniques to evaluate the parameters [9, 10]. Back propagation artificial neural networks (BP-ANN) is also used to solve  this kind of problem. In , the adulteration in olive oil was defined by near-infrared spectroscopy and using chemometric techniques such as principal component analysis, partial least squares regression (PLS), and applied methods for data pretreatments such as signal detection correction. Principal component analysis and SIMCA classification model  are other methods to support the geographic classification problem given in Figure 1.
We briefly explain fuzzy logic and fuzzy c-means algorithm as fuzzification tool. Also, we review briefly fuzzy ID3 builder combined with fuzzy rule-based classification and its reasoning method. We give information about T-operators and we suggest fuzzy ID3 weighted reasoning method approach via different types of T-operators in subsections.
3.1. Fuzzy logic and fuzzy c-means algorithm as fuzzification tool
In 1965, fuzzy set theory was first proposed in . A fuzzy subset of the universe of discourse U is described by a membership function , which represents the degree to which belongs to the set v. Each value defines by a membership degree. The transformation process into membership degrees for each term of fuzzy variables is called as fuzzification. In literature, there are many types of membership functions, triangular membership functions, trapezoidal membership functions, Gaussian membership functions, etc. . In general, triangular membership functions are preferred. Otherwise, fuzzy c-means (FCM) algorithm, which was suggested in  and it was improved in , can be used for the transformation of membership degrees for each term of fuzzy variables. This algorithm is a kind of clustering algorithm. This clustering algorithm aims to reach a fuzzy C partition matrix U. The objective function is minimized as follows for fuzzy partition (Eq. (1)):
and, is explained as the membership degree of the kth data point in ith class. Dimensionality of the data space is indicated by ‘p’. The parameter demonstrates sharpness of the fuzzification process. In Eq. (2), indicates any distance measure (usually the Euclidean distance) between data point and cluster center in p dimensional space. Then, displays cluster center. Eq. (3) calculates each of the clusters centers for each class:
Membership degrees are calculated according to the Eq. (4):
whereas optimal cluster number is determined by the calculation of . Each cluster number represents the number of fuzzy linguistic term for each fuzzy variable.
3.2. Fuzzy rule-based classification system (FRBCS)
Fuzzy rule-based classification system (FRBCS) is very useful for the solution of classification problems. In real life, they have been applied into the different kinds of problems, such as image processing , medical problems , etc.
There is a class from a preassigned class set to an object, which is a part of a certain feature space and a classifier is to realize an assignment for an appropriate class, (.
In general, the classifier includes a set of fuzzy rules. It can be a neural network, a decision tree, fuzzy decision tree etc. If the classifier produces a set of fuzzy rules, the system is called a fuzzy rule-based classification system (its acronym is FRBCS).
The antecedents of fuzzy rules defined by fuzzy variables provide computational flexibility. Using a set of training samples and a classifier solves a classification problem. The model provides the class of a new sample. The scheme of classification problem with fuzzy ID3 algorithm combined with fuzzy rule-based classification system is summarized in Figure 2 as follows.
In this study, it is seen that fuzzy interactive dichotomizer 3 (fuzzy ID3) algorithm is preferred as a classifier. This algorithm generate rules, fuzzy ID3 algorithm constructs a tree in learning process. Fuzzy entropy is applied to find the attributes, which has the maximum information whereas minimum uncertainty. Each path of the tree shows the rules. Each leaf node has rule weight (RW) for each class. represents jth rule’s weight handled from fuzzy confidence value which equals to . After the rules induction, fuzzy rule-based reasoning is performed to handle the classification task.
In literature, there are three definitions for fuzzy rules . In this study, the following type of rules is used for the experiments constructed from the fuzzy decision trees.
Fuzzy rules with a class and a certainty degree in the consequent .
where is the certainty degree of the classification in the class for a pattern belonging to the fuzzy substance restricted by the fuzzy antecedent.
3.3. Fuzzy interactive dichotomizer 3
Fuzzy decision tree is the adaptation of decision tree structure with fuzzy logic. There are many types of decision tree algorithms, which are adapted with fuzzy logic to construct a fuzzy decision tree. A tree is generated and the decision rules are achieved by using each path from the root to the leaves of the tree. Fuzzy interactive dichotomizer 3 (Fuzzy ID3) defined in  is widely used as a classification tree builder algorithm. It is the adaptation of ID3 algorithm proposed by Quinlan in  with fuzzy logic. One of the important advantages is to deal with crisp and fuzzy variables defined by the user. This algorithm separates the data set according to a data attribute, which is selected by using a measure called as information gain based on fuzzy entropy. It seeks the attributes, which has the information with the highest degree of resolution.
Let a training set consists of p samples, be the sample of the training set where is the value of the attribute of the training sample. Each sample belongs to a class shown as , where m is the number of classes of the problem . Assume there are N labeled fuzzified patterns and n attributes . For each k assume that . The attribute takes values of fuzzy subsets . C denotes the classification target attribute, taking m values . The symbol is used to denote the cardinality of a given fuzzy set, that is, the sum of the membership values of the fuzzy set [2, 26].
The induction process of fuzzy ID3 is given as follows:
Step 1: Produce a root node, which contains a set of all data. Each data is fuzzified, and each membership degree equals to 1 for all data for the initialization.
Step 2: The attribute for each internal node is selected by using the following steps:
Step 2a: Compute its relative frequencies with respect to class for each linguistic label ,
Step 2b: Compute its fuzzy classification entropy for each linguistic label :
Step 2c: Compute the average fuzzy classification entropy () of each attribute.
Step 2d: Select the attribute (Attr) that maximizes the gain information () .
Step 2e: Assign the selected attribute as the root node and the linguistic labels as candidate branches of the tree.
Step 3: Pick out one branch to analyze. Remove the branch if it is containing nothing. If the branch is nonentity, calculate the relative frequencies via (Eq. (6)) of all objects within the branch into each class. If the relative frequency of each class is above the given threshold or all the attributes have been expanded for this branch, stop the branch as a leaf. Otherwise, select the attribute from among those, which have not been extended yet in this branch with the smallest average fuzzy classification entropy (Eq. (9)) as a new decision node for the branch and add its linguistic labels as candidates branches to analyze. At each leaf, each class will have its relative frequency .
Step 4: Repeat Step 3 while there are branches to analyze. If there are no candidate branches then the decision tree is totaled .
The rule structure generated from each branch of the fuzzy decision tree.
After the fuzzy decision tree induction, the rules are generated from each branch. Each branch behaves as path. The rule is given as follows :
Rule : If is and … and is then with , where is the label of the jth rule. is an n-dimensional pattern vector. This vector is used to represent the example. is a fuzzy set. is the class label, and is the rule weight. In fuzzy decision tree, at each leaf node has rule weights. These rule weights are founded via the relative frequency for each class (as given in Step 3) .
3.4. Fuzzy reasoning method based on T-operators
Fuzzy reasoning method (FRM) is defined as an inference procedure. This inference procedure aims to achieve an assignment from a set of fuzzy if then rules. It makes the combination between the information of the rules fires and the pattern to be classified. This ability of FRM supports the generalization capability of the classification system . We will analyze this idea in this section according to the following structure. In this section, the adaptation of the general model of fuzzy reasoning is represented with the classical FRM. After that, we talk about a general model of reasoning that involves different possibilities as reasoning methods, we suggest eight alternative FRMs as some particular new proposals, which are adapted with the general reasoning model. Finally, in the last section, we present the experiments carried out, displaying the advantageous behavior of the alternative proposed reasoning methods.
3.4.1. General model of fuzzy reasoning
Let be the pth example of the training set, which is composed of P examples, where is the value of the ith attribute of the pth sample. Each example belongs to class , where m is the number of classes of the problem. It is assumed that is a novel example to be classified FID3 reasoning procedure given in . Fuzzy reasoning method for FARC-HD in  is summarized in four steps. In our approach, fuzzy ID3 reasoning method is combined with T-operators. T-operators were developed from the triangular inequalities [29, 30]. The combination of fuzzy set theory and T-operators are used to intersect and reunite two fuzzy sets [31, 32]. There are different types of T-operators, which are also called T-norms and T-conorms in literature . These operators are used in different types of problems . T-operators are two placed functions from to that are monotonic, commutative, and associative .
T-norm is used to find the intersection of two fuzzy sets A and B. The intersection of two fuzzy sets A and B is a fuzzy set C, written as , whose MF is related to those of A and B by
On the other hand, T-conorm is performed to achieve the union of two fuzzy sets A and B is a fuzzy set C, written as , whose membership function (MF) is related to those of A and B by
3.4.2. Fuzzy rule evaluation measures in data mining
There are two measures called as confidence and support in the field data mining to evaluate rules. Assume that fuzzy rule is defined as where . In [34, 35, 36, 37], fuzzy versions of two rule evaluation measures were explained as below:
Let us assume that m labeled patterns,
are given from M classes for an n-dimensional pattern classification problem.
In literature [38, 39, 40], the compatibility grade of each training pattern with the antecedent is defined by the product operation as , where is the membership function of the antecedent fuzzy set .
The support measures the coverage of the training patterns by .
3.4.3. Heuristic methods for rule weight specification
While the determination of the consequent class, there are many ways to give weights to the rules [38, 39, 40]. In general, the consequent of the fuzzy rule in  is settled with the class who has the maximum confidence for the antecedent .
The confidence can be used as the rule weight of the fuzzy rule .
While a set of antecedent fuzzy sets is given for each attribute, the antecedent part of each fuzzy rule (i.e. ) is defined with the combination of antecedent fuzzy sets for n attributes. In , it is seen that the confidence is directly used for each class for the fuzzy rule with multiple consequent classes .
The adaptation of generalized model with weighted fuzzy reasoning based on T-operators.
The steps are given below combined with FID3 reasoning based on T-operators:
Step 1: Antecedent degree of a rule: In this step, the strength of activation of the if-part for all rules handled from each path of the fuzzy decision tree in the RB with the pattern is computed
where is the matching degree of the example with ith antecedent of the rule , which is handled from a leaf node at the end of each path. T is a T-norm (listed in Table 1) and is the number of antecedents of the rule.
Step 2: Consequent degree for a class: The consequent degree favor of class l by the rule for the pattern is computed as follows where the weight is computed according to the multiple consequent classes (Eq. (16))
Step 3: Confidence degree for a class: In this stage, the confidence degree for the class l according to all rules in RB is computed. To obtain the confidence degree of a class, the association degrees of the rules of that class are aggregated by using conjunction operators, where T* is a T-conorm (listed in Table 1) [2, 27].
where , , is the association degree of the pattern , to the class l, according to the .th rule.
4. Experimental study on fuzzy rule-based approach to geographic classification of virgin olive oil using T-operators
In this section, fuzzy rule-based approach to geographic classification of virgin olive oil problem is summarized. And, the solution is given step by step. Then, we describe the experimental study. Firstly, the description of the olive oil samples and the methodology used in chemical analyses of olive oil samples are explained in detail. Secondly, we explain performance measure and statistical tests. Fuzzy reasoning methods with nonparametric operators are examined. The behavior of fuzzy ID3 weighted fuzzy reasoning method based on different T-operators is observed. Then, the weighted and unweighted fuzzy reasoning methods based on different T-operators are compared.
4.1. Olive oil samples
Olives were collected from certain trees of the cultivars, which were determined subject matter of this work: Ayvalik, Memecik, Kilis Yaglik, and Nizip Yaglik. The samples collected in 2002–2003, 2004–2005, and 2005–2006 harvest seasons. About 101 olive oil samples  were used for the experimental study. These samples were collected from different regions [North Aegean (33), South Aegean (53), Mediterranean (4), and South East (11)]. The detail information about the chemical analysis of the samples was given in pioneer studies [27, 47, 48]. PCA was applied in SPSS 20.0, partition coefficients and fuzzy c-means algorithm were handled in MATLAB 2015. The software is designed named as OliveDeSoft in the Visual C# for the experimental study (intel i7, 2.4 GHz, 4 Gb RAM) . The data fuzzification process was applied by using fuzzy c-means (FCM). Partition coefficient determined the number of clusters [19, 20]. The calculated partition coefficient value for each cluster is given in former study .
4.2. Performance measure and statistical tests
In former study , principal component analysis is performed on this data set in order to explore the data structure. It is seen that the geographic origin of virgin olive oils on the results handled from the chemical analyses are explained clearly. Yet one region (Mediterranean) has less data than the other regions, so it is not explained. The data implementation is done in IBM SPSS 20. The chemical measurements have fuzziness. So, we prefer to use fuzzy ID3 algorithm based on fuzzy logic for the classification in our study. In classical case, ID3 algorithm works with categorical variables. It is an advantage of fuzzy ID3 algorithm. This algorithm carries out numerical variables via fuzzy variables. Each numeric variable is converted to fuzzy variable. Fuzzy c-means algorithm is performed for the fuzzification. This proposed approach displays eight different T-operators in the reasoning procedure. The performances of standard fuzzy ID3 represented in [2, 27] and C4.5  algorithms are examined in the experimental study. Leave one out validation procedure was performed for the performances measurement of the algorithms. Accuracy rate is preferred to test different methods . In experimental study, threshold value for fuzzy decision tree is set to . Parameters of parametric operators are fixed as Yager p = 2, Hamacher p = 0.25, Dombi = 1, Dubois = 0.25, and Weber = 15 for fuzzy reasoning procedure. The comparison of the performances of unweighted and weighted fuzzy reasoning approaches is performed.
Studying fuzzy reasoning method with nonparametric operators: C4.5 algorithm also uses entropy as splitting criteria. It is the improved version of ID3 algorithm. It was presented by Quinlan in 1994 to work on the numerical data . The performance of it is 86.14%. Then, it is observed that the performance of fuzzy ID3 algorithm with reasoning method in  is 86.14% too .
The performance results of nonparametric approaches given in Table 2 shows that the result handled from three nonparametric operators have the same performance value with handled from C4.5 algorithm. Yet, the accuracy handled with Zadeh T-operators is smaller value with 82.18%.
|Algorithms||Accuracy rate (%)|
|FuzzyID3_reasoning with Weighted Product Sum_Umano||86.14|
|FuzzyID3_ reasoning with Weighted T-Operators &||82.18|
|FuzzyID3_ reasoning with Weighted Product-Sum &||86.14|
|FuzzyID3_ reasoning with Weighted Non Parametric Hamacher () &||86.14|
Study of the behavior of fuzzy ID3 weighted fuzzy reasoning method based on different T-operators: We have made use of the Friedman aligned ranks as a nonparametric statistical procedure to discover statistical differences among a group of results for 20 threshold () values in Table 3.
|Algorithm||Rank||Friedman aligned ranks|
|Nonparametric Hamacher )||6.88||Test Statistic||76.396|
|Hamacher||6.80||Degrees of Freedom||8|
|Dubois||3.22||Asymptotic Sig. (2 sided test)||0.000|
The results of pairwise comparisons for weighted fuzzy ID3 reasoning based on different T-operators  with 20 different thresholds (range = 0.71-0.90) via adjusted significance values are given in Table 4.
|Weber||Zadeh||Yager||Hamacher||Nonparametric Hamacher ()||Product sum||Umano||Dubois|
|Non parametric Hamacher ()||1.000||0.000||0.004||1.000|
Friedman aligned ranks test shows that p-value is equal to zero. It means that there are significant differences among the results. Then, the pairwise comparisons are performed. The results are shown in Table 4. These nonparametric tests were performed in IBM SPSS 20.
The comparison of the weighted and unweighted fuzzy reasoning methods based on different T-operators: Accuracy rates handled for different thresholds within unweighted fuzzy reasoning method based on different T-operators are given in Table 5. It is seen that maximum value has Dombi T-operators handled for with 88.11%. As a result, it is observed that we can also reach better results by using different threshold values.
|Zadeh||Umano||Product-sum||Nonparametric Hamacher ()||Yager (p = 2)||Hamacher (p = 0.25)||Dombi (1)||Dubois (0.25)||Weber (15)|
On the other hand, accuracy rates handled for different thresholds within weighted fuzzy reasoning method based on different T-operators are given in Table 6. It is seen that Umano T-operators, Product-Sum T-operators, nonparametric Hamacher (), and Hamacher reached maxmimum accuracy rate for with 88.12%. While unweighted fuzzy reasoning based on Dombi T-operators () was handled maximum accuracy rate for with 88.11%, weighted fuzzy reasoning based on Dombi T-operators () reached 87.13% for .
|Zadeh||Umano||Product-sum||Nonparametric Hamacher ()||Yager (p = 2)||Hamacher (.25)||Dombi (1)||Dubois (0.25)||Weber (15)|
The comparison of the performances between weighted and unweighted fuzzy reasoning based on different t-operators is done for each T-operator with Wilcoxon Signed Rank Test. It is seen that the performances of unweighted and weighted fuzzy reasoning based on Zadeh T-operators (p < 0.001), Yager T-operators (p < 0.001), Dombi T-operators (p < 0.001), Dubois T-operators (p < 0.05), and Weber T-operators (p < 0.001) are significantly different.
If the average is taken for the performances of the T-operators with 20 different thresholds (range = 0.71–0.90), Hamacher () has the maximum value with 85.59% for unweighted fuzzy reasoning approach and Weber () has the maximum value with 81.74% for weighted fuzzy reasoning approach.
Geographical classification of olive oil is an important topic. This topic has crucial manner for the human health from past to present. In addition, this topic is the main topic for the traceability of designation of origin olive oil. In pioneer study, we were interested in geographic classification system of olive oil. In accordance of this paper, chemical measurements were used for the experimental study. Chemical measurements contain imprecise information. In order to deal with imprecise information, fuzzy ID3 classifier was selected for the classification of olive oil samples. In addition, fuzzy ID3 reasoning method based on T-operators has been suggested. We made the experiments for the performances of proposed fuzzy reasoning method in order to solve geographic classification problem. In this paper, we propose weighted fuzzy reasoning approach based T-operators. Three nonparametric operators [Product-Sum_Umano, Product-Sum, and Nonparametric Hamacher ()] have the same performance value with handled from C4.5 algorithm. Yet, the accuracy handled with Zadeh T-operators is smaller value with 82.18%. Then, we have checked the performance of parametric operators. Statistical procedure was performed in order to detect statistical differences among a group of results for 20 threshold () values. It is observed that there are significant differences among the results between unweighted and weighted fuzzy reasoning based approaches. It is seen that weighted fuzzy reasoning approach based on Umano T-operators, Product-Sum T-operators, Nonparametric Hamacher (), and Hamacher reached maxmimum accuracy rate for with 88.12%. So, we claim that by using different parameters and weights for each rule, we can handle better reasoning performances.
The authors would like to thank Erden Kantarcı for his valuable support and Mrs. Ummuhan Tibet and Dr. Aytac Gumuskesen for allowing us to use the data set.