T-Operators used in fuzzy reasoning method.
Olive oil is an important agricultural food product. Especially, protected designation of origin (PDO) and protected geographic indications (PGI) are useful to protect the intellectual property rights of the consumers and producers. For this reason, the importance of the geographic classification increases to trace geographical indications. This chapter suggests a geographical classification system for the virgin olive oils. This system is formed on chemical parameters. These parameters include fuzziness. Novel proposed system constructs the rules by using fuzzy decision tree algorithm. It produces rules over fuzzy ID3 algorithm. It uses fuzzy entropy on the fuzzified data. The reasoning procedure depends on weighted rule-based system and is adapted into the fuzzy reasoning handled with different T-operators. Fuzzification is performed with fuzzy c-means algorithm for the olive oil data set. The cluster numbers of each variable are selected based on partition coefficient validity criteria. The model is examined by using different decision tree approaches (C4.5 and standard version fuzzy ID3 algorithm) and FID3 reasoning method with eight different T-operators. Also, the conclusions are supported by statistical analysis. Experimental results support that the weights have important manner on fuzzy reasoning method for the geographic classification system.
- fuzzy decision tree
- fuzzy rule
- geographic classification
- olive oil
Geographic indications are very important signs used on products. Their aim is to specify geographical origin of the product and follow the qualities. There are two kinds of geographical indications, protected designation of origin (PDO) and protected geographic indications (PGI). These indications are generally used for agricultural products. Olive oil has crucial manner among these agricultural food products. It is necessary to observe the properties of olive oil produced from different kinds of regions or different types of olive varieties. Geographical classification problem investigates the relationship among the chemical and sensorial parameters for each region.
Nowadays, machine learning discipline and chemical data structures come together with the information age. Machine learning is interested in the design and development of algorithms for computers. It aims to observe the relationships among the data structure and to make knowledge mining without assumptions. There are several machine learning algorithms to search the knowledge.
Decision trees as machine learning tasks, are most commonly used in machine learning discipline. There are several types of decision tree algorithms such as ID3, C4.5, CART, etc. Nowadays, fuzzy logic is adapted into decision tree algorithms to handle the uncertainty. The decision trees adapted with fuzzy logic are called as fuzzy decision tree [1, 2, 3]. It consists of nodes for testing attributes, edges for branching by test values of fuzzy sets, and leaves for deciding class according to class membership.
The chemical measurements have also uncertainty [4, 5, 6, 7, 8]. In this study, geographical classification problem uses chemical measurements. This study aims to propose an improved methodological approach for the classification of olive oil samples based on fuzzy ID3 classification approach.
This novel proposed system constructs the rules by using fuzzy decision tree algorithm. Its reasoning procedure is based on weighted rule-based system adapted into the fuzzy reasoning handled with different T-operators. The model is examined by using different decision tree approaches (C4.5 and standard version fuzzy ID3 algorithm) and FID3 reasoning method with eight different T-operators. This study is examined on 101 virgin olive oil samples collected from four different regions (North Aegean, South Aegean, Mediterranean, and South East) by using measurements of chemical parameters. Min-max normalization was applied into the dataset. The nonparametric methods were preferred for the statistical analysis because of the data structure. Leave-one-out procedure was performed in order to measure the performances of the algorithms. The Friedman aligned rank test and pairwise comparisons were performed to evaluate fuzzy reasoning method based on different T-operators. And, the comparison between unweighted and weighted fuzzy reasoning approaches was done. The rest of the paper is organized as follows: Section 2 presents the geographical classification problem definition and related works. The preliminaries such as fuzzification, fuzzy ID3 algorithm, and fuzzy rule-based classification system are given in Section 3. Experimental study on unweighted and weighted fuzzy rule-based approach to Geographic Classification of Virgin Olive Oil Using T-Operators is given in Section 4, and finally, the conclusion is represented in Section 5.
2. Geographic classification problem
Geographic classification problem aims to find the region for an unassigned olive oil sample. This problem comes to exist to support the traceability of denominated protected origin policy for olive oil samples. Especially, the definition of a methodology is an important issue for Turkey. In literature, it is seen that the scholars generally prefer to study on the classification of olive oils [9, 10]. Principal component analysis, linear discriminant, probabilistic neural networks, and classification binary tree were preferred techniques to evaluate the parameters [9, 10]. Back propagation artificial neural networks (BP-ANN) is also used to solve  this kind of problem. In , the adulteration in olive oil was defined by near-infrared spectroscopy and using chemometric techniques such as principal component analysis, partial least squares regression (PLS), and applied methods for data pretreatments such as signal detection correction. Principal component analysis and SIMCA classification model  are other methods to support the geographic classification problem given in Figure 1.
We briefly explain fuzzy logic and fuzzy c-means algorithm as fuzzification tool. Also, we review briefly fuzzy ID3 builder combined with fuzzy rule-based classification and its reasoning method. We give information about T-operators and we suggest fuzzy ID3 weighted reasoning method approach via different types of T-operators in subsections.
3.1. Fuzzy logic and fuzzy c-means algorithm as fuzzification tool
In 1965, fuzzy set theory was first proposed in . A fuzzy subset of the universe of discourse
and, is explained as the membership degree of the
Membership degrees are calculated according to the Eq. (4):
whereas optimal cluster number is determined by the calculation of . Each cluster number represents the number of fuzzy linguistic term for each fuzzy variable.
3.2. Fuzzy rule-based classification system (FRBCS)
Fuzzy rule-based classification system (FRBCS) is very useful for the solution of classification problems. In real life, they have been applied into the different kinds of problems, such as image processing , medical problems , etc.
There is a class from a preassigned class set to an object, which is a part of a certain feature space and a classifier is to realize an assignment for an appropriate class, (.
In general, the classifier includes a set of fuzzy rules. It can be a neural network, a decision tree, fuzzy decision tree etc. If the classifier produces a set of fuzzy rules, the system is called a fuzzy rule-based classification system (its acronym is FRBCS).
The antecedents of fuzzy rules defined by fuzzy variables provide computational flexibility. Using a set of training samples and a classifier solves a classification problem. The model provides the class of a new sample. The scheme of classification problem with fuzzy ID3 algorithm combined with fuzzy rule-based classification system is summarized in Figure 2 as follows.
In this study, it is seen that fuzzy interactive dichotomizer 3 (fuzzy ID3) algorithm is preferred as a classifier. This algorithm generate rules, fuzzy ID3 algorithm constructs a tree in learning process. Fuzzy entropy is applied to find the attributes, which has the maximum information whereas minimum uncertainty. Each path of the tree shows the rules. Each leaf node has rule weight (RW) for each class. represents
In literature, there are three definitions for fuzzy rules . In this study, the following type of rules is used for the experiments constructed from the fuzzy decision trees.
where is the certainty degree of the classification in the class for a pattern belonging to the fuzzy substance restricted by the fuzzy antecedent.
3.3. Fuzzy interactive dichotomizer 3
Fuzzy decision tree is the adaptation of decision tree structure with fuzzy logic. There are many types of decision tree algorithms, which are adapted with fuzzy logic to construct a fuzzy decision tree. A tree is generated and the decision rules are achieved by using each path from the root to the leaves of the tree. Fuzzy interactive dichotomizer 3 (Fuzzy ID3) defined in  is widely used as a classification tree builder algorithm. It is the adaptation of ID3 algorithm proposed by Quinlan in  with fuzzy logic. One of the important advantages is to deal with crisp and fuzzy variables defined by the user. This algorithm separates the data set according to a data attribute, which is selected by using a measure called as information gain based on fuzzy entropy. It seeks the attributes, which has the information with the highest degree of resolution.
Let a training set consists of
The induction process of fuzzy ID3 is given as follows:
After the fuzzy decision tree induction, the rules are generated from each branch. Each branch behaves as path. The rule is given as follows :
3.4. Fuzzy reasoning method based on T-operators
Fuzzy reasoning method (FRM) is defined as an inference procedure. This inference procedure aims to achieve an assignment from a set of fuzzy if then rules. It makes the combination between the information of the rules fires and the pattern to be classified. This ability of FRM supports the generalization capability of the classification system . We will analyze this idea in this section according to the following structure. In this section, the adaptation of the general model of fuzzy reasoning is represented with the classical FRM. After that, we talk about a general model of reasoning that involves different possibilities as reasoning methods, we suggest eight alternative FRMs as some particular new proposals, which are adapted with the general reasoning model. Finally, in the last section, we present the experiments carried out, displaying the advantageous behavior of the alternative proposed reasoning methods.
3.4.1. General model of fuzzy reasoning
Let be the
T-norm is used to find the intersection of two fuzzy sets A and B. The intersection of two fuzzy sets A and B is a fuzzy set C, written as , whose MF is related to those of A and B by
On the other hand, T-conorm is performed to achieve the union of two fuzzy sets A and B is a fuzzy set C, written as , whose membership function (MF) is related to those of A and B by
3.4.2. Fuzzy rule evaluation measures in data mining
There are two measures called as confidence and support in the field data mining to evaluate rules. Assume that fuzzy rule is defined as where . In [34, 35, 36, 37], fuzzy versions of two rule evaluation measures were explained as below:
Let us assume that
are given from
In literature [38, 39, 40], the compatibility grade of each training pattern with the antecedent is defined by the product operation as , where is the membership function of the antecedent fuzzy set .
The support measures the coverage of the training patterns by .
3.4.3. Heuristic methods for rule weight specification
While the determination of the consequent class, there are many ways to give weights to the rules [38, 39, 40]. In general, the consequent of the fuzzy rule in  is settled with the class who has the maximum confidence for the antecedent .
The confidence can be used as the rule weight of the fuzzy rule .
While a set of antecedent fuzzy sets is given for each attribute, the antecedent part of each fuzzy rule (i.e. ) is defined with the combination of antecedent fuzzy sets for n attributes. In , it is seen that the confidence is directly used for each class for the fuzzy rule with multiple consequent classes .
The steps are given below combined with FID3 reasoning based on T-operators:
where is the matching degree of the example with
where , , is the association degree of the pattern , to the class
4. Experimental study on fuzzy rule-based approach to geographic classification of virgin olive oil using T-operators
In this section, fuzzy rule-based approach to geographic classification of virgin olive oil problem is summarized. And, the solution is given step by step. Then, we describe the experimental study. Firstly, the description of the olive oil samples and the methodology used in chemical analyses of olive oil samples are explained in detail. Secondly, we explain performance measure and statistical tests. Fuzzy reasoning methods with nonparametric operators are examined. The behavior of fuzzy ID3 weighted fuzzy reasoning method based on different T-operators is observed. Then, the weighted and unweighted fuzzy reasoning methods based on different T-operators are compared.
4.1. Olive oil samples
Olives were collected from certain trees of the cultivars, which were determined subject matter of this work: Ayvalik, Memecik, Kilis Yaglik, and Nizip Yaglik. The samples collected in 2002–2003, 2004–2005, and 2005–2006 harvest seasons. About 101 olive oil samples  were used for the experimental study. These samples were collected from different regions [North Aegean (33), South Aegean (53), Mediterranean (4), and South East (11)]. The detail information about the chemical analysis of the samples was given in pioneer studies [27, 47, 48]. PCA was applied in SPSS 20.0, partition coefficients and fuzzy
4.2. Performance measure and statistical tests
In former study , principal component analysis is performed on this data set in order to explore the data structure. It is seen that the geographic origin of virgin olive oils on the results handled from the chemical analyses are explained clearly. Yet one region (Mediterranean) has less data than the other regions, so it is not explained. The data implementation is done in IBM SPSS 20. The chemical measurements have fuzziness. So, we prefer to use fuzzy ID3 algorithm based on fuzzy logic for the classification in our study. In classical case, ID3 algorithm works with categorical variables. It is an advantage of fuzzy ID3 algorithm. This algorithm carries out numerical variables via fuzzy variables. Each numeric variable is converted to fuzzy variable. Fuzzy
The performance results of nonparametric approaches given in Table 2 shows that the result handled from three nonparametric operators have the same performance value with handled from C4.5 algorithm. Yet, the accuracy handled with Zadeh T-operators is smaller value with 82.18%.
|Algorithms||Accuracy rate (%)|
|FuzzyID3_reasoning with Weighted Product Sum_Umano||86.14|
|FuzzyID3_ reasoning with Weighted T-Operators &||82.18|
|FuzzyID3_ reasoning with Weighted Product-Sum &||86.14|
|FuzzyID3_ reasoning with Weighted Non Parametric Hamacher () &||86.14|
|Algorithm||Rank||Friedman aligned ranks|
|Nonparametric Hamacher )||6.88||Test Statistic||76.396|
|Hamacher||6.80||Degrees of Freedom||8|
|Dubois||3.22||Asymptotic Sig. (2 sided test)||0.000|
The results of pairwise comparisons for weighted fuzzy ID3 reasoning based on different T-operators  with 20 different thresholds (range = 0.71-0.90) via adjusted significance values are given in Table 4.
|Weber||Zadeh||Yager||Hamacher||Nonparametric Hamacher ()||Product sum||Umano||Dubois|
|Non parametric Hamacher ()||1.000||0.000||0.004||1.000|
|Zadeh||Umano||Product-sum||Nonparametric Hamacher ()||Yager (p = 2)||Hamacher (p = 0.25)||Dombi (1)||Dubois (0.25)||Weber (15)|
On the other hand, accuracy rates handled for different thresholds within weighted fuzzy reasoning method based on different T-operators are given in Table 6. It is seen that Umano T-operators, Product-Sum T-operators, nonparametric Hamacher (), and Hamacher reached maxmimum accuracy rate for with 88.12%. While unweighted fuzzy reasoning based on Dombi T-operators () was handled maximum accuracy rate for with 88.11%, weighted fuzzy reasoning based on Dombi T-operators () reached 87.13% for .
|Zadeh||Umano||Product-sum||Nonparametric Hamacher ()||Yager (p = 2)||Hamacher (.25)||Dombi (1)||Dubois (0.25)||Weber (15)|
The comparison of the performances between weighted and unweighted fuzzy reasoning based on different t-operators is done for each T-operator with Wilcoxon Signed Rank Test. It is seen that the performances of unweighted and weighted fuzzy reasoning based on Zadeh T-operators (p < 0.001), Yager T-operators (p < 0.001), Dombi T-operators (p < 0.001), Dubois T-operators (p < 0.05), and Weber T-operators (p < 0.001) are significantly different.
If the average is taken for the performances of the T-operators with 20 different thresholds (range = 0.71–0.90), Hamacher () has the maximum value with 85.59% for unweighted fuzzy reasoning approach and Weber () has the maximum value with 81.74% for weighted fuzzy reasoning approach.
Geographical classification of olive oil is an important topic. This topic has crucial manner for the human health from past to present. In addition, this topic is the main topic for the traceability of designation of origin olive oil. In pioneer study, we were interested in geographic classification system of olive oil. In accordance of this paper, chemical measurements were used for the experimental study. Chemical measurements contain imprecise information. In order to deal with imprecise information, fuzzy ID3 classifier was selected for the classification of olive oil samples. In addition, fuzzy ID3 reasoning method based on T-operators has been suggested. We made the experiments for the performances of proposed fuzzy reasoning method in order to solve geographic classification problem. In this paper, we propose weighted fuzzy reasoning approach based T-operators. Three nonparametric operators [Product-Sum_Umano, Product-Sum, and Nonparametric Hamacher ()] have the same performance value with handled from C4.5 algorithm. Yet, the accuracy handled with Zadeh T-operators is smaller value with 82.18%. Then, we have checked the performance of parametric operators. Statistical procedure was performed in order to detect statistical differences among a group of results for 20 threshold () values. It is observed that there are significant differences among the results between unweighted and weighted fuzzy reasoning based approaches. It is seen that weighted fuzzy reasoning approach based on Umano T-operators, Product-Sum T-operators, Nonparametric Hamacher (), and Hamacher reached maxmimum accuracy rate for with 88.12%. So, we claim that by using different parameters and weights for each rule, we can handle better reasoning performances.
The authors would like to thank Erden Kantarcı for his valuable support and Mrs. Ummuhan Tibet and Dr. Aytac Gumuskesen for allowing us to use the data set.