Open access peer-reviewed chapter

Revealing Interesting If-Then Rules

Written By

Abraham Meidan

Submitted: 01 March 2023 Reviewed: 17 March 2023 Published: 05 December 2023

DOI: 10.5772/intechopen.111376

From the Edited Volume

Research Advances in Data Mining Techniques and Applications

Edited by Yves Rybarczyk

Chapter metrics overview

220 Chapter Downloads

View Full Metrics

Abstract

One of the challenges of data mining is revealing the interesting if-then rules in the mined dataset. We present two methods for the automated discovery of unexpected, and, thereby, interesting rules within the set of all the if-then rules previously revealed in the dataset. The first method calculates, for each rule, the probability that the rule exists accidentally. The lower this probability, the more unexpected the rule is. The second method calculates the conditional probability of the event described by the rule, given the relevant more basic rules. Once again, the lower this conditional probability, the more unexpected the rule is. These two methods are independent and can be combined.

Keywords

  • data mining
  • if-then rules
  • interesting rules
  • unexpected rules
  • probability

1. Introduction

Several data mining algorithms reveal if-then rules in the mined data [1]. However, in many cases, the size of the analyzed data is enormous [2], and as a result, the number of rules is so big that the user cannot practically review all of them manually. Consequently, the user would like to have a method of accessing the most interesting rules among the discovered ones.

What makes a rule interesting? Several papers in data mining literature have addressed this issue. Two different interpretations of the concept of “interesting rules” can be found in this literature. Some papers, such as [3], refer to “interesting rules” in the sense of “rules that are best or optimal.” Other papers refer to another meaning of “interesting rules,” according to which an interesting rule satisfies the user’s curiosity. In this paper, we refer to this second meaning.

One can filter the non-interesting rules using the support and confidence levels [2, 4, 5]. The support level of an if-then rule denotes the number of records where the rule holds (both the rule’s conditions and conclusion are true) relative to the total number of records in the dataset. The user may determine the threshold of the support level, implying that rules having a support level below this threshold are not interesting. The confidence level designates the frequency of cases where both the rule’s conditions and conclusion hold relative to the cases where the rule’s conditions hold with or without the conclusion. (In other words, the confidence level denotes the rule probability. For example, If the value in field A is a, and value in field B is b, then there is an 80% probability that the value in field C is c). Obviously, a rule is interesting only if the confidence level is significantly higher than the primary frequency of the conclusion in the dataset. Once again, the user may determine a threshold implying that rules having a confidence level below this threshold are not interesting. However, this method alone cannot guarantee to reveal all the interesting rules. When either the support level and/or the confidence level are too high, interesting rules may be missed, and when they are too low, non-interesting rules may be revealed.

Following this line, it was suggested that other means should be used on top of the support level and confidence level to reveal the interesting rules. One suggestion was to measure how unexpected each rule is. The idea that unexpected phenomena are interesting was suggested already in the ancient world. In the twentieth century, it was mentioned in the Philosophy of Science by Agassi [6]. In the field of data mining, it was presented by Silberschatz and Tuzhilin [7, 8]. According to this view, users classify a rule as interesting when it is inconsistent with their expectations. In other words, rules are interesting when they are unexpected.

How can the level of a rule’s unexpectedness be measured? Liu and Hsu [9] and Liu, Hsu, and Chen [10] suggest measuring the extent of the rule’s unexpectedness by comparing the rule with a predefined set of expectations. At the preliminary stage, the users enter their expectations, and based on this set of expectations, the program calculates an evaluation of unexpectedness for each rule. Such an approach can be used when the user can formulate all the expected rules, but it is not easy. Moreover, since some of the discovered rules might result from noise, and obviously, the user does not expect these rules, they will be presented as unexpected and, thereby, interesting. Sahar [11] suggests a process where the system selects a few rules, the user marks the rules that are not interesting, and the program then automatically eliminates the other associated rules. Sahar reports that “by little over five iterations, this process often reduces the list of rules to half its original size.” However, in many cases, such a reduction is not sufficient. When the list of rules contains 10,000 rules, reducing the list to 5000 might not satisfy the user. They might instead prefer sorting the rules by their level of interestingness and reviewing just the 10 or 100 most interesting ones. Wesley Romão, Alex A. Freitas, and Itana M.de S. Gimenes [12] suggest using a genetic algorithm for revealing surprising if-then rules based on user-defined general impressions (subjective knowledge). A similar approach was proposed by Jyoti Vashishtha, Dharminder Kumar, Saroj Ratnoo, and Kapila Kundu [13].

In this paper, we suggest two additional methods for revealing the unexpected and, thereby interesting rules. Contrary to the previous methods, our methods reveal the unexpected rules automatically, that is, unassisted by the user. Moreover, these methods sort the rules by their level of unexpectedness and, thereby, interestingness.

The first method for measuring the degree of unexpectedness rests on the assumption that when the users have no preliminary knowledge about the relations among the fields and their values in the data, their “natural” expectation is that the relations are accidental. By calculating the probability that a given rule is accidental, we can therefore measure the extent to which the rule is unexpected. The logic underlying this idea is similar to that underlying the calculation of the significance level in classical statistical tests, such as t-test or F-test. These tests refer to the concept of α, which denotes the probability that the phenomenon is accidental. The lower this probability, the more significant the phenomenon is. Following the same line of thought, we calculate the probability that a given rule is accidental. The lower this probability, the more unexpected and, thereby, more interesting the rule is. In other words, we assume that the user expects accidental relations between the fields and their values. Each rule refutes this expectation. The level of the refutation can be measured by calculating the probability that the rule is accidental. The lower this probability, the more significant and interesting the rule is.

The second method for measuring the degree of unexpectedness rests on the assumption that the set of one-condition rules can be considered the user’s basic expectations. Given the one-condition rules, the conditional probability of the rules having more than one condition can then be calculated. The lower this conditional probability, the more unexpected the rule is. For example, suppose that the following three rules were discovered in the data: (1) If A then R, (2) If B then R, and (3) If A and B then NOT R. Rule (3) can be considered as unexpected relative to rules (1) and (2). Assuming that the user already knows rules (1) and (2), being one-condition rules, rule (3) turns out to be unexpected and thereby interesting. This approach can be elaborated by considering not just the one-condition rules as the basic expectations. Rather the conditional probability of any rules having more than one condition can be calculated given any rules, the conditions of which are a subset of the conditions of the rule under discussion. Thus a three-condition rule can be compared against the relevant one-condition and two-condition rules.

A similar approach was suggested by Suzuki [14]. He analyzed rule pairs such as “If A then B” and “If A and C then not B.” However, this approach does not calculate the conditional probability of the unexpected rule given the one-condition rules and is limited to pairs of if-then rules where the conclusions are inconsistent. Our approach covers the abovementioned rules pairs as a private case.

In what follows, we will present the algorithms for calculating the level of the unexpectedness of the rules according to these two methods.

Advertisement

2. If-then rules

There are several algorithms for revealing if-then rules in the data [2, 3]. For the sake of simplicity and without loss of generality, we will limit the discussion to association rules that are related to one Boolean field in the “then” part (the dependent variable).

Consider a dataset containing n+1 fields, among which one field is selected as the dependent variable, y. The remaining n fields x1,,xn are considered as input data used for explaining the selected field y. Consider the most frequent case when field y is Boolean. Without loss of generality we assume y01.

Let Ii=ai1aimi be the set of codes of values of variable (field) xi. For example, if field xi is quantitative, aij denotes the code of the interval of change of values of variable xi. A single condition (1-condition) is a condition of the following type: xi=aij,j1mi. A composite condition is a conjunction of q single conditions, q=2,,n. We will call a composite condition consisting of q single conditions q-condition. Thus, q-condition is a condition of the type: xi1=ai1j1xiq=aiqjq.

An if-then rule is the following statement:

Iftheqconditionthenyisy1.E1

Rule’s probability (confidence level) = p.

Rule’s support level = s,

where y1 belongs to the set of values of field y, that is, y1=1 or y1=0.

The number of records, s, at which both the rule’s condition (the q-condition) and the rule’s conclusion (y is y1) are fulfilled is called the rule’s support level. (Contrary to other definitions of “support level” we refer to the number of records satisfying the rule rather than the percentage of records out of the total number of records in the dataset.) The rule’s probability, p, is the ratio of the rule’s support to the number of records satisfying the rule’s condition.

Let pa be the a priori probability that y=1. pa=MN, where N is the total number of records in the mined dataset, and M is the number of records at which y=1. Rules of type (1) can be interesting only if their probability significantly deviates from pa. Formally, only the rules of type (1) in which pp¯1 or pp¯0 can be interesting, where p¯1>pa and p¯0<pa are the predetermined values of a probability that y is y1. Here we suppose that all the rules of type (1) are represented such that they have y is 1 in the “then” part. (Note that a rule having probability pp¯0 that y is y1 is equivalent to the rule containing the same condition in the “if” part and having a probability 1p1p¯0 that y is 0.) Moreover, the support level of an interesting rule must be not less than the pre-given value smin, that is, ssmin.

Assume that all the rules satisfying the abovementioned requirements have already been found. Then, a new problem unexpectedly arises. It turns out that the number of such rules can be so huge that it is quite impractical for the user to review all of them in order to select the interesting ones. The user can decrease the number of revealed rules by raising the minimum rule’s support level, smin and/or the minimum rule’s probability (confidence level) p¯1, or by decreasing the value p¯0, but this can lead to missing rules having a low support level or a probability, not strongly deviating from pa, which are nevertheless interesting.

Advertisement

3. Calculating the probability that a rule is accidental

This section will present the first method for measuring the degree of unexpectedness. We will present an algorithm for calculating the probability that a given if-then rule is accidental. As mentioned, the lower this probability, the more unexpected and interesting the rule is.

Let J be the set of all the discovered association rules. Consider a rule jJ. Let us try to evaluate the significance level of the rule j, namely the probability that rule j in the investigated dataset exists not accidentally.

The significance level of rule j may be determined as 1αj, where αj is the probability that the rule j in the investigated dataset exists by chance. To precisely define αj, assume that the rule j has probability pjp¯1 (that y is 1) and support level sj. It is intuitively clear that there is a certain a priori probability that this rule, or any such rule where the probability is not less than pj, will be revealed. αj is this a priori probability. Formally, assume that mj records have been chosen at random from a dataset containing N records, among which there are paN records where y is 1. Here mj=sjpj is the number of records satisfying the condition of rule j. αj is the probability that there are not less than sj records where y is 1 among these mj records. αj is calculated by the formula of hypergeometric distribution, that is:

αj=k=sjmjPN,Mmjk,E2

where

PN,Mmjk=kMmjkNMmjNE3

αj for the rule j with the probability pjp¯0 that y is 1 (i.e., for a rule having the probability 1pj that y is 0, where 1pj1p¯0) implies the a priori probability that this rule, or any such rule where the probability that y is 1 is not greater than pj, will be revealed. In this case, αj is calculated as follows:

αj=k=0sjPN,Mmjk,E4

where PN,Mmjk is calculated by formula (3).

To calculate PN,Mmjk by formula (3), at first lnPN,Mmjk is calculated by the approximation formula lnn!n+0.5lnnn+0.5ln2π. Then elnPN,Mmjk is calculated.

The less αj, the greater the significance level of rule j. The significance level of rule j may be interpreted as the measure of the unexpectedness of this rule, thereby as a designation of the extent to which the rule is interesting.

Thus, we can calculate the significance level for each rule jJ, and sort the array of all discovered rules by the descending order of the values of the significance level. This descending order allows the user to sort the rules by their level of unexpectedness according to the first criteria of unexpectedness.

Now, the first k rules of the sorted array can be considered unexpected and interesting. The number k may be determined based on the condition that the significance level of a rule is not less than a pre-given value. Another way of determining the boundary value for the significance level is as follows. The average value a¯ of significance levels for all rules jJ and the corresponding standard deviation σ are calculated. The value a¯+σ can be accepted as the boundary value for the significance level of an interesting rule.

Advertisement

4. Calculating the conditional probability of a rule given the basic trends

The previous section referred to the first method for measuring the degree of unexpectedness. We turn now to the second method. This method refers to rules having more than one condition. It calculates the conditional probability of each of these rules given the basic trends in the data.

Consider first an example. Assume that the a priori probability that y = 1 is 0.3, and we wish to find all the if-then rules having a probability (1) not less than p¯1=0.4, and (2) not greater than p¯0=0.2. Assume also that the following two rules were discovered (among other):

  1. If x1 is 3, then there is a probability of 0.45 that y is 1.

  2. If x1is 3 and x2 is 5, then there is a probability of 0.85 that y is 0 (i.e., a probability of 0.15 that y is 1).

This pair of rules may be considered as unexpected because (1) the first rule states that under the condition x1 is 3, the probability that y is 1 significantly deviates from pa in the upper side, and (2) when adding the condition x2 is 5 to x1 is 3, the probability that y is 1 unexpectedly deviates from pa in the opposite direction.

If the probability that y is 1 in the second rule were not 0.15 but, for example, 0.9, this fact would also be considered as unexpected, since adding the second condition to the first one leads to a sudden leap of probability. In this case, even if the first rule had not been revealed, the second rule would still be considered as unexpected. One can easily see the reason for this by considering why the first rule had not been discovered. The first rule’s support cannot be less than the second rule’s support ssmin. Consequently, the only possible reason for not discovering the first rule is the nonfulfillment of both, the condition that pp¯1 and the condition that pp¯0, where p is the first rule’s probability. Hence, p¯0<p<p¯1 . Thus, analogously to the previous case, the addition of the condition x2is 5 to x1is 3 leads to an even greater leap of probability that y is 1.

Let us now give a formal definition for an unexpected rule (following the second method).

Definition: A rule containing q-condition (q > 1) in the “if” part, y is y1 in the “then” part, and having probability P that y is y1 is called unexpected if at least one of the following two requirements is fulfilled:

  1. There is a rule containing y is not y1 in the “then” part and q1-condition (q1<q) in the “if” part such that the set of 1-conditions entered in the q1-condition is a subset of the set of 1-conditions entered in the q-condition.

  2. There is no rule containing y is y1 in the “then” part and q1-condition (q1<q) in the “if” part (where the set of 1-conditions entered in the q1-condition is a subset of the set of 1-conditions entered in the q-condition) for which the inequality p<<P is not fulfilled, where p denotes a probability of such a rule. (If y1 is 1, p is the probability that y is 1; if y1 is 0, p is the probability that y is 0.)

The second requirement means that the probability that y is y1 for any discovered rule with the above-defined q1-condition in the “if” part and y is y1 in the “then” part must be much less than P.

To reveal the unexpected rules according to the above definition, it is first necessary to precisely define the relation “<<”. Consider the following possible definitions for relation p<<P.

Assume that p is relatively small (for definiteness, p<0.2). To determine the lower boundary value for P, let us add a third of the length of the segment p1 to p. We get the inequality P2p+13. Let 0.2p<0.5 . In this case, to determine the lower boundary value for P, let us add a half of the length of the segment p1 to p. We get Pp+12 . Let 0.5p<0.7. To determine the lower boundary value for P, let us add two thirds of the length of segment p1 to p. We get Pp+23 . For the case when 0.7p<0.9, let us define the relation p<<P as P=1. If p0.9, we will not define the relation p<<P, that is, in this case, the fulfillment of the first of the two abovementioned requirements is the necessary condition for accepting the rule having probability P as unexpected.

Another definition uses a notion of an expected probability for a rule containing the q-condition in the “if” part (q>1) and suspected as unexpected. Consider the set of 1-conditions entered in the q-condition. Let pi be the probability that y is 1 under ith 1-condition, i=1,,q. (Note that if for a fixed ipip¯1 or pip¯0, then we have the corresponding rule with 1-condition.) Let si be the number of records satisfying both the ith 1-condition and the condition y is 1. Then the number of records satisfying the ith 1-condition is mi=sipi . The probability that ith 1-condition is fulfilled on the set of records of the investigated dataset is miN. Assume that all the events, each of which is defined by the set of records satisfying ith 1-condition, are independent. Then the expected number kind of records satisfying all of these conditions can be calculated as follows:

kind=Ni=1qmiNE5

or

kind=i=1qmiNq1E6

Assume that all the events, defined as “both the ith 1-condition and y is 1 are fulfilled,” are independent. Then the expected number of records satisfying all these events is

kind1=Mi=1qsiM=i=1qsiMq1E7

Assume that all the events, defined as “both the ith 1-condition, and y is 0 are fulfilled,” are independent. Then the expected number of records satisfying all these events is

kind0=NMi=1qmisiNM=i=1qmisiNMq1E8

Let us now calculate the expected probability Pind that y is 1 under the abovementioned assumptions:

Pind=kind1kind1+kind0E9

After the transformations, we get:

Pind=11+A,E10

where

A=pa1paq1i=1q1pipiE11

Assume now that all the events, each of which is defined by the set of records satisfying ith 1-condition, are dependent to a maximal degree. In this case, the number kdep of records satisfying all the 1-conditions is equal to the minimum numbers of records satisfying each individual 1-condition, that is:

kdep=i=1,,qmiE12

Let i0=arg minmi. The probability Pdep that y is 1 for the rule containing all the 1-conditions is equal to pio, i.e.

Pdep=pi0E13

Thus, we have considered two extreme cases where the 1-conditions are (1) independent and (2) dependent as much as possible. The extent of the dependency between the 1-conditions can be determined by the number of records satisfying all the 1-conditions. This is the number of records satisfying the q-condition of the rule, which is suspected as unexpected. Let us denote this number by K. It is obvious that Kkdep. The nearer K to kdep, the greater the extent of dependency between the 1-conditions. If K is near to kind, then it can be assumed that the 1-conditions are independent or “almost” independent. In this case, when K<kind we can say that the events, each of which is defined by the set of records satisfying ith 1-condition, are more inconsistent than independent. But, in this case, there is a very low chance of establishing the abovementioned rule with the q-condition as a result of nonfulfillment of the necessary condition Ksmin.

Consider now function Pexp=PexpK, where Pexp is the expected probability that y is 1 for the above rule with the q-condition. For the sake of simplicity, assume that this function is linear, for kindKkdep. Pexp=Pind, if K=kind; Pexp=Pdep, if K=kdep. For extremely rare cases when K<kind, let us set Pexp=Pind. Then, Pexp is calculated as follows:

Pexp=PdepPindkdepkindKkind+Pind,ifkindKkdepE14
Pexp=Pind,ifK<kindE15

where kind, Pind, kdep, Pdep are calculated by formulas (6)(13).

Thus, a rule containing the q-condition in the “if” part and having probability P that y is 1 can be defined as unexpected if Pexp << P. The definition for the relation “<<” was previously presented.

Assume that we have found the unexpected rule, containing q-condition in the “if” part, whose support level is S and the probability (confidence level) is P. We can determine the probability of the event described by this rule given the events described by each set of records defined by ith 1-condition such that this 1-condition belongs to the set of 1-conditions entered in the q-condition, i=1,,q. Note that the corresponding rule with the ith 1-condition has perhaps been revealed for all i or some of them. If such a rule having ith 1-condition exists, let us denote the support level, the probability, and the number of records satisfying the rule’s condition by si, pi, and mi respectively. In the case when a rule having ith 1-condition does not exist, we will use the same notations.

Let us define the level of unlikelihood Ui of the unexpected rule relative to the rule i as the probability of the existence of this unexpected rule provided that the rule i holds (more precisely, provided that the ith 1-condition is defined by the three above parameters whose values are si, pi, mi). Hence, the level of unlikelihood Ui can be calculated by the formula of the hypergeometric distribution:

Ui=SsiKSmisiKmiE16

The maximum (or the minimum) of Ui can be considered as the level of the unlikelihood of the unexpected rule.

The rules can then be sorted by their level of unlikelihood, and the higher this level, the higher the level of unexpectedness according to the second criteria.

Advertisement

5. Experiments with real data

The abovementioned two algorithms have been implemented in a commercial data mining tool, WizWhy, which can be downloaded from www.wizsoft.com.

In what follows, we present the reports issued by this tool in regard to the Diabetes dataset (www.ics.uci.edu/∼mlearn/MLRepository.html). We issued an analysis having the following thresholds:

Minimum support level: 20 records (The total number of records was 768.)

Minimum confidence level for if-then rules: 0.48 (The a priori probability was 0.349.)

Minimum confidence level for if-then-not rules: 0.79.

Two hundred twenty rules were discovered. The two rules having the lowest error probability (α being smaller than 0.0000001) were:

  1. If Plasma glucose is 108.00 … 197.00 (average = 142.80)

    and Age is 29.00 … 81.00 (average = 42.39)

    Then Predict is 2

    Confidence level: Rule’s probability is 0.593

    Support level: The rule exists in 172 records.

  2. If Plasma glucose is 0.00 … 107.00 (average = 90.65)

    Then Predict is not 2

    Confidence level: Rule’s probability is 0.875

    Support level: The rule exists in 253 records.

    These rules are the most interesting rules according to the first method. Thirty-four rules (out of the total 220 rules that were discovered) were unexpected according to the second method. The first rule in this list was:

  3. If Number of times pregnant is 1.00 … 2.00 (average = 1.35)

    and Diastolic blood pressure is 64.00 … 88.00 (average = 74.20)

    and Age is 29.00 … 62.00 (average = 36.83)

    Then Predict is 2

    Confidence level: Rule’s probability is 0.550

    Support level: The rule exists in 22 records.

    Significance Level: Error probability < 0.01

    This rule was found to be unlikely relative to the following three rules and trends:

  4. If Number of times pregnant is 1.00 … 2.00 (average = 1.43)

    Then Predict is not 2

    Confidence level: Rule’s probability is 0.798

    Support level: The rule exists in 190 records.

  5. If Diastolic blood pressure is 64.00 … 88.00

    Then Predict is 2

    Confidence level: Trend’s probability is 0.369

    Support level: The trend exists in 190 records.

  6. If Age is 29.00 … 81.00

    Then Predict is 2

    Confidence level: Trend’s probability is 0.491

    Support level: The trend exists in 197 records.

Note that rule # (4) says that if the number of pregnant is 1 or 2, then there is a high probability (0.798) that the value in the Predict field is not 2. However, following rule # (3), if the conditions presented in rules # (5) and (6) are added to the condition of rule # (4), then there is a high probability that the Predict field is 2. On the basis of the events described in rules # (4), (5), and (6), we should have expected that the confidence level in rule # (3) would have been 0.332, while in fact it is 0.55. In this sense, rule # (3) is unexpected and as such interesting.

Following experiments with the datasets in the abovementioned repository, these results are quite representative. On average, about 5% of the rules have an error probability lower than 0.01, and about another 5% are unexpected, according to the second method.

Advertisement

6. Conclusion

We have presented two methods for the automated discovery of unexpected and, thereby, interesting rules within the set of all the if-then rules previously revealed in the mined dataset.

The first method calculates, for each rule, the probability that the rule exists accidentally. The lower this probability, the more unexpected the rule is. The second method calculates the conditional probability of each rule having more than one condition, given the relevant more basic rules and trends. The lower this conditional probability, the more unexpected the rule is.

These two methods are independent, and it makes sense always to use both. One can calculate a combined score by multiplying the scores of the two methods. And since there are additional methods for revealing interesting rules (some of them were mentioned in the first section of this paper), one can develop a score that combines all these methods. The search for finding additional methods for revealing interesting rules and combining these methods is a project to be continued.

The author would like to thank Boris Levin and Ilya Vorobyov for helping to write this paper and to develop the software program presented in this paper.

References

  1. 1. Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: An overview. Advances in Knowledge Discovery and Data Mining. 1996:1-34
  2. 2. Agrawal R, Imielinski T and Swami A. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGKDD Conference on Management of Data. 1993. pp. 207-216
  3. 3. Bayardo R and Agrawal R. Mining the most interesting rules. Proceedings: KDD-99 The Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999. pp. 145-154
  4. 4. Agrawal R, Mannila H, Srikant R, Toivonen H and Verkamo AI. Fast discovery of association rules, in advances in knowledge discovery and data mining. 1995
  5. 5. Piatetsky-Shapiro G and Matheus CJ. The interestingness of deviations. Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases. 1994. pp. 25-36
  6. 6. Agassi J. Science in Flux. Reidel; 1975:28
  7. 7. Silberschatz A, Tuzhilin A. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering. 1996;8(6):970-974
  8. 8. Silberschatz A and Tuzhilin A. On subjective measures of interestingness in knowledge discovery. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. 1997. pp. 259-262
  9. 9. Liu B and Hsu W. Post-Analysis of Learned Rules, AAAI-96. 1996. pp. 828-834
  10. 10. Liu B, Hsu W, and Chen S. Using general impressions to analyze discovered classification rules. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. 1997. pp. 31-36
  11. 11. Sahar S. Interestingness via what is not interesting. Proceedings: KDD-99 The Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999. pp. 332-336
  12. 12. Romão W, Freitas AA, de Itana M, Gimenes S. Discovering interesting knowledge from a science and technology database with a genetic algorithm. Applied Soft Computing. 2004;4(2):121-137
  13. 13. Vashishtha J, Kumar D, Ratnoo S, Kundu K. Mining comprehensible and interesting rules: A genetic algorithm approach. International Journal of Computer Applications (0975–8887). 2011;31(1):39-47
  14. 14. Suzuki E. Autonomous discovery of reliable exception rules, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. 1997. pp. 259-262

Written By

Abraham Meidan

Submitted: 01 March 2023 Reviewed: 17 March 2023 Published: 05 December 2023