Inconsistent Decision System: Rough Set Data Mining Strategy to Extract Decision Algorithm of a Numerical Distance Relay – Tutorial

© 2012 Othman and Aris, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Inconsistent Decision System: Rough Set Data Mining Strategy to Extract Decision Algorithm of a Numerical Distance Relay – Tutorial


Introduction
Modern numerical protective relays being intelligent electronic devices (IED) are inevitably vulnerable to false tripping or failure of operation for faults in the power system [1]. With regular and rigorous analyses the performance reliability of the digital protective relays can be ascertained, their availability maximized and subsequently their misoperation risks minimized [2]. The precise relay operation analyses would normally be assessing the relay characteristics, evaluating the relay performance and identifying the relay-power system interactions so as to ensure that the protective relays operate in correspond to their predetermined settings [3,4].
Protection engineers would in practice resort to computing technologies for automating the analysis process when the gravity of event data exploration, manipulation and inferencing incapacitate human manageability. The voluminous amount of data to be processed has prompted the need to use intelligent data mining, an essential constituent in the Knowledge Discovery in Databases (KDD) process [5]. This has motivated the adoption of rough set theory to data mine the protective relay event report so as to discover its decision algorithm.

Problem statement and objective
The following two pertinent problems are the attributing factors in driving this paper into studying the protective relay operation analysis:


Inconsistencies in the device's event report particularly found when upon power system fault inception, a protective relay detects and invokes a common combination of tripping conditions in time succession but having two distinct tripping decisions (classifications). These distinct decisions are one, that upon relay pick-up, trip signal has not been asserted immediately after and the other is when a subsequent trip signal is asserted, after a preset time delay as set by the protection engineer.  Non-linear nature of relay operation that makes it very difficult to select a group of effective attributes to fully represent relay tripping behavior.
In the grueling manual analysis of relay event report [1,6], the selected attributes hardly provide adequate knowledge in accurately mapping the interclass boundary in the relay decision system due to inconsistency. This characterizes the interclass boundary to be usually "rough". Based on the selected attributes, some relay events close to the boundary are unclassifiable -trip or nontrip. The small overlaps between different relay events make the protective relay operation analysis to be actually a rough classification problem. Thus, rough set theory has been appropriately chosen to resolve this conflict [7].

Rough set data mining in dealing with inconsistent numerical distance relay decision system to extract decision algorithm -The fundamental concept
Using rough set theory approach, relay decision rule extraction is naturally a byproduct of the data reduction process involved and easily understood. Rule extraction technique is inherent to the machine learning process of rough set theory. Thus, the inherent capability of rough set theory to discover fundamental patterns in relay data has essentially mooted this study. Using an approximation concept, rough set theory is able to remove data redundancies and consequently generate decision rules. In contrast to crisp sets, a rough set has boundary line cases -events that cannot be certainly classified either as members of the set or of its complement. Rough set theory is an alternative intelligent data analysis tool that can be employed to handle vagueness and inconsistencies [8].
An information system (IS) also alternatively known as knowledge representation system (KRS) is a tabulated data set, the rows of which are labeled by objects (events) of interest, columns labeled by attributes, and the entries are attribute values [8]. This data layout fits very well the protective relay event report that is characterized by its attributes of relay multifunctional elements versus sequence of time-stamped events [7].
In the protective relay event report, the IS manifestation is more appropriately referred as relay decision table or decision system (DS) as Huang et. al. [9] put it that decision table is characterized by disjoint sets of condition attributes (C  Q) and decision (action) attributes (D  Q). In this regard Q = C  D and C ∩ D = Ø. This DS is a 4-tuple structure formulated as DS = U, Q, V, f, the elements of which are as follows [8,10,11]:  U, i.e. the universe denoted as U = {t1, t2, t3, …, tm}, is a finite set of relay events (ti's).  Q = C  D is a non-empty finite union set of condition and decision attributes, condition attributes (ci  C) indicate the internally various multifunctional protective elements and analog measurands, indicates the trip output of the relay, such that q: U  Vq for every q  Q.

Relay decision system indiscernibility relation
If a set of attributes P  Q = C  D and f(tx,q) = f(ty,q) where tx, ty  U, then for every q  P, tx and ty are indiscernible (indistinguishable) by the set of attributes P in DS. Thus, every P  Q brings forth a binary relation on U called P-indiscernibility relation (or equivalence relation) which is denoted by IND(P). This suggests that there will be sets of relay events that are indiscernible based on any selected subset of attributes P. UIND(P) denotes the family of all equivalence classes of relation IND(P). IND(P) and UIND(P) can be formulated as where, UIND(P) is also interchangeably referred as P-basic knowledge or P-elementary sets in DS. Pelementary set including relay event t is denoted as [t]IND(P). The first step in classification with rough sets is the construction of elementary sets [11]. A description of P-elementary set X  UIND(P) in terms of values of attributes from P is denoted as DesP(X), i.e.

Relay decision system set approximation
In the context of protective relay operations, consider T  U as an arbitrary target set of relay events described (classified) by a particular trip assertion status that is needed to be represented by equivalence classes originating from attribute subset P  Q. P could be a selected condition attribute set P  C or all condition attributes C reflecting relay multifunctional protective elements while T could be the set of relay events indiscernible with respect to the decision attribute D = Trip having a domain value 'b' for pole-B tripping, for example [7].
The idea of the rough set revolves around the concept of approximation [11]. Thus, by introducing a pair of sets, called the lower and upper approximations of the target set T using only the information contained within P, the target set T can be approximated.
Formally, with a given relay decision system DS, each target subset T  U having equivalence relation IND(P) is related to two subsets of T as follows.
P-lower approximation of T expressed as, is defined as the union of all elementary sets in [t]IND(P) which are contained in T. For any relay event ti of the lower approximation of T with respect to the set of attributes P (i.e., ti  PT), it positively certain belongs to T.
P-upper approximation of T expressed as, is defined as the union of elementary sets in [t]IND(P) which have a non-empty intersection with T. For any relay event ti of the upper approximation of T with respect to the set of attributes P (i.e., ti  PT), it may possibly belong to T.
P-boundary of set T expressed as, is the difference between PT and PT. The set of elements ti which cannot be certainly classified as belonging to T using the set of attributes P [12].
The following three regions shall be derived from the lower-and upper-approximations as illustrated in Figure 1 [7,10,13].
 POSP(T) = PT, described as P-positive region of T, is the set of relay events which can be classified with certainty in the approximated set T.  NEGP(T) = U -PT, described as P-negative region of T, is the set of relay events which cannot be classified without ambiguity in the approximated set T (or classified as belonging to the complement of T).  BNP(T) = PT -PT, described as P-boundary region of T, is the set of relay events in which none can be classified with certainty into T nor its complement T as far as the attributes P are concerned. The set T is crisp if there are no boundary sets, i.e. BNP(T) = Ø (empty set), which otherwise it is rough.  (cardinality) of a set is the number of events contained in the set [11]).

Approximation accuracy and quality
Clearly, equal upper and lower approximations, i.e. empty boundary region and that P(T) = 1, would mean the target set T is said to be definable in U since it is perfectly approximated. Regardless of the size of the upper approximation, zero accuracy would mean the lower approximation is empty.
In general, the set T can be defined in U according to one of the following four concepts of definability [14,15]:  Roughly definable T in U given PT  Ø and PT  U (Ø denotes empty set)  Externally undefinable T in U given PT  Ø and PT = U  Internally undefinable T in U given PT = Ø and PT  U  Totally undefinable T in U given PT = Ø and PT = U The quality of approximation of a target set T is expressed as i.e. the ratio of P-correctly approximated events to all events in the system.

The concept of reduct and core in reduction of protective relay attributes
Dependencies between attributes are primarily important in the protective relay data analysis using rough sets approach. The set of attributes R  Q depends on the set of attributes P  Q in IS if and only if IND(P)  IND(R). This dependency is denoted as P  R.
This so-called attribute reduction is so performed that the reduced set of attributes provides the same approximation quality as the original set of attributes. If a particular set of attributes is dependent, it is interesting to find reducts (all possible minimal subsets of attributes) that lead to the same number of elementary sets as in the case of the whole set of attributes and also to find core (the set of all indispensable attributes) [11]. By adopting the fundamental concepts of core and reduct, rough set theory minimizes the subsets of attributes in the relay database but still fully characterizes the inherent knowledge of relay operation behavior.
Reduct is essentially a sufficient set of features of a DS, which discerns (differentiates) all events discernible by the original DS. Reduct is a subset of attributes RED  P (where P  Q) such that:


The reduced attribute set RED induces the same equivalence classes as those induced by full attribute set P. This is denoted as This suggests that no attribute can be dispensed from set RED without modifying the equivalence classes [t]IND(P) [16].
Core is defined as the set of attributes found to be in common in all reducts. Core is a subset of attributes CORE  RED (where RED  P and P  Q) such that:

Decision rules interpreted from protective relay event report
Relay DS analysis is considered as a supervised learning problem (classification) [13]. A DS determines a logical implication called decision rule when the conditions specified by condition attributes in each row of DS correlate what decisions (trip assertions) are to take effect [18]. Thus, in this study the logical implication is designated as relay decision rule. A complete set of relay decision rules can be derived from the relay decision table DS. Events in DS, i.e. {t1, t2, t3, …, tm} = U, identify as labels of relay decision rules.
in terms of values of attributes from P).
The relay CD-decision rules are logical statements read as 'if C…then…D'. These rule correlate descriptions of condition attributes C  Q (for internal multifunctional protective elements, voltages, currents and impedance measurements) to classes of decision attribute D  Q (i.e. type of trip assertions).
The set of decision rules for each decision class Yj (j = 1,…, n) is denoted by: Decision algorithm in DS is used to mean the set of decision rules for all decision classes, i.e. CD-algorithm [10,18]. In the context of protective relay operation characteristics, a decision algorithm is a collection of relay CD-decision rules, thus referred to as relay CD-decision algorithm in this study.
Rules having the same conditions but different decisions are inconsistent (nondeterministic, conflicting); otherwise they are consistent (certain, deterministic, nonconflicting) [17]. When some conditions are satisfied, deterministic DS uniquely describes the decisions (actions) to be made. In a non-deterministic DS, decisions are not uniquely determined by the conditions [9]. Formally, it is defined that: The degree of consistency (or degree of dependency) between the set of attributes C and D of a relay CD-decision algorithm is denoted as C k D and can be formally defined as: (i.e. conceptually similar to the quality of approximation or classification) [10]. In other words, D depends on C in a degree of dependency k (0 ≤ k ≤ 1). All the values of attributes from D depend totally on (i.e. uniquely determined by) the values of attributes from C if k = 1, i.e. C 1 D or simply C  D. D depends partially in a degree k on C if k < 1 [17].
It may happen that the set D depends on subset C called relative reduct and not on the entire set C. C' is a relative reduct called D-reduct of C if C  C is a minimal subset of C and (C, D) = (C, D) is valid (i.e. similar in dependency). REDD(C) is used to mean the family of all Dreducts of C [18]. Putting it simply, the minimal subsets of condition attributes that discern all decision equivalence classes of the relation UIND(D) discernable by the entire set of attributes are called D-reducts [11]. The following notations are, thus, valid: The following property is also true for DS system as previously defined, The previous definitions are valid if D = C [18].  Using a slightly modified discernibility matrix called D-discernibility matrix of C, relative reducts can be computed. The set of all condition attributes which discern events ti and tj that do not belong to the same equivalence class of the relation UIND(D) defines the element of D-discernibility matrix of C. The set of all single elements of the Ddiscernibility matrix of C is the D-core of C [10,11]. Rather than the ordinary reduct of C, D-reduct of C is very much the essence of this paper's study that aspires to derive the relay CD-decision rules (i.e. C  D).

Discovering decision algorithm of numerical distance protective relay
In order to fairly understand the indiscernibility relation and rules discovery from distance protective relay decision system DS, the following tutorial is presented.

Protective relay decision table
Trip sec code zone zone zone logic logic logic logic logic logic logic logic pole 0.5066 t7

Protective relay decision table analysis
From  {t8, t9, t10, t11, t12, t13, t14, t15, t16, t17, t18, t19} = D2 b With classification accuracies of 0.33 and 0.53, the respective elementary sets D1 and D2 are roughly definable (vaguely classified) in the DS. This is rather expected. The decision attribute D = {Trip} may remain in a certain domain value for a certain time-sequence of relay events after a particular relay trip trigger according to the protection engineer's preset time duration of signal assertion [7]. This may prevail even though the condition attributes have changed during this duration. This explains the inconsistency found in the CD-algorithm.
The accuracy and quality of overall classification D are:   i.e. the overall classification with respect to C is rough.
Normalization in its final normal form, the last Boolean expression fC(D) is recognized as Disjunctive Normal Form (DNF). DNF is analogous to Sum Of Product (SOP) boolean algebra in digital electronics logic. fC(D) in DNF form is an alternative representation of the DS in which all its constituents are the D-reducts of C (i.e. REDD(C)) [11,17]. Either one of the set of reducts can be used to represent exactly the same data classification as that depicted by the entire set of attributes C. The following REDD(C) of the above final fC(D) reveals that either one of the D-reducts of C can be used alternatively to represent exactly the same equivalence relation UIND(D) of the DS as that represented by the whole set of attributes C, i.e., The D-core of C can be figured out by either:  Identifying all the single attribute entries in the D-discernibility matrix of C [11], which from Table 4 Hence, Z3pu is the most characteristic attribute that is indispensible in DS without reducing the approximation quality of equivalence relation UC with respect to D.
CORED(C) = Z3pu does not seem to signify any significance in the behavior of the relay under analysis. Had the reduct analysis been worked out based only on the whole condition attributes C (as per the equivalence relation in Table 3, where decision attribute D is excluded such as in the case of IS instead of DS), the core of C (i.e. the core of the equivalence relation UC with respect to C) would have been, This implies the protective relay has been subjected to B-G fault. In reality this fault occurred in distance zone 1 operation characteristic and was picked up by the zone 1 distance element. However, the D-core of C discovers the indispensability of the condition attribute Z3pu as being the core when the decision attribute D is considered for the DS analysis. Actually, this attribute is entirely insignificant based on the understanding of the manner the distance relay functions. This is simply because of the concurrent nature of the distance relay quadrilateral operation characteristic whereby zone 1 element is encapsulated in zone 2 element and subsequently zone 2 element is encapsulated in zone 3 element. Zone 4 element is on its own separate entity not encapsulated in any zone elements [7]. Thus, by merely considering the exertion of the zone 1 element in case of fault and correspondingly disregarding zones 2, 3 and 4 operation is principally correct. Figure 2 illustrates that a fault occurring in zone 1 is also concurrently shown as present in zones 2 and 3 as well.
To simplify and make the analysis process more sense, an attribute priority of the distance relay operation has to be formulated so that the relay DS can be modified as shown Table 5. Case 1´ + + + * Z2pu Case 2 + + * Z2pu Case 2´ + + * Z3pu Case 3 + * Z3pu Case 4 + * Z4pu + denotes value of attribute equal to "1", i.e. Vci = 1 where attribute ciC. * denotes the attribute's value of "1" occurring at possibly different events (rows). The absence of relay trip assertion signal in attributes Z2trp, Z3trp, and Z4trp which is represented by the attribute value "0" further justifies the necessity of disregarding attributes Z2pu, Z3pu, and Z4pu for fault in zone 1. This is because, for example, the assertion of attribute Z1pu (value of "1") must always be accompanied by the assertion (after and for a preset time duration, i.e. sequence of consecutive events) of the corresponding attribute Z1trp in order to be taken into consideration in the analysis. However, in the above example, it is highly likely that attribute Z2trp will assert (after and for a preset number of events) in lieu of the attribute Z1trp as shown in Table 5 if the relay failed to operate in asserting the attribute Z1trp when the attribute Z1pu is asserted.
Taking into account the proposition, the DS system in Table 1 is then modified prior to reanalysis using rough set as shown in Table 6.  Table 6, the elementary sets with respect to the decision attribute D = {Trip} are still the same as shown in Table 2.
However, the elementary sets with respect to the shrunk condition = {ag, bg, cg, Z1pu, Z4pu, Z1trp, Z2trp, Z3trp, Z4trp} as shown in Table 7 are slightly different from those found with the whole attributes C considered (Table 3).
{t1, t2, t3, t16, t17, t18, t19, t20, t21} The new D-discernibility matrix of C as in Table 8 will result in new D-reducts and D-core of C when events are discerned with respect to the modified condition attributes C between different equivalence classes in the relation UIND(D). As before, similar consideration is taken in discerning events appearing only in different classes in D-space.   Table 8. D-discernibility matrix of modified C

Protective relay decision algorithm discovery
As aforementioned, a relay decision algorithm in DS called CD-decision algorithm manifests as a CD-decision table. It comprises a finite set of relay CD-decision rules or instructions. The event report of a protective distance relay in the form of a DS is a manifestation of relay decision algorithm. In protection system, protection engineers relate relay decision algorithm as relay operation logic. It is envisaged that with rough set theory, the relay operation logic knowledge can be discovered. Later it can be transformed into a knowledge base of a decision support system for determining anticipated relay behavior out of a new test DS [7].
Checking whether or not all the relay operation logics (decision rules) are true would enable us to check whether or not a relay decision algorithm is consistent. As aforementioned, consistency is measured by the degree of dependency k (or alternatively, dependency is measured by the degree of consistency) [10]. Thus, it is well understood that with the degree of consistency given in Equation (10) The modified DS in Table 6 Table 9 produces a rather simplified version of relay CDdecision algorithm, i.e., Thus, when one by one the said condition attributes is removed, the changes incurred in the positive region of the relay CD-decision algorithm concur with the core attributes' indispensability. Thus, the core having both attributes {bg, Z1pu} is correct.

Protective relay decision algorithm minimization
It is subsequently desirable to further minimize the decision rules in the relay CD-decision algorithm after the above simplification via reduction of the set of condition attributes. This is achieved by removal of any possibly superfluous decision rules which essentially involves reducing the superfluous values of attributes. In other words, the unnecessary conditions have to be separately removed leaving only the core attribute in each decision rule of the algorithm [10].
The tabulated version of the above simplified relay CD-decision algorithm is shown in Table  10. In Table 11 the condition attribute of each decision rule in Table 10 is removed one by one.
In each removal the resultant rule is cross checked with other rules to find whether they are in conflict (inconsistent). This cross reference with other rules is to figure out whether the remaining condition attribute's value is the same but implication on the decision attribute is different. This process discovers the core attribute(s) that when eliminated causes the corresponding decision rule, or in general the CD-decision algorithm, inconsistent and consequently invalid (albeit not necessarily in the relay analysis perspective).
In summary, Table 12 contains cores of each decision rule. The condition attribute having eliminated value can be said as having no effect whatsoever on the CD-decision algorithm and may be termed as "don't care". It can be assigned with a value or otherwise. Combining duplicate rules and demarcating separate decision classes, Table 13 is obtained.
For decision attribute Trip = 0, one minimal set of decision rules is obtained from bg0 Z1pu0  Trip0 The final form of CD-decision algorithm can now be easily interpreted as follows:   Table 11. Eliminating unnecessary condition attribute in decision rules

Conclusion
Rough set theory has been proven to be an essentially useful mathematical tool in intelligent data mining analysis of inconsistent and vague protective relay data pattern as evident in the rough classification involved in the assertion of the trip decision attribute. The adoption of rough set theory is managed under supervised learning.
A single D-reduct of C (i.e. REDD(C) = {bg, Z1pu}) has been discovered after formulating the attribute priority of the distance relay operation to trim the DS. REDD(C) can alternatively be used to represent exactly the same equivalence relation UIND(D) represented by the whole set of attributes C. Relying on the reduced number of condition attributes represented by REDD(C), relay analysis that can be achieved at ease.