Open access peer-reviewed chapter

# Converting Graphic Relationships into Conditional Probabilities in Bayesian Network

Written By

Loc Nguyen

Submitted: December 6th, 2016 Reviewed: June 8th, 2017 Published: November 2nd, 2017

DOI: 10.5772/intechopen.70057

From the Edited Volume

## Bayesian Inference

Edited by Javier Prieto Tejedor

Chapter metrics overview

View Full Metrics

## Abstract

Bayesian network (BN) is a powerful mathematical tool for prediction and diagnosis applications. A large Bayesian network can constitute many simple networks, which in turn are constructed from simple graphs. A simple graph consists of one child node and many parent nodes. The strength of each relationship between a child node and a parent node is quantified by a weight and all relationships share the same semantics such as prerequisite, diagnostic, and aggregation. The research focuses on converting graphic relationships into conditional probabilities in order to construct a simple Bayesian network from a graph. Diagnostic relationship is the main research object, in which sufficient diagnostic proposition is proposed for validating diagnostic relationship. Relationship conversion is adhered to logic gates such as AND, OR, and XOR, which are essential features of the research.

### Keywords

• diagnostic relationship
• Bayesian network
• transformation coefficient

## 1. Introduction

Bayesian network (BN) is a directed acyclic graph (DAG) consists of a set of nodes and a set of arcs. Each node is a random variable. Each arc represents a relationship between two nodes. The strength of a relationship in a graph can be quantified by a number called weight. There are some important relationships such as prerequisite, diagnostic, and aggregation. The difference between BN and normal graph is that the strength of every relationship in BN is represented by a conditional probability table (CPT) whose entries are conditional probabilities of a child node given parent nodes. There are two main approaches to construct a BN, which are as follows

• The first approach aims to learn BN from training data by learning machine algorithms.

• The second approach is that experts define some graph patterns according to specific relationships and then, BN is constructed based on such patterns along with determined CPTs.

This research focuses on the second approach in which relationships are converted into CPTs. Essentially, relationship conversion aims to determine conditional probabilities based on weights and meanings of relationships. We will have different ways to convert graphic weights into CPTs for different relationships. It is impossible to convert all relationships but some of them such as diagnostic, aggregation, and prerequisite are mandatory ones that we must specify as computable CPTs of BN. Especially, these relationships are adhered to logic X-gates [1] such as AND-gate, OR-gate, and SIGMA-gate. The X-gate inference in this research is derived and inspired from noisy OR-gate described in the book “Learning Bayesian Networks” Neapolitan ([2], pp. 157–159). Díez and Druzdzel [3] also researched OR/MAX, AND/MIN, and noisy XOR inferences but they focused on canonical models, deterministic models, and ICI models whereas I focused on logic gate and graphic relationships. So, their research is different from mine but we share the same result that is AND-gate model. In general, my research focuses on applied probability adhered to Bayesian network, logic gates, and Bayesian user modeling [4]. The scientific results are shared with Millán and Pérez-de-la-Cruz [4].

Factor graph [5] represents factorization of a global function into many partial functions. If joint distribution of BN is considered as the global function and CPTs are considered as partial functions, the sumproduct algorithm [6] of factor graph is applied into calculating posterior probabilities of variables in BN. Pearl’s propagation algorithm [7] is very successful in BN inference. The application of factor graph into BN is only realized if all CPT (s) of BN are already determined whereas this research focuses on defining such CPTs firstly. I did not use factor graph for constructing BN. The concept “X-gate inference” only implies how to convert simple graph into BN. However, the arrange sum with a fixed variable mentioned in this research is the “not-sum” ([6], p. 499) of factor graph. Essentially, X-gate probability shown in Eq. (10) is as same as λ message in the Pearl’s algorithm ([6], p. 518) but I use the most basic way to prove the X-gate probability.

As default, the research is applied in learning context in which BN is used to assess students’ knowledge. Evidences are tests, exams, exercises, etc. and hypotheses are learning concepts, knowledge items, etc. Note that diagnostic relationship is very important to Bayesian evaluation in learning context because it is used to evaluate student’s mastery of concepts (knowledge items) over entire BN. Now, we start relationship conversion with a research on diagnostic relationship in the next section.

## 2. Diagnostic relationship

In some opinions like mine, the diagnostic relationship should be from hypothesis to evidence. For example, disease is hypothesis and symptom is evidence. The symptom must be conditionally dependent on disease. Given a symptom, calculating the posterior probability of disease is essentially to diagnose likelihood of such disease ([8], p. 1666). Inversely, the arc from evidence to hypothesis implies prediction where evidence and hypothesis represent observation and event, respectively. Given an observation, calculating the posterior probability of the event is essentially to predict/assert such event ([8], p. 1666). Figure 1 shows diagnosis and prediction.

The weight w of the relationship between X and D is 1. Figure 1 depicts simplest graph with two random variables. We need to convert diagnostic relationship into conditional probabilities in order to construct a simplest BN from the simplest graph. Note that hypothesis is binary but evidence can be numerical. In learning context, evidence D can be test, exam, exercise, etc. The conditional probability of D given X (likelihood function) is P(D|X). The posterior probability of X is P(X|D), which is used to evaluate student’s mastery over concept (hypothesis) X given evidence D. Eq. (1) specifies CPT of D when D is binary (0 and 1)

P ( D | X ) = { D   if   X = 1 1 D   if   X = 0 E1

Eq (1) is our first relationship conversion. It implies

P ( D | X = 0 ) + P ( D | X = 1 ) = D + 1 D = 1 E38

Evidence D can be used to diagnose hypothesis X if the so-called sufficient diagnostic proposition is satisfied, as seen in Table 1.

 D is equivalent to X in diagnostic relationship if P(X|D) = kP(D|X) given uniform distribution of X and the transformation coefficient k is independent from D. In other words, k is constant with regards to D and so D is called sufficient evidence.

### Table 1.

Sufficient diagnostic proposition.

The concept of sufficient evidence is borrowed from the concept of sufficient statistics and it is inspired from equivalence of variables T and T’ in the research ([4], pp. 292-295). The proposition can be restated that evidence D is only used to assess hypotheses if it is sufficient evidence. As a convention, the proposition is called diagnostic condition and hypotheses have uniform distribution. The assumption of hypothetic uniform distribution (P(X = 1) = P(X = 0)) implies that we cannot assert whether or not given hypothesis is true before we observe its evidence.

In learning context, D can be totally used to assess student’s mastery of X if diagnostic condition is satisfied. Derived from such condition, Eq. (2) specifies transformation coefficient k given uniform distribution of X.

k = P ( X | D ) P ( D | X ) E2

We need to prove that Eq. (1) satisfies diagnostic condition. Suppose the prior probability of X is uniform.

P ( X = 0 ) = P ( X = 1 ) E39

we have

P ( X | D ) = P ( D | X ) P ( X ) P ( D ) = P ( D | X ) P ( X ) P ( D | X = 0 ) P ( X = 0 ) + P ( D | X = 1 ) P ( X = 1 ) ( due to Bayes rule ) = P ( D | X ) P ( X ) P ( X ) ( P ( D | X = 0 ) + P ( D | X = 1 ) ) ( due to   P ( X = 0 ) = P ( X = 1 ) ) = P ( D | X ) P ( D | X = 0 ) + P ( D | X = 1 ) = 1 * P ( D | X ) ( due to   P ( D | X = 0 ) + P ( D | X = 1 ) = 1 )   E40

It is easy to infer that the transformation coefficient k is 1, if D is binary. In practice, evidence D is often a test whose grade ranges within an interval {0, 1, 2,…, η}. Eq. (3) specifies CPT of D in this case

P ( D | X ) = { D S if   X = 1 η S D S if   X = 0 E3

Where

D { 0 , 1 , 2 , , η }   S = D = 0 n D = η ( η + 1 ) 2 E41

As a convention, P ( D | X ) = 0 , D { 0 , 1 , 2 , , η } . Eq. (3) implies that if student has mastered concept (X = 1), the probability that she/he completes the exercise/test D is proportional to her/his mark on D ( P ( D | X ) = D S ) . We also have

P ( D | X = 0 ) + P ( D | X = 1 ) = D S + η D S = η S = 2 ( η + 1 ) E42
D = 0 η P ( D | X = 1 ) = D = 0 η D S = D = 0 η D S = S S = 1 E43
D = 0 η P ( D | X = 0 ) = D = 0 η η D S = D = 0 η ( η D ) S = D = 0 η η D = 0 η D S = η ( η + 1 ) S S = 2 S S S = 1 E44

We need to prove that Eq. (3) satisfies diagnostic condition. Suppose the prior probability of X is uniform.

P ( X = 0 ) = P ( X = 1 ) E45

The assumption of prior uniform distribution of X implies that we do not determine if student has mastered X yet. Similarly, we have

P ( X | D ) = P ( D | X ) P ( X ) P ( D ) = P ( D | X ) P ( D | X = 0 ) + P ( D | X = 1 ) = η + 1 2 P ( D | X )   E46

So, the transformation coefficient k is η + 1 2 if D ranges in {0, 1, 2,…, η}.

In the most general case, discrete evidence D ranges within an arbitrary integer interval { a , a + 1 , a + 2 , , b } . In other words, D is bounded integer variable whose lower bound and upper bound are a and b, respectively. Eq. (4) specifies CPT of D, where D { a , a + 1 , a + 2 , , b } .

P ( D | X ) = { D S if   X = 1 b + a S D S if   X = 0 E4

Where

D { a , a + 1 , a + 2 , , b }   S = a + ( a + 1 ) + ( a + 2 ) + + b = ( b + a ) ( b a + 1 ) 2 E1000

Note, P ( D | X ) = 0 , D { a , a + 1 , a + 2 , , b } . According to the diagnostic condition, we need to prove the equality P ( X | D ) = k P ( D | X ) , where

k = b a + 1 2 E47

Similarly, we have

P ( X | D ) = P ( D | X ) P ( X ) P ( D ) = P ( D | X ) P ( D | X = 0 ) + P ( D | X = 1 ) = b a + 1 2 P ( D | X )   E48

If evidence D is continuous in the real interval [a, b] with note that a and b are real numbers, Eq. (5) specifies probability density function (PDF) of continuous evidence D [ a , b ] . The PDF p ( D | X ) replaces CPT in case of continuous random variable.

p ( D | X ) = { 2 D b 2 a 2 if   X = 1 2 b a 2 D b 2 a 2 if   X = 0 E49

where

D [ a , b ]  where   a   and   b   are real numbers E50
S = a b D d D = b 2 a 2 2 E5

As a convention, [a, b] is called domain of continuous evidence, which can be replaced by open or half-open intervals such as (a, b), (a, b], and [a, b). Of course we have p ( D | X ) = 0 , D [ a , b ] . In learning context, evidence D is often a test whose grade ranges within real interval [a, b].

Functions p(D|X = 1) and p(D|X = 0) are valid PDFs due to

D p ( D | X = 1 ) d D = a b 2 D b 2 a 2 d D = 1 b 2 a 2 a b 2 D d D = 1 E51
D p ( D | X = 0 ) d D = 2 b a a b d D 1 b 2 a 2 a b 2 D d D = 1 . E52

According to the diagnostic condition, we need to prove the equality

P ( X | D ) = k p ( D | X ) E53

where,

k = b a 2 E54

When D is continuous, its probability is calculated in ε-vicinity where ε is very small number. As usual, ε is bias if D is measure values produced from equipment. The probability of D given X, where D + ε [a, b] and Dε [a, b] is

P ( D | X ) = D ε D + ε p ( D | X ) d D = { D ε D + ε 2 D b 2 a 2 d D   if   X = 1 D ε D + ε ( 2 b a 2 D b 2 a 2 ) d D   if   X = 0 = { 4 ε D b 2 a 2 if   X = 1 4 ε b a 4 ε D b 2 a 2 if   X = 0 = 2 ε p ( D | X ) E55

In fact, we have

P ( X | D ) = P ( D | X ) P ( X ) P ( D | X = 0 ) P ( X = 0 ) + P ( D | X = 1 ) P ( X = 1 ) = P ( D | X ) P ( D | X = 0 ) + P ( D | X = 1 ) E56
( due to Bayes ' rule and the assumption   P ( X = 0 ) = P ( X = 1 ) ) E57
= b a 4 ε P ( D | X ) = k p ( D | X )   E58

In general, Eq. (6) summarizes CPT of evidence of single diagnostic relationship.

P ( D | X ) = { D S   if   X = 1 M S D S   if   X = 0 k = N 2 E9000

Where,

N = { 2   if   D { 0 , 1 } η + 1   if   D { 0 , 1 , 2 , , η } b a + 1   if   D { a , a + 1 , a + 2 , , b } b a   if   D   continuous and   D [ a , b ] M = { 1   if   D { 0 , 1 } η   if   D { 0 , 1 , 2 , , η } b + a   if   D { a , a + 1 , a + 2 , , b } b + a   if   D   continuous and   D [ a , b ] S = D D = N M 2 = { 1   if   D { 0 , 1 } η ( η + 1 ) 2   if   D { 0 , 1 , 2 , , η } ( b + a ) ( b a + 1 ) 2   if   D { a , a + 1 , a + 2 , , b } b 2 a 2 2   if   D   continuous and   D [ a , b ] E6

In general, if the conditional probability P(D|X) is specified by Eq. (6), the diagnostic condition will be satisfied. Note that the CPT P(D|X) is the PDF p(D|X) in case of continuous evidence. The diagnostic relationship will be extended with more than one hypothesis. The next section will mention how to determine CPTs of a simple graph with one child node and many parent nodes based on X-gate inferences.

## 3. X-gate inferences

Given a simple graph consisting of one child variable Y and n parent variables Xi, as shown in Figure 2, each relationship from Xi to Y is quantified by normalized weight wi where 0 ≤ wi ≤ 1. A large graph is an integration of many simple graphs. Figure 2 shows the DAG of a simple BN. As aforementioned, the essence of constructing simple BN is to convert graphic relationships of simple graph into CPTs of simple BN.

Child variable Y is called target and parent variables Xis are called sources. Especially, these relationships are adhered to X-gates such as AND-gate, OR-gate, and SIGMA-gate. These gates are originated from logic gate [1]. For instance, AND-gate and OR-gate represent prerequisite relationship. SIGMA-gate represents aggregation relationship. Therefore, relationship conversion is to determined X-gate inference. The simple graph shown in Figure 2 is also called X-gate graph or X-gate network. Please distinguish the letter “X” in the term “X-gate inference” which implies logic operators (AND, OR, XOR, etc.) from the “variable X”.

All variables are binary and they represent events. The probability P(X) indicates event X occurs. Thus, P(X) implicates P(X = 1) and P(not(X)) implicates P(X = 0). Eq. (7) specifies the simple NOT-gate inference.

P ( not ( X ) ) = P ( X ¯ ) = P ( X = 0 ) = 1 P ( X = 1 ) = 1 P ( X ) P ( not ( not ( X ) ) ) = P ( X ) E7

X-gate inference is based on three assumptions mentioned in Ref. ([2], p. 157), which are as follows

• X-gate inhibition: Given a relationship from source Xi to target Y, there is a factor Ii that inhibits Xi from being integrated into Y. Factor Ii is called inhibition of Xi. That the inhibition Ii is turned off is prerequisite of Xi integrated into Y.

• Inhibition independence: Inhibitions are mutually independent. For example, inhibition I1 of X1 is independent from inhibition I2 of X2.

• Accountability: X-gate network is established by accountable variables Ai for Xi and Ii. Each X-gate inference owns particular combination of Ais.

Figure 3 shows the extended X-gate network with accountable variables Ais ([2], p. 158).

The strength of each relationship from source Xi to target Y is quantified by a weight 0 ≤ wi ≤ 1. According to the assumption of inhibition, probability of Ii = OFF is pi, which is set to be the weight wi.

p i = w i E59

If notation wi is used, we focus on the strength of relationship. If notation pi is used, we focus on probability of OFF inhibition. In probabilistic inference, pi is also prior probability of Xi = 1. However, we will assume each Xi has uniform distribution later on. Eq. (8) specifies probabilities of inhibitions Iis and accountable variables Ais.

P ( I i = O F F ) = p i = w i P ( I i = O N ) = 1 p i = 1 w i P ( A i = O N | X i = 1 , I i = O F F ) = 1 P ( A i = O N | X i = 1 , I i = O N ) = 0 P ( A i = O N | X i = 0 , I i = O F F ) = 0 P ( A i = O N | X i = 0 , I i = O N ) = 0 P ( A i = O F F | X i = 1 , I i = O F F ) = 0 P ( A i = O F F | X i = 1 , I i = O N ) = 1 P ( A i = O F F | X i = 0 , I i = O F F ) = 1 P ( A i = O F F | X i = 0 , I i = O N ) = 1 E8

According to Eq. (8), given probability P(Ai=ON | Xi=1, Ii=OFF), it is assured 100% confident that accountable variables Ai is turned on if source Xi is 1 and inhibition Ii is turned off. Eq. (9) specifies conditional probability of accountable variables Ai (s) given Xi (s), which is corollary of Eq. (8).

P ( A i = O N | X i = 1 ) = p i = w i P ( A i = O N | X i = 0 ) = 0 P ( A i = O F F | X i = 1 ) = 1 p i = 1 w i P ( A i = O F F | X i = 0 ) = 1 E9

Appendix A1 is the proof of Eq. (9). As a definition, the set of all Xis is complete if and only if

P ( X 1 X 2 X n ) = P ( Ω ) = i = 1 n w i = 1 E60

The set of all Xis is mutually exclusive if and only if

X i X j = , i j E61

For each Xi, there is only one Ai and vice versa, which establishes a bijection between Xis and Ais. Obviously, the fact that the set of all Xis is complete is equivalent to the fact that the set of all Ai (s) is complete. We will prove by contradiction that “the fact that the set of all Xi (s) is mutually exclusive is equivalent to the fact that the set of all Ai (s) is mutually exclusive.” Suppose X i X j = , i j but i j : A i A j = B . Let B 1 be preimage of B. Due to B     A i and   B     A j , we have B 1     X i and B 1     X j , which causes that X i X j = B 1 . There is a contradiction and so we have

X i X j = , i j A i A j = , i j E62

By similar proof, we have

A i A j = , i j X i X j = , i j   E63

The extended X-gate network shown in Figure 3 is interpretation of simple network shown in Figure 2. Specifying CPT of the simple network is to determine the conditional probability P(Y = 1 | X1, X2,…, Xn) based on extended X-gate network. The X-gate inference is represented by such probability P(Y = 1 | X1, X2,…, Xn) specified by Eq. (10) ([2], p. 159).

P ( Y | X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) E10

Appendix A2 is the proof of Eq. (10). It is necessary to make some mathematical notations because Eq. (10) is complicated, which is relevant to arrangements of Xi (s). Given the set Ω = {X1, X2,…, Xn} where all variables are binary, Table 2 specifies binary arrangements of Ω.

 Given Ω = {X1, X2,…, Xn} where |Ω| = n is cardinality of Ω.Let a(Ω) be an arrangement of Ω which is a set of n instances {X1=x1, X2=x2,…, Xn=xn} where xi is 1 or 0. The number of all a(Ω) is 2|Ω|. For instance, given Ω = {X1, X2}, there are 22=4 arrangements as follows: a ( Ω ) = { X 1 = 1 , X 2 = 1 } , a ( Ω ) = { X 1 = 1 , X 2 = 0 } , a ( Ω ) = { X 1 = 0 , X 2 = 1 } , a ( Ω )   =   { X 1 = 0 , X 2 = 0 } . E64 Let a(Ω:{Xi}) be the arrangement of Ω with fixed Xi. The number of all a(Ω:{Xi}) is 2|Ω|−1. Similarly, for instance, a(Ω:{X1, X2, X3}) is an arrangement of Ω with fixed X1, X2, X3. The number of all a(Ω:{X1, X2, X3}) is 2|Ω|−3.Let c(Ω) and c(Ω:{Xi}) be the number of arrangements a(Ω) and a(Ω:{Xi}), respectively. Such c(Ω) and c(Ω:{Xi}) are called arrangement counters. As usual, counters c(Ω) and c(Ω:{Xi}) are equal to 2|Ω| and 2|Ω|−1, respectively but they will vary according to specific cases.Let ∑ a F ( a ( Ω ) ) and ∏ a F ( a ( Ω ) ) denote sum and product of values generated from function F acting on every a(Ω). The number of arrangements on which F acts is c(Ω).Let x denote the X-gate operator, for instance, x = ⊙ for AND-gate, x = ⊕ for OR-gate, x = not ⊙ for NAND-gate, x = not ⊕ for NOR-gate, x = ⊗ for XOR-gate, x = not ⊗ for XNOR-gate, x = ⊎ for U-gate, x = + for SIGMA-gate. Given an x-operator, let s(Ω:{Xi}) and s(Ω) be sum of all P ( X 1 x X 2 x … x X n ) through every arrangement of Ω with and without fixed Xi, respectively. s ( Ω ) = ∑ a P ( X 1 x X 2 x … x X n | a ( Ω ) ) = ∑ a P ( Y = 1 | a ( Ω ) ) s ( Ω : { X i } ) = ∑ a P ( X 1 x X 2 x … x X n | a ( Ω : { X i } ) ) = ∑ a P ( Y = 1 | a ( Ω : { X i } ) ) E65 For example, s(Ω) and s(Ω:{Xi}) for OR-gate are: s ( Ω ) = ∑ a P ( X 1 ⊕ X 2 ⊕ … ⊕ X n | a ( Ω ) ) s ( Ω : { X i } ) = ∑ a P ( X 1 ⊕ X 2 ⊕ … ⊕ X n | a ( Ω : { X i } ) ) E66 Such s(Ω) and s(Ω:{Xi}) are called arrangement sum. They are acting function F.Note that Ω can be any set of binary variables.

### Table 2.

Binary arrangements.

It is not easy to produce all binary arrangements of Ω. Table 3 shows a code snippet written by Java programming language for producing such all arrangements.

 public class ArrangementGenerator { private ArrayList arrangements; private int n; private int r;  private ArrangementGenerator(int n, int r) {    this.n = n;      this.r = r;      this.arrangements = new ArrayList(); } private void create(int[] a, int i) {     for(int j = 0; j < n; j++) {         a[i] = j;        if(i < r - 1)            create(a, i + 1);        else if(i == r -1) {            int[] b = new int[a.length];            for(int k = 0; k < a.length; k++) b[k] = a[k];            arrangements.add(b);        }     } } public int[] get(int i) {     return arrangements.get(i); } public long size() {     return arrangements.size(); } public static ArrangementGenerator parse(int n, int r) {     ArrangementGenerator arr =         new ArrangementGenerator(n, r);     int[] a = new int[r];     for(int i=0; i

### Table 3.

Code snippet generating all binary arrangements.

Each element of the list “arrangements” is a binary arrangement a(Ω) presented by an array of bits (0 and 1). The method “create(int[] a, int i)” which is recursive method, is the main one that generates arrangements. The method call “ArrangementGenerator.parse(2, n)” will list all possible binary arrangements.

Eq. (11) specifies the connection between s(Ω:{Xi = 1}) and s(Ω:{Xi = 0}), between c(Ω:{Xi = 1}) and c(Ω:{Xi = 0}).

s ( Ω : { X i = 1 } ) + s ( Ω : { X i = 0 } ) = s ( Ω ) c ( Ω : { X i = 1 } ) + c ( Ω : { X i = 0 } ) = c ( Ω ) E11

It is easy to draw Eq. (11) when the set of all arrangements a(Ω:{Xi = 1) is complement of the set of all arrangements a(Ω:{Xi = 0).

Let K be a set of Xis whose values are 1 and let L be a set of Xis whose values are 0. K and L are mutually complementary. Eq. (12) determines sets K and L.

{ K = { i : X i = 1 } L = { i : X i = 0 } K L = K L = { 1 , 2 , , n } E12

The AND-gate inference represents prerequisite relationship satisfying AND-gate condition specified by Eq. (13).

P ( Y = 1 | A i = O F F  for some  i ) = 0 E13

From Eq. (10), we have

P ( Y = 1 | X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y = 1 | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) = i = 1 n P ( A i = O N | X i ) ( Due to  P ( Y = 1 | A i = O F F   for some   i ) = 0 ) = ( i K P ( A i = O N | X i = 1 ) ) ( i K P ( A i = O N | X i = 0 ) ) = ( i K p i ) ( i K 0 ) = { i = 1 n p i   if all   X i ( s )   are   1 0   if there exists at least one  X i = 0 E67

(Due to Eq. (9))

In general, Eq. (14) specifies AND-gate inference.

P ( X 1 X 2 X n ) = P ( Y = 1 | X 1 , X 2 , , X n ) = { i = 1 n   p i   if all   X i ( s )   are   1 0   if there exists at least one  X i = 0 P ( Y = 0 | X 1 , X 2 , , X n ) = { 1 i = 1 n   p i   if all   X i ( s )   are   1 1   if there exists at least one  X i = 0 E14

The AND-gate inference was also described in ([3], p. 33). Eq. (14) varies according to two cases whose arrangement counters are listed as follows

L = E68
c ( Ω : { X i = 1 } ) = 1 , c ( Ω : { X i = 0 } ) = 0 , c ( Ω ) = 1. E69
L E70
c ( Ω : { X i = 1 } ) = 2 n 1 1 , c ( Ω : { X i = 0 } ) = 2 n 1 , c ( Ω ) = 2 n 1. E71

The OR-gate inference represents prerequisite relationship satisfying OR-gate condition specified by Eq. (15) ([2], p. 157).

P ( Y = 1 | A i = O N  for some  i ) = 1 E15

The OR-gate condition implies

P ( Y = 0 | A i = O N  for some  i ) = 0 E72

From Eq. (10), we have ([2], p. 159)

P ( Y = 0 | X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y = 1 | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) = i = 1 n P ( A i = O F F | X i ) ( due to  P ( Y = 1 | A i = O N   for some   i ) = 0 ) = ( i K P ( A i = O F F | X i = 1 ) ) ( i K P ( A i = O F F | X i = 0 ) ) = ( i K ( 1 p i ) ) ( i K 1 ) = { i K ( 1 p i ) if   K 1   if   K = E73

(Due to Eq. (9))

In general, Eq. (16) specifies OR-gate inference.

P ( X 1 X 2 X n ) = 1 P ( Y = 0 | X 1 , X 2 , , X n ) = { 1 i K ( 1 p i )   if   K 0   if   K = P ( Y = 0 | X 1 , X 2 , , X n ) = { i K ( 1 p i )   if   K 1   if   K = E16

where K is the set of Xis whose values are 1. The OR-gate inference was mentioned in Refs. ([2], p. 158) and ([3], p. 20). Eq. (16) varies according to two cases whose arrangement counters are listed as follows

K E74
c ( Ω : { X i = 1 } ) = 2 n 1 , c ( Ω : { X i = 0 } ) = 2 n 1 1 , c ( Ω ) = 2 n 1. E75
K = E76
c ( Ω : { X i = 1 } ) = 0 , c ( Ω : { X i = 0 } ) = 1 , c ( Ω ) = 1. E77

According to De Morgan’s rule with regard to AND-gate and OR-gate, we have

P ( not ( X 1 X 2 X n ) ) = P ( ( not ( X 1 ) ) ( not ( X 2 ) ) ( not ( X n ) ) ) = { 1 i L ( 1 ( 1 p i ) ) if   L 0   if   L = E78

(Due to Eq. (16))

According to Eq. (14), we also have

P ( not ( X 1 X 2 X n ) ) = P ( ( not ( X 1 ) ) ( not ( X 2 ) ) ( not ( X n ) ) ) = { i = 1 n P ( not ( X i ) )   if all not   ( X i ) ( s )   are   1 0   if there exists at least one not   ( X i ) = 0 = { i = 1 n ( 1 p i )   if all   X i ( s )   are   0 0   if there exists at least one  X i = 1 E79

In general, Eq. (17) specifies NAND-gate inference and NOR-gate inference derived from AND-gate and OR-gate

P ( not ( X 1 X 2 X n ) ) = { 1 i L p i   if   L 0   if   L = P ( not ( X 1 X 2 X n ) ) = { i = 1 n q i   if  K = 0   if  K E17

where K and L are the sets of Xis whose values are 1 and 0, respectively.

Suppose the number of sources Xis is even. Let O be the set of Xis whose indices are odd. Let O1 and O2 be subsets of O, in which all Xis are 1 and 0, respectively. Let E be the set of Xis whose indices are even. Let E1 and E2 be the subsets of E, in which all Xis are 1 and 0, respectively.

{ E = { 2 , 4 , 6 , , n } E 1 E E 2 E E 1 E 2 = E E 1 E 2 = X i = 1 , i E 1 X i = 0 , i E 2 and   { O = { 1 , 3 , 5 , , n 1 } O 1 O O 2 O O 1 O 2 = O O 1 O 2 = X i = 1 , i O 1 X i = 0 , i O 2 E80

Thus, O1 and E1 are the subsets of K. Sources Xis and target Y follow XOR-gate if one of two XOR-gate conditions specified by Eq. (18) is satisfied.

P ( Y = 1 | { A i = O N   for   i O A i = O F F   for   i     O } ) = P ( Y = 1 | A 1 = O N , A 2 = O F F , , A n 1 = O N , A n = O F F ) = 1 P ( Y = 1 | { A i = O N   for   i E A i = O F F   for   i     E } ) = P ( Y = 1 | A 1 = O F F , A 2 = O N , , A n 1 = O F F , A n = O N ) = 1 E18

From Eq. (10), we have

P ( Y = 1 | X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y = 1 | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) E81

If both XOR-gate conditions are not satisfied then,

P ( Y = 1 | X 1 , X 2 , , X n ) = 0 E82

If the first XOR-gate condition is satisfied, we have

P ( Y = 1 | X 1 , X 2 , , X n ) = P ( Y = 1 | A 1 = O N , A 2 = O F F , , A n 1 = O N , A n = O F F ) i = 1 n P ( A i | X i ) = ( i O P ( A i = O N | X i ) ) ( i E P ( A i = O F F | X i ) ) E83

We have

i O P ( A i = O N | X i ) = ( i O 1 P ( A i = O N | X i = 1 ) ) * ( i O 2 P ( A i = O N | X i = 0 ) ) = ( i O 1 p i ) * ( i O 2 0 ) = { i O 1 p i   if   O 2 = 0   if   O 2 E84

(Due to Eq. (9))

We also have

i E P ( A i = O F F | X i ) = ( i E 1 P ( A i = O F F | X i = 1 ) ) * ( i E 2 P ( A i = O F F | X i = 0 ) ) = ( i E 1 ( 1 p i ) ) ( i E 2 1 ) = { i E 1 ( 1 p i )   if   E 1 1     if   E 1 = E85

(Due to Eq. (9))

Given the first XOR-gate condition, it implies

P ( Y = 1 | X 1 , X 2 , , X n ) = ( i O P ( A i = O N | X i ) ) ( i E P ( A i = O F F | X i ) ) = { ( i O 1 p i ) ( i E 1 ( 1 p i ) ) if   O 2 =   and   E 1   i O 1 p i if   O 2 =   and   E 1 = 0   if   O 2 E86

Similarly, given the second XOR-gate condition, we have

P ( Y = 1 | X 1 , X 2 , , X n ) = ( i E P ( A i = O N | X i ) ) ( i O P ( A i = O F F | X i ) ) = { ( i E 1 p i ) ( i O 1 ( 1 p i ) ) if   E 2 =   and   O 1   i E 1 p i   if   E 2 =   and   O 1 = 0   if   E 2 E87

If one of XOR-gate conditions is satisfied then,

P ( Y = 1 | X 1 , X 2 , , X n ) = ( i O P ( A i = O N | X i ) ) ( i E P ( A i = O F F | X i ) ) + ( i E P ( A i = O N | X i ) ) ( i O P ( A i = O F F | X i ) ) E88

This implies Eq. (19) to specify XOR-gate inference.

P ( X 1 X 2 X n ) = P ( Y = 1 | X 1 , X 2 , , X n ) = { ( i O 1 p i ) ( i E 1 ( 1 p i ) ) + ( i E 1 p i ) ( i O 1 ( 1 p i ) )  if  O 2 =   a n d   E 2 = ( i O 1 p i ) ( i E 1 ( 1 p i ) )  if  O 2 =   a n d   E 1   a n d   E 2 i O 1 p i  if  O 2 =   a n d   E 1 = ( i E 1 p i ) ( i O 1 ( 1 p i ) )  if  E 2 =   a n d   O 1   a n d   O 2 i E 1 p i  if  E 2 =   a n d   O 1 = 0  if  O 2   a n d   E 2 0  if  n < 2   o r   n  is odd where { O = { 1 , 3 , 5 , , n 1 } O 1     O O 2     O O 1 O 2 = O O 1 O 2 = X i = 1 , i O 1 X i = 0 , i O 2 and   { E = { 2 , 4 , 6 , , n } E 1     E E 2     E E 1 E 2 = E E 1 E 2 = X i = 1 , i E 1 X i = 0 , i E 2 E19

Where,

Given n ≥ 2 and n is even, Eq. (19) varies according to six cases whose arrangement counters are listed as follows

O 2 =   and   E 2 = E89
c ( Ω : { X i = 1 } ) = 1 , c ( Ω : { X i = 0 } ) = 0 , c ( Ω ) = 1. E90
O 2 =   and   E 1   and   E 2 E91
c ( Ω : { X i = 1 } ) = 2 n 2 2 , c ( Ω : { X i = 0 } ) = 0 , c ( Ω ) = 2 n 2 2. E92
O 2 =   and   E 1 = E93
c ( Ω : { X i = 1 } ) = 1 , c ( Ω : { X i = 0 } ) = 0 , c ( Ω ) = 1. E94
E 2 =   and   O 1   and   O 2 E95
c ( Ω : { X i = 1 } ) = 2 n 2 1 1 , c ( Ω : { X i = 0 } ) = 2 n 2 1 1 , c ( Ω ) = 2 n 2 2. E96
E 2 =   and   O 1 = E97
c ( Ω : { X i = 1 } ) = 0 , c ( Ω : { X i = 0 } ) = 1 , c ( Ω ) = 1. E98
O 2   and   E 2 E99
c ( Ω : { X i = 1 } ) = ( 2 n 2 1 1 ) ( 2 n 2 1 ) , c ( Ω : { X i = 0 } ) = 2 n 2 1 ( 2 n 2 1 ) , c ( Ω ) = ( 2 n 2 1 ) 2 . E100

Suppose the number of sources Xis is even. According to XNOR-gate inference [1], the output is on if all inputs get the same value 1 (or 0). Sources Xi (s) and target Y follow XNOR-gate if one of two XNOR-gate conditions specified by Eq. (20) is satisfied.

P ( Y = 1 | A i = O N , i ) = 1 P ( Y = 1 | A i = O F F , i ) = 1 E20

From Eq. (10), we have

P ( Y = 1 | X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y = 1 | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) E101

If both XNOR-gate conditions are not satisfied then,

P ( Y = 1 | X 1 , X 2 , , X n ) = 0 E102

If Ai = ON for all i, we have

P ( Y = 1 | X 1 , X 2 , , X n ) = P ( Y = 1 | A i = O N , i ) i = 1 n P ( A i = O N | X i ) = i = 1 n P ( A i = O N | X i ) = { i = 1 n p i if  L = 0   if  L E103

(Please see similar proof in AND-gate inference)

If Ai = OFF for all i, we have

P ( Y = 1 | X 1 , X 2 , , X n ) = i = 1 n P ( A i = O F F | X i ) = { i K ( 1 p i )   if   K 1   if   K = E104

(Please see similar proof in OR-gate inference)

If one of XNOR-gate conditions is satisfied then,

P ( Y = 1 | X 1 , X 2 , , X n ) = i = 1 n P ( A i = O N | X i ) + i = 1 n P ( A i = O F F | X i ) E105

This implies Eq. (21) to specify XNOR-gate inference.

P ( not ( X 1 X 2 X n ) ) = P ( Y = 1 | X 1 , X 2 , , X n ) = { i = 1 n p i + i = 1 n ( 1 p i )   if  L = i K ( 1 p i )   if   L   and   K 1   if   L   and   K = E21

where K and L are the sets of Xis whose values are 1 and 0, respectively. Eq. (21) varies according to three cases whose arrangement counters are listed as follows

L = E106
c ( Ω : { X i = 1 } ) = 1 , c ( Ω : { X i = 0 } ) = 0 , c ( Ω ) = 1. E107
L   and   K E108
c ( Ω : { X i = 1 } ) = 2 n 1 1 , c ( Ω : { X i = 0 } ) = 2 n 1 1 , c ( Ω ) = 2 n 2. E109
L   and   K = E110
c ( Ω : { X i = 1 } ) = 0 , c ( Ω : { X i = 0 } ) = 1 , c ( Ω ) = 1. E111

Let U be a set of indices such that Ai = ON and let α ≥ 0 and β ≥ 0 be predefined numbers. The U-gate inference is defined based on α, β and cardinality of U. Table 4 specifies four common U-gate conditions.

 |U|=α P ( Y = 1 | A 1 , A 2 , … , A n ) = 1 if there are exactly α variables Ai = ON (s). Otherwise, P ( Y = 1 | A 1 , A 2 , … , A n ) = 0 . |U|≥α P ( Y = 1 | A 1 , A 2 , … , A n ) = 1 if there are at least α variables Ai = ON (s). Otherwise, P ( Y = 1 | A 1 , A 2 , … , A n ) = 0 . |U|≤β P ( Y = 1 | A 1 , A 2 , … , A n ) = 1 if there are at most β variables Ai = ON (s). Otherwise, P ( Y = 1 | A 1 , A 2 , … , A n ) = 0 . α≤|U|≤β P ( Y = 1 | A 1 , A 2 , … , A n ) = 1 if the number of Ai = ON (s) is from α to β. Otherwise, P ( Y = 1 | A 1 , A 2 , … , A n ) = 0 .

### Table 4.

U-gate conditions.

Note that U-gate condition on |U| can be arbitrary and it is only relevant to Ais (ON or OFF) and the way to combine Ais. For example, AND-gate and OR-gate are specific cases of U-gate with |U| = n and |U| ≥ 1, respectively. XOR-gate and XNOR-gate are also specific cases of U-gate with specific conditions on Ai (s). However, it must be assured that there is at least one combination of Ais satisfying the predefined U-gate condition, which causes that U-gate probability is not always equal to 0. In this research, U-gate is the most general nonlinear gate where U-gate probability contains products of weights (see Table 5). Later on, we will research a so-called SIGMA-gate that contains only linear combination of weights (sum of weights, see Eq. (23)). Shortly, each X-gate is a pattern owning a particular X-gate inference that is X-gate probability P(X1 × X2 ×…× Xn). Each X-gate inference is based on particular X-gate condition(s) relevant to only variables Ais.

 Let, S U = ∑ U ∈ U   ∏ i ∈ U p i ∏ j ∈ K \ U ( 1 − p j ) P U = P ( X 1 ⊎ X 2 ⊎ … ⊎ X n ) = P ( Y = 1 | X 1 , X 2 , … , X n ) As a convention, ∏ i ∈ U p i = 1   if | U | = 0 ∏ j ∈ K \ U ( 1 − p j ) = 1   if | U | = | K | |U|=0 P U = { ∏ j = 1 n ( 1 − p j )   if   | K | > 0 1     if   | K | = 0 | U | = 1 |U|≥0 P U = { S U  if   | K | > 0 1   if   | K | = 0 | U | = 2 | K | The case |U|≥0 is the same to the case |U|≤n |U|=n P U = { ∏ i = 1 n p i if  | K | = n 0   if  | K | < n | U | = { 1   if  | K | = n 0   if  | K | < n |U|=α0<α 0 1   if   | K | = 0 | U | = { ∑ j = 0 min ( β , | K | ) ( | K | j )   if   | K | > 0 1   if   | K | = 0 α≤|U|≤β0<α

### Table 5.

U-gate inference.

From Eq. (10), we have

P ( Y = 1 | X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y = 1 | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) E116

Let U be the set of all possible U (s), we have

P ( Y = 1 | X 1 , X 2 , , X n ) = U U P ( Y = 1 | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) = U U i U P ( A i = O N | X i ) j U P ( A j = O F F | X j ) E117

If X i = 0 , i U then,

P ( Y = 1 | X 1 , X 2 , , X n ) = U U i U 0 j U P ( A j = O F F | X j ) = 0 E118

This implies all sets U (s) must be subsets of K. The U-gate probability is rewritten as follows

P ( Y = 1 | X 1 , X 2 , , X n ) = U U   i U P ( A i = O N | X i = 1 ) j U P ( A j = O F F | X j ) = U U   i U p i j U P ( A j = O F F | X j ) = U U   i U p i j K \ U P ( A j = O F F | X j = 1 ) j K P ( A j = O F F | X j = 0 ) = U U   i U p i j K \ U ( 1 p j ) j K 1 = U U   i U p i j K \ U ( 1 p j ) E119

(Due to Eq. (9))

Let PU be the U-gate probability; Table 5 specifies U-gate inference and cardinality of U where U is the set of subsets (U) of K.

Note that the notation ( n j ) denotes the number of combinations of j elements taken from n elements.

( n j ) = n ! j ! ( n j ) ! E120

Arrangement counters relevant to U-gate inference and the set K are listed as follows

| K | = 0 E121
c ( Ω : { X i = 1 } ) = 0 , c ( Ω : { X i = 0 } ) = 1 , c ( Ω ) = 1. E122
| K | = 1 E123
c ( Ω : { X i = 1 } ) = 1 , c ( Ω : { X i = 0 } ) = 0 , c ( Ω ) = 1. E124
| K | = α   and   α > 0 E125
c ( Ω : { X i = 1 } ) = ( n 1 α 1 ) , c ( Ω : { X i = 0 } ) = ( n 1 α ) , c ( Ω ) = ( n α ) . E126
| K | α   and   α > 0 E127
c ( Ω : { X i = 1 } ) = j = 1 α ( n 1 j 1 ) , c ( Ω : { X i = 0 } ) = j = 0 α ( n 1 j ) , c ( Ω ) = j = 0 α ( n j ) . E128
| K | α   and   α > 0 E129
c ( Ω : { X i = 1 } ) = j = α n ( n 1 j 1 ) , c ( Ω : { X i = 0 } ) = j = α n 1 ( n 1 j ) , c ( Ω ) = j = α n ( n j ) . E130

The SIGMA-gate inference [9] represents aggregation relationship satisfying SIGMA-gate condition specified by Eq. (22).

P ( Y ) = P ( i = 1 n A i ) E131

where the set of Ai is complete and mutually exclusive

i = 1 n w i = 1 A i A j = , i j E22

The sigma sum i = 1 n A i indicates that Y is exclusive union of Ais and here, it does not express arithmetical additions.

Y = i = 1 n A i = i = 1 n A i E132

This implies

P ( Y ) = P ( i = 1 n A i ) = P ( i = 1 n A i ) = i = 1 n P ( A i ) E133

The sigma sum i = 1 n P ( A i ) now expresses arithmetical additions of probabilities P(Ai).

SIGMA-gate inference requires the set of Ais is complete and mutually exclusive, which means that the set of Xis is complete and mutually exclusive too. The SIGMA-gate probability is [9]

P ( Y | X 1 , X 2 , , X n ) = P ( i = 1 n A i | X 1 , X 2 , , X n ) ( due to SIGMA gate condition ) = i = 1 n P ( A i | X 1 , X 2 , , X n ) ( because  A i ( s )  are mutually exclusive ) = i = 1 n P ( A i | X i ) ( because  A i  is only dependent on  X i ) E134

It implies

P ( Y = 1 | X 1 , X 2 , , X n ) = i = 1 n P ( A i = O N | X i ) = ( i K P ( A i = O N | X i = 1 ) ) + ( i K P ( A i = O N | X i = 0 ) ) = i K w i + i K 0 = i K w i E135

(Due to Eq. (9))

In general, Eq. (23) specifies the theorem of SIGMA-gate inference [9]. The base of this theorem was mentioned by Millán and Pérez-de-la-Cruz ([4], pp. 292-295).

P ( X 1 + X 2 + + X n ) = P ( i = 1 n X i ) = P ( Y = 1 | X 1 , X 2 , , X n ) = i K w i P ( Y = 0 | X 1 , X 2 , , X n ) = 1 i K w i = i L w i E136

where the set of Xis is complete and mutually exclusive.

i = 1 n w i = 1 X i X j = , i j E23

The arrangement counters of SIGMA-gate inference are c(Ω:{Xi = 1}) = c(Ω:{Xi = 0}) = 2n−1, c(Ω) = 2n.

Eq. (9) specifies the “clockwise” strength of relationship between Xi and Y. Event Xi = 1 causes event Ai = ON with “clockwise” weight wi. There is a question “given Xi = 0, how likely the event Ai = OFF is”. In order to solve this problem, I define a so-called “counterclockwise” strength of relationship between Xi and Y denoted ωi. Event Xi = 0 causes event Ai = OFF with “counterclockwise” weight ωi. In other words, each arc in simple graph is associated with a clockwise weight wi and a counterclockwise weight ωi. Such graph is called bi-weight simple graph shown in Figure 4.

With bi-weight simple graph, all X-gate inferences are extended as so-called X-gate bi-inferences. Derived from Eq. (9), Eq. (24) specifies conditional probability of accountable variables with regard to bi-weight graph.

P ( A i = O N | X i = 1 ) = p i = w i P ( A i = O N | X i = 0 ) = 1 ρ i = 1 ω i P ( A i = O F F | X i = 1 ) = 1 p i = 1 w i P ( A i = O F F | X i = 0 ) = ρ i = ω i E24

The probabilities P(Ai = ON | Xi = 0) and P(Ai = OFF | Xi = 1) are called clockwise adder di and counterclockwise adder δi. As usual, di and δi are smaller than wi and ωi. When di = 0, bi-weight graph becomes normal simple graph.

d i = P ( A i = O N | X i = 0 ) = 1 ρ i = 1 ω i δ i = P ( A i = O F F | X i = 1 ) = 1 p i = 1 w i E137

The total clockwise weight or total counterclockwise weight is defined as sum of clockwise weight and clockwise adder or sum of counterclockwise weight and counterclockwise adder. Eq. (25) specifies such total weights Wi and W i . These weights are also called relationship powers.

W i = w i + d i W i = ω i + δ i E138

where

d i = 1 ρ i = 1 ω i δ i = 1 p i = 1 w i E25

Given Eq. (25), the set of all Ais is complete if and only if i = 1 n w i = 1 .

By extending aforementioned X-gate inferences, we get bi-inferences for AND-gate, OR-gate, NAND-gate, NOR-gate, XOR-gate, XNOR-gate, and U-gate as shown in Table 6.

The largest cardinalities of K (L) are 2n−1 and 2n with and without fixed Xi. Thus, it is possible to calculate arrangement counters. As a convention, the product of probabilities is 1 if indices set is empty.

i I f i = 1   if   I = E139

With regard to SIGMA-gate bi-inference, the sum of all total clockwise weights must be 1 as follows

i = 1 n W i = i = 1 n ( w i + d i ) = i = 1 n ( w i + 1 ω i ) = 1 E140

Derived from Eq. (23), the SIGMA-gate probability for bi-weight graph is

P ( X 1 + X 2 + + X n ) = i = 1 n P ( A i = O N | X i ) = i K P ( A i = O N | X i = 1 ) + i L P ( A i = O N | X i = 0 ) = i K w i + i L d i E141

Shortly, Eq. (26) specifies SIGMA-gate bi-inference.

P ( X 1 + X 2 + + X n ) = i K w i + i L d i E142

where the set of Xi(s) is complete and mutually exclusive.

i = 1 n W i = 1 X i X j = , i j E26

The next section will research diagnostic relationship which adheres to X-gate inference.

## 4. Multihypothesis diagnostic relationship

Given a simple graph shown in Figure 2, if we replace the target source Y by an evidence D, we get a so-called multihypothesis diagnostic relationship whose property adheres to X-gate inference. Maybe there are other diagnostic relationships in which X-gate inference is not concerned. However, this research focuses on X-gate inference and so multi-hypothesis diagnostic relationship is called X-gate diagnostic relationship. Sources X1, X2,…, Xn become hypotheses. As a convention, these hypotheses have prior uniform distribution.

According to aforementioned X-gate network shown in Figures 2 and 3, the target variable must be binary whereas evidence D can be numeric. It is impossible to establish the evidence D as direct target variable. Thus, the solution of this problem is to add an augmented target binary variable Y and then, the evidence D is connected directly to Y. In other words, the X-gate diagnostic network have n sources {X1, X2,…, Xn}, one augmented hypothesis Y, and one evidence D. As a convention, X-gate diagnostic network is called X-D network. The CPTs of the entire network are determined based on combination of diagnostic relationship and X-gate inference mentioned in previous sections. Figure 5 depicts the augmented X-D network. Note that variables X1, X2,…, Xn, and Y are always binary.

Appendix A3 is the proof that the augmented X-D network is equivalent to X-D network with regard to variables X1, X2,…, Xn and D. As a convention, augmented X-D network is considered as same as X-D network.

The simplest case of X-D network is NOT-D network having one hypothesis X1 and one evidence D, equipped with NOT-gate inference. NOT-D network satisfies diagnostic condition because it essentially represents the single diagnostic relationship. Inferred from Eqs. (1) and (7), the conditional probability P(D|X1) and posterior probability P(X1|D) of NOT-D network are

P ( D | X 1 ) = { 1 D   if   X 1 = 1 D   if   X 1 = 0 E143
P ( X 1 | D ) = P ( D | X 1 ) P ( X 1 ) P ( X 1 ) ( P ( D | X 1 = 0 ) + P ( D | X 1 = 1 ) ) E144

(Due to Bayes’ rule and uniform distribution of X1)

= P ( D | X 1 ) P ( D | X 1 = 0 ) + P ( D | X 1 = 1 ) = 1 * P ( D | X 1 ) E145
( due to   P ( D | X 1 = 0 ) + P ( D | X 1 = 1 ) = 1 ) E146

It implies NOT-D network satisfies diagnostic condition. Let

Ω = { X 1 , X 2 , , X n } n = | Ω | E147

We will validate whether the CPT of diagnostic relationship, P(D|X) specified by Eq. (6), still satisfies diagnostic condition within general case, X-D network. In other words, X-D network is general case of single diagnostic relationship.

Recall from dependencies shown in Figure 5, Eq. (27) specifies the joint probability of X-D network.

P ( Ω , Y , D ) = P ( X 1 , X 2 , , X n , Y , D ) = P ( D | Y ) P ( Y | X 1 , X 2 , , X n ) i = 1 n P ( X i ) where Ω  = { X 1 ,   X 2 , ,   X n } . E27

Eq. (28) specifies the conditional probability of D given Xi (likelihood function) and the posterior probability of Xi given D.

P ( D | X i ) = P ( X i , D ) P ( X i ) = { Ω , Y , D } \ { X i , D } P ( Ω , Y , D ) { Ω , Y , D } \ { X i } P ( Ω , Y , D ) P ( X i | D ) = P ( X i , D ) P ( D ) = { Ω , Y , D } \ { X i , D } P ( Ω , Y , D ) { Ω , Y , D } \ { D } P ( Ω , Y , D ) E28

where Ω = {X1, X2,…, Xn} and the sign “\” denotes the subtraction (excluding) operator in set theory [10]. Eq. (29) specifies the joint probability P(Xi, D) and the marginal probability P(D) given uniform distribution of all sources. Appendix A4 is the proof of Eq. (29).

P ( X i , D ) = 1 2 n S ( ( 2 D M ) s ( Ω : { X i } ) + 2 n 1 ( M D ) ) P ( D ) = 1 2 n S ( ( 2 D M ) s ( Ω ) + 2 n ( M D ) ) E29

where s(Ω) and s(Ω:{Xi}) are specified in Table 2. From Eqs. (2830) specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient for X-gate inference.

P ( D | X i = 1 ) = P ( X i = 1 , D ) P ( X i = 1 ) = ( 2 D M ) s ( Ω : { X i = 1 } ) + 2 n 1 ( M D ) 2 n 1 S P ( D | X i = 0 ) = P ( X i = 0 , D ) P ( X i = 0 ) = ( 2 D M ) s ( Ω : { X i = 0 } ) + 2 n 1 ( M D ) 2 n 1 S P ( X i = 1 | D ) = P ( X i = 1 , D ) P ( D ) = ( 2 D M ) s ( Ω : { X i = 1 } ) + 2 n 1 ( M D ) ( 2 D M ) s ( Ω ) + 2 n ( M D ) P ( X i = 0 | D ) = 1 P ( X i = 1 | D ) = ( 2 D M ) s ( Ω : { X i = 0 } ) + 2 n 1 ( M D ) ( 2 D M ) s ( Ω ) + 2 n ( M D ) k = P ( X i | D ) P ( D | X i ) = 2 n 1 S ( 2 D M ) s ( Ω ) + 2 n ( M D ) E30

The transformation coefficient is rewritten as follows

k = 2 n 1 S 2 D ( s ( Ω ) 2 n 1 ) + M ( 2 n s ( Ω ) ) E148

Note that S, D, and M are abstract symbols and there is no proportional connection between 2n−1S and D for all D, specified by Eq. (6). Assuming that such proportional connection 2n−1S = aDj exists for all D where a is arbitrary constant. Given binary case when D = 0 and S = 1, we have

2 n 1 = 2 n 1 * 1 = 2 n 1 S = a D j = a * 0 j = 0 E149

There is a contradiction, which implies that it is impossible to reduce k into the following form

k = a D j b D j E150

Therefore, if k is constant with regard to D then,

2 D ( s ( Ω ) 2 n 1 ) + M ( 2 n s ( Ω ) ) = C 0 , D E151

where C is constant. We have

D ( 2 D ( s ( Ω ) 2 n 1 ) + M ( 2 n s ( Ω ) ) ) = D C 2 S ( s ( Ω ) 2 n 1 ) + N M ( 2 n s ( Ω ) ) = N C 2 n S = N C E152

It is implied that

k = 2 n 1 S 2 D ( s ( Ω ) 2 n 1 ) + M ( 2 n s ( Ω ) ) = N C 2 C = N 2 E153

This holds

2 n S = N ( 2 D ( s ( Ω ) 2 n 1 ) + M ( 2 n s ( Ω ) ) ) = 2 N D ( s ( Ω ) 2 n 1 ) + 2 S ( 2 n s ( Ω ) ) 2 N D ( s ( Ω ) 2 n 1 ) 2 S ( s ( Ω ) 2 n 1 ) = 0 ( N D S ) ( s ( Ω ) 2 n 1 ) = 0 E154

Assuming ND = S we have

N D = S = 2 N M D = 2 M E155

There is a contradiction because M is maximum value of D. Therefore, if k is constant with regard to D then s(Ω) = 2n−1. Inversely, if s(Ω) = 2n−1 then k is

k = 2 n 1 S 2 D ( 2 n 1 2 n 1 ) + M ( 2 n 2 n 1 ) = N 2 E156

In general, the event that k is constant with regard to D is equivalent to the event s(Ω) = 2n−1. This implies diagnostic theorem stated in Table 7.

 P ( X 1 ⊙ X 2 ⊙ … ⊙ X n ) = ∏ i ∈ K p i ∏ i ∈ L d i E157 P ( X 1 ⊕ X 2 ⊕ … ⊕ X n ) = 1 − ∏ i ∈ K δ i ∏ i ∈ L ρ i E158 P ( not ( X 1 ⊙ X 2 ⊙ … ⊙ X n ) ) = 1 − ∏ i ∈ L ρ i ∏ i ∈ K δ i E159 P ( not ( X 1 ⊕ X 2 ⊕ … ⊕ X n ) ) = ∏ i ∈ L d i ∏ i ∈ K p i E160 P ( X 1 ⊗ X 2 ⊗ … ⊗ X n ) = ∏ i ∈ O 1 p i ∏ i ∈ O 2 d i ∏ i ∈ E 1 δ i ∏ i ∈ E 2 ρ i + ∏ i ∈ E 1 p i ∏ i ∈ E 2 d i ∏ i ∈ O 1 δ i ∏ i ∈ O 2 ρ i E161 P ( not ( X 1 ⊗ X 2 ⊗ … ⊗ X n ) ) = ∏ i ∈ K p i ∏ i ∈ L d i + ∏ i ∈ K δ i ∏ i ∈ L ρ i E162 P ( X 1 ⊎ X 2 ⊎ … ⊎ X n ) = ∑ U ∈ U ( ∏ i ∈ U ∩ ​ K p i ∏ i ∈ U ∩ ​ L d i ) ( ∏ i ∈ U ¯ ∩ ​ K δ i ∏ i ∈ U ¯ ∩ ​ L ρ i ) E163 There are four common conditions of U: |U|=α, |U|≥α, |U|≤β, and α≤|U|≤β. Note that U ¯ is the complement of U, U ¯ = { 1 , 2 , … , n } \ U E164 The largest cardinality of U is: | U | = 2 n E165

### Table 6.

Bi-inferences for AND-gate, OR-gate, NAND-gate, NOR-gate, XOR-gate, XNOR-gate, and U-gate.

 Given X-D network is combination of diagnostic relationship and X-gate inference: P ( Y = 1 | X 1 , X 2 , … , X n ) = P ( X 1 x X 2 x … x X n ) E166 P ( D | Y ) = { D S if   Y = 1 M S − D S if   Y = 0 E167 The diagnostic condition of X-D network is satisfied if and only if s ( Ω ) = ∑ a P ( Y = 1 | a ( Ω ) ) = 2 | Ω | − 1 , ∀ Ω ≠ ∅ E168 At that time, the transformation coefficient becomes: k = N 2 E169 Note that weights pi = wi and ρi = ωi, which are inputs of s(Ω), are abstract variables. Thus, the equality s(Ω) = 2|Ω|−1 implies all abstract variables are removed and so s(Ω) does not depend on weights.

### Table 7.

Diagnostic theorem.

The diagnostic theorem is the optimal way to validate the diagnostic condition.

The Eq. (30) becomes simple with AND-gate inference. Recall that Eq. (14) specified AND-gate inference as follows

P ( X 1 X 2 X n ) = P ( Y = 1 | X 1 , X 2 , , X n ) = { i = 1 n   p i   if all   X i ( s )   are   1 0   if there exists at least one  X i = 0 E170

Due to only one case X1 = X2 =…= Xn = 1, we have

s ( Ω ) = s ( Ω : { X i = 1 } ) = i = 1 n   p i E171

Due to Xi = 0, we have

s ( Ω : { X i = 0 } ) = 0 E172

Derived from Eq. (30), Eq. (31) specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient according to X-D network with AND-gate reference called AND-D network.

P ( D | X i = 1 ) = ( 2 D M ) i = 1 n p i + 2 n 1 ( M D ) 2 n 1 S P ( D | X i = 0 ) = M D S P ( X i = 1 | D ) = ( 2 D M ) i = 1 n p i + 2 n 1 ( M D ) ( 2 D M ) i = 1 n p i + 2 n ( M D ) P ( X i = 0 | D ) = 2 n 1 ( M D ) ( 2 D M ) i = 1 n p i + 2 n ( M D ) k = 2 n 1 S ( 2 D M ) i = 1 n p i + 2 n ( M D ) E31

For convenience, we validate diagnostic condition with a case of two sources Ω = {X1, X2}, p1 = p2 = w1 = w2 = 0.5, D { 0 , 1 , 2 , 3 } . According to diagnostic theorem stated in Table 7, if s(Ω) ≠ 2 for given X-gate then, such X-gate does not satisfy diagnostic condition.

Given AND-gate inference, by applying Eq. (14), we have

s ( Ω ) = ( 0.5 * 0.5 ) + 0 + 0 + 0 = 0.25 E173

Given OR-gate inference, by applying Eq. (16), we have

s ( Ω ) = ( 1 0.5 * 0.5 ) + ( 1 0.5 ) + ( 1 0.5 ) + 0 = 3 3 * 0.5 * 0.5 = 1.75 E174

Given XOR-gate inference, by applying Eq. (19), we have

s ( Ω ) = ( 0.5 * 0.5 + 0.5 * 0.5 ) + 0.5 + 0.5 + 0 = 1.5 E175

Given XNOR-gate inference, by applying Eq. (21), we have

s ( Ω ) = ( 0.5 * 0.5 + 0.5 * 0.5 ) + 0.5 + 0.5 + 1 = 2.5 E176

Given SIGMA-gate inference, by applying Eq. (23), we have

s ( Ω ) = ( 0.5 + 0.5 ) + 0.5 + 0.5 + 0 = 2 E177

It is asserted that AND-gate, OR-gate, XOR-gate, and XNOR-gate do not satisfy diagnostic condition and so they should not be used to assess hypotheses. However, it is not asserted if U-gate and SIGMA-gate satisfy such diagnostic condition. It is necessary to expend equation for SIGMA-gate diagnostic network (called SIGMA-D network) in order to validate it.

In case of SIGMA-gate inference, by applying Eq. (23), we have

i w i = 1 E178
s ( Ω ) = 2 n 1 i w i = 2 n 1 E179
s ( Ω : { X i = 1 } ) = 2 n 1 w i + 2 n 2 j i w j = 2 n 1 w i + 2 n 2 ( 1 w i ) = 2 n 2 ( 1 + w i ) E180
s ( Ω : { X i = 0 } ) = s ( Ω ) s ( Ω : { X i = 1 } ) = 2 n 2 ( 1 w i ) E181

It is necessary to validate SIGMA-D network with SIGMA-gate bi-inference. By applying Eq. (26), we recalculate these quantities as follows

s ( Ω ) = 2 n 1 i w i + 2 n 1 i d i = 2 n 1 i ( w i + d i ) = 2 n 1 E182
( due to i ( w i + d i ) = 1 ) E183
s ( Ω : { X i = 1 } ) = 2 n 1 w i + 2 n 2 j i w j + 2 n 2 i d i = 2 n 2 w i + 2 n 2 i ( w i + d i ) = 2 n 2 ( 1 + w i ) E184
s ( Ω : { X i = 0 } ) = s ( Ω ) s ( Ω : { X i = 1 } ) = 2 n 2 ( 1 w i ) E185

Obviously, quantities s(Ω), s(Ω:{Xi=1}), and s(Ω:{Xi = 0}) are kept intact. According to diagnostic theorem, we conclude that SIGMA-D network does satisfy diagnostic condition due to s(Ω)=2n−1. Thus, SIGMA-D network can be used to assess hypotheses.

Eq. (32), an immediate consequence of Eq. (30), specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient for SIGMA-D network.

P ( D | X i = 1 ) = ( 2 D M ) w i + M 2 S P ( D | X i = 0 ) = ( M 2 D ) w i + M 2 S P ( X i = 1 | D ) = ( 2 D M ) w i + M 2 M P ( X i = 0 | D ) = ( M 2 D ) w i + M 2 M k = N 2 E32

In case of SIGMA-gate, the augmented variable Y can be removed from X-D network. The evidence D is now established as direct target variable. Figure 6 shows a so-called direct SIGMA-gate diagnostic network (direct SIGMA-D network).

Derived from Eq. (23), the CPT of direct SIGMA-D network is determined by Eq. (33).

P ( D | X 1 , X 2 , , X n ) = i K D S w i + j L M D S w j E186

where the set of Xi (s) is complete and mutually exclusive.

i = 1 n w i = 1 X i X j = , i j E33

Eq. (33) specifies valid CPT due to

D P ( D | X 1 , X 2 , , X n ) = 1 S i K w i D D + 1 S j L w j D ( M D ) = 1 S i K S w i + 1 S j L w j ( N M S ) = 1 S i K S w i + 1 S j L S w j = i = 1 n w i = 1 E187

From dependencies shown in Figure 6, Eq. (34) specifies the joint probability of direct SIGMA-D network.

P ( X 1 , X 2 , , X n , Y , D ) = P ( D | X 1 , X 2 , , X n ) n i = 1 P ( X i ) E34

Inferred from Eq. (29), Eq. (35) specifies the joint probability P(Xi, D) and the marginal probability P(D) of direct SIGMA-D network, given uniform distribution of all sources.

P ( X i , D ) = 1 2 n s ( Ω : { X i } ) P ( D ) = 1 2 n s ( Ω ) E35

where s(Ω) and s(Ω:{Xi}) are specified in Table 2.

By browsing all variables of direct SIGMA-D network, we have

s ( Ω : { X i = 1 } ) = 2 n 1 D S w i + 2 n 2 j i D S w j + 2 n 2 j i M D S w j = 2 n 2 S ( 2 D w i + M j i w j ) = 2 n 2 S ( 2 D w i + M ( 1 w i ) ) ( Due to  i = 1 n w i = 1 ) = 2 n 2 S ( ( 2 D M ) w i + M ) E188

Similarly, we have

s ( Ω : { X i = 0 } ) = 2 n 1 M D S w i + 2 n 2 j i M D S w j + 2 n 2 j i D S w j = 2 n 2 S ( ( M 2 D ) w i + M ) E189
s ( Ω ) = 2 n 1 i D S w i + 2 n 1 i M D S w i = 2 n 1 M S E190

By applying Eq. (35), s(Ω:{Xi = 0}), s(Ω:{Xi = 1}), and s(Ω), we get the same result with Eq. (32).

P ( D | X i = 1 ) = ( 2 D M ) w i + M 2 S E191
P ( D | X i = 0 ) = ( M 2 D ) w i + M 2 S E192
P ( X i = 1 | D ) = ( 2 D M ) w i + M 2 M E193
P ( X i = 0 | D ) = ( M 2 D ) w i + M 2 M E194
k = N 2 E195

Therefore, it is possible to use direct SIGMA-D network to assess hypotheses. It is asserted that SIGMA-D network satisfy diagnostic condition when single relationship, NOT-D network, direct SIGMA-D network are specific cases of SIGMA-D network. There is a question: does an X-D network that is different from SIGMA-D network and not aforementioned exist such that it satisfies diagnostic condition?

Recall that each X-D network is a pattern owning a particular X-gate inference which in turn is based on particular X-gate condition (s) relevant to only variables Ais. The most general nonlinear X-D network is U-D network whereas SIGMA-D network is linear one. The U-gate inference given arbitrary condition on U is

P ( X 1 X 2 X n ) = U U ( i U K p i i U L ( 1 ρ i ) ) ( i U ¯ K ( 1 p i ) i U ¯ L ρ i ) E196

Let f be the arrangement sum of U-gate inference.

f ( p i , ρ i ) = a ( Ω )   U U ( i U K p i i U L ( 1 ρ i ) ) ( i U ¯ K ( 1 p i ) i U ¯ L ρ i ) E197

The function f is sum of many large expressions and each expression is product of four possible sub-products (Π) as follows

E x p r = i U K p i i U L ( 1 ρ i ) i U ¯ K ( 1 p i ) i U ¯ L ρ i E198

In any case of degradation, there always exist expression Expr (s) having at least 2 sub-products (Π), for example,

E x p r = i U K p i i U L ( 1 ρ i ) E199

Consequently, there always exist Expr (s) having at least 5 terms relevant to pi and ρi if n ≥ 5, for example,

E x p r = p 1 p 2 p 3 ( 1 ρ 4 ) ( 1 ρ 5 ) E200

Thus, degree of f will be larger than or equal to 5 given n ≥ 5. According to diagnostic theorem, U-gate network satisfies diagnostic condition if and only if f(pi, ρi) = 2n−1 for all n ≥ 1 and for all abstract variables pi and ρi. Without loss of generality, each pi or ρi is sum of variable x and a variable ai or bi, respectively. Note that all pi, ρi, ai are bi are abstract variables.

p i = x + a i E201
ρ i = x + b i E202

The equation f−2n−1 = 0 becomes equation g(x) = 0 whose degree is m ≥ 5 if n ≥ 5.

ɡ ( x ) = ± x m + C 1 x m 1 + + C m 1 x + C m 2 n 1 = 0 E203

where coefficients Ci s are functions of ai and bis. According to Abel-Ruffini theorem [11], equation g(x) = 0 has no algebraic solution when m ≥ 5. Thus, abstract variables pi and ρi cannot be eliminated entirely from g(x) = 0, which causes that there is no specification of U-gate inference P(X1xX2x…xXn) so that diagnostic condition is satisfied.

It is concluded that there is no nonlinear X-D network satisfying diagnostic condition, but a new question is raised: does there exist the general linear X-D network satisfying diagnostic condition? Such linear network is called GL-D network and SIGMA-D network is specific case of GL-D network. The GL-gate probability must be linear combination of weights.

P ( X 1 x X 2 x x X n ) = C + i = 1 n α i w i + i = 1 n β i d i E204

where C is arbitrary constant.

The GL-gate inference is singular if αi and βi are functions of only Xi as follows

P ( X 1 x X 2 x x X n ) = C + i = 1 n h i ( X i ) w i + i = 1 n ɡ i ( X i ) d i E205

The functions hi and gi are not relevant to Ai because the final equation of GL-gate inference is only relevant to Xi (s) and weights (s). Because GL-D network is a pattern, we only survey singular GL-gate. Mentioned GL-gate is singular by default and it is dependent on how to define functions hi and gi. The arrangement sum with regard to GL-gate is

s ( Ω ) = a ( C + i = 1 n h i ( X i ) w i + i = 1 n ɡ i ( X i ) d i ) = 2 n C + 2 n 1 i = 1 n ( h i ( X i = 1 ) + h i ( X i = 0 ) ) w i + 2 n 1 i = 1 n ( ɡ i ( X i = 1 ) + ɡ i ( X i = 0 ) ) d i E206

Suppose hi and gi are probability mass functions with regard to Xi. For all i, we have

0 h i ( X i ) 1 E207
0 ɡ i ( X i ) 1 E208
h i ( X i = 1 ) + h i ( X i = 0 ) = 1 E209
ɡ i ( X i = 1 ) + g i ( X i = 0 ) = 1 E210

The arrangement sum becomes

s ( Ω ) = 2 n C + 2 n 1 i = 1 n ( w i + d i ) E211

GL-D network satisfies diagnostic condition if

s ( Ω ) = 2 n C + 2 n 1 i = 1 n ( w i + d i ) = 2 n 1 2 C + i = 1 n ( w i + d i ) = 1 E212

Suppose the set of Xis is complete.

i = 1 n ( w i + d i ) = 1 E213

This implies C = 0. Shortly, Eq. (36) specifies the singular GL-gate inference so that GL-D network satisfies diagnostic condition.

P ( X 1 x X 2 x x X n ) = i = 1 n h i ( X i ) w i + i = 1 n ɡ i ( X i ) d i where  h i  and  ɡ i  are probability mass functions and the set of  X i ( s )  is complete . i = 1 n W i = 1 E36

Functions hi(Xi) and gi(Xi) are always linear due to Xim = Xi for all m ≥ 1 when Xi is binary. It is easy to infer that SIGMA-D network is GL-D network with following definition of functions hi and gi.

h i ( X i ) = 1 ɡ i ( X i ) = X i , i E214

According to Millán and Pérez-de-la-Cruz [4], a hypothesis can have multiple evidences as seen in Figure 7. This is multi-evidence diagnostic relationship opposite to aforementioned multihypothesis diagnostic relationship.

Figure 7 depicts the multi-evidence diagnostic network called M-E-D network in which there are m evidences D1, D2,…, Dm and one hypothesis Y. Note that Y has uniform distribution.

In simplest case where all evidences are binary, the joint probability of M-E-D network is

P ( Y , D 1 , D 2 , , D m ) = P ( Y ) j = 1 m P ( D j | Y ) = P ( Y ) P ( D 1 , D 2 , , D m | Y ) E215

The product j = 1 m P ( D j | Y ) is denoted as likelihood function as follows

P ( D 1 , D 2 , , D m | Y ) = j = 1 m P ( D j | Y ) E216

The posterior probability P(Y | D1, D2,…, Dm) given uniform distribution of Y is

P ( Y | D 1 , D 2 , , D m ) = P ( Y , D 1 , D 2 , , D m ) P ( Y = 1 , D 1 , D 2 , , D m ) + P ( Y = 0 , D 1 , D 2 , , D m ) = 1 j = 1 m P ( D j | Y = 1 ) + j = 1 m P ( D j | Y = 0 ) * P ( D 1 , D 2 , , D m | Y ) E217

The possible transformation coefficient is

1 k = j = 1 m P ( D j | Y = 1 ) + j = 1 m P ( D j | Y = 0 ) E218

M-E-D network will satisfy diagnostic condition if k = 1 because all hypotheses and evidence are binary, which leads that following equation specified by Eq. (37) has 2m real roots P(Dj|Y) for all m ≥ 2.

j = 1 m P ( D j | Y = 1 ) + j = 1 m P ( D j | Y = 0 ) = 1 E37

Eq. (37) has no real root given m = 2 according to following proof. Suppose Eq. (37) has 4 real roots as follows

a 1 = P ( D 1 = 1 | Y = 1 ) E219
a 2 = P ( D 2 = 1 | Y = 1 ) E220
b 1 = P ( D 1 = 1 | Y = 0 ) E221
b 2 = P ( D 2 = 1 | Y = 0 ) E222

From Eq. (37), it holds

{ a 1 a 2 + b 1 b 2 = 1 a 1 ( 1 a 2 ) + b 1 b 2 = 1 ( 1 a 1 ) a 2 + b 1 b 2 = 1 a 1 a 2 + b 1 ( 1 b 2 ) = 1 a 1 a 2 + ( 1 b 1 ) b 2 = 1 { a 1 = a 2 b 1 = b 2 a 1 2 + b 1 2 = 1 a 1 + 2 b 1 2 = 2 b 1 + 2 a 1 2 = 2 { a 1 = a 2 = 0 b 1 = b 2 a 1 2 + b 1 2 = 1 b 1 = 2 or   { a 1 = a 2 = 0.5 b 1 = b 2 a 1 2 + b 1 2 = 1 b 1 = 1.5 E223

The final equation leads a contradiction (b1 = 2 or b1 = 1.5) and so it is impossible to apply the sufficient diagnostic proposition into M-E-D network. Such proposition is only used for one-evidence network. Moreover, X-gate inference absorbs many sources and then produces out of one targeted result whereas the M-E-D network essentially splits one source into many results. It is impossible to model M-E-D network by X-gates. The potential solution for this problem is to group many evidences D1, D2,…, Dm into one representative evidence D which in turn is dependent on hypothesis Y but this solution will be inaccurate in specifying conditional probabilities because directions of dependencies become inconsistent (relationships from Dj to D and from Y to D) except that all Djs are removed and D becomes a vector. However, evidence vector does not simplify the hazardous problem and it changes the current problem into a new problem.

Another solution is to reverse the direction of relationship, in which the hypothesis is dependent on evidences so as to take advantages of X-gate inference as usual. However, the reversion method violates the viewpoint in this research where diagnostic relationship must be from hypothesis to evidence. In other words, we should change the viewpoint.

Another solution is based on a so-called partial diagnostic condition that is a loose case of diagnostic condition for M-E-D network, which is defined as follows

P ( Y | D j ) = k P ( D j | Y ) E224

where k is constant with regard to Dj. The joint probability is

P ( Y , D 1 , D 2 , , D m ) = P ( Y ) j = 1 m P ( D j | Y ) E225

M-E-D network satisfies partial diagnostic condition. In fact, given all variables are binary, we have

P ( Y | D j ) = Ψ \ { Y , D j } P ( Y , D 1 , D 2 , , D m ) Ψ \ { D j } P ( Y , D 1 , D 2 , , D m ) E226

(Let Ψ = {D1, D2,…, Dm})

= P ( D j | Y ) k = 1 , k j m ( D k P ( D k | Y ) ) k = 1 , k j m ( D k P ( D k | Y = 1 ) ) + k = 1 , k j m ( D k P ( D k | Y = 0 ) ) E227

(Due to uniform distribution of Y)

= P ( D j | Y ) k = 1 , k j m 1 k = 1 , k j m 1 + k = 1 , k j m 1 = 1 2 P ( D j | Y ) E228
( Due to   D k P ( D k | Y ) = P ( D k = 0 | Y ) + P ( D k = 1 | Y ) = 1 ) E229

Partial diagnostic condition expresses a different viewpoint. It is not an optimal solution because we cannot test a disease based on only one symptom while ignoring other obvious symptoms, for example. The equality P(Y|Dj) = 0.5P(Dj|Y) indicates the accuracy is decreased two times. However, Bayesian network provides inference mechanism based on personal belief. It is subjective. You can use partial diagnostic condition if you think that such condition is appropriate to your application.

If we are successful in specifying conditional probabilities of M-E-D network, it is possible to define an extended network which is constituted of n hypotheses X1, X2,…, Xn and m evidences D1, D2,…, Dm. Such extended network represents multi-hypothesis multi-evidence diagnostic relationship, called M-HE-D network. Figure 8 depicts M-HE-D network.

The M-HE-D network is the most general case of diagnostic network, which was mentioned in Ref. ([4], p. 297). We can construct any large diagnostic BN from M-HE-D networks and so the research is still open.

## 5. Conclusion

In short, relationship conversion is to determine conditional probabilities based on logic gates that are adhered to semantics of relationships. The weak point of logic gates is to require that all variables must be binary. For example, in learning context, it is inconvenient for expert to create an assessment BN with studying exercises (evidences) whose marks are only 0 and 1. In order to lessen the impact of such weak point, the numeric evidence is used for extending capacity of simple Bayesian network. However, combination of binary hypothesis and numeric evidence leads to errors or biases in inference. For example, given a student gets maximum grade for an exercise but the built-in inference results out that she/he has not mastered fully the associated learning concept (hypothesis). Therefore, I propose the sufficient diagnostic proposition so as to confirm that numeric evidence is adequate to make complicated inference tasks in BN. The probabilistic reasoning based on evidence is always accurate. Application of the research can go beyond learning context whenever probabilistic deduction relevant to constraints of semantic relationships is required. A large BN can be constituted of many simple BN (s). Inference in large BN is hazardous problem and there are many optimal algorithms for solving such problem. In future, I will research effective inference methods for the special BN that is constituted of X-gate BN (s) mentioned in this research because X-gate BN (s) have precise and useful features of which we should take advantages. For instance, their CPT (s) are simple in some cases and the meanings of their relationships are mandatory in many applications. Moreover, I try my best to research deeply M-E-D network and M-HE-D network whose problems I cannot solve absolutely now.

Two main documents that I referred to do this research are the book “Learning Bayesian Networks” [2] by the author Richard E. Neapolitan and the article “A Bayesian Diagnostic Algorithm for Student Modeling and its Evaluation” [4] by authors Eva Millán and José Luis Pérez-de-la-Cruz. Especially, the SIGMA-gate inference is based on and derived from the work of the Eva Millán and José Luis Pérez-de-la-Cruz. This research is originated from my PhD research “A User Modeling System for Adaptive Learning” [12]. Other references relevant to user modeling, overlay model, and Bayesian network are [1316]. Please concern these references.

A1. Following is the proof of Eq. (9)

P ( A i = O N | X i ) = P ( A i = O N | X i , I i = O N ) P ( I i = O N ) + P ( A i = O N | X i , I i = O F F ) P ( I i = O F F ) = 0 * ( 1 p i ) + P ( A i = O N | X i , I i = O F F ) p i ( By applying Eq .   ( 8 ) ) = p i P ( A i = O N | X i , I i = O F F ) E230

It implies

P ( A i = O N | X i = 1 ) = p i P ( A i = O N | X i = 1 , I i = O F F ) = p i E231
P ( A i = O N | X i = 0 ) = p i P ( A i = O N | X i = 0 , I i = O F F ) = 0 E232
P ( A i = O F F | X i = 1 ) = 1 P ( A i = O N | X i = 1 ) = 1 p i E233
P ( A i = O F F | X i = 0 ) = 1 P ( A i = O N | X i = 0 ) = 1   E234

A2. Following is the proof of Eq. (10)

P ( Y | X 1 , X 2 , , X n ) = P ( Y , X 1 , X 2 , , X n ) P ( X 1 , X 2 , , X n ) ( Due to Bayes’ rule ) = A 1 , A 2 , , A n P ( Y , X 1 , X 2 , , X n | A 1 , A 2 , , A n ) * P ( A 1 , A 2 , , A n ) P ( X 1 , X 2 , , X n ) ( Due to total probability rule ) = A 1 , A 2 , , A n P ( Y , X 1 , X 2 , , X n | A 1 , A 2 , , A n ) * P ( A 1 , A 2 , , A n ) P ( X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y | A 1 , A 2 , , A n ) * P ( X 1 , X 2 , , X n | A 1 , A 2 , , A n ) * P ( A 1 , A 2 , , A n ) P ( X 1 , X 2 , , X n ) E235

(Because Y is conditionally independent from Xis given Ais)

= A 1 , A 2 , , A n P ( Y | A 1 , A 2 , , A n ) * P ( X 1 , X 2 , , X n , A 1 , A 2 , , A n ) P ( X 1 , X 2 , , X n ) = A 1 , A 2 , , A n P ( Y | A 1 , A 2 , , A n ) * P ( A 1 , A 2 , , A n | X 1 , X 2 , , X n ) ( Due to Bayes’ rule ) = A 1 , A 2 , , A n P ( Y | A 1 , A 2 , , A n ) i = 1 n P ( A i | X 1 , X 2 , , X n ) E236

(Because Ais are mutually independent)

= A 1 , A 2 , , A n P ( Y | A 1 , A 2 , , A n ) i = 1 n P ( A i | X i ) E237

(Because each Ai is only dependent on Xi) ■

A3. Following is the proof that the augmented X-D network (shown in Figure 5) is equivalent to X-D network (shown in shown in Figures 2 and 3) with regard to variables X1, X2,…, Xn, and D.

The joint probability of augmented X-D network shown in Figure 5 is

P ( X 1 , X 2 , , X n , Y , D ) = P ( D | Y ) P ( Y | X 1 , X 2 , , X n ) i = 1 n P ( X i ) E238

The joint probability of X-D network is

P ( X 1 , X 2 , , X n , D ) = P ( D | X 1 , X 2 , , X n ) i = 1 n P ( X i ) E239

By applying total probability rule into X-D network, we have

P ( X 1 , X 2 , , X n , D ) = P ( D , X 1 , X 2 , , X n ) P ( X 1 , X 2 , , X n ) i = 1 n P ( X i ) ( Due to Bayes’ rule ) = Y P ( D , X 1 , X 2 , , X n | Y ) P ( Y ) P ( X 1 , X 2 , , X n ) i = 1 n P ( X i ) ( Due to total probability rule ) = Y P ( D , X 1 , X 2 , , X n | Y ) P ( Y ) P ( X 1 , X 2 , , X n ) i = 1 n P ( X i ) = ( Y P ( D , X 1 , X 2 , , X n | Y ) * P ( Y ) P ( X 1 , X 2 , , X n ) ) * i = 1 n P ( X i ) = ( Y P ( D | Y ) * P ( X 1 , X 2 , , X n | Y ) P ( Y ) P ( X 1 , X 2 , , X n ) ) * i = 1 n P ( X i ) E240

(Because D is conditionally independent from all Xi (s) given Y)

= ( Y P ( D | Y ) * P ( Y , X 1 , X 2 , , X n ) P ( X 1 , X 2 , , X n ) ) * i = 1 n P ( X i ) = Y P ( D | Y ) P ( Y | X 1 , X 2 , , X n ) i = 1 n P ( X i ) ( Due to Bayes’ rule ) = Y P ( X 1 , X 2 , , X n , Y , D )   E241

A4. Following is the proof of Eq. (29)

Given uniform distribution of Xi (s), we have

P ( X 1 ) = P ( X 2 ) = = P ( X n ) = 1 2 E242

The joint probability becomes

P ( Ω , Y , D ) = 1 2 n P ( Y | X 1 , X 2 , , X n ) P ( D | Y ) E243

The joint probability of Xi and D is

P ( X i , D ) = { Ω , Y , D } \ { X i , D } P ( Ω , Y , D ) = P ( X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 1 , Y = 1 , D ) + P ( X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 0 , Y = 1 , D ) + + P ( X 1 = 0 , X 2 = 0 , , X i , , X n 1 = 0 , X n = 1 , Y = 1 , D ) + P ( X 1 = 0 , X 2 = 0 , , X i , , X n 1 = 0 , X n = 0 , Y = 1 , D ) + P ( X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 1 , Y = 0 , D ) + P ( X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 0 , Y = 0 , D ) + + P ( X 1 = 0 , X 2 = 0 , , X i , , X n 1 = 0 , X n = 1 , Y = 0 , D ) + P ( X 1 = 0 , X 2 = 0 , , X i , , X n 1 = 0 , X n = 0 , Y = 0 , D ) = 1 2 n D S ( P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 1 ) + P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 0 ) + + P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 0 , X n = 1 ) + P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 0 , X n = 0 ) ) + 1 2 n M D S ( P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 1 ) + P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 1 , X n = 0 ) + + P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 0 , X n = 1 ) + P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X i , , X n 1 = 0 , X n = 0 ) ) E244

(Due to Eq. (6))

The marginal probability of D is

P ( D ) = { Ω , Y , D } \ { D } P ( Ω , Y , D ) = P ( X 1 = 1 , X 2 = 1 , , X n = 1 , Y = 1 , D ) + P ( X 1 = 1 , X 2 = 1 , , X n = 0 , Y = 1 , D ) + + P ( X 1 = 0 , X 2 = 0 , , X n = 1 , Y = 1 , D ) + P ( X 1 = 0 , X 2 = 0 , , X n = 0 , Y = 1 , D ) + P ( X 1 = 1 , X 2 = 1 , , X n = 1 , Y = 0 , D ) = 1 2 n D S ( P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X n = 1 ) + P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X n = 0 ) + + P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X n = 1 ) + P ( Y = 1 | X 1 = 1 , X 2 = 1 , , X n = 0 ) ) + 1 2 n M D S ( P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X n = 1 ) + P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X n = 0 ) + + P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X n = 1 ) + P ( Y = 0 | X 1 = 1 , X 2 = 1 , , X n = 0 ) ) + P ( X 1 = 1 , X 2 = 1 , , X n = 0 , Y = 0 , D ) + E245

By applying Table 2, the joint probability P(Xi, D) is determined as follows

P ( X i , D ) = 1 2 n S ( D a P ( Y = 1 | a ( Ω : { X i } ) ) + ( M D ) a P ( Y = 0 | a ( Ω : { X i } ) ) ) = 1 2 n S ( D a P ( Y = 1 | a ( Ω : { X i } ) ) + ( M D ) a ( 1 P ( Y = 1 | a ( Ω : { X i } ) ) ) ) = 1 2 n S ( ( 2 D M ) s ( Ω : { X i } ) + 2 n 1 ( M D ) ) E246

Similarly, the marginal probability P(D) is

P ( D ) = 1 2 n S ( ( 2 D M ) s ( Ω ) + 2 n ( M D ) )   E247

## References

1. 1. Wikipedia. Logic gate. Wikimedia Foundation [Internet]. 2016. [Online]. Available from: https://en.wikipedia.org/wiki/Logic_gate [Accessed June 4, 2016]
2. 2. Neapolitan RE. Learning Bayesian Networks. Upper Saddle River, New Jersey: Prentice Hall; 2003. p. 674
3. 3. Díez FJ, Druzdzel MJ. Canonical Probabilistic Models. Madrid: Research Centre on Intelligent Decision-Support Systems; 2007
4. 4. Millán E, Pérez-de-la-Cruz JL. A bayesian diagnostic algorithm for student modeling and its evaluation. User Modeling and User-Adapted Interaction. 2002;12(2-3):281–330
5. 5. Wikipedia. Factor graph. Wikimedia Foundation [Internet]. 2015. [Online]. Available from: https://en.wikipedia.org/wiki/Factor_graph [Accessed: February 8, 2017]
6. 6. Kschischang FR, Frey BJ, Loeliger HA. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory. 2001;47(2):498–519
7. 7. Pearl J. Fusion, propagation, and structuring in belief networks. Artificial Intelligence. 1986;29(3):241–288
8. 8. Millán E, Loboda T, Pérez-de-la-Cruz JL. Bayesian networks for student model engineering. Computers & Education. 2010:55(4):1663–1683
9. 9. Nguyen L. Theorem of SIGMA-gate inference in Bayesian network. Wulfenia Journal. 2016;23(3):280–289
10. 10. Wikipedia. Set (mathematics), Wikimedia Foundation [Internet]. 2014. [Online]. Available from: http://en.wikipedia.org/wiki/Set_(mathematics) [Accessed: October 11, 2014]
11. 11. Wikipedia. Abel-Ruffini theorem. Wikimedia Foundation [Internet]. 2016. [Online]. Available from: https://en.wikipedia.org/wiki/Abel%E2%80%93Ruffini_theorem [Accessed: June 26, 2016]
12. 12. Nguyen L. A User Modeling System for Adaptive Learning. Abuja, Nigeria: Standard Research Journals; 2014
13. 13. Fröschl C. User modeling and user profiling in adaptive E-learning systems [master thesis]. Graz, Austria: Graz University of Technology; 2005
14. 14. De Bra P, Smits D, Stash N. The Design of AHA!. In Proceedings of the Seventeenth ACM Hypertext Conference on Hypertext and hypermedia (Hypertext ’06); 22-25 August 2006; Odense, Denmark. New York, NY: ACM; 2006. pp. 133–134
15. 15. Murphy KP. A Brief Introduction to Graphical Models and Bayesian Networks. University of British Columbia; 1998. [Online]. Available from: http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html [Accessed: 2008]
16. 16. Heckerman D. A Tutorial on Learning With Bayesian Networks. Redmond: Microsoft Research; 1995

Written By

Loc Nguyen

Submitted: December 6th, 2016 Reviewed: June 8th, 2017 Published: November 2nd, 2017