InTechOpen uses cookies to offer you the best online experience. By continuing to use our site, you agree to our Privacy Policy.

Mathematics » "Bayesian Inference", book edited by Javier Prieto Tejedor, ISBN 978-953-51-3578-4, Print ISBN 978-953-51-3577-7, Published: November 2, 2017 under CC BY 3.0 license. © The Author(s).

# Converting Graphic Relationships into Conditional Probabilities in Bayesian Network

By Loc Nguyen
DOI: 10.5772/intechopen.70057

Article top

## Overview

Figure 1. Diagnosis and prediction with hypothesis X and evidence D.

Figure 2. Simple graph or simple network.

Figure 3. Extended X-gate network with accountable variables Ais.

Figure 4. Bi-weight simple graph.

Figure 5. Augmented X-D network.

Figure 6. Direct SIGMA-gate diagnostic network (direct SIGMA-D network).

Figure 7. Diagnostic relationship with multiple evidences (M-E-D network).

Figure 8. M-HE-D network.

# Converting Graphic Relationships into Conditional Probabilities in Bayesian Network

Loc Nguyen
Show details

## Abstract

Bayesian network (BN) is a powerful mathematical tool for prediction and diagnosis applications. A large Bayesian network can constitute many simple networks, which in turn are constructed from simple graphs. A simple graph consists of one child node and many parent nodes. The strength of each relationship between a child node and a parent node is quantified by a weight and all relationships share the same semantics such as prerequisite, diagnostic, and aggregation. The research focuses on converting graphic relationships into conditional probabilities in order to construct a simple Bayesian network from a graph. Diagnostic relationship is the main research object, in which sufficient diagnostic proposition is proposed for validating diagnostic relationship. Relationship conversion is adhered to logic gates such as AND, OR, and XOR, which are essential features of the research.

Keywords: diagnostic relationship, Bayesian network, transformation coefficient

## 1. Introduction

Bayesian network (BN) is a directed acyclic graph (DAG) consists of a set of nodes and a set of arcs. Each node is a random variable. Each arc represents a relationship between two nodes. The strength of a relationship in a graph can be quantified by a number called weight. There are some important relationships such as prerequisite, diagnostic, and aggregation. The difference between BN and normal graph is that the strength of every relationship in BN is represented by a conditional probability table (CPT) whose entries are conditional probabilities of a child node given parent nodes. There are two main approaches to construct a BN, which are as follows

• The first approach aims to learn BN from training data by learning machine algorithms.

• The second approach is that experts define some graph patterns according to specific relationships and then, BN is constructed based on such patterns along with determined CPTs.

This research focuses on the second approach in which relationships are converted into CPTs. Essentially, relationship conversion aims to determine conditional probabilities based on weights and meanings of relationships. We will have different ways to convert graphic weights into CPTs for different relationships. It is impossible to convert all relationships but some of them such as diagnostic, aggregation, and prerequisite are mandatory ones that we must specify as computable CPTs of BN. Especially, these relationships are adhered to logic X-gates [1] such as AND-gate, OR-gate, and SIGMA-gate. The X-gate inference in this research is derived and inspired from noisy OR-gate described in the book “Learning Bayesian Networks” Neapolitan ([2], pp. 157–159). Díez and Druzdzel [3] also researched OR/MAX, AND/MIN, and noisy XOR inferences but they focused on canonical models, deterministic models, and ICI models whereas I focused on logic gate and graphic relationships. So, their research is different from mine but we share the same result that is AND-gate model. In general, my research focuses on applied probability adhered to Bayesian network, logic gates, and Bayesian user modeling [4]. The scientific results are shared with Millán and Pérez-de-la-Cruz [4].

Factor graph [5] represents factorization of a global function into many partial functions. If joint distribution of BN is considered as the global function and CPTs are considered as partial functions, the sumproduct algorithm [6] of factor graph is applied into calculating posterior probabilities of variables in BN. Pearl’s propagation algorithm [7] is very successful in BN inference. The application of factor graph into BN is only realized if all CPT (s) of BN are already determined whereas this research focuses on defining such CPTs firstly. I did not use factor graph for constructing BN. The concept “X-gate inference” only implies how to convert simple graph into BN. However, the arrange sum with a fixed variable mentioned in this research is the “not-sum” ([6], p. 499) of factor graph. Essentially, X-gate probability shown in Eq. (10) is as same as λ message in the Pearl’s algorithm ([6], p. 518) but I use the most basic way to prove the X-gate probability.

As default, the research is applied in learning context in which BN is used to assess students’ knowledge. Evidences are tests, exams, exercises, etc. and hypotheses are learning concepts, knowledge items, etc. Note that diagnostic relationship is very important to Bayesian evaluation in learning context because it is used to evaluate student’s mastery of concepts (knowledge items) over entire BN. Now, we start relationship conversion with a research on diagnostic relationship in the next section.

## 2. Diagnostic relationship

In some opinions like mine, the diagnostic relationship should be from hypothesis to evidence. For example, disease is hypothesis and symptom is evidence. The symptom must be conditionally dependent on disease. Given a symptom, calculating the posterior probability of disease is essentially to diagnose likelihood of such disease ([8], p. 1666). Inversely, the arc from evidence to hypothesis implies prediction where evidence and hypothesis represent observation and event, respectively. Given an observation, calculating the posterior probability of the event is essentially to predict/assert such event ([8], p. 1666). Figure 1 shows diagnosis and prediction.

#### Figure 1.

Diagnosis and prediction with hypothesis X and evidence D.

The weight w of the relationship between X and D is 1. Figure 1 depicts simplest graph with two random variables. We need to convert diagnostic relationship into conditional probabilities in order to construct a simplest BN from the simplest graph. Note that hypothesis is binary but evidence can be numerical. In learning context, evidence D can be test, exam, exercise, etc. The conditional probability of D given X (likelihood function) is P(D|X). The posterior probability of X is P(X|D), which is used to evaluate student’s mastery over concept (hypothesis) X given evidence D. Eq. (1) specifies CPT of D when D is binary (0 and 1)

 P(D|X)={D if X=11−D if X=0 (1)

Eq (1) is our first relationship conversion. It implies

P(D|X=0)+P(D|X=1)=D+1D=1

Evidence D can be used to diagnose hypothesis X if the so-called sufficient diagnostic proposition is satisfied, as seen in Table 1.

 D is equivalent to X in diagnostic relationship if P(X|D) = kP(D|X) given uniform distribution of X and the transformation coefficient k is independent from D. In other words, k is constant with regards to D and so D is called sufficient evidence.

#### Table 1.

Sufficient diagnostic proposition.

The concept of sufficient evidence is borrowed from the concept of sufficient statistics and it is inspired from equivalence of variables T and T’ in the research ([4], pp. 292-295). The proposition can be restated that evidence D is only used to assess hypotheses if it is sufficient evidence. As a convention, the proposition is called diagnostic condition and hypotheses have uniform distribution. The assumption of hypothetic uniform distribution (P(X = 1) = P(X = 0)) implies that we cannot assert whether or not given hypothesis is true before we observe its evidence.

In learning context, D can be totally used to assess student’s mastery of X if diagnostic condition is satisfied. Derived from such condition, Eq. (2) specifies transformation coefficient k given uniform distribution of X.

 k=P(X|D)P(D|X) (2)

We need to prove that Eq. (1) satisfies diagnostic condition. Suppose the prior probability of X is uniform.

P(X=0)=P(X=1)

we have

P(X|D)=P(D|X)P(X)P(D)=P(D|X)P(X)P(D|X=0)P(X=0)+P(D|X=1)P(X=1)(due to Bayesrule)=P(D|X)P(X)P(X)(P(D|X=0)+P(D|X=1))(due to P(X=0)=P(X=1))=P(D|X)P(D|X=0)+P(D|X=1)=1*P(D|X)(due to P(D|X=0)+P(D|X=1)=1)

It is easy to infer that the transformation coefficient k is 1, if D is binary. In practice, evidence D is often a test whose grade ranges within an interval {0, 1, 2,…, η}. Eq. (3) specifies CPT of D in this case

 P(D|X)={DSif X=1ηS−DSif X=0 (3)

Where

D{0,1,2,,η} S=D=0nD=η(η+1)2

As a convention, P(D|X)=0,D{0,1,2,,η} . Eq. (3) implies that if student has mastered concept (X = 1), the probability that she/he completes the exercise/test D is proportional to her/his mark on D (P(D|X)=DS) . We also have

P(D|X=0)+P(D|X=1)=DS+ηDS=ηS=2(η+1)
D=0ηP(D|X=1)=D=0ηDS=D=0ηDS=SS=1
D=0ηP(D|X=0)=D=0ηηDS=D=0η(ηD)S=D=0ηηD=0ηDS=η(η+1)SS=2SSS=1

We need to prove that Eq. (3) satisfies diagnostic condition. Suppose the prior probability of X is uniform.

P(X=0)=P(X=1)

The assumption of prior uniform distribution of X implies that we do not determine if student has mastered X yet. Similarly, we have

P(X|D)=P(D|X)P(X)P(D)=P(D|X)P(D|X=0)+P(D|X=1)=η+12P(D|X)

So, the transformation coefficient k is η+12 if D ranges in {0, 1, 2,…, η}.

In the most general case, discrete evidence D ranges within an arbitrary integer interval {a,a+1,a+2,,b} . In other words, D is bounded integer variable whose lower bound and upper bound are a and b, respectively. Eq. (4) specifies CPT of D, where D{a,a+1,a+2,,b} .

 P(D|X)={DSif X=1b+aS−DSif X=0 (4)

Where

D{a,a+1,a+2,,b} S=a+(a+1)+(a+2)++b=(b+a)(ba+1)2

Note, P(D|X)=0,D{a,a+1,a+2,,b} . According to the diagnostic condition, we need to prove the equality P(X|D)=kP(D|X) , where

k=ba+12

Similarly, we have

P(X|D)=P(D|X)P(X)P(D)=P(D|X)P(D|X=0)+P(D|X=1)=ba+12P(D|X)

If evidence D is continuous in the real interval [a, b] with note that a and b are real numbers, Eq. (5) specifies probability density function (PDF) of continuous evidence D[a,b] . The PDF p(D|X) replaces CPT in case of continuous random variable.

p(D|X)={2Db2a2if X=12ba2Db2a2if X=0

where

D[a,b] where a and b are real numbers
 S=∫abDdD=b2−a22 (5)

As a convention, [a, b] is called domain of continuous evidence, which can be replaced by open or half-open intervals such as (a, b), (a, b], and [a, b). Of course we have p(D|X)=0,D[a,b] . In learning context, evidence D is often a test whose grade ranges within real interval [a, b].

Functions p(D|X = 1) and p(D|X = 0) are valid PDFs due to

Dp(D|X=1)dD=ab2Db2a2dD=1b2a2ab2DdD=1
Dp(D|X=0)dD=2baabdD1b2a2ab2DdD=1.

According to the diagnostic condition, we need to prove the equality

P(X|D)=kp(D|X)

where,

k=ba2

When D is continuous, its probability is calculated in ε-vicinity where ε is very small number. As usual, ε is bias if D is measure values produced from equipment. The probability of D given X, where D + ε [a, b] and Dε [a, b] is

P(D|X)=DεD+εp(D|X)dD={DεD+ε2Db2a2dD if X=1DεD+ε(2ba2Db2a2)dD if X=0={4εDb2a2if X=14εba4εDb2a2if X=0=2εp(D|X)

In fact, we have

P(X|D)=P(D|X)P(X)P(D|X=0)P(X=0)+P(D|X=1)P(X=1)=P(D|X)P(D|X=0)+P(D|X=1)
(due to Bayes'rule and the assumption P(X=0)=P(X=1))
=ba4εP(D|X)=kp(D|X)

In general, Eq. (6) summarizes CPT of evidence of single diagnostic relationship.

P(D|X)={DS if X=1MSDS if X=0k=N2

Where,

 N={2 if D∈{0,1}η+1 if D∈{0,1,2,…,η}b−a+1 if D∈{a,a+1,a+2,…,b}b−a if D continuous and D∈[a,b]M={1 if D∈{0,1}η if D∈{0,1,2,…,η}b+a if D∈{a,a+1,a+2,…,b}b+a if D continuous and D∈[a,b]S=∑DD=NM2={1 if D∈{0,1}η(η+1)2 if D∈{0,1,2,…,η}(b+a)(b−a+1)2 if D∈{a,a+1,a+2,…,b}b2−a22 if D continuous and D∈[a,b] (6)

In general, if the conditional probability P(D|X) is specified by Eq. (6), the diagnostic condition will be satisfied. Note that the CPT P(D|X) is the PDF p(D|X) in case of continuous evidence. The diagnostic relationship will be extended with more than one hypothesis. The next section will mention how to determine CPTs of a simple graph with one child node and many parent nodes based on X-gate inferences.

## 3. X-gate inferences

Given a simple graph consisting of one child variable Y and n parent variables Xi, as shown in Figure 2, each relationship from Xi to Y is quantified by normalized weight wi where 0 ≤ wi ≤ 1. A large graph is an integration of many simple graphs. Figure 2 shows the DAG of a simple BN. As aforementioned, the essence of constructing simple BN is to convert graphic relationships of simple graph into CPTs of simple BN.

#### Figure 2.

Simple graph or simple network.

Child variable Y is called target and parent variables Xis are called sources. Especially, these relationships are adhered to X-gates such as AND-gate, OR-gate, and SIGMA-gate. These gates are originated from logic gate [1]. For instance, AND-gate and OR-gate represent prerequisite relationship. SIGMA-gate represents aggregation relationship. Therefore, relationship conversion is to determined X-gate inference. The simple graph shown in Figure 2 is also called X-gate graph or X-gate network. Please distinguish the letter “X” in the term “X-gate inference” which implies logic operators (AND, OR, XOR, etc.) from the “variable X”.

All variables are binary and they represent events. The probability P(X) indicates event X occurs. Thus, P(X) implicates P(X = 1) and P(not(X)) implicates P(X = 0). Eq. (7) specifies the simple NOT-gate inference.

 P(not(X))=P(X¯)=P(X=0)=1−P(X=1)=1−P(X)P(not(not(X)))=P(X) (7)

X-gate inference is based on three assumptions mentioned in Ref. ([2], p. 157), which are as follows

• X-gate inhibition: Given a relationship from source Xi to target Y, there is a factor Ii that inhibits Xi from being integrated into Y. Factor Ii is called inhibition of Xi. That the inhibition Ii is turned off is prerequisite of Xi integrated into Y.

• Inhibition independence: Inhibitions are mutually independent. For example, inhibition I1 of X1 is independent from inhibition I2 of X2.

• Accountability: X-gate network is established by accountable variables Ai for Xi and Ii. Each X-gate inference owns particular combination of Ais.

Figure 3 shows the extended X-gate network with accountable variables Ais ([2], p. 158).

#### Figure 3.

Extended X-gate network with accountable variables Ais.

The strength of each relationship from source Xi to target Y is quantified by a weight 0 ≤ wi ≤ 1. According to the assumption of inhibition, probability of Ii = OFF is pi, which is set to be the weight wi.

pi=wi

If notation wi is used, we focus on the strength of relationship. If notation pi is used, we focus on probability of OFF inhibition. In probabilistic inference, pi is also prior probability of Xi = 1. However, we will assume each Xi has uniform distribution later on. Eq. (8) specifies probabilities of inhibitions Iis and accountable variables Ais.

 P(Ii=OFF)=pi=wiP(Ii=ON)=1−pi=1−wiP(Ai=ON|Xi=1,Ii=OFF)=1P(Ai=ON|Xi=1,Ii=ON)=0P(Ai=ON|Xi=0,Ii=OFF)=0P(Ai=ON|Xi=0,Ii=ON)=0P(Ai=OFF|Xi=1,Ii=OFF)=0P(Ai=OFF|Xi=1,Ii=ON)=1P(Ai=OFF|Xi=0,Ii=OFF)=1P(Ai=OFF|Xi=0,Ii=ON)=1 (8)

According to Eq. (8), given probability P(Ai=ON | Xi=1, Ii=OFF), it is assured 100% confident that accountable variables Ai is turned on if source Xi is 1 and inhibition Ii is turned off. Eq. (9) specifies conditional probability of accountable variables Ai (s) given Xi (s), which is corollary of Eq. (8).

 P(Ai=ON|Xi=1)=pi=wiP(Ai=ON|Xi=0)=0P(Ai=OFF|Xi=1)=1−pi=1−wiP(Ai=OFF|Xi=0)=1 (9)

Appendix A1 is the proof of Eq. (9). As a definition, the set of all Xis is complete if and only if

P(X1X2Xn)=P(Ω)=i=1nwi=1

The set of all Xis is mutually exclusive if and only if

XiXj=,ij

For each Xi, there is only one Ai and vice versa, which establishes a bijection between Xis and Ais. Obviously, the fact that the set of all Xis is complete is equivalent to the fact that the set of all Ai (s) is complete. We will prove by contradiction that “the fact that the set of all Xi (s) is mutually exclusive is equivalent to the fact that the set of all Ai (s) is mutually exclusive.” Suppose XiXj=,ij but ij : AiAj=B . Let B1 be preimage of B. Due to B  Ai and  B  Aj , we have B1  Xi and B1  Xj , which causes that XiXj=B1 . There is a contradiction and so we have

XiXj=,ijAiAj=,ij

By similar proof, we have

AiAj=,ijXiXj=,ij

The extended X-gate network shown in Figure 3 is interpretation of simple network shown in Figure 2. Specifying CPT of the simple network is to determine the conditional probability P(Y = 1 | X1, X2,…, Xn) based on extended X-gate network. The X-gate inference is represented by such probability P(Y = 1 | X1, X2,…, Xn) specified by Eq. (10) ([2], p. 159).

 P(Y|X1,X2,…,Xn)=∑A1,A2,…,AnP(Y|A1,A2,…,An)∏i=1nP(Ai|Xi) (10)

Appendix A2 is the proof of Eq. (10). It is necessary to make some mathematical notations because Eq. (10) is complicated, which is relevant to arrangements of Xi (s). Given the set Ω = {X1, X2,…, Xn} where all variables are binary, Table 2 specifies binary arrangements of Ω.

 Given Ω = {X1, X2,…, Xn} where |Ω| = n is cardinality of Ω.Let a(Ω) be an arrangement of Ω which is a set of n instances {X1=x1, X2=x2,…, Xn=xn} where xi is 1 or 0. The number of all a(Ω) is 2|Ω|. For instance, given Ω = {X1, X2}, there are 22=4 arrangements as follows: a(Ω)={X1=1,X2=1},a(Ω)={X1=1,X2=0},a(Ω)={X1=0,X2=1},a(Ω) = {X1=0,X2=0}. Let a(Ω:{Xi}) be the arrangement of Ω with fixed Xi. The number of all a(Ω:{Xi}) is 2|Ω|−1. Similarly, for instance, a(Ω:{X1, X2, X3}) is an arrangement of Ω with fixed X1, X2, X3. The number of all a(Ω:{X1, X2, X3}) is 2|Ω|−3.Let c(Ω) and c(Ω:{Xi}) be the number of arrangements a(Ω) and a(Ω:{Xi}), respectively. Such c(Ω) and c(Ω:{Xi}) are called arrangement counters. As usual, counters c(Ω) and c(Ω:{Xi}) are equal to 2|Ω| and 2|Ω|−1, respectively but they will vary according to specific cases.Let ∑aF(a(Ω)) and ∏aF(a(Ω)) denote sum and product of values generated from function F acting on every a(Ω). The number of arrangements on which F acts is c(Ω).Let x denote the X-gate operator, for instance, x = ⊙ for AND-gate, x = ⊕ for OR-gate, x = not ⊙ for NAND-gate, x = not ⊕ for NOR-gate, x = ⊗ for XOR-gate, x = not ⊗ for XNOR-gate, x = ⊎ for U-gate, x=+ for SIGMA-gate. Given an x-operator, let s(Ω:{Xi}) and s(Ω) be sum of all P(X1xX2x…xXn) through every arrangement of Ω with and without fixed Xi, respectively. s(Ω)=∑aP(X1xX2x…xXn|a(Ω))=∑aP(Y=1|a(Ω))s(Ω:{Xi})=∑aP(X1xX2x…xXn|a(Ω:{Xi}))=∑aP(Y=1|a(Ω:{Xi})) For example, s(Ω) and s(Ω:{Xi}) for OR-gate are: s(Ω)=∑aP(X1⊕X2⊕…⊕Xn|a(Ω))s(Ω:{Xi})=∑aP(X1⊕X2⊕…⊕Xn|a(Ω:{Xi})) Such s(Ω) and s(Ω:{Xi}) are called arrangement sum. They are acting function F.Note that Ω can be any set of binary variables.

#### Table 2.

Binary arrangements.

It is not easy to produce all binary arrangements of Ω. Table 3 shows a code snippet written by Java programming language for producing such all arrangements.

 public class ArrangementGenerator { private ArrayList arrangements; private int n; private int r;  private ArrangementGenerator(int n, int r) {    this.n = n;      this.r = r;      this.arrangements = new ArrayList(); } private void create(int[] a, int i) {     for(int j = 0; j < n; j++) {         a[i] = j;        if(i < r - 1)            create(a, i + 1);        else if(i == r -1) {            int[] b = new int[a.length];            for(int k = 0; k < a.length; k++) b[k] = a[k];            arrangements.add(b);        }     } } public int[] get(int i) {     return arrangements.get(i); } public long size() {     return arrangements.size(); } public static ArrangementGenerator parse(int n, int r) {     ArrangementGenerator arr =         new ArrangementGenerator(n, r);     int[] a = new int[r];     for(int i=0; i

#### Table 3.

Code snippet generating all binary arrangements.

Each element of the list “arrangements” is a binary arrangement a(Ω) presented by an array of bits (0 and 1). The method “create(int[] a, int i)” which is recursive method, is the main one that generates arrangements. The method call “ArrangementGenerator.parse(2, n)” will list all possible binary arrangements.

Eq. (11) specifies the connection between s(Ω:{Xi = 1}) and s(Ω:{Xi = 0}), between c(Ω:{Xi = 1}) and c(Ω:{Xi = 0}).

 s(Ω:{Xi=1})+s(Ω:{Xi=0})=s(Ω)c(Ω:{Xi=1})+c(Ω:{Xi=0})=c(Ω) (11)

It is easy to draw Eq. (11) when the set of all arrangements a(Ω:{Xi = 1) is complement of the set of all arrangements a(Ω:{Xi = 0).

Let K be a set of Xis whose values are 1 and let L be a set of Xis whose values are 0. K and L are mutually complementary. Eq. (12) determines sets K and L.

 {K={i:Xi=1}L={i:Xi=0}K∩​L=∅K∪​L={1,2,…,n} (12)

The AND-gate inference represents prerequisite relationship satisfying AND-gate condition specified by Eq. (13).

 P(Y=1|Ai=OFF for some i)=0 (13)

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)=i=1nP(Ai=ON|Xi)(Due to P(Y=1|Ai=OFF for some i)=0)=(iKP(Ai=ON|Xi=1))(iKP(Ai=ON|Xi=0))=(iKpi)(iK0)={i=1npi if all Xi(s) are 10 if there exists at least one Xi=0

(Due to Eq. (9))

In general, Eq. (14) specifies AND-gate inference.

 P(X1⊙X2⊙…⊙Xn)=P(Y=1|X1,X2,…,Xn)={∏i=1n pi if all Xi(s) are 10 if there exists at least one Xi=0P(Y=0|X1,X2,…,Xn)={1−∏i=1n pi if all Xi(s) are 11 if there exists at least one Xi=0 (14)

The AND-gate inference was also described in ([3], p. 33). Eq. (14) varies according to two cases whose arrangement counters are listed as follows

L=
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.
L
c(Ω:{Xi=1})=2n11,c(Ω:{Xi=0})=2n1,c(Ω)=2n1.

The OR-gate inference represents prerequisite relationship satisfying OR-gate condition specified by Eq. (15) ([2], p. 157).

 P(Y=1|Ai=ON for some i)=1 (15)

The OR-gate condition implies

P(Y=0|Ai=ON for some i)=0

From Eq. (10), we have ([2], p. 159)

P(Y=0|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)=i=1nP(Ai=OFF|Xi)(due to P(Y=1|Ai=ON for some i)=0)=(iKP(Ai=OFF|Xi=1))(iKP(Ai=OFF|Xi=0))=(iK(1pi))(iK1)={iK(1pi)if K1 if K=

(Due to Eq. (9))

In general, Eq. (16) specifies OR-gate inference.

 P(X1⊕X2⊕…⊕Xn)=1−P(Y=0|X1,X2,…,Xn)={1−∏i∈K(1−pi) if K≠∅0 if K=∅P(Y=0|X1,X2,…,Xn)={∏i∈K(1−pi) if K≠∅1 if K=∅ (16)

where K is the set of Xis whose values are 1. The OR-gate inference was mentioned in Refs. ([2], p. 158) and ([3], p. 20). Eq. (16) varies according to two cases whose arrangement counters are listed as follows

K
c(Ω:{Xi=1})=2n1,c(Ω:{Xi=0})=2n11,c(Ω)=2n1.
K=
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.

According to De Morgan’s rule with regard to AND-gate and OR-gate, we have

P(not(X1X2Xn))=P((not(X1))(not(X2))(not(Xn)))={1iL(1(1pi))if L0 if L=

(Due to Eq. (16))

According to Eq. (14), we also have

P(not(X1X2Xn))=P((not(X1))(not(X2))(not(Xn)))={i=1nP(not(Xi)) if all not (Xi)(s) are 10 if there exists at least one not (Xi)=0={i=1n(1pi) if all Xi(s) are 00 if there exists at least one Xi=1

In general, Eq. (17) specifies NAND-gate inference and NOR-gate inference derived from AND-gate and OR-gate

 P(not(X1⊙X2⊙…⊙Xn))={1−∏i∈Lpi if L≠∅0 if L=∅P(not(X1⊕X2⊕…⊕Xn))={∏i=1nqi if K=∅0 if K≠∅ (17)

where K and L are the sets of Xis whose values are 1 and 0, respectively.

Suppose the number of sources Xis is even. Let O be the set of Xis whose indices are odd. Let O1 and O2 be subsets of O, in which all Xis are 1 and 0, respectively. Let E be the set of Xis whose indices are even. Let E1 and E2 be the subsets of E, in which all Xis are 1 and 0, respectively.

{E={2,4,6,,n}E1EE2EE1E2=EE1E2=Xi=1,iE1Xi=0,iE2and {O={1,3,5,,n1}O1OO2OO1O2=OO1O2=Xi=1,iO1Xi=0,iO2

Thus, O1 and E1 are the subsets of K. Sources Xis and target Y follow XOR-gate if one of two XOR-gate conditions specified by Eq. (18) is satisfied.

 P(Y=1|{Ai=ON for i∈OAi=OFF for i ∉ O})=P(Y=1|A1=ON,A2=OFF,…,An−1=ON,An=OFF)=1P(Y=1|{Ai=ON for i∈EAi=OFF for i ∉ E})=P(Y=1|A1=OFF,A2=ON,…,An−1=OFF,An=ON)=1 (18)

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)

If both XOR-gate conditions are not satisfied then,

P(Y=1|X1,X2,,Xn)=0

If the first XOR-gate condition is satisfied, we have

P(Y=1|X1,X2,,Xn)=P(Y=1|A1=ON,A2=OFF,,An1=ON,An=OFF)i=1nP(Ai|Xi)=(iOP(Ai=ON|Xi))(iEP(Ai=OFF|Xi))

We have

iOP(Ai=ON|Xi)=(iO1P(Ai=ON|Xi=1))*(iO2P(Ai=ON|Xi=0))=(iO1pi)*(iO20)={iO1pi if O2=0 if O2

(Due to Eq. (9))

We also have

iEP(Ai=OFF|Xi)=(iE1P(Ai=OFF|Xi=1))*(iE2P(Ai=OFF|Xi=0))=(iE1(1pi))(iE21)={iE1(1pi) if E11  if E1=

(Due to Eq. (9))

Given the first XOR-gate condition, it implies

P(Y=1|X1,X2,,Xn)=(iOP(Ai=ON|Xi))(iEP(Ai=OFF|Xi))={(iO1pi)(iE1(1pi))if O2= and E1 iO1piif O2= and E1=0 if O2

Similarly, given the second XOR-gate condition, we have

P(Y=1|X1,X2,,Xn)=(iEP(Ai=ON|Xi))(iOP(Ai=OFF|Xi))={(iE1pi)(iO1(1pi))if E2= and O1 iE1pi if E2= and O1=0 if E2

If one of XOR-gate conditions is satisfied then,

P(Y=1|X1,X2,,Xn)=(iOP(Ai=ON|Xi))(iEP(Ai=OFF|Xi))+(iEP(Ai=ON|Xi))(iOP(Ai=OFF|Xi))

This implies Eq. (19) to specify XOR-gate inference.

 P(X1⊗X2⊗…⊗Xn)=P(Y=1|X1,X2,…,Xn)={(∏i∈O1pi)(∏i∈E1(1−pi))+(∏i∈E1pi)(∏i∈O1(1−pi)) if O2=∅ and E2=∅(∏i∈O1pi)(∏i∈E1(1−pi)) if O2=∅ and E1≠∅ and E2≠∅∏i∈O1pi if O2=∅ and E1=∅(∏i∈E1pi)(∏i∈O1(1−pi)) if E2=∅ and O1≠∅ and O2≠∅∏i∈E1pi if E2=∅ and O1=∅0 if O2≠∅ and E2≠∅0 if n<2 or n is oddwhere{O={1,3,5,…,n−1}O1 ⊆ OO2 ⊆ OO1∪​O2=OO1∩​O2=∅Xi=1,∀i∈O1Xi=0,∀i∈O2and {E={2,4,6,…,n}E1 ⊆ EE2 ⊆ EE1∪​E2=EE1∩​E2=∅Xi=1,∀i∈E1Xi=0,∀i∈E2 (19)

Where,

Given n ≥ 2 and n is even, Eq. (19) varies according to six cases whose arrangement counters are listed as follows

O2= and E2=
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.
O2= and E1 and E2
c(Ω:{Xi=1})=2n22,c(Ω:{Xi=0})=0,c(Ω)=2n22.
O2= and E1=
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.
E2= and O1 and O2
c(Ω:{Xi=1})=2n211,c(Ω:{Xi=0})=2n211,c(Ω)=2n22.
E2= and O1=
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.
O2 and E2
c(Ω:{Xi=1})=(2n211)(2n21),c(Ω:{Xi=0})=2n21(2n21),c(Ω)=(2n21)2.

Suppose the number of sources Xis is even. According to XNOR-gate inference [1], the output is on if all inputs get the same value 1 (or 0). Sources Xi (s) and target Y follow XNOR-gate if one of two XNOR-gate conditions specified by Eq. (20) is satisfied.

 P(Y=1|Ai=ON,∀i)=1P(Y=1|Ai=OFF,∀i)=1 (20)

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)

If both XNOR-gate conditions are not satisfied then,

P(Y=1|X1,X2,,Xn)=0

If Ai = ON for all i, we have

P(Y=1|X1,X2,,Xn)=P(Y=1|Ai=ON,i)i=1nP(Ai=ON|Xi)=i=1nP(Ai=ON|Xi)={i=1npiif L=0 if L

(Please see similar proof in AND-gate inference)

If Ai = OFF for all i, we have

P(Y=1|X1,X2,,Xn)=i=1nP(Ai=OFF|Xi)={iK(1pi) if K1 if K=

(Please see similar proof in OR-gate inference)

If one of XNOR-gate conditions is satisfied then,

P(Y=1|X1,X2,,Xn)=i=1nP(Ai=ON|Xi)+i=1nP(Ai=OFF|Xi)

This implies Eq. (21) to specify XNOR-gate inference.

 P(not(X1⊗X2⊗…⊗Xn))=P(Y=1|X1,X2,…,Xn)={∏i=1npi+∏i=1n(1−pi) if L=∅∏i∈K(1−pi) if L≠∅ and K≠∅1 if L≠∅ and K=∅ (21)

where K and L are the sets of Xis whose values are 1 and 0, respectively. Eq. (21) varies according to three cases whose arrangement counters are listed as follows

L=
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.
L and K
c(Ω:{Xi=1})=2n11,c(Ω:{Xi=0})=2n11,c(Ω)=2n2.
L and K=
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.

Let U be a set of indices such that Ai = ON and let α ≥ 0 and β ≥ 0 be predefined numbers. The U-gate inference is defined based on α, β and cardinality of U. Table 4 specifies four common U-gate conditions.

 |U|=α P(Y=1|A1,A2,…,An)=1 if there are exactly α variables Ai = ON (s). Otherwise, P(Y=1|A1,A2,…,An)=0 . |U|≥α P(Y=1|A1,A2,…,An)=1 if there are at least α variables Ai = ON (s). Otherwise, P(Y=1|A1,A2,…,An)=0 . |U|≤β P(Y=1|A1,A2,…,An)=1 if there are at most β variables Ai = ON (s). Otherwise, P(Y=1|A1,A2,…,An)=0 . α≤|U|≤β P(Y=1|A1,A2,…,An)=1 if the number of Ai = ON (s) is from α to β. Otherwise, P(Y=1|A1,A2,…,An)=0 .

#### Table 4.

U-gate conditions.

Note that U-gate condition on |U| can be arbitrary and it is only relevant to Ais (ON or OFF) and the way to combine Ais. For example, AND-gate and OR-gate are specific cases of U-gate with |U| = n and |U| ≥ 1, respectively. XOR-gate and XNOR-gate are also specific cases of U-gate with specific conditions on Ai (s). However, it must be assured that there is at least one combination of Ais satisfying the predefined U-gate condition, which causes that U-gate probability is not always equal to 0. In this research, U-gate is the most general nonlinear gate where U-gate probability contains products of weights (see Table 5). Later on, we will research a so-called SIGMA-gate that contains only linear combination of weights (sum of weights, see Eq. (23)). Shortly, each X-gate is a pattern owning a particular X-gate inference that is X-gate probability P(X1 × X2 ×…× Xn). Each X-gate inference is based on particular X-gate condition(s) relevant to only variables Ais.

 Let, SU=∑U∈U ∏i∈Upi∏j∈K\U(1−pj) PU=P(X1⊎X2⊎…⊎Xn)=P(Y=1|X1,X2,…,Xn) As a convention, ∏i∈Upi=1 if|U|=0 ∏j∈K\U(1−pj)=1 if|U|=|K| |U|=0 PU={∏j=1n(1−pj) if |K|>01  if |K|=0 |U|=1 |U|≥0 PU={SU if |K|>01 if |K|=0 |U|=2|K| The case |U|≥0 is the same to the case |U|≤n |U|=n PU={∏i=1npiif |K|=n0 if |K|01 if |K|=0 |U|={∑j=0min(β,|K|)(|K|j) if |K|>01 if |K|=0 α≤|U|≤β0<α

#### Table 5.

U-gate inference.

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)

Let U be the set of all possible U (s), we have

P(Y=1|X1,X2,,Xn)=UUP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)=UUiUP(Ai=ON|Xi)jUP(Aj=OFF|Xj)

If Xi=0,iU then,

P(Y=1|X1,X2,,Xn)=UUiU0jUP(Aj=OFF|Xj)=0

This implies all sets U (s) must be subsets of K. The U-gate probability is rewritten as follows

P(Y=1|X1,X2,,Xn)=UU iUP(Ai=ON|Xi=1)jUP(Aj=OFF|Xj)=UU iUpijUP(Aj=OFF|Xj)=UU iUpijK\UP(Aj=OFF|Xj=1)jKP(Aj=OFF|Xj=0)=UU iUpijK\U(1pj)jK1=UU iUpijK\U(1pj)

(Due to Eq. (9))

Let PU be the U-gate probability; Table 5 specifies U-gate inference and cardinality of U where U is the set of subsets (U) of K.

Note that the notation (nj) denotes the number of combinations of j elements taken from n elements.

(nj)=n!j!(nj)!

Arrangement counters relevant to U-gate inference and the set K are listed as follows

|K|=0
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.
|K|=1
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.
|K|=α and α>0
c(Ω:{Xi=1})=(n1α1),c(Ω:{Xi=0})=(n1α),c(Ω)=(nα).
|K|α and α>0
c(Ω:{Xi=1})=j=1α(n1j1),c(Ω:{Xi=0})=j=0α(n1j),c(Ω)=j=0α(nj).
|K|α and α>0
c(Ω:{Xi=1})=j=αn(n1j1),c(Ω:{Xi=0})=j=αn1(n1j),c(Ω)=j=αn(nj).

The SIGMA-gate inference [9] represents aggregation relationship satisfying SIGMA-gate condition specified by Eq. (22).

P(Y)=P(i=1nAi)

where the set of Ai is complete and mutually exclusive

 ∑i=1nwi=1Ai∩​​​Aj=∅,∀i≠j (22)

The sigma sum i=1nAi indicates that Y is exclusive union of Ais and here, it does not express arithmetical additions.

Y=i=1nAi=i=1nAi

This implies

P(Y)=P(i=1nAi)=P(i=1nAi)=i=1nP(Ai)

The sigma sum i=1nP(Ai) now expresses arithmetical additions of probabilities P(Ai).

SIGMA-gate inference requires the set of Ais is complete and mutually exclusive, which means that the set of Xis is complete and mutually exclusive too. The SIGMA-gate probability is [9]

P(Y|X1,X2,,Xn)=P(i=1nAi|X1,X2,,Xn)(due to SIGMAgate condition)=i=1nP(Ai|X1,X2,,Xn)(because Ai(s) are mutually exclusive)=i=1nP(Ai|Xi)(because Ai is only dependent on Xi)

It implies

P(Y=1|X1,X2,,Xn)=i=1nP(Ai=ON|Xi)=(iKP(Ai=ON|Xi=1))+(iKP(Ai=ON|Xi=0))=iKwi+iK0=iKwi

(Due to Eq. (9))

In general, Eq. (23) specifies the theorem of SIGMA-gate inference [9]. The base of this theorem was mentioned by Millán and Pérez-de-la-Cruz ([4], pp. 292-295).

P(X1+X2++Xn)=P(i=1nXi)=P(Y=1|X1,X2,,Xn)=iKwiP(Y=0|X1,X2,,Xn)=1iKwi=iLwi

where the set of Xis is complete and mutually exclusive.

 ∑i=1nwi=1Xi∩​​Xj=∅,∀i≠j (23)

The arrangement counters of SIGMA-gate inference are c(Ω:{Xi = 1}) = c(Ω:{Xi = 0}) = 2n−1, c(Ω) = 2n.

Eq. (9) specifies the “clockwise” strength of relationship between Xi and Y. Event Xi = 1 causes event Ai = ON with “clockwise” weight wi. There is a question “given Xi = 0, how likely the event Ai = OFF is”. In order to solve this problem, I define a so-called “counterclockwise” strength of relationship between Xi and Y denoted ωi. Event Xi = 0 causes event Ai = OFF with “counterclockwise” weight ωi. In other words, each arc in simple graph is associated with a clockwise weight wi and a counterclockwise weight ωi. Such graph is called bi-weight simple graph shown in Figure 4.

#### Figure 4.

Bi-weight simple graph.

With bi-weight simple graph, all X-gate inferences are extended as so-called X-gate bi-inferences. Derived from Eq. (9), Eq. (24) specifies conditional probability of accountable variables with regard to bi-weight graph.

 P(Ai=ON|Xi=1)=pi=wiP(Ai=ON|Xi=0)=1−ρi=1−ωiP(Ai=OFF|Xi=1)=1−pi=1−wiP(Ai=OFF|Xi=0)=ρi=ωi (24)

The probabilities P(Ai = ON | Xi = 0) and P(Ai = OFF | Xi = 1) are called clockwise adder di and counterclockwise adder δi. As usual, di and δi are smaller than wi and ωi. When di = 0, bi-weight graph becomes normal simple graph.

di=P(Ai=ON|Xi=0)=1ρi=1ωiδi=P(Ai=OFF|Xi=1)=1pi=1wi

The total clockwise weight or total counterclockwise weight is defined as sum of clockwise weight and clockwise adder or sum of counterclockwise weight and counterclockwise adder. Eq. (25) specifies such total weights Wi and Wi . These weights are also called relationship powers.

Wi=wi+diWi=ωi+δi

where

 di=1−ρi=1−ωiδi=1−pi=1−wi (25)

Given Eq. (25), the set of all Ais is complete if and only if i=1nwi=1 .

By extending aforementioned X-gate inferences, we get bi-inferences for AND-gate, OR-gate, NAND-gate, NOR-gate, XOR-gate, XNOR-gate, and U-gate as shown in Table 6.

The largest cardinalities of K (L) are 2n−1 and 2n with and without fixed Xi. Thus, it is possible to calculate arrangement counters. As a convention, the product of probabilities is 1 if indices set is empty.

iIfi=1 if I=

With regard to SIGMA-gate bi-inference, the sum of all total clockwise weights must be 1 as follows

i=1nWi=i=1n(wi+di)=i=1n(wi+1ωi)=1

Derived from Eq. (23), the SIGMA-gate probability for bi-weight graph is

P(X1+X2++Xn)=i=1nP(Ai=ON|Xi)=iKP(Ai=ON|Xi=1)+iLP(Ai=ON|Xi=0)=iKwi+iLdi

Shortly, Eq. (26) specifies SIGMA-gate bi-inference.

P(X1+X2++Xn)=iKwi+iLdi

where the set of Xi(s) is complete and mutually exclusive.

 ∑i=1nWi=1Xi∩​Xj=∅,∀i≠j (26)

The next section will research diagnostic relationship which adheres to X-gate inference.

## 4. Multihypothesis diagnostic relationship

Given a simple graph shown in Figure 2, if we replace the target source Y by an evidence D, we get a so-called multihypothesis diagnostic relationship whose property adheres to X-gate inference. Maybe there are other diagnostic relationships in which X-gate inference is not concerned. However, this research focuses on X-gate inference and so multi-hypothesis diagnostic relationship is called X-gate diagnostic relationship. Sources X1, X2,…, Xn become hypotheses. As a convention, these hypotheses have prior uniform distribution.

According to aforementioned X-gate network shown in Figures 2 and 3, the target variable must be binary whereas evidence D can be numeric. It is impossible to establish the evidence D as direct target variable. Thus, the solution of this problem is to add an augmented target binary variable Y and then, the evidence D is connected directly to Y. In other words, the X-gate diagnostic network have n sources {X1, X2,…, Xn}, one augmented hypothesis Y, and one evidence D. As a convention, X-gate diagnostic network is called X-D network. The CPTs of the entire network are determined based on combination of diagnostic relationship and X-gate inference mentioned in previous sections. Figure 5 depicts the augmented X-D network. Note that variables X1, X2,…, Xn, and Y are always binary.

#### Figure 5.

Augmented X-D network.

Appendix A3 is the proof that the augmented X-D network is equivalent to X-D network with regard to variables X1, X2,…, Xn and D. As a convention, augmented X-D network is considered as same as X-D network.

The simplest case of X-D network is NOT-D network having one hypothesis X1 and one evidence D, equipped with NOT-gate inference. NOT-D network satisfies diagnostic condition because it essentially represents the single diagnostic relationship. Inferred from Eqs. (1) and (7), the conditional probability P(D|X1) and posterior probability P(X1|D) of NOT-D network are

P(D|X1)={1D if X1=1D if X1=0
P(X1|D)=P(D|X1)P(X1)P(X1)(P(D|X1=0)+P(D|X1=1))

(Due to Bayes’ rule and uniform distribution of X1)

=P(D|X1)P(D|X1=0)+P(D|X1=1)=1*P(D|X1)
(due to P(D|X1=0)+P(D|X1=1)=1)

It implies NOT-D network satisfies diagnostic condition. Let

Ω={X1,X2,,Xn}n=|Ω|

We will validate whether the CPT of diagnostic relationship, P(D|X) specified by Eq. (6), still satisfies diagnostic condition within general case, X-D network. In other words, X-D network is general case of single diagnostic relationship.

Recall from dependencies shown in Figure 5, Eq. (27) specifies the joint probability of X-D network.

 P(Ω,Y,D)=P(X1,X2,…,Xn,Y,D)=P(D|Y)P(Y|X1,X2,…,Xn)∏i=1nP(Xi)where Ω ={X1, X2,…, Xn}. (27)

Eq. (28) specifies the conditional probability of D given Xi (likelihood function) and the posterior probability of Xi given D.

 P(D|Xi)=P(Xi,D)P(Xi)=∑{Ω,Y,D}\{Xi,D}P(Ω,Y,D)∑{Ω,Y,D}\{Xi}P(Ω,Y,D)P(Xi|D)=P(Xi,D)P(D)=∑{Ω,Y,D}\{Xi,D}P(Ω,Y,D)∑{Ω,Y,D}\{D}P(Ω,Y,D) (28)

where Ω = {X1, X2,…, Xn} and the sign “\” denotes the subtraction (excluding) operator in set theory [10]. Eq. (29) specifies the joint probability P(Xi, D) and the marginal probability P(D) given uniform distribution of all sources. Appendix A4 is the proof of Eq. (29).

 P(Xi,D)=12nS((2D−M)s(Ω:{Xi})+2n−1(M−D))P(D)=12nS((2D−M)s(Ω)+2n(M−D)) (29)

where s(Ω) and s(Ω:{Xi}) are specified in Table 2. From Eqs. (2830) specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient for X-gate inference.

 P(D|Xi=1)=P(Xi=1,D)P(Xi=1)=(2D−M)s(Ω:{Xi=1})+2n−1(M−D)2n−1SP(D|Xi=0)=P(Xi=0,D)P(Xi=0)=(2D−M)s(Ω:{Xi=0})+2n−1(M−D)2n−1SP(Xi=1|D)=P(Xi=1,D)P(D)=(2D−M)s(Ω:{Xi=1})+2n−1(M−D)(2D−M)s(Ω)+2n(M−D)P(Xi=0|D)=1−P(Xi=1|D)=(2D−M)s(Ω:{Xi=0})+2n−1(M−D)(2D−M)s(Ω)+2n(M−D)k=P(Xi|D)P(D|Xi)=2n−1S(2D−M)s(Ω)+2n(M−D) (30)

The transformation coefficient is rewritten as follows

k=2n1S2D(s(Ω)2n1)+M(2ns(Ω))

Note that S, D, and M are abstract symbols and there is no proportional connection between 2n−1S and D for all D, specified by Eq. (6). Assuming that such proportional connection 2n−1S = aDj exists for all D where a is arbitrary constant. Given binary case when D = 0 and S = 1, we have

There is a contradiction, which implies that it is impossible to reduce k into the following form

Therefore, if k is constant with regard to D then,

2D(s(Ω)2n1)+M(2ns(Ω))=C0,D

where C is constant. We have

D(2D(s(Ω)2n1)+M(2ns(Ω)))=DC2S(s(Ω)2n1)+NM(2ns(Ω))=NC2nS=NC

It is implied that

k=2n1S2D(s(Ω)2n1)+M(2ns(Ω))=NC2C=N2

This holds

2nS=N(2D(s(Ω)2n1)+M(2ns(Ω)))=2ND(s(Ω)2n1)+2S(2ns(Ω))2ND(s(Ω)2n1)2S(s(Ω)2n1)=0(NDS)(s(Ω)2n1)=0

Assuming ND = S we have

ND=S=2NMD=2M

There is a contradiction because M is maximum value of D. Therefore, if k is constant with regard to D then s(Ω) = 2n−1. Inversely, if s(Ω) = 2n−1 then k is

k=2n1S2D(2n12n1)+M(2n2n1)=N2

In general, the event that k is constant with regard to D is equivalent to the event s(Ω) = 2n−1. This implies diagnostic theorem stated in Table 7.

 P(X1⊙X2⊙…⊙Xn)=∏i∈Kpi∏i∈Ldi P(X1⊕X2⊕…⊕Xn)=1−∏i∈Kδi∏i∈Lρi P(not(X1⊙X2⊙…⊙Xn))=1−∏i∈Lρi∏i∈Kδi P(not(X1⊕X2⊕…⊕Xn))=∏i∈Ldi∏i∈Kpi P(X1⊗X2⊗…⊗Xn)=∏i∈O1pi∏i∈O2di∏i∈E1δi∏i∈E2ρi+∏i∈E1pi∏i∈E2di∏i∈O1δi∏i∈O2ρi P(not(X1⊗X2⊗…⊗Xn))=∏i∈Kpi∏i∈Ldi+∏i∈Kδi∏i∈Lρi P(X1⊎X2⊎…⊎Xn)=∑U∈U(∏i∈U∩​Kpi∏i∈U∩​Ldi)(∏i∈U¯∩​Kδi∏i∈U¯∩​Lρi) There are four common conditions of U: |U|=α, |U|≥α, |U|≤β, and α≤|U|≤β. Note that U¯ is the complement of U, U¯={1,2,…,n}\U The largest cardinality of U is: |U|=2n

#### Table 6.

Bi-inferences for AND-gate, OR-gate, NAND-gate, NOR-gate, XOR-gate, XNOR-gate, and U-gate.

 Given X-D network is combination of diagnostic relationship and X-gate inference: P(Y=1|X1,X2,…,Xn)=P(X1xX2x…xXn) P(D|Y)={DSif Y=1MS−DSif Y=0 The diagnostic condition of X-D network is satisfied if and only if s(Ω)=∑aP(Y=1|a(Ω))=2|Ω|−1,∀Ω≠∅ At that time, the transformation coefficient becomes: k=N2 Note that weights pi = wi and ρi = ωi, which are inputs of s(Ω), are abstract variables. Thus, the equality s(Ω) = 2|Ω|−1 implies all abstract variables are removed and so s(Ω) does not depend on weights.

#### Table 7.

Diagnostic theorem.

The diagnostic theorem is the optimal way to validate the diagnostic condition.

The Eq. (30) becomes simple with AND-gate inference. Recall that Eq. (14) specified AND-gate inference as follows

P(X1X2Xn)=P(Y=1|X1,X2,,Xn)={i=1n pi if all Xi(s) are 10 if there exists at least one Xi=0

Due to only one case X1 = X2 =…= Xn = 1, we have

s(Ω)=s(Ω:{Xi=1})=i=1n pi

Due to Xi = 0, we have

s(Ω:{Xi=0})=0

Derived from Eq. (30), Eq. (31) specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient according to X-D network with AND-gate reference called AND-D network.

 P(D|Xi=1)=(2D−M)∏i=1npi+2n−1(M−D)2n−1SP(D|Xi=0)=M−DSP(Xi=1|D)=(2D−M)∏i=1npi+2n−1(M−D)(2D−M)∏i=1npi+2n(M−D)P(Xi=0|D)=2n−1(M−D)(2D−M)∏i=1npi+2n(M−D)k=2n−1S(2D−M)∏i=1npi+2n(M−D) (31)

For convenience, we validate diagnostic condition with a case of two sources Ω = {X1, X2}, p1 = p2 = w1 = w2 = 0.5, D{0,1,2,3} . According to diagnostic theorem stated in Table 7, if s(Ω) ≠ 2 for given X-gate then, such X-gate does not satisfy diagnostic condition.

Given AND-gate inference, by applying Eq. (14), we have

s(Ω)=(0.5*0.5)+0+0+0=0.25

Given OR-gate inference, by applying Eq. (16), we have

s(Ω)=(10.5*0.5)+(10.5)+(10.5)+0=33*0.5*0.5=1.75

Given XOR-gate inference, by applying Eq. (19), we have

s(Ω)=(0.5*0.5+0.5*0.5)+0.5+0.5+0=1.5

Given XNOR-gate inference, by applying Eq. (21), we have

s(Ω)=(0.5*0.5+0.5*0.5)+0.5+0.5+1=2.5

Given SIGMA-gate inference, by applying Eq. (23), we have

s(Ω)=(0.5+0.5)+0.5+0.5+0=2

It is asserted that AND-gate, OR-gate, XOR-gate, and XNOR-gate do not satisfy diagnostic condition and so they should not be used to assess hypotheses. However, it is not asserted if U-gate and SIGMA-gate satisfy such diagnostic condition. It is necessary to expend equation for SIGMA-gate diagnostic network (called SIGMA-D network) in order to validate it.

In case of SIGMA-gate inference, by applying Eq. (23), we have

iwi=1
s(Ω)=2n1iwi=2n1
s(Ω:{Xi=1})=2n1wi+2n2jiwj=2n1wi+2n2(1wi)=2n2(1+wi)
s(Ω:{Xi=0})=s(Ω)s(Ω:{Xi=1})=2n2(1wi)

It is necessary to validate SIGMA-D network with SIGMA-gate bi-inference. By applying Eq. (26), we recalculate these quantities as follows

s(Ω)=2n1iwi+2n1idi=2n1i(wi+di)=2n1
(due toi(wi+di)=1)
s(Ω:{Xi=1})=2n1wi+2n2jiwj+2n2idi=2n2wi+2n2i(wi+di)=2n2(1+wi)
s(Ω:{Xi=0})=s(Ω)s(Ω:{Xi=1})=2n2(1wi)

Obviously, quantities s(Ω), s(Ω:{Xi=1}), and s(Ω:{Xi = 0}) are kept intact. According to diagnostic theorem, we conclude that SIGMA-D network does satisfy diagnostic condition due to s(Ω)=2n−1. Thus, SIGMA-D network can be used to assess hypotheses.

Eq. (32), an immediate consequence of Eq. (30), specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient for SIGMA-D network.

 P(D|Xi=1)=(2D−M)wi+M2SP(D|Xi=0)=(M−2D)wi+M2SP(Xi=1|D)=(2D−M)wi+M2MP(Xi=0|D)=(M−2D)wi+M2Mk=N2 (32)

In case of SIGMA-gate, the augmented variable Y can be removed from X-D network. The evidence D is now established as direct target variable. Figure 6 shows a so-called direct SIGMA-gate diagnostic network (direct SIGMA-D network).

#### Figure 6.

Direct SIGMA-gate diagnostic network (direct SIGMA-D network).

Derived from Eq. (23), the CPT of direct SIGMA-D network is determined by Eq. (33).

P(D|X1,X2,,Xn)=iKDSwi+jLMDSwj

where the set of Xi (s) is complete and mutually exclusive.

 ∑i=1nwi=1Xi∩​Xj=∅,∀i≠j (33)

Eq. (33) specifies valid CPT due to

DP(D|X1,X2,,Xn)=1SiKwiDD+1SjLwjD(MD)=1SiKSwi+1SjLwj(NMS)=1SiKSwi+1SjLSwj=i=1nwi=1

From dependencies shown in Figure 6, Eq. (34) specifies the joint probability of direct SIGMA-D network.

 P(X1,X2,…,Xn,Y,D)=P(D|X1,X2,…,Xn)∏ni=1P(Xi) (34)

Inferred from Eq. (29), Eq. (35) specifies the joint probability P(Xi, D) and the marginal probability P(D) of direct SIGMA-D network, given uniform distribution of all sources.

 P(Xi,D)=12ns(Ω:{Xi})P(D)=12ns(Ω) (35)

where s(Ω) and s(Ω:{Xi}) are specified in Table 2.

By browsing all variables of direct SIGMA-D network, we have

s(Ω:{Xi=1})=2n1DSwi+2n2jiDSwj+2n2jiMDSwj=2n2S(2Dwi+Mjiwj)=2n2S(2Dwi+M(1wi))(Due to i=1nwi=1)=2n2S((2DM)wi+M)

Similarly, we have

s(Ω:{Xi=0})=2n1MDSwi+2n2jiMDSwj+2n2jiDSwj=2n2S((M2D)wi+M)
s(Ω)=2n1iDSwi+2n1iMDSwi=2n1MS

By applying Eq. (35), s(Ω:{Xi = 0}), s(Ω:{Xi = 1}), and s(Ω), we get the same result with Eq. (32).

P(D|Xi=1)=(2DM)wi+M2S
P(D|Xi=0)=(M2D)wi+M2S
P(Xi=1|D)=(2DM)wi+M2M
P(Xi=0|D)=(M2D)wi+M2M
k=N2

Therefore, it is possible to use direct SIGMA-D network to assess hypotheses. It is asserted that SIGMA-D network satisfy diagnostic condition when single relationship, NOT-D network, direct SIGMA-D network are specific cases of SIGMA-D network. There is a question: does an X-D network that is different from SIGMA-D network and not aforementioned exist such that it satisfies diagnostic condition?

Recall that each X-D network is a pattern owning a particular X-gate inference which in turn is based on particular X-gate condition (s) relevant to only variables Ais. The most general nonlinear X-D network is U-D network whereas SIGMA-D network is linear one. The U-gate inference given arbitrary condition on U is

P(X1X2Xn)=UU(iUKpiiUL(1ρi))(iU¯K(1pi)iU¯Lρi)

Let f be the arrangement sum of U-gate inference.

f(pi,ρi)=a(Ω) UU(iUKpiiUL(1ρi))(iU¯K(1pi)iU¯Lρi)

The function f is sum of many large expressions and each expression is product of four possible sub-products (Π) as follows

Expr=iUKpiiUL(1ρi)iU¯K(1pi)iU¯Lρi

In any case of degradation, there always exist expression Expr (s) having at least 2 sub-products (Π), for example,

Expr=iUKpiiUL(1ρi)

Consequently, there always exist Expr (s) having at least 5 terms relevant to pi and ρi if n ≥ 5, for example,

Expr=p1p2p3(1ρ4)(1ρ5)

Thus, degree of f will be larger than or equal to 5 given n ≥ 5. According to diagnostic theorem, U-gate network satisfies diagnostic condition if and only if f(pi, ρi) = 2n−1 for all n ≥ 1 and for all abstract variables pi and ρi. Without loss of generality, each pi or ρi is sum of variable x and a variable ai or bi, respectively. Note that all pi, ρi, ai are bi are abstract variables.

pi=x+ai
ρi=x+bi

The equation f−2n−1 = 0 becomes equation g(x) = 0 whose degree is m ≥ 5 if n ≥ 5.

ɡ(x)=±xm+C1xm1++Cm1x+Cm2n1=0

where coefficients Ci s are functions of ai and bis. According to Abel-Ruffini theorem [11], equation g(x) = 0 has no algebraic solution when m ≥ 5. Thus, abstract variables pi and ρi cannot be eliminated entirely from g(x) = 0, which causes that there is no specification of U-gate inference P(X1xX2x…xXn) so that diagnostic condition is satisfied.

It is concluded that there is no nonlinear X-D network satisfying diagnostic condition, but a new question is raised: does there exist the general linear X-D network satisfying diagnostic condition? Such linear network is called GL-D network and SIGMA-D network is specific case of GL-D network. The GL-gate probability must be linear combination of weights.

P(X1xX2xxXn)=C+i=1nαiwi+i=1nβidi

where C is arbitrary constant.

The GL-gate inference is singular if αi and βi are functions of only Xi as follows

P(X1xX2xxXn)=C+i=1nhi(Xi)wi+i=1nɡi(Xi)di

The functions hi and gi are not relevant to Ai because the final equation of GL-gate inference is only relevant to Xi (s) and weights (s). Because GL-D network is a pattern, we only survey singular GL-gate. Mentioned GL-gate is singular by default and it is dependent on how to define functions hi and gi. The arrangement sum with regard to GL-gate is

s(Ω)=a(C+i=1nhi(Xi)wi+i=1nɡi(Xi)di)=2nC+2n1i=1n(hi(Xi=1)+hi(Xi=0))wi+2n1i=1n(ɡi(Xi=1)+ɡi(Xi=0))di

Suppose hi and gi are probability mass functions with regard to Xi. For all i, we have

0hi(Xi)1
0ɡi(Xi)1
hi(Xi=1)+hi(Xi=0)=1
ɡi(Xi=1)+gi(Xi=0)=1

The arrangement sum becomes

s(Ω)=2nC+2n1i=1n(wi+di)

GL-D network satisfies diagnostic condition if

s(Ω)=2nC+2n1i=1n(wi+di)=2n12C+i=1n(wi+di)=1

Suppose the set of Xis is complete.

i=1n(wi+di)=1

This implies C = 0. Shortly, Eq. (36) specifies the singular GL-gate inference so that GL-D network satisfies diagnostic condition.

 P(X1xX2x…xXn)=∑i=1nhi(Xi)wi+∑i=1nɡi(Xi)diwhere hi and ɡi are probability mass functions and the set of Xi(s) is complete.∑i=1nWi=1 (36)

Functions hi(Xi) and gi(Xi) are always linear due to Xim = Xi for all m ≥ 1 when Xi is binary. It is easy to infer that SIGMA-D network is GL-D network with following definition of functions hi and gi.

hi(Xi)=1ɡi(Xi)=Xi,i

According to Millán and Pérez-de-la-Cruz [4], a hypothesis can have multiple evidences as seen in Figure 7. This is multi-evidence diagnostic relationship opposite to aforementioned multihypothesis diagnostic relationship.

#### Figure 7.

Diagnostic relationship with multiple evidences (M-E-D network).

#### Figure 8.

M-HE-D network.

Figure 7 depicts the multi-evidence diagnostic network called M-E-D network in which there are m evidences D1, D2,…, Dm and one hypothesis Y. Note that Y has uniform distribution.

In simplest case where all evidences are binary, the joint probability of M-E-D network is

P(Y,D1,D2,,Dm)=P(Y)j=1mP(Dj|Y)=P(Y)P(D1,D2,,Dm|Y)

The product j=1mP(Dj|Y) is denoted as likelihood function as follows

P(D1,D2,,Dm|Y)=j=1mP(Dj|Y)

The posterior probability P(Y | D1, D2,…, Dm) given uniform distribution of Y is

P(Y|D1,D2,,Dm)=P(Y,D1,D2,,Dm)P(Y=1,D1,D2,,Dm)+P(Y=0,D1,D2,,Dm)=1j=1mP(Dj|Y=1)+j=1mP(Dj|Y=0)*P(D1,D2,,Dm|Y)

The possible transformation coefficient is

1k=j=1mP(Dj|Y=1)+j=1mP(Dj|Y=0)

M-E-D network will satisfy diagnostic condition if k = 1 because all hypotheses and evidence are binary, which leads that following equation specified by Eq. (37) has 2m real roots P(Dj|Y) for all m ≥ 2.

 ∏j=1mP(Dj|Y=1)+∏j=1mP(Dj|Y=0)=1 (37)

Eq. (37) has no real root given m = 2 according to following proof. Suppose Eq. (37) has 4 real roots as follows

a1=P(D1=1|Y=1)
a2=P(D2=1|Y=1)
b1=P(D1=1|Y=0)
b2=P(D2=1|Y=0)

From Eq. (37), it holds

{a1a2+b1b2=1a1(1a2)+b1b2=1(1a1)a2+b1b2=1a1a2+b1(1b2)=1a1a2+(1b1)b2=1{a1=a2b1=b2a12+b12=1a1+2b12=2b1+2a12=2{a1=a2=0b1=b2a12+b12=1b1=2or {a1=a2=0.5b1=b2a12+b12=1b1=1.5

The final equation leads a contradiction (b1 = 2 or b1 = 1.5) and so it is impossible to apply the sufficient diagnostic proposition into M-E-D network. Such proposition is only used for one-evidence network. Moreover, X-gate inference absorbs many sources and then produces out of one targeted result whereas the M-E-D network essentially splits one source into many results. It is impossible to model M-E-D network by X-gates. The potential solution for this problem is to group many evidences D1, D2,…, Dm into one representative evidence D which in turn is dependent on hypothesis Y but this solution will be inaccurate in specifying conditional probabilities because directions of dependencies become inconsistent (relationships from Dj to D and from Y to D) except that all Djs are removed and D becomes a vector. However, evidence vector does not simplify the hazardous problem and it changes the current problem into a new problem.

Another solution is to reverse the direction of relationship, in which the hypothesis is dependent on evidences so as to take advantages of X-gate inference as usual. However, the reversion method violates the viewpoint in this research where diagnostic relationship must be from hypothesis to evidence. In other words, we should change the viewpoint.

Another solution is based on a so-called partial diagnostic condition that is a loose case of diagnostic condition for M-E-D network, which is defined as follows

P(Y|Dj)=kP(Dj|Y)

where k is constant with regard to Dj. The joint probability is

P(Y,D1,D2,,Dm)=P(Y)j=1mP(Dj|Y)

M-E-D network satisfies partial diagnostic condition. In fact, given all variables are binary, we have

P(Y|Dj)=Ψ\{Y,Dj}P(Y,D1,D2,,Dm)Ψ\{Dj}P(Y,D1,D2,,Dm)

(Let Ψ = {D1, D2,…, Dm})

=P(Dj|Y)k=1,kjm(DkP(Dk|Y))k=1,kjm(DkP(Dk|Y=1))+k=1,kjm(DkP(Dk|Y=0))

(Due to uniform distribution of Y)

=P(Dj|Y)k=1,kjm1k=1,kjm1+k=1,kjm1=12P(Dj|Y)
(Due to DkP(Dk|Y)=P(Dk=0|Y)+P(Dk=1|Y)=1)

Partial diagnostic condition expresses a different viewpoint. It is not an optimal solution because we cannot test a disease based on only one symptom while ignoring other obvious symptoms, for example. The equality P(Y|Dj) = 0.5P(Dj|Y) indicates the accuracy is decreased two times. However, Bayesian network provides inference mechanism based on personal belief. It is subjective. You can use partial diagnostic condition if you think that such condition is appropriate to your application.

If we are successful in specifying conditional probabilities of M-E-D network, it is possible to define an extended network which is constituted of n hypotheses X1, X2,…, Xn and m evidences D1, D2,…, Dm. Such extended network represents multi-hypothesis multi-evidence diagnostic relationship, called M-HE-D network. Figure 8 depicts M-HE-D network.

The M-HE-D network is the most general case of diagnostic network, which was mentioned in Ref. ([4], p. 297). We can construct any large diagnostic BN from M-HE-D networks and so the research is still open.

## 5. Conclusion

In short, relationship conversion is to determine conditional probabilities based on logic gates that are adhered to semantics of relationships. The weak point of logic gates is to require that all variables must be binary. For example, in learning context, it is inconvenient for expert to create an assessment BN with studying exercises (evidences) whose marks are only 0 and 1. In order to lessen the impact of such weak point, the numeric evidence is used for extending capacity of simple Bayesian network. However, combination of binary hypothesis and numeric evidence leads to errors or biases in inference. For example, given a student gets maximum grade for an exercise but the built-in inference results out that she/he has not mastered fully the associated learning concept (hypothesis). Therefore, I propose the sufficient diagnostic proposition so as to confirm that numeric evidence is adequate to make complicated inference tasks in BN. The probabilistic reasoning based on evidence is always accurate. Application of the research can go beyond learning context whenever probabilistic deduction relevant to constraints of semantic relationships is required. A large BN can be constituted of many simple BN (s). Inference in large BN is hazardous problem and there are many optimal algorithms for solving such problem. In future, I will research effective inference methods for the special BN that is constituted of X-gate BN (s) mentioned in this research because X-gate BN (s) have precise and useful features of which we should take advantages. For instance, their CPT (s) are simple in some cases and the meanings of their relationships are mandatory in many applications. Moreover, I try my best to research deeply M-E-D network and M-HE-D network whose problems I cannot solve absolutely now.

Two main documents that I referred to do this research are the book “Learning Bayesian Networks” [2] by the author Richard E. Neapolitan and the article “A Bayesian Diagnostic Algorithm for Student Modeling and its Evaluation” [4] by authors Eva Millán and José Luis Pérez-de-la-Cruz. Especially, the SIGMA-gate inference is based on and derived from the work of the Eva Millán and José Luis Pérez-de-la-Cruz. This research is originated from my PhD research “A User Modeling System for Adaptive Learning” [12]. Other references relevant to user modeling, overlay model, and Bayesian network are [1316]. Please concern these references.

## Appendices

### Appendices

A1. Following is the proof of Eq. (9)

P(Ai=ON|Xi)=P(Ai=ON|Xi,Ii=ON)P(Ii=ON)+P(Ai=ON|Xi,Ii=OFF)P(Ii=OFF)=0*(1pi)+P(Ai=ON|Xi,Ii=OFF)pi(By applying Eq. (8))=piP(Ai=ON|Xi,Ii=OFF)

It implies

P(Ai=ON|Xi=1)=piP(Ai=ON|Xi=1,Ii=OFF)=pi
P(Ai=ON|Xi=0)=piP(Ai=ON|Xi=0,Ii=OFF)=0
P(Ai=OFF|Xi=1)=1P(Ai=ON|Xi=1)=1pi
P(Ai=OFF|Xi=0)=1P(Ai=ON|Xi=0)=1

A2. Following is the proof of Eq. (10)

P(Y|X1,X2,,Xn)=P(Y,X1,X2,,Xn)P(X1,X2,,Xn)(Due to Bayes’ rule)=A1,A2,,AnP(Y,X1,X2,,Xn|A1,A2,,An)*P(A1,A2,,An)P(X1,X2,,Xn)(Due to total probability rule)=A1,A2,,AnP(Y,X1,X2,,Xn|A1,A2,,An)*P(A1,A2,,An)P(X1,X2,,Xn)=A1,A2,,AnP(Y|A1,A2,,An)*P(X1,X2,,Xn|A1,A2,,An)*P(A1,A2,,An)P(X1,X2,,Xn)

(Because Y is conditionally independent from Xis given Ais)

=A1,A2,,AnP(Y|A1,A2,,An)*P(X1,X2,,Xn,A1,A2,,An)P(X1,X2,,Xn)=A1,A2,,AnP(Y|A1,A2,,An)*P(A1,A2,,An|X1,X2,,Xn)(Due to Bayes’ rule)=A1,A2,,AnP(Y|A1,A2,,An)i=1nP(Ai|X1,X2,,Xn)

(Because Ais are mutually independent)

=A1,A2,,AnP(Y|A1,A2,,An)i=1nP(Ai|Xi)

(Because each Ai is only dependent on Xi) ■

A3. Following is the proof that the augmented X-D network (shown in Figure 5) is equivalent to X-D network (shown in shown in Figures 2 and 3) with regard to variables X1, X2,…, Xn, and D.

The joint probability of augmented X-D network shown in Figure 5 is

P(X1,X2,,Xn,Y,D)=P(D|Y)P(Y|X1,X2,,Xn)i=1nP(Xi)

The joint probability of X-D network is

P(X1,X2,,Xn,D)=P(D|X1,X2,,Xn)i=1nP(Xi)

By applying total probability rule into X-D network, we have

P(X1,X2,,Xn,D)=P(D,X1,X2,,Xn)P(X1,X2,,Xn)i=1nP(Xi)(Due to Bayes’ rule)=YP(D,X1,X2,,Xn|Y)P(Y)P(X1,X2,,Xn)i=1nP(Xi)(Due to total probability rule)=YP(D,X1,X2,,Xn|Y)P(Y)P(X1,X2,,Xn)i=1nP(Xi)=(YP(D,X1,X2,,Xn|Y)*P(Y)P(X1,X2,,Xn))*i=1nP(Xi)=(YP(D|Y)*P(X1,X2,,Xn|Y)P(Y)P(X1,X2,,Xn))*i=1nP(Xi)

(Because D is conditionally independent from all Xi (s) given Y)

=(YP(D|Y)*P(Y,X1,X2,,Xn)P(X1,X2,,Xn))*i=1nP(Xi)=YP(D|Y)P(Y|X1,X2,,Xn)i=1nP(Xi)(Due to Bayes’ rule)=YP(X1,X2,,Xn,Y,D)

A4. Following is the proof of Eq. (29)

Given uniform distribution of Xi (s), we have

P(X1)=P(X2)==P(Xn)=12

The joint probability becomes

P(Ω,Y,D)=12nP(Y|X1,X2,,Xn)P(D|Y)

The joint probability of Xi and D is

P(Xi,D)={Ω,Y,D}\{Xi,D}P(Ω,Y,D)=P(X1=1,X2=1,,Xi,,Xn1=1,Xn=1,Y=1,D)+P(X1=1,X2=1,,Xi,,Xn1=1,Xn=0,Y=1,D)++P(X1=0,X2=0,,Xi,,Xn1=0,Xn=1,Y=1,D)+P(X1=0,X2=0,,Xi,,Xn1=0,Xn=0,Y=1,D)+P(X1=1,X2=1,,Xi,,Xn1=1,Xn=1,Y=0,D)+P(X1=1,X2=1,,Xi,,Xn1=1,Xn=0,Y=0,D)++P(X1=0,X2=0,,Xi,,Xn1=0,Xn=1,Y=0,D)+P(X1=0,X2=0,,Xi,,Xn1=0,Xn=0,Y=0,D)=12nDS(P(Y=1|X1=1,X2=1,,Xi,,Xn1=1,Xn=1)+P(Y=1|X1=1,X2=1,,Xi,,Xn1=1,Xn=0)++P(Y=1|X1=1,X2=1,,Xi,,Xn1=0,Xn=1)+P(Y=1|X1=1,X2=1,,Xi,,Xn1=0,Xn=0))+12nMDS(P(Y=0|X1=1,X2=1,,Xi,,Xn1=1,Xn=1)+P(Y=0|X1=1,X2=1,,Xi,,Xn1=1,Xn=0)++P(Y=0|X1=1,X2=1,,Xi,,Xn1=0,Xn=1)+P(Y=0|X1=1,X2=1,,Xi,,Xn1=0,Xn=0))

(Due to Eq. (6))

The marginal probability of D is

P(D)={Ω,Y,D}\{D}P(Ω,Y,D)=P(X1=1,X2=1,,Xn=1,Y=1,D)+P(X1=1,X2=1,,Xn=0,Y=1,D)++P(X1=0,X2=0,,Xn=1,Y=1,D)+P(X1=0,X2=0,,Xn=0,Y=1,D)+P(X1=1,X2=1,,Xn=1,Y=0,D)=12nDS(P(Y=1|X1=1,X2=1,,Xn=1)+P(Y=1|X1=1,X2=1,,Xn=0)++P(Y=1|X1=1,X2=1,,Xn=1)+P(Y=1|X1=1,X2=1,,Xn=0))+12nMDS(P(Y=0|X1=1,X2=1,,Xn=1)+P(Y=0|X1=1,X2=1,,Xn=0)++P(Y=0|X1=1,X2=1,,Xn=1)+P(Y=0|X1=1,X2=1,,Xn=0))+P(X1=1,X2=1,,Xn=0,Y=0,D)+

By applying Table 2, the joint probability P(Xi, D) is determined as follows

P(Xi,D)=12nS(DaP(Y=1|a(Ω:{Xi}))+(MD)aP(Y=0|a(Ω:{Xi})))=12nS(DaP(Y=1|a(Ω:{Xi}))+(MD)a(1P(Y=1|a(Ω:{Xi}))))=12nS((2DM)s(Ω:{Xi})+2n1(MD))

Similarly, the marginal probability P(D) is

P(D)=12nS((2DM)s(Ω)+2n(MD))

## References

1 - Wikipedia. Logic gate. Wikimedia Foundation [Internet]. 2016. [Online]. Available from: https://en.wikipedia.org/wiki/Logic_gate [Accessed June 4, 2016]
2 - Neapolitan RE. Learning Bayesian Networks. Upper Saddle River, New Jersey: Prentice Hall; 2003. p. 674
3 - Díez FJ, Druzdzel MJ. Canonical Probabilistic Models. Madrid: Research Centre on Intelligent Decision-Support Systems; 2007
4 - Millán E, Pérez-de-la-Cruz JL. A bayesian diagnostic algorithm for student modeling and its evaluation. User Modeling and User-Adapted Interaction. 2002;12(2-3):281–330
5 - Wikipedia. Factor graph. Wikimedia Foundation [Internet]. 2015. [Online]. Available from: https://en.wikipedia.org/wiki/Factor_graph [Accessed: February 8, 2017]
6 - Kschischang FR, Frey BJ, Loeliger HA. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory. 2001;47(2):498–519
7 - Pearl J. Fusion, propagation, and structuring in belief networks. Artificial Intelligence. 1986;29(3):241–288
8 - Millán E, Loboda T, Pérez-de-la-Cruz JL. Bayesian networks for student model engineering. Computers & Education. 2010:55(4):1663–1683
9 - Nguyen L. Theorem of SIGMA-gate inference in Bayesian network. Wulfenia Journal. 2016;23(3):280–289
10 - Wikipedia. Set (mathematics), Wikimedia Foundation [Internet]. 2014. [Online]. Available from: http://en.wikipedia.org/wiki/Set_(mathematics) [Accessed: October 11, 2014]
11 - Wikipedia. Abel-Ruffini theorem. Wikimedia Foundation [Internet]. 2016. [Online]. Available from: https://en.wikipedia.org/wiki/Abel%E2%80%93Ruffini_theorem [Accessed: June 26, 2016]
12 - Nguyen L. A User Modeling System for Adaptive Learning. Abuja, Nigeria: Standard Research Journals; 2014
13 - Fröschl C. User modeling and user profiling in adaptive E-learning systems [master thesis]. Graz, Austria: Graz University of Technology; 2005
14 - De Bra P, Smits D, Stash N. The Design of AHA!. In Proceedings of the Seventeenth ACM Hypertext Conference on Hypertext and hypermedia (Hypertext ’06); 22-25 August 2006; Odense, Denmark. New York, NY: ACM; 2006. pp. 133–134
15 - Murphy KP. A Brief Introduction to Graphical Models and Bayesian Networks. University of British Columbia; 1998. [Online]. Available from: http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html [Accessed: 2008]
16 - Heckerman D. A Tutorial on Learning With Bayesian Networks. Redmond: Microsoft Research; 1995