Open access peer-reviewed chapter

Converting Graphic Relationships into Conditional Probabilities in Bayesian Network

By Loc Nguyen

Submitted: December 6th 2016Reviewed: June 8th 2017Published: November 2nd 2017

DOI: 10.5772/intechopen.70057

Downloaded: 325

Abstract

Bayesian network (BN) is a powerful mathematical tool for prediction and diagnosis applications. A large Bayesian network can constitute many simple networks, which in turn are constructed from simple graphs. A simple graph consists of one child node and many parent nodes. The strength of each relationship between a child node and a parent node is quantified by a weight and all relationships share the same semantics such as prerequisite, diagnostic, and aggregation. The research focuses on converting graphic relationships into conditional probabilities in order to construct a simple Bayesian network from a graph. Diagnostic relationship is the main research object, in which sufficient diagnostic proposition is proposed for validating diagnostic relationship. Relationship conversion is adhered to logic gates such as AND, OR, and XOR, which are essential features of the research.

Keywords

  • diagnostic relationship
  • Bayesian network
  • transformation coefficient

1. Introduction

Bayesian network (BN) is a directed acyclic graph (DAG) consists of a set of nodes and a set of arcs. Each node is a random variable. Each arc represents a relationship between two nodes. The strength of a relationship in a graph can be quantified by a number called weight. There are some important relationships such as prerequisite, diagnostic, and aggregation. The difference between BN and normal graph is that the strength of every relationship in BN is represented by a conditional probability table (CPT) whose entries are conditional probabilities of a child node given parent nodes. There are two main approaches to construct a BN, which are as follows

  • The first approach aims to learn BN from training data by learning machine algorithms.

  • The second approach is that experts define some graph patterns according to specific relationships and then, BN is constructed based on such patterns along with determined CPTs.

This research focuses on the second approach in which relationships are converted into CPTs. Essentially, relationship conversion aims to determine conditional probabilities based on weights and meanings of relationships. We will have different ways to convert graphic weights into CPTs for different relationships. It is impossible to convert all relationships but some of them such as diagnostic, aggregation, and prerequisite are mandatory ones that we must specify as computable CPTs of BN. Especially, these relationships are adhered to logic X-gates [1] such as AND-gate, OR-gate, and SIGMA-gate. The X-gate inference in this research is derived and inspired from noisy OR-gate described in the book “Learning Bayesian Networks” Neapolitan ([2], pp. 157–159). Díez and Druzdzel [3] also researched OR/MAX, AND/MIN, and noisy XOR inferences but they focused on canonical models, deterministic models, and ICI models whereas I focused on logic gate and graphic relationships. So, their research is different from mine but we share the same result that is AND-gate model. In general, my research focuses on applied probability adhered to Bayesian network, logic gates, and Bayesian user modeling [4]. The scientific results are shared with Millán and Pérez-de-la-Cruz [4].

Factor graph [5] represents factorization of a global function into many partial functions. If joint distribution of BN is considered as the global function and CPTs are considered as partial functions, the sumproduct algorithm [6] of factor graph is applied into calculating posterior probabilities of variables in BN. Pearl’s propagation algorithm [7] is very successful in BN inference. The application of factor graph into BN is only realized if all CPT (s) of BN are already determined whereas this research focuses on defining such CPTs firstly. I did not use factor graph for constructing BN. The concept “X-gate inference” only implies how to convert simple graph into BN. However, the arrange sum with a fixed variable mentioned in this research is the “not-sum” ([6], p. 499) of factor graph. Essentially, X-gate probability shown in Eq. (10) is as same as λ message in the Pearl’s algorithm ([6], p. 518) but I use the most basic way to prove the X-gate probability.

As default, the research is applied in learning context in which BN is used to assess students’ knowledge. Evidences are tests, exams, exercises, etc. and hypotheses are learning concepts, knowledge items, etc. Note that diagnostic relationship is very important to Bayesian evaluation in learning context because it is used to evaluate student’s mastery of concepts (knowledge items) over entire BN. Now, we start relationship conversion with a research on diagnostic relationship in the next section.

2. Diagnostic relationship

In some opinions like mine, the diagnostic relationship should be from hypothesis to evidence. For example, disease is hypothesis and symptom is evidence. The symptom must be conditionally dependent on disease. Given a symptom, calculating the posterior probability of disease is essentially to diagnose likelihood of such disease ([8], p. 1666). Inversely, the arc from evidence to hypothesis implies prediction where evidence and hypothesis represent observation and event, respectively. Given an observation, calculating the posterior probability of the event is essentially to predict/assert such event ([8], p. 1666). Figure 1 shows diagnosis and prediction.

Figure 1.

Diagnosis and prediction with hypothesis X and evidence D.

The weight w of the relationship between X and D is 1. Figure 1 depicts simplest graph with two random variables. We need to convert diagnostic relationship into conditional probabilities in order to construct a simplest BN from the simplest graph. Note that hypothesis is binary but evidence can be numerical. In learning context, evidence D can be test, exam, exercise, etc. The conditional probability of D given X (likelihood function) is P(D|X). The posterior probability of X is P(X|D), which is used to evaluate student’s mastery over concept (hypothesis) X given evidence D. Eq. (1) specifies CPT of D when D is binary (0 and 1)

P(D|X)={D if X=11D if X=0E1

Eq (1) is our first relationship conversion. It implies

P(D|X=0)+P(D|X=1)=D+1D=1E38

Evidence D can be used to diagnose hypothesis X if the so-called sufficient diagnostic proposition is satisfied, as seen in Table 1.

D is equivalent to X in diagnostic relationship if P(X|D) = kP(D|X) given uniform distribution of X and the transformation coefficient k is independent from D. In other words, k is constant with regards to D and so D is called sufficient evidence.

Table 1.

Sufficient diagnostic proposition.

The concept of sufficient evidence is borrowed from the concept of sufficient statistics and it is inspired from equivalence of variables T and T’ in the research ([4], pp. 292-295). The proposition can be restated that evidence D is only used to assess hypotheses if it is sufficient evidence. As a convention, the proposition is called diagnostic condition and hypotheses have uniform distribution. The assumption of hypothetic uniform distribution (P(X = 1) = P(X = 0)) implies that we cannot assert whether or not given hypothesis is true before we observe its evidence.

In learning context, D can be totally used to assess student’s mastery of X if diagnostic condition is satisfied. Derived from such condition, Eq. (2) specifies transformation coefficient k given uniform distribution of X.

k=P(X|D)P(D|X)E2

We need to prove that Eq. (1) satisfies diagnostic condition. Suppose the prior probability of X is uniform.

P(X=0)=P(X=1)E39

we have

P(X|D)=P(D|X)P(X)P(D)=P(D|X)P(X)P(D|X=0)P(X=0)+P(D|X=1)P(X=1)(due to Bayesrule)=P(D|X)P(X)P(X)(P(D|X=0)+P(D|X=1))(due to P(X=0)=P(X=1))=P(D|X)P(D|X=0)+P(D|X=1)=1*P(D|X)(due to P(D|X=0)+P(D|X=1)=1) E40

It is easy to infer that the transformation coefficient k is 1, if D is binary. In practice, evidence D is often a test whose grade ranges within an interval {0, 1, 2,…, η}. Eq. (3) specifies CPT of D in this case

P(D|X)={DSif X=1ηSDSif X=0E3

Where

D{0,1,2,,η} S=D=0nD=η(η+1)2E41

As a convention, P(D|X)=0,D{0,1,2,,η}. Eq. (3) implies that if student has mastered concept (X = 1), the probability that she/he completes the exercise/test D is proportional to her/his mark on D (P(D|X)=DS). We also have

P(D|X=0)+P(D|X=1)=DS+ηDS=ηS=2(η+1)E42
D=0ηP(D|X=1)=D=0ηDS=D=0ηDS=SS=1E43
D=0ηP(D|X=0)=D=0ηηDS=D=0η(ηD)S=D=0ηηD=0ηDS=η(η+1)SS=2SSS=1E44

We need to prove that Eq. (3) satisfies diagnostic condition. Suppose the prior probability of X is uniform.

P(X=0)=P(X=1)E45

The assumption of prior uniform distribution of X implies that we do not determine if student has mastered X yet. Similarly, we have

P(X|D)=P(D|X)P(X)P(D)=P(D|X)P(D|X=0)+P(D|X=1)=η+12P(D|X) E46

So, the transformation coefficient k is η+12if D ranges in {0, 1, 2,…, η}.

In the most general case, discrete evidence D ranges within an arbitrary integer interval {a,a+1,a+2,,b}. In other words, D is bounded integer variable whose lower bound and upper bound are a and b, respectively. Eq. (4) specifies CPT of D, where D{a,a+1,a+2,,b}.

P(D|X)={DSif X=1b+aSDSif X=0E4

Where

D{a,a+1,a+2,,b} S=a+(a+1)+(a+2)++b=(b+a)(ba+1)2E1000

Note, P(D|X)=0,D{a,a+1,a+2,,b}. According to the diagnostic condition, we need to prove the equality P(X|D)=kP(D|X), where

k=ba+12E47

Similarly, we have

P(X|D)=P(D|X)P(X)P(D)=P(D|X)P(D|X=0)+P(D|X=1)=ba+12P(D|X) E48

If evidence D is continuous in the real interval [a, b] with note that a and b are real numbers, Eq. (5) specifies probability density function (PDF) of continuous evidence D[a,b]. The PDF p(D|X)replaces CPT in case of continuous random variable.

p(D|X)={2Db2a2if X=12ba2Db2a2if X=0E49

where

D[a,b] where a and b are real numbersE50
S=abDdD=b2a22E5

As a convention, [a, b] is called domain of continuous evidence, which can be replaced by open or half-open intervals such as (a, b), (a, b], and [a, b). Of course we have p(D|X)=0,D[a,b]. In learning context, evidence D is often a test whose grade ranges within real interval [a, b].

Functions p(D|X = 1) and p(D|X = 0) are valid PDFs due to

Dp(D|X=1)dD=ab2Db2a2dD=1b2a2ab2DdD=1E51
Dp(D|X=0)dD=2baabdD1b2a2ab2DdD=1.E52

According to the diagnostic condition, we need to prove the equality

P(X|D)=kp(D|X)E53

where,

k=ba2E54

When D is continuous, its probability is calculated in ε-vicinity where ε is very small number. As usual, ε is bias if D is measure values produced from equipment. The probability of D given X, where D + ε[a, b] and Dε[a, b] is

P(D|X)=DεD+εp(D|X)dD={DεD+ε2Db2a2dD if X=1DεD+ε(2ba2Db2a2)dD if X=0={4εDb2a2if X=14εba4εDb2a2if X=0=2εp(D|X)E55

In fact, we have

P(X|D)=P(D|X)P(X)P(D|X=0)P(X=0)+P(D|X=1)P(X=1)=P(D|X)P(D|X=0)+P(D|X=1)E56
(due to Bayes'rule and the assumption P(X=0)=P(X=1))E57
=ba4εP(D|X)=kp(D|X) E58

In general, Eq. (6) summarizes CPT of evidence of single diagnostic relationship.

P(D|X)={DS if X=1MSDS if X=0k=N2E9000

Where,

N={2 if D{0,1}η+1 if D{0,1,2,,η}ba+1 if D{a,a+1,a+2,,b}ba if D continuous and D[a,b]M={1 if D{0,1}η if D{0,1,2,,η}b+a if D{a,a+1,a+2,,b}b+a if D continuous and D[a,b]S=DD=NM2={1 if D{0,1}η(η+1)2 if D{0,1,2,,η}(b+a)(ba+1)2 if D{a,a+1,a+2,,b}b2a22 if D continuous and D[a,b]E6

In general, if the conditional probability P(D|X) is specified by Eq. (6), the diagnostic condition will be satisfied. Note that the CPT P(D|X) is the PDF p(D|X) in case of continuous evidence. The diagnostic relationship will be extended with more than one hypothesis. The next section will mention how to determine CPTs of a simple graph with one child node and many parent nodes based on X-gate inferences.

3. X-gate inferences

Given a simple graph consisting of one child variable Y and n parent variables Xi, as shown in Figure 2, each relationship from Xi to Y is quantified by normalized weight wi where 0 ≤ wi ≤ 1. A large graph is an integration of many simple graphs. Figure 2 shows the DAG of a simple BN. As aforementioned, the essence of constructing simple BN is to convert graphic relationships of simple graph into CPTs of simple BN.

Figure 2.

Simple graph or simple network.

Child variable Y is called target and parent variables Xis are called sources. Especially, these relationships are adhered to X-gates such as AND-gate, OR-gate, and SIGMA-gate. These gates are originated from logic gate [1]. For instance, AND-gate and OR-gate represent prerequisite relationship. SIGMA-gate represents aggregation relationship. Therefore, relationship conversion is to determined X-gate inference. The simple graph shown in Figure 2 is also called X-gate graph or X-gate network. Please distinguish the letter “X” in the term “X-gate inference” which implies logic operators (AND, OR, XOR, etc.) from the “variable X”.

All variables are binary and they represent events. The probability P(X) indicates event X occurs. Thus, P(X) implicates P(X = 1) and P(not(X)) implicates P(X = 0). Eq. (7) specifies the simple NOT-gate inference.

P(not(X))=P(X¯)=P(X=0)=1P(X=1)=1P(X)P(not(not(X)))=P(X)E7

X-gate inference is based on three assumptions mentioned in Ref. ([2], p. 157), which are as follows

  • X-gate inhibition: Given a relationship from source Xi to target Y, there is a factor Ii that inhibits Xi from being integrated into Y. Factor Ii is called inhibition of Xi. That the inhibition Ii is turned off is prerequisite of Xi integrated into Y.

  • Inhibition independence: Inhibitions are mutually independent. For example, inhibition I1 of X1 is independent from inhibition I2 of X2.

  • Accountability: X-gate network is established by accountable variables Ai for Xi and Ii. Each X-gate inference owns particular combination of Ais.

Figure 3 shows the extended X-gate network with accountable variables Ais ([2], p. 158).

Figure 3.

Extended X-gate network with accountable variables Ais.

The strength of each relationship from source Xi to target Y is quantified by a weight 0 ≤ wi ≤ 1. According to the assumption of inhibition, probability of Ii = OFF is pi, which is set to be the weight wi.

pi=wiE59

If notation wi is used, we focus on the strength of relationship. If notation pi is used, we focus on probability of OFF inhibition. In probabilistic inference, pi is also prior probability of Xi = 1. However, we will assume each Xi has uniform distribution later on. Eq. (8) specifies probabilities of inhibitions Iis and accountable variables Ais.

P(Ii=OFF)=pi=wiP(Ii=ON)=1pi=1wiP(Ai=ON|Xi=1,Ii=OFF)=1P(Ai=ON|Xi=1,Ii=ON)=0P(Ai=ON|Xi=0,Ii=OFF)=0P(Ai=ON|Xi=0,Ii=ON)=0P(Ai=OFF|Xi=1,Ii=OFF)=0P(Ai=OFF|Xi=1,Ii=ON)=1P(Ai=OFF|Xi=0,Ii=OFF)=1P(Ai=OFF|Xi=0,Ii=ON)=1E8

According to Eq. (8), given probability P(Ai=ON | Xi=1, Ii=OFF), it is assured 100% confident that accountable variables Ai is turned on if source Xi is 1 and inhibition Ii is turned off. Eq. (9) specifies conditional probability of accountable variables Ai (s) given Xi (s), which is corollary of Eq. (8).

P(Ai=ON|Xi=1)=pi=wiP(Ai=ON|Xi=0)=0P(Ai=OFF|Xi=1)=1pi=1wiP(Ai=OFF|Xi=0)=1E9

Appendix A1 is the proof of Eq. (9). As a definition, the set of all Xis is complete if and only if

P(X1X2Xn)=P(Ω)=i=1nwi=1E60

The set of all Xis is mutually exclusive if and only if

XiXj=,ijE61

For each Xi, there is only one Ai and vice versa, which establishes a bijection between Xis and Ais. Obviously, the fact that the set of all Xis is complete is equivalent to the fact that the set of all Ai (s) is complete. We will prove by contradiction that “the fact that the set of all Xi (s) is mutually exclusive is equivalent to the fact that the set of all Ai (s) is mutually exclusive.” Suppose XiXj=,ijbut ij: AiAj=B. Let B1be preimage of B. Due to B  Aiand B  Aj, we have B1  Xiand B1  Xj, which causes that XiXj=B1. There is a contradiction and so we have

XiXj=,ijAiAj=,ijE62

By similar proof, we have

AiAj=,ijXiXj=,ij E63

The extended X-gate network shown in Figure 3 is interpretation of simple network shown in Figure 2. Specifying CPT of the simple network is to determine the conditional probability P(Y = 1 | X1, X2,…, Xn) based on extended X-gate network. The X-gate inference is represented by such probability P(Y = 1 | X1, X2,…, Xn) specified by Eq. (10) ([2], p. 159).

P(Y|X1,X2,,Xn)=A1,A2,,AnP(Y|A1,A2,,An)i=1nP(Ai|Xi)E10

Appendix A2 is the proof of Eq. (10). It is necessary to make some mathematical notations because Eq. (10) is complicated, which is relevant to arrangements of Xi (s). Given the set Ω = {X1, X2,…, Xn} where all variables are binary, Table 2 specifies binary arrangements of Ω.

Given Ω = {X1, X2,…, Xn} where |Ω| = n is cardinality of Ω.
Let a(Ω) be an arrangement of Ω which is a set of n instances {X1=x1, X2=x2,…, Xn=xn} where xi is 1 or 0. The number of all a(Ω) is 2|Ω|. For instance, given Ω = {X1, X2}, there are 22=4 arrangements as follows:
a(Ω)={X1=1,X2=1},a(Ω)={X1=1,X2=0},a(Ω)={X1=0,X2=1},a(Ω) = {X1=0,X2=0}.E64

Let a(Ω:{Xi}) be the arrangement of Ω with fixed Xi. The number of all a(Ω:{Xi}) is 2|Ω|−1. Similarly, for instance, a(Ω:{X1, X2, X3}) is an arrangement of Ω with fixed X1, X2, X3. The number of all a(Ω:{X1, X2, X3}) is 2|Ω|−3.
Let c(Ω) and c(Ω:{Xi}) be the number of arrangements a(Ω) and a(Ω:{Xi}), respectively. Such c(Ω) and c(Ω:{Xi}) are called arrangement counters. As usual, counters c(Ω) and c(Ω:{Xi}) are equal to 2|Ω| and 2|Ω|−1, respectively but they will vary according to specific cases.
Let aF(a(Ω))and aF(a(Ω))denote sum and product of values generated from function F acting on every a(Ω). The number of arrangements on which F acts is c(Ω).
Let x denote the X-gate operator, for instance, x = ⊙ for AND-gate, x = ⊕ for OR-gate, x = not ⊙ for NAND-gate, x = not ⊕ for NOR-gate, x = ⊗ for XOR-gate, x = not ⊗ for XNOR-gate, x = ⊎ for U-gate, x=+for SIGMA-gate. Given an x-operator, let s(Ω:{Xi}) and s(Ω) be sum of all P(X1xX2xxXn)through every arrangement of Ω with and without fixed Xi, respectively.
s(Ω)=aP(X1xX2xxXn|a(Ω))=aP(Y=1|a(Ω))s(Ω:{Xi})=aP(X1xX2xxXn|a(Ω:{Xi}))=aP(Y=1|a(Ω:{Xi}))E65

For example, s(Ω) and s(Ω:{Xi}) for OR-gate are:
s(Ω)=aP(X1X2Xn|a(Ω))s(Ω:{Xi})=aP(X1X2Xn|a(Ω:{Xi}))E66

Such s(Ω) and s(Ω:{Xi}) are called arrangement sum. They are acting function F.
Note that Ω can be any set of binary variables.

Table 2.

Binary arrangements.

It is not easy to produce all binary arrangements of Ω. Table 3 shows a code snippet written by Java programming language for producing such all arrangements.

 public class ArrangementGenerator {
 private ArrayList<int[]> arrangements;
 private int n;
 private int r;

 private ArrangementGenerator(int n, int r) {
    this.n = n;
      this.r = r;
      this.arrangements = new ArrayList();
 }
 private void create(int[] a, int i) {
     for(int j = 0; j < n; j++) {
         a[i] = j;
        if(i < r - 1)
            create(a, i + 1);
        else if(i == r -1) {
            int[] b = new int[a.length];
            for(int k = 0; k < a.length; k++) b[k] = a[k];
            arrangements.add(b);
        }
     }
 }
 public int[] get(int i) {
     return arrangements.get(i);
 }
 public long size() {
     return arrangements.size();
 }
 public static ArrangementGenerator parse(int n, int r) {
     ArrangementGenerator arr =
         new ArrangementGenerator(n, r);
     int[] a = new int[r];
     for(int i=0; i<r; i++) a[i] = -1;
     arr.create(a, 0);
     return arr;
 }
 }

Table 3.

Code snippet generating all binary arrangements.

Each element of the list “arrangements” is a binary arrangement a(Ω) presented by an array of bits (0 and 1). The method “create(int[] a, int i)” which is recursive method, is the main one that generates arrangements. The method call “ArrangementGenerator.parse(2, n)” will list all possible binary arrangements.

Eq. (11) specifies the connection between s(Ω:{Xi = 1}) and s(Ω:{Xi = 0}), between c(Ω:{Xi = 1}) and c(Ω:{Xi = 0}).

s(Ω:{Xi=1})+s(Ω:{Xi=0})=s(Ω)c(Ω:{Xi=1})+c(Ω:{Xi=0})=c(Ω)E11

It is easy to draw Eq. (11) when the set of all arrangements a(Ω:{Xi = 1) is complement of the set of all arrangements a(Ω:{Xi = 0).

Let K be a set of Xis whose values are 1 and let L be a set of Xis whose values are 0. K and L are mutually complementary. Eq. (12) determines sets K and L.

{K={i:Xi=1}L={i:Xi=0}KL=KL={1,2,,n}E12

The AND-gate inference represents prerequisite relationship satisfying AND-gate condition specified by Eq. (13).

P(Y=1|Ai=OFF for some i)=0E13

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)=i=1nP(Ai=ON|Xi)(Due to P(Y=1|Ai=OFF for some i)=0)=(iKP(Ai=ON|Xi=1))(iKP(Ai=ON|Xi=0))=(iKpi)(iK0)={i=1npi if all Xi(s) are 10 if there exists at least one Xi=0E67

(Due to Eq. (9))

In general, Eq. (14) specifies AND-gate inference.

P(X1X2Xn)=P(Y=1|X1,X2,,Xn)={i=1n pi if all Xi(s) are 10 if there exists at least one Xi=0P(Y=0|X1,X2,,Xn)={1i=1n pi if all Xi(s) are 11 if there exists at least one Xi=0E14

The AND-gate inference was also described in ([3], p. 33). Eq. (14) varies according to two cases whose arrangement counters are listed as follows

L=E68
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.E69
LE70
c(Ω:{Xi=1})=2n11,c(Ω:{Xi=0})=2n1,c(Ω)=2n1.E71

The OR-gate inference represents prerequisite relationship satisfying OR-gate condition specified by Eq. (15) ([2], p. 157).

P(Y=1|Ai=ON for some i)=1E15

The OR-gate condition implies

P(Y=0|Ai=ON for some i)=0E72

From Eq. (10), we have ([2], p. 159)

P(Y=0|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)=i=1nP(Ai=OFF|Xi)(due to P(Y=1|Ai=ON for some i)=0)=(iKP(Ai=OFF|Xi=1))(iKP(Ai=OFF|Xi=0))=(iK(1pi))(iK1)={iK(1pi)if K1 if K=E73

(Due to Eq. (9))

In general, Eq. (16) specifies OR-gate inference.

P(X1X2Xn)=1P(Y=0|X1,X2,,Xn)={1iK(1pi) if K0 if K=P(Y=0|X1,X2,,Xn)={iK(1pi) if K1 if K=E16

where K is the set of Xis whose values are 1. The OR-gate inference was mentioned in Refs. ([2], p. 158) and ([3], p. 20). Eq. (16) varies according to two cases whose arrangement counters are listed as follows

KE74
c(Ω:{Xi=1})=2n1,c(Ω:{Xi=0})=2n11,c(Ω)=2n1.E75
K=E76
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.E77

According to De Morgan’s rule with regard to AND-gate and OR-gate, we have

P(not(X1X2Xn))=P((not(X1))(not(X2))(not(Xn)))={1iL(1(1pi))if L0 if L=E78

(Due to Eq. (16))

According to Eq. (14), we also have

P(not(X1X2Xn))=P((not(X1))(not(X2))(not(Xn)))={i=1nP(not(Xi)) if all not (Xi)(s) are 10 if there exists at least one not (Xi)=0={i=1n(1pi) if all Xi(s) are 00 if there exists at least one Xi=1E79

In general, Eq. (17) specifies NAND-gate inference and NOR-gate inference derived from AND-gate and OR-gate

P(not(X1X2Xn))={1iLpi if L0 if L=P(not(X1X2Xn))={i=1nqi if K=0 if KE17

where K and L are the sets of Xis whose values are 1 and 0, respectively.

Suppose the number of sources Xis is even. Let O be the set of Xis whose indices are odd. Let O1 and O2 be subsets of O, in which all Xis are 1 and 0, respectively. Let E be the set of Xis whose indices are even. Let E1 and E2 be the subsets of E, in which all Xis are 1 and 0, respectively.

{E={2,4,6,,n}E1EE2EE1E2=EE1E2=Xi=1,iE1Xi=0,iE2and {O={1,3,5,,n1}O1OO2OO1O2=OO1O2=Xi=1,iO1Xi=0,iO2E80

Thus, O1 and E1 are the subsets of K. Sources Xis and target Y follow XOR-gate if one of two XOR-gate conditions specified by Eq. (18) is satisfied.

P(Y=1|{Ai=ON for iOAi=OFF for i  O})=P(Y=1|A1=ON,A2=OFF,,An1=ON,An=OFF)=1P(Y=1|{Ai=ON for iEAi=OFF for i  E})=P(Y=1|A1=OFF,A2=ON,,An1=OFF,An=ON)=1E18

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)E81

If both XOR-gate conditions are not satisfied then,

P(Y=1|X1,X2,,Xn)=0E82

If the first XOR-gate condition is satisfied, we have

P(Y=1|X1,X2,,Xn)=P(Y=1|A1=ON,A2=OFF,,An1=ON,An=OFF)i=1nP(Ai|Xi)=(iOP(Ai=ON|Xi))(iEP(Ai=OFF|Xi))E83

We have

iOP(Ai=ON|Xi)=(iO1P(Ai=ON|Xi=1))*(iO2P(Ai=ON|Xi=0))=(iO1pi)*(iO20)={iO1pi if O2=0 if O2E84

(Due to Eq. (9))

We also have

iEP(Ai=OFF|Xi)=(iE1P(Ai=OFF|Xi=1))*(iE2P(Ai=OFF|Xi=0))=(iE1(1pi))(iE21)={iE1(1pi) if E11  if E1=E85

(Due to Eq. (9))

Given the first XOR-gate condition, it implies

P(Y=1|X1,X2,,Xn)=(iOP(Ai=ON|Xi))(iEP(Ai=OFF|Xi))={(iO1pi)(iE1(1pi))if O2= and E1 iO1piif O2= and E1=0 if O2E86

Similarly, given the second XOR-gate condition, we have

P(Y=1|X1,X2,,Xn)=(iEP(Ai=ON|Xi))(iOP(Ai=OFF|Xi))={(iE1pi)(iO1(1pi))if E2= and O1 iE1pi if E2= and O1=0 if E2E87

If one of XOR-gate conditions is satisfied then,

P(Y=1|X1,X2,,Xn)=(iOP(Ai=ON|Xi))(iEP(Ai=OFF|Xi))+(iEP(Ai=ON|Xi))(iOP(Ai=OFF|Xi))E88

This implies Eq. (19) to specify XOR-gate inference.

P(X1X2Xn)=P(Y=1|X1,X2,,Xn)={(iO1pi)(iE1(1pi))+(iE1pi)(iO1(1pi)) if O2= and E2=(iO1pi)(iE1(1pi)) if O2= and E1 and E2iO1pi if O2= and E1=(iE1pi)(iO1(1pi)) if E2= and O1 and O2iE1pi if E2= and O1=0 if O2 and E20 if n<2 or n is oddwhere{O={1,3,5,,n1}O1  OO2  OO1O2=OO1O2=Xi=1,iO1Xi=0,iO2and {E={2,4,6,,n}E1  EE2  EE1E2=EE1E2=Xi=1,iE1Xi=0,iE2E19

Where,

Given n ≥ 2 and n is even, Eq. (19) varies according to six cases whose arrangement counters are listed as follows

O2= and E2=E89
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.E90
O2= and E1 and E2E91
c(Ω:{Xi=1})=2n22,c(Ω:{Xi=0})=0,c(Ω)=2n22.E92
O2= and E1=E93
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.E94
E2= and O1 and O2E95
c(Ω:{Xi=1})=2n211,c(Ω:{Xi=0})=2n211,c(Ω)=2n22.E96
E2= and O1=E97
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.E98
O2 and E2E99
c(Ω:{Xi=1})=(2n211)(2n21),c(Ω:{Xi=0})=2n21(2n21),c(Ω)=(2n21)2.E100

Suppose the number of sources Xis is even. According to XNOR-gate inference [1], the output is on if all inputs get the same value 1 (or 0). Sources Xi (s) and target Y follow XNOR-gate if one of two XNOR-gate conditions specified by Eq. (20) is satisfied.

P(Y=1|Ai=ON,i)=1P(Y=1|Ai=OFF,i)=1E20

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)E101

If both XNOR-gate conditions are not satisfied then,

P(Y=1|X1,X2,,Xn)=0E102

If Ai = ON for all i, we have

P(Y=1|X1,X2,,Xn)=P(Y=1|Ai=ON,i)i=1nP(Ai=ON|Xi)=i=1nP(Ai=ON|Xi)={i=1npiif L=0 if LE103

(Please see similar proof in AND-gate inference)

If Ai = OFF for all i, we have

P(Y=1|X1,X2,,Xn)=i=1nP(Ai=OFF|Xi)={iK(1pi) if K1 if K=E104

(Please see similar proof in OR-gate inference)

If one of XNOR-gate conditions is satisfied then,

P(Y=1|X1,X2,,Xn)=i=1nP(Ai=ON|Xi)+i=1nP(Ai=OFF|Xi)E105

This implies Eq. (21) to specify XNOR-gate inference.

P(not(X1X2Xn))=P(Y=1|X1,X2,,Xn)={i=1npi+i=1n(1pi) if L=iK(1pi) if L and K1 if L and K=E21

where K and L are the sets of Xis whose values are 1 and 0, respectively. Eq. (21) varies according to three cases whose arrangement counters are listed as follows

L=E106
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.E107
L and KE108
c(Ω:{Xi=1})=2n11,c(Ω:{Xi=0})=2n11,c(Ω)=2n2.E109
L and K=E110
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.E111

Let U be a set of indices such that Ai = ON and let α ≥ 0 and β ≥ 0 be predefined numbers. The U-gate inference is defined based on α, β and cardinality of U. Table 4 specifies four common U-gate conditions.

|U|=α P(Y=1|A1,A2,,An)=1if there are exactly α variables Ai = ON (s). Otherwise, P(Y=1|A1,A2,,An)=0.
|U|≥α P(Y=1|A1,A2,,An)=1if there are at least α variables Ai = ON (s). Otherwise, P(Y=1|A1,A2,,An)=0.
|U|≤β P(Y=1|A1,A2,,An)=1if there are at most β variables Ai = ON (s). Otherwise, P(Y=1|A1,A2,,An)=0.
α≤|U|≤β P(Y=1|A1,A2,,An)=1if the number of Ai = ON (s) is from α to β. Otherwise, P(Y=1|A1,A2,,An)=0.

Table 4.

U-gate conditions.

Note that U-gate condition on |U| can be arbitrary and it is only relevant to Ais (ON or OFF) and the way to combine Ais. For example, AND-gate and OR-gate are specific cases of U-gate with |U| = n and |U| ≥ 1, respectively. XOR-gate and XNOR-gate are also specific cases of U-gate with specific conditions on Ai (s). However, it must be assured that there is at least one combination of Ais satisfying the predefined U-gate condition, which causes that U-gate probability is not always equal to 0. In this research, U-gate is the most general nonlinear gate where U-gate probability contains products of weights (see Table 5). Later on, we will research a so-called SIGMA-gate that contains only linear combination of weights (sum of weights, see Eq. (23)). Shortly, each X-gate is a pattern owning a particular X-gate inference that is X-gate probability P(X1 × X2 ×…× Xn). Each X-gate inference is based on particular X-gate condition(s) relevant to only variables Ais.

Let,SU=UU iUpijK\U(1pj)PU=P(X1X2Xn)=P(Y=1|X1,X2,,Xn)
As a convention,iUpi=1 if|U|=0jK\U(1pj)=1 if|U|=|K|
|U|=0PU={j=1n(1pj) if |K|>01  if |K|=0
|U|=1
|U|≥0PU={SU if |K|>01 if |K|=0|U|=2|K|
The case |U|≥0 is the same to the case |U|≤n
|U|=nPU={i=1npiif |K|=n0 if |K|<n
|U|={1 if |K|=n0 if |K|<n
|U|=α
0<α<n
PU={SU if |K|α0 if |K|<α|U|={(|K|α) if |K|α0 if |K|<α
|U|≥α
0<α<n
PU={SU if |K|α0 if |K|<α|U|={j=α|K|(|K|j) if |K|α0  if |K|<α
|U|≤β
0<β<n
PU={SU if |K|>01 if |K|=0
|U|={j=0min(β,|K|)(|K|j) if |K|>01 if |K|=0
α≤|U|≤β
0<α<n
0<β<n
PU={SU if |K|α0 if |K|<α|U|={j=αmin(β,|K|)(|K|j) if |K|α0 if |K|<α

Table 5.

U-gate inference.

From Eq. (10), we have

P(Y=1|X1,X2,,Xn)=A1,A2,,AnP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)E116

Let Ube the set of all possible U (s), we have

P(Y=1|X1,X2,,Xn)=UUP(Y=1|A1,A2,,An)i=1nP(Ai|Xi)=UUiUP(Ai=ON|Xi)jUP(Aj=OFF|Xj)E117

If Xi=0,iUthen,

P(Y=1|X1,X2,,Xn)=UUiU0jUP(Aj=OFF|Xj)=0E118

This implies all sets U (s) must be subsets of K. The U-gate probability is rewritten as follows

P(Y=1|X1,X2,,Xn)=UU iUP(Ai=ON|Xi=1)jUP(Aj=OFF|Xj)=UU iUpijUP(Aj=OFF|Xj)=UU iUpijK\UP(Aj=OFF|Xj=1)jKP(Aj=OFF|Xj=0)=UU iUpijK\U(1pj)jK1=UU iUpijK\U(1pj)E119

(Due to Eq. (9))

Let PU be the U-gate probability; Table 5 specifies U-gate inference and cardinality of Uwhere Uis the set of subsets (U) of K.

Note that the notation (nj)denotes the number of combinations of j elements taken from n elements.

(nj)=n!j!(nj)!E120

Arrangement counters relevant to U-gate inference and the set K are listed as follows

|K|=0E121
c(Ω:{Xi=1})=0,c(Ω:{Xi=0})=1,c(Ω)=1.E122
|K|=1E123
c(Ω:{Xi=1})=1,c(Ω:{Xi=0})=0,c(Ω)=1.E124
|K|=α and α>0E125
c(Ω:{Xi=1})=(n1α1),c(Ω:{Xi=0})=(n1α),c(Ω)=(nα).E126
|K|α and α>0E127
c(Ω:{Xi=1})=j=1α(n1j1),c(Ω:{Xi=0})=j=0α(n1j),c(Ω)=j=0α(nj).E128
|K|α and α>0E129
c(Ω:{Xi=1})=j=αn(n1j1),c(Ω:{Xi=0})=j=αn1(n1j),c(Ω)=j=αn(nj).E130

The SIGMA-gate inference [9] represents aggregation relationship satisfying SIGMA-gate condition specified by Eq. (22).

P(Y)=P(i=1nAi)E131

where the set of Ai is complete and mutually exclusive

i=1nwi=1AiAj=,ijE22

The sigma sum i=1nAiindicates that Y is exclusive union of Ais and here, it does not express arithmetical additions.

Y=i=1nAi=i=1nAiE132

This implies

P(Y)=P(i=1nAi)=P(i=1nAi)=i=1nP(Ai)E133

The sigma sum i=1nP(Ai)now expresses arithmetical additions of probabilities P(Ai).

SIGMA-gate inference requires the set of Ais is complete and mutually exclusive, which means that the set of Xis is complete and mutually exclusive too. The SIGMA-gate probability is [9]

P(Y|X1,X2,,Xn)=P(i=1nAi|X1,X2,,Xn)(due to SIGMAgate condition)=i=1nP(Ai|X1,X2,,Xn)(because Ai(s) are mutually exclusive)=i=1nP(Ai|Xi)(because Ai is only dependent on Xi)E134

It implies

P(Y=1|X1,X2,,Xn)=i=1nP(Ai=ON|Xi)=(iKP(Ai=ON|Xi=1))+(iKP(Ai=ON|Xi=0))=iKwi+iK0=iKwiE135

(Due to Eq. (9))

In general, Eq. (23) specifies the theorem of SIGMA-gate inference [9]. The base of this theorem was mentioned by Millán and Pérez-de-la-Cruz ([4], pp. 292-295).

P(X1+X2++Xn)=P(i=1nXi)=P(Y=1|X1,X2,,Xn)=iKwiP(Y=0|X1,X2,,Xn)=1iKwi=iLwiE136

where the set of Xis is complete and mutually exclusive.

i=1nwi=1XiXj=,ijE23

The arrangement counters of SIGMA-gate inference are c(Ω:{Xi = 1}) = c(Ω:{Xi = 0}) = 2n−1, c(Ω) = 2n.

Eq. (9) specifies the “clockwise” strength of relationship between Xi and Y. Event Xi = 1 causes event Ai = ON with “clockwise” weight wi. There is a question “given Xi = 0, how likely the event Ai = OFF is”. In order to solve this problem, I define a so-called “counterclockwise” strength of relationship between Xi and Y denoted ωi. Event Xi = 0 causes event Ai = OFF with “counterclockwise” weight ωi. In other words, each arc in simple graph is associated with a clockwise weight wi and a counterclockwise weight ωi. Such graph is called bi-weight simple graph shown in Figure 4.

Figure 4.

Bi-weight simple graph.

With bi-weight simple graph, all X-gate inferences are extended as so-called X-gate bi-inferences. Derived from Eq. (9), Eq. (24) specifies conditional probability of accountable variables with regard to bi-weight graph.

P(Ai=ON|Xi=1)=pi=wiP(Ai=ON|Xi=0)=1ρi=1ωiP(Ai=OFF|Xi=1)=1pi=1wiP(Ai=OFF|Xi=0)=ρi=ωiE24

The probabilities P(Ai = ON | Xi = 0) and P(Ai = OFF | Xi = 1) are called clockwise adder di and counterclockwise adder δi. As usual, di and δi are smaller than wi and ωi. When di = 0, bi-weight graph becomes normal simple graph.

di=P(Ai=ON|Xi=0)=1ρi=1ωiδi=P(Ai=OFF|Xi=1)=1pi=1wiE137

The total clockwise weight or total counterclockwise weight is defined as sum of clockwise weight and clockwise adder or sum of counterclockwise weight and counterclockwise adder. Eq. (25) specifies such total weights Wi and Wi. These weights are also called relationship powers.

Wi=wi+diWi=ωi+δiE138

where

di=1ρi=1ωiδi=1pi=1wiE25

Given Eq. (25), the set of all Ais is complete if and only if i=1nwi=1.

By extending aforementioned X-gate inferences, we get bi-inferences for AND-gate, OR-gate, NAND-gate, NOR-gate, XOR-gate, XNOR-gate, and U-gate as shown in Table 6.

The largest cardinalities of K (L) are 2n−1 and 2n with and without fixed Xi. Thus, it is possible to calculate arrangement counters. As a convention, the product of probabilities is 1 if indices set is empty.

iIfi=1 if I=E139

With regard to SIGMA-gate bi-inference, the sum of all total clockwise weights must be 1 as follows

i=1nWi=i=1n(wi+di)=i=1n(wi+1ωi)=1E140

Derived from Eq. (23), the SIGMA-gate probability for bi-weight graph is

P(X1+X2++Xn)=i=1nP(Ai=ON|Xi)=iKP(Ai=ON|Xi=1)+iLP(Ai=ON|Xi=0)=iKwi+iLdiE141

Shortly, Eq. (26) specifies SIGMA-gate bi-inference.

P(X1+X2++Xn)=iKwi+iLdiE142

where the set of Xi(s) is complete and mutually exclusive.

i=1nWi=1XiXj=,ijE26

The next section will research diagnostic relationship which adheres to X-gate inference.

4. Multihypothesis diagnostic relationship

Given a simple graph shown in Figure 2, if we replace the target source Y by an evidence D, we get a so-called multihypothesis diagnostic relationship whose property adheres to X-gate inference. Maybe there are other diagnostic relationships in which X-gate inference is not concerned. However, this research focuses on X-gate inference and so multi-hypothesis diagnostic relationship is called X-gate diagnostic relationship. Sources X1, X2,…, Xn become hypotheses. As a convention, these hypotheses have prior uniform distribution.

According to aforementioned X-gate network shown in Figures 2 and 3, the target variable must be binary whereas evidence D can be numeric. It is impossible to establish the evidence D as direct target variable. Thus, the solution of this problem is to add an augmented target binary variable Y and then, the evidence D is connected directly to Y. In other words, the X-gate diagnostic network have n sources {X1, X2,…, Xn}, one augmented hypothesis Y, and one evidence D. As a convention, X-gate diagnostic network is called X-D network. The CPTs of the entire network are determined based on combination of diagnostic relationship and X-gate inference mentioned in previous sections. Figure 5 depicts the augmented X-D network. Note that variables X1, X2,…, Xn, and Y are always binary.

Figure 5.

Augmented X-D network.

Appendix A3 is the proof that the augmented X-D network is equivalent to X-D network with regard to variables X1, X2,…, Xn and D. As a convention, augmented X-D network is considered as same as X-D network.

The simplest case of X-D network is NOT-D network having one hypothesis X1 and one evidence D, equipped with NOT-gate inference. NOT-D network satisfies diagnostic condition because it essentially represents the single diagnostic relationship. Inferred from Eqs. (1) and (7), the conditional probability P(D|X1) and posterior probability P(X1|D) of NOT-D network are

P(D|X1)={1D if X1=1D if X1=0E143
P(X1|D)=P(D|X1)P(X1)P(X1)(P(D|X1=0)+P(D|X1=1))E144

(Due to Bayes’ rule and uniform distribution of X1)

=P(D|X1)P(D|X1=0)+P(D|X1=1)=1*P(D|X1)E145
(due to P(D|X1=0)+P(D|X1=1)=1)E146

It implies NOT-D network satisfies diagnostic condition. Let

Ω={X1,X2,,Xn}n=|Ω|E147

We will validate whether the CPT of diagnostic relationship, P(D|X) specified by Eq. (6), still satisfies diagnostic condition within general case, X-D network. In other words, X-D network is general case of single diagnostic relationship.

Recall from dependencies shown in Figure 5, Eq. (27) specifies the joint probability of X-D network.

P(Ω,Y,D)=P(X1,X2,,Xn,Y,D)=P(D|Y)P(Y|X1,X2,,Xn)i=1nP(Xi)where Ω ={X1, X2,, Xn}.E27

Eq. (28) specifies the conditional probability of D given Xi (likelihood function) and the posterior probability of Xi given D.

P(D|Xi)=P(Xi,D)P(Xi)={Ω,Y,D}\{Xi,D}P(Ω,Y,D){Ω,Y,D}\{Xi}P(Ω,Y,D)P(Xi|D)=P(Xi,D)P(D)={Ω,Y,D}\{Xi,D}P(Ω,Y,D){Ω,Y,D}\{D}P(Ω,Y,D)E28

where Ω = {X1, X2,…, Xn} and the sign “\” denotes the subtraction (excluding) operator in set theory [10]. Eq. (29) specifies the joint probability P(Xi, D) and the marginal probability P(D) given uniform distribution of all sources. Appendix A4 is the proof of Eq. (29).

P(Xi,D)=12nS((2DM)s(Ω:{Xi})+2n1(MD))P(D)=12nS((2DM)s(Ω)+2n(MD))E29

where s(Ω) and s(Ω:{Xi}) are specified in Table 2. From Eqs. (2830) specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient for X-gate inference.

P(D|Xi=1)=P(Xi=1,D)P(Xi=1)=(2DM)s(Ω:{Xi=1})+2n1(MD)2n1SP(D|Xi=0)=P(Xi=0,D)P(Xi=0)=(2DM)s(Ω:{Xi=0})+2n1(MD)2n1SP(Xi=1|D)=P(Xi=1,D)P(D)=(2DM)s(Ω:{Xi=1})+2n1(MD)(2DM)s(Ω)+2n(MD)P(Xi=0|D)=1P(Xi=1|D)=(2DM)s(Ω:{Xi=0})+2n1(MD)(2DM)s(Ω)+2n(MD)k=P(Xi|D)P(D|Xi)=2n1S(2DM)s(Ω)+2n(MD)E30

The transformation coefficient is rewritten as follows

k=2n1S2D(s(Ω)2n1)+M(2ns(Ω))E148

Note that S, D, and M are abstract symbols and there is no proportional connection between 2n−1S and D for all D, specified by Eq. (6). Assuming that such proportional connection 2n−1S = aDj exists for all D where a is arbitrary constant. Given binary case when D = 0 and S = 1, we have

2n1=2n1*1=2n1S=aDj=a*0j=0E149

There is a contradiction, which implies that it is impossible to reduce k into the following form

k=aDjbDjE150

Therefore, if k is constant with regard to D then,

2D(s(Ω)2n1)+M(2ns(Ω))=C0,DE151

where C is constant. We have

D(2D(s(Ω)2n1)+M(2ns(Ω)))=DC2S(s(Ω)2n1)+NM(2ns(Ω))=NC2nS=NCE152

It is implied that

k=2n1S2D(s(Ω)2n1)+M(2ns(Ω))=NC2C=N2E153

This holds

2nS=N(2D(s(Ω)2n1)+M(2ns(Ω)))=2ND(s(Ω)2n1)+2S(2ns(Ω))2ND(s(Ω)2n1)2S(s(Ω)2n1)=0(NDS)(s(Ω)2n1)=0E154

Assuming ND = S we have

ND=S=2NMD=2ME155

There is a contradiction because M is maximum value of D. Therefore, if k is constant with regard to D then s(Ω) = 2n−1. Inversely, if s(Ω) = 2n−1 then k is

k=2n1S2D(2n12n1)+M(2n2n1)=N2E156

In general, the event that k is constant with regard to D is equivalent to the event s(Ω) = 2n−1. This implies diagnostic theorem stated in Table 7.

P(X1X2Xn)=iKpiiLdiE157
P(X1X2Xn)=1iKδiiLρiE158
P(not(X1X2Xn))=1iLρiiKδiE159
P(not(X1X2Xn))=iLdiiKpiE160
P(X1X2Xn)=iO1piiO2diiE1δiiE2ρi+iE1piiE2diiO1δiiO2ρiE161
P(not(X1X2Xn))=iKpiiLdi+iKδiiLρiE162
P(X1X2Xn)=UU(iUKpiiULdi)(iU¯KδiiU¯Lρi)E163
There are four common conditions of U: |U|=α, |U|≥α, |U|≤β, and α≤|U|≤β. Note that U¯is the complement of U,
U¯={1,2,,n}\UE164

The largest cardinality of Uis:
|U|=2nE165

Table 6.

Bi-inferences for AND-gate, OR-gate, NAND-gate, NOR-gate, XOR-gate, XNOR-gate, and U-gate.

Given X-D network is combination of diagnostic relationship and X-gate inference:
P(Y=1|X1,X2,,Xn)=P(X1xX2xxXn)E166
P(D|Y)={DSif Y=1MSDSif Y=0E167

The diagnostic condition of X-D network is satisfied if and only if
s(Ω)=aP(Y=1|a(Ω))=2|Ω|1,ΩE168

At that time, the transformation coefficient becomes:
k=N2E169

Note that weights pi = wi and ρi = ωi, which are inputs of s(Ω), are abstract variables. Thus, the equality s(Ω) = 2|Ω|−1 implies all abstract variables are removed and so s(Ω) does not depend on weights.

Table 7.

Diagnostic theorem.

The diagnostic theorem is the optimal way to validate the diagnostic condition.

The Eq. (30) becomes simple with AND-gate inference. Recall that Eq. (14) specified AND-gate inference as follows

P(X1X2Xn)=P(Y=1|X1,X2,,Xn)={i=1n pi if all Xi(s) are 10 if there exists at least one Xi=0E170

Due to only one case X1 = X2 =…= Xn = 1, we have

s(Ω)=s(Ω:{Xi=1})=i=1n piE171

Due to Xi = 0, we have

s(Ω:{Xi=0})=0E172

Derived from Eq. (30), Eq. (31) specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient according to X-D network with AND-gate reference called AND-D network.

P(D|Xi=1)=(2DM)i=1npi+2n1(MD)2n1SP(D|Xi=0)=MDSP(Xi=1|D)=(2DM)i=1npi+2n1(MD)(2DM)i=1npi+2n(MD)P(Xi=0|D)=2n1(MD)(2DM)i=1npi+2n(MD)k=2n1S(2DM)i=1npi+2n(MD)E31

For convenience, we validate diagnostic condition with a case of two sources Ω = {X1, X2}, p1 = p2 = w1 = w2 = 0.5, D{0,1,2,3}. According to diagnostic theorem stated in Table 7, if s(Ω) ≠ 2 for given X-gate then, such X-gate does not satisfy diagnostic condition.

Given AND-gate inference, by applying Eq. (14), we have

s(Ω)=(0.5*0.5)+0+0+0=0.25E173

Given OR-gate inference, by applying Eq. (16), we have

s(Ω)=(10.5*0.5)+(10.5)+(10.5)+0=33*0.5*0.5=1.75E174

Given XOR-gate inference, by applying Eq. (19), we have

s(Ω)=(0.5*0.5+0.5*0.5)+0.5+0.5+0=1.5E175

Given XNOR-gate inference, by applying Eq. (21), we have

s(Ω)=(0.5*0.5+0.5*0.5)+0.5+0.5+1=2.5E176

Given SIGMA-gate inference, by applying Eq. (23), we have

s(Ω)=(0.5+0.5)+0.5+0.5+0=2E177

It is asserted that AND-gate, OR-gate, XOR-gate, and XNOR-gate do not satisfy diagnostic condition and so they should not be used to assess hypotheses. However, it is not asserted if U-gate and SIGMA-gate satisfy such diagnostic condition. It is necessary to expend equation for SIGMA-gate diagnostic network (called SIGMA-D network) in order to validate it.

In case of SIGMA-gate inference, by applying Eq. (23), we have

iwi=1E178
s(Ω)=2n1iwi=2n1E179
s(Ω:{Xi=1})=2n1wi+2n2jiwj=2n1wi+2n2(1wi)=2n2(1+wi)E180
s(Ω:{Xi=0})=s(Ω)s(Ω:{Xi=1})=2n2(1wi)E181

It is necessary to validate SIGMA-D network with SIGMA-gate bi-inference. By applying Eq. (26), we recalculate these quantities as follows

s(Ω)=2n1iwi+2n1idi=2n1i(wi+di)=2n1E182
(due toi(wi+di)=1)E183
s(Ω:{Xi=1})=2n1wi+2n2jiwj+2n2idi=2n2wi+2n2i(wi+di)=2n2(1+wi)E184
s(Ω:{Xi=0})=s(Ω)s(Ω:{Xi=1})=2n2(1wi)E185

Obviously, quantities s(Ω), s(Ω:{Xi=1}), and s(Ω:{Xi = 0}) are kept intact. According to diagnostic theorem, we conclude that SIGMA-D network does satisfy diagnostic condition due to s(Ω)=2n−1. Thus, SIGMA-D network can be used to assess hypotheses.

Eq. (32), an immediate consequence of Eq. (30), specifies conditional probability P(D|Xi), posterior probability P(Xi|D), and transformation coefficient for SIGMA-D network.

P(D|Xi=1)=(2DM)wi+M2SP(D|Xi=0)=(M2D)wi+M2SP(Xi=1|D)=(2DM)wi+M2MP(Xi=0|D)=(M2D)wi+M2Mk=N2E32

In case of SIGMA-gate, the augmented variable Y can be removed from X-D network. The evidence D is now established as direct target variable. Figure 6 shows a so-called direct SIGMA-gate diagnostic network (direct SIGMA-D network).

Figure 6.

Direct SIGMA-gate diagnostic network (direct SIGMA-D network).

Derived from Eq. (23), the CPT of direct SIGMA-D network is determined by Eq. (33).

P(D|X1,X2,,Xn)=iKDSwi+jLMDSwjE186

where the set of Xi (s) is complete and mutually exclusive.

i=1nwi=1XiXj=,ijE33

Eq. (33) specifies valid CPT due to

DP(D|X1,X2,,Xn)=1SiKwiDD+1SjLwjD(MD)=1SiKSwi+1SjLwj(NMS)=1SiKSwi+1SjLSwj=i=1nwi=1E187

From dependencies shown in Figure 6, Eq. (34) specifies the joint probability of direct SIGMA-D network.

P(X1,X2,,Xn,Y,D)=P(D|X1,X2,,Xn)ni=1P(Xi)E34

Inferred from Eq. (29), Eq. (35) specifies the joint probability P(Xi, D) and the marginal probability P(D) of direct SIGMA-D network, given uniform distribution of all sources.

P(Xi,D)=12ns(Ω:{Xi})P(D)=12ns(Ω)E35

where s(Ω) and s(Ω:{Xi}) are specified in Table 2.

By browsing all variables of direct SIGMA-D network, we have

s(Ω:{Xi=1})=2n1DSwi+2n2jiDSwj+2n2jiMDSwj=2n2S(2Dwi+Mjiwj)=2n2S(2Dwi+M(1wi))(Due to i=1nwi=1)=2n2S((2DM)wi+M)E188

Similarly, we have

s(Ω:{Xi=0})=2n1MDSwi+2n2jiMDSwj+2n2jiDSwj=2n2S((M2D)wi+M)E189
s(Ω)=2n1iDSwi+2n1iMDSwi=2n1MSE190

By applying Eq. (35), s(Ω:{Xi = 0}), s(Ω:{Xi = 1}), and s(Ω), we get the same result with Eq. (32).

P(D|Xi=1)=(2DM)wi+M2SE191
P(D|Xi=0)=(M2D)wi+M2SE192
P(Xi=1|D)=(2DM)wi+M2ME193
P(Xi=0|D)=(M2D)wi+M2ME194
k=N2E195

Therefore, it is possible to use direct SIGMA-D network to assess hypotheses. It is asserted that SIGMA-D network satisfy diagnostic condition when single relationship, NOT-D network, direct SIGMA-D network are specific cases of SIGMA-D network. There is a question: does an X-D network that is different from SIGMA-D network and not aforementioned exist such that it satisfies diagnostic condition?

Recall that each X-D network is a pattern owning a particular X-gate inference which in turn is based on particular X-gate condition (s) relevant to only variables Ais. The most general nonlinear X-D network is U-D network whereas SIGMA-D network is linear one. The U-gate inference given arbitrary condition on U is

P(X1X2Xn)=UU(iUKpiiUL(1ρi))(iU¯K(1pi)iU¯Lρi)E196

Let f be the arrangement sum of U-gate inference.

f(pi,ρi)=a(Ω) UU(iUKpiiUL(1ρi))(iU¯K(1pi)iU¯Lρi)E197

The function f is sum of many large expressions and each expression is product of four possible sub-products (Π) as follows

Expr=iUKpiiUL(1ρi)iU¯K(1pi)iU¯LρiE198

In any case of degradation, there always exist expression Expr (s) having at least 2 sub-products (Π), for example,

Expr=iUKpiiUL(1ρi)E199

Consequently, there always exist Expr (s) having at least 5 terms relevant to pi and ρi if n ≥ 5, for example,

Expr=p1p2p3(1ρ4)(1ρ5)E200

Thus, degree of f will be larger than or equal to 5 given n ≥ 5. According to diagnostic theorem, U-gate network satisfies diagnostic condition if and only if f(pi, ρi) = 2n−1 for all n ≥ 1 and for all abstract variables pi and ρi. Without loss of generality, each pi or ρi is sum of variable x and a variable ai or bi, respectively. Note that all pi, ρi, ai are bi are abstract variables.

pi=x+aiE201
ρi=x+biE202

The equation f−2n−1 = 0 becomes equation g(x) = 0 whose degree is m ≥ 5 if n ≥ 5.

ɡ(x)=±xm+C1xm1++Cm1x+Cm2n1=0E203

where coefficients Ci s are functions of ai and bis. According to Abel-Ruffini theorem [11], equation g(x) = 0 has no algebraic solution when m ≥ 5. Thus, abstract variables pi and ρi cannot be eliminated entirely from g(x) = 0, which causes that there is no specification of U-gate inference P(X1xX2x…xXn) so that diagnostic condition is satisfied.

It is concluded that there is no nonlinear X-D network satisfying diagnostic condition, but a new question is raised: does there exist the general linear X-D network satisfying diagnostic condition? Such linear network is called GL-D network and SIGMA-D network is specific case of GL-D network. The GL-gate probability must be linear combination of weights.

P(X1xX2xxXn)=C+i=1nαiwi+i=1nβidiE204

where C is arbitrary constant.

The GL-gate inference is singular if αi and βi are functions of only Xi as follows

P(X1xX2xxXn)=C+i=1nhi(Xi)wi+i=1nɡi(Xi)diE205

The functions hi and gi are not relevant to Ai because the final equation of GL-gate inference is only relevant to Xi (s) and weights (s). Because GL-D network is a pattern, we only survey singular GL-gate. Mentioned GL-gate is singular by default and it is dependent on how to define functions hi and gi. The arrangement sum with regard to GL-gate is

s(Ω)=a(C+i=1nhi(Xi)wi+i=1nɡi(Xi)di)=2nC+2n1i=1n(hi(Xi=1)+hi(Xi=0))wi+2n1i=1n(ɡi(Xi=1)+ɡi(Xi=0))diE206

Suppose hi and gi are probability mass functions with regard to Xi. For all i, we have

0hi(Xi)1E207
0ɡi(Xi)1E208
hi(Xi=1)+hi(Xi=0)=1E209
ɡi(Xi=1)+gi(Xi=0)=1E210

The arrangement sum becomes

s(Ω)=2nC+2n1i=1n(wi+di)E211

GL-D network satisfies diagnostic condition if

s(Ω)=2nC+2n1i=1n(wi+di)=2n12C+i=1n(wi+di)=1E212

Suppose the set of Xis is complete.

i=1n(wi+di)=1E213

This implies C = 0. Shortly, Eq. (36) specifies the singular GL-gate inference so that GL-D network satisfies diagnostic condition.

P(X1xX2xxXn)=i=1nhi(Xi)wi+i=1nɡi(Xi)diwhere hi and ɡi are probability mass functions and the set of Xi(s) is complete.i=1nWi=1E36

Functions hi(Xi) and gi(Xi) are always linear due to Xim = Xi for all m ≥ 1 when Xi is binary. It is easy to infer that SIGMA-D network is GL-D network with following definition of functions hi and gi.

hi(Xi)=1ɡi(Xi)=Xi,iE214

According to Millán and Pérez-de-la-Cruz [4], a hypothesis can have multiple evidences as seen in Figure 7. This is multi-evidence diagnostic relationship opposite to aforementioned multihypothesis diagnostic relationship.

Figure 7.

Diagnostic relationship with multiple evidences (M-E-D network).

Figure 8.

M-HE-D network.

Figure 7 depicts the multi-evidence diagnostic network called M-E-D network in which there are m evidences D1, D2,…, Dm and one hypothesis Y. Note that Y has uniform distribution.

In simplest case where all evidences are binary, the joint probability of M-E-D network is

P(Y,D1,D2,,Dm)=P(Y)j=1mP(Dj|Y)=P(Y)P(D1,D2,,Dm|Y)E215

The product j=1mP(Dj|Y)is denoted as likelihood function as follows

P(D1,D2,,Dm|Y)=j=1mP(Dj|Y)E216

The posterior probability P(Y | D1, D2,…, Dm) given uniform distribution of Y is

P(Y|D1,D2,,Dm)=P(Y,D1,D2,,Dm)P(Y=1,D1,D2,,Dm)+P(Y=0,D1,D2,,Dm)=1j=1mP(Dj|Y=1)+j=1mP(Dj|Y=0)*P(D1,D2,,Dm|Y)E217

The possible transformation coefficient is

1k=j=1mP(Dj|Y=1)+j=1mP(Dj|Y=0)E218

M-E-D network will satisfy diagnostic condition if k = 1 because all hypotheses and evidence are binary, which leads that following equation specified by Eq. (37) has 2m real roots P(Dj|Y) for all m ≥ 2.

j=1mP(Dj|Y=1)+j=1mP(Dj|Y=0)=1E37

Eq. (37) has no real root given m = 2 according to following proof. Suppose Eq. (37) has 4 real roots as follows

a1=P(D1=1|Y=1)E219
a2=P(D2=1|Y=1)E220
b1=P(D1=1|Y=0)E221
b2=P(D2=1|Y=0)E222

From Eq. (37), it holds

{a1a2+b1b2=1a1(1a2)+b1b2=1(1a1)a2+b1b2=1a1a2+b1(1b2)=1a1a2+(1b1)b2=1{a1=a2b1=b2a12+b12=1a1+2b12=2b1+2a12=2{a1=a2=0b1=b2a12+b12=1b1=2or {a1=a2=0.5b1=b2a12+b12=1b1=1.5E223

The final equation leads a contradiction (b1 = 2 or b1 = 1.5) and so it is impossible to apply the sufficient diagnostic proposition into M-E-D network. Such proposition is only used for one-evidence network. Moreover, X-gate inference absorbs many sources and then produces out of one targeted result whereas the M-E-D network essentially splits one source into many results. It is impossible to model M-E-D network by X-gates. The potential solution for this problem is to group many evidences D1, D2,…, Dm into one representative evidence D which in turn is dependent on hypothesis Y but this solution will be inaccurate in specifying conditional probabilities because directions of dependencies become inconsistent (relationships from Dj to D and from Y to D) except that all Djs are removed and D becomes a vector. However, evidence vector does not simplify the hazardous problem and it changes the current problem into a new problem.

Another solution is to reverse the direction of relationship, in which the hypothesis is dependent on evidences so as to take advantages of X-gate inference as usual. However, the reversion method violates the viewpoint in this research where diagnostic relationship must be from hypothesis to evidence. In other words, we should change the viewpoint.

Another solution is based on a so-called partial diagnostic condition that is a loose case of diagnostic condition for M-E-D network, which is defined as follows

P(Y|Dj)=kP(Dj|Y)E224

where k is constant with regard to Dj. The joint probability is

P(Y,D1,D2,,Dm)=P(Y)j=1mP(Dj|Y)E225

M-E-D network satisfies partial diagnostic condition. In fact, given all variables are binary, we have

P(Y|Dj)=Ψ\{Y,Dj}P(Y,D1,D2,,Dm)Ψ\{Dj}P(Y,D1,D2,,Dm)E226

(Let Ψ = {D1, D2,…, Dm})

=P(Dj|Y)k=1,kjm(DkP(Dk|Y))k=1,kjm(DkP(Dk|Y=1))+k=1,kjm(DkP(Dk|Y=0))E227

(Due to uniform distribution of Y)

=P(Dj|Y)k=1,kjm1k=1,kjm1+k=1,kjm1=12P(Dj|Y)E228
(Due to DkP(Dk|Y)=P(Dk=0|Y)+P(Dk=1|Y)=1)E229

Partial diagnostic condition expresses a different viewpoint. It is not an optimal solution because we cannot test a disease based on only one symptom while ignoring other obvious symptoms, for example. The equality P(Y|Dj) = 0.5P(Dj|Y) indicates the accuracy is decreased two times. However, Bayesian network provides inference mechanism based on personal belief. It is subjective. You can use partial diagnostic condition if you think that such condition is appropriate to your application.

If we are successful in specifying conditional probabilities of M-E-D network, it is possible to define an extended network which is constituted of n hypotheses X1, X2,…, Xn and m evidences D1, D2,…, Dm. Such extended network represents multi-hypothesis multi-evidence diagnostic relationship, called M-HE-D network. Figure 8 depicts M-HE-D network.

The M-HE-D network is the most general case of diagnostic network, which was mentioned in Ref. ([4], p. 297). We can construct any large diagnostic BN from M-HE-D networks and so the research is still open.

5. Conclusion

In short, relationship conversion is to determine conditional probabilities based on logic gates that are adhered to semantics of relationships. The weak point of logic gates is to require that all variables must be binary. For example, in learning context, it is inconvenient for expert to create an assessment BN with studying exercises (evidences) whose marks are only 0 and 1. In order to lessen the impact of such weak point, the numeric evidence is used for extending capacity of simple Bayesian network. However, combination of binary hypothesis and numeric evidence leads to errors or biases in inference. For example, given a student gets maximum grade for an exercise but the built-in inference results out that she/he has not mastered fully the associated learning concept (hypothesis). Therefore, I propose the sufficient diagnostic proposition so as to confirm that numeric evidence is adequate to make complicated inference tasks in BN. The probabilistic reasoning based on evidence is always accurate. Application of the research can go beyond learning context whenever probabilistic deduction relevant to constraints of semantic relationships is required. A large BN can be constituted of many simple BN (s). Inference in large BN is hazardous problem and there are many optimal algorithms for solving such problem. In future, I will research effective inference methods for the special BN that is constituted of X-gate BN (s) mentioned in this research because X-gate BN (s) have precise and useful features of which we should take advantages. For instance, their CPT (s) are simple in some cases and the meanings of their relationships are mandatory in many applications. Moreover, I try my best to research deeply M-E-D network and M-HE-D network whose problems I cannot solve absolutely now.

Two main documents that I referred to do this research are the book “Learning Bayesian Networks” [2] by the author Richard E. Neapolitan and the article “A Bayesian Diagnostic Algorithm for Student Modeling and its Evaluation” [4] by authors Eva Millán and José Luis Pérez-de-la-Cruz. Especially, the SIGMA-gate inference is based on and derived from the work of the Eva Millán and José Luis Pérez-de-la-Cruz. This research is originated from my PhD research “A User Modeling System for Adaptive Learning” [12]. Other references relevant to user modeling, overlay model, and Bayesian network are [1316]. Please concern these references.

A1. Following is the proof of Eq. (9)

P(Ai=ON|Xi)=P(Ai=ON|Xi,Ii=ON)P(Ii=ON)+P(Ai=ON|Xi,Ii=OFF)P(Ii=OFF)=0*(1pi)+P(Ai=ON|Xi,Ii=OFF)pi(By applying Eq. (8))=piP(Ai=ON|Xi,Ii=OFF)E230

It implies

P(Ai=ON|Xi=1)=piP(Ai=ON|Xi=1,Ii=OFF)=piE231
P(Ai=ON|Xi=0)=piP(Ai=ON|Xi=0,Ii=OFF)=0E232
P(Ai=OFF|Xi=1)=1P(Ai=ON|Xi=1)=1piE233
P(Ai=OFF|Xi=0)=1P(Ai=ON|Xi=0)=1 E234

A2. Following is the proof of Eq. (10)

P(Y|X1,X2,,Xn)=P(Y,X1,X2,,Xn)P(X1,X2,,Xn)(Due to Bayes’ rule)=A1,A2,,AnP(Y,X1,X2,,Xn|A1,A2,,An)*P(A1,A2,,An)P(X1,X2,,Xn)(Due to total probability rule)=A1,A2,,AnP(Y,X1,X2,,Xn|A1,A2,,An)*P(A1,A2,,An)P(X1,X2,,Xn)=A1,A2,,AnP(Y|A1,A2,,An)*P(X1,X2,,Xn|A1,A2,,An)*P(A1,A2,,An)P(X1,X2,,Xn)E235

(Because Y is conditionally independent from Xis given Ais)

=A1,A2,,AnP(Y|A1,A2,,An)*P(X1,X2,,Xn,A1,A2,,An)P(X1,X2,,Xn)=A1,A2,,AnP(Y|A1,A2,,An)*P(A1,A2,,An|X1,X2,,Xn)(Due to Bayes’ rule)=A1,A2,,AnP(Y|A1,A2,,An)i=1nP(Ai|X1,X2,,Xn)E236

(Because Ais are mutually independent)

=A1,A2,,AnP(Y|A1,A2,,An)i=1nP(Ai|Xi)E237

(Because each Ai is only dependent on Xi) ■

A3. Following is the proof that the augmented X-D network (shown in Figure 5) is equivalent to X-D network (shown in shown in Figures 2 and 3) with regard to variables X1, X2,…, Xn, and D.

The joint probability of augmented X-D network shown in Figure 5 is

P(X1,X2,,Xn,Y,D)=P(D|Y)P(Y|X1,X2,,Xn)i=1nP(Xi)E238

The joint probability of X-D network is

P(X1,X2,,Xn,D)=P(D|X1,X2,,Xn)i=1nP(Xi)E239

By applying total probability rule into X-D network, we have

P(X1,X2,,Xn,D)=P(D,X1,X2,,Xn)P(X1,X2,,Xn)i=1nP(Xi)(Due to Bayes’ rule)=YP(D,X1,X2,,Xn|Y)P(Y)P(X1,X2,,Xn)i=1nP(Xi)(Due to total probability rule)=YP(D,X1,X2,,Xn|Y)P(Y)P(X1,X2,,Xn)i=1nP(Xi)=(YP(D,X1,X2,,Xn|Y)*P(Y)P(X1,X2,,Xn))*i=1nP(Xi)=(YP(D|Y)*P(X1,X2,,Xn|Y)P(Y)P(X1,X2,,Xn))*i=1nP(Xi)E240

(Because D is conditionally independent from all Xi (s) given Y)

=(YP(D|Y)*P(Y,X1,X2,,Xn)P(X1,X2,,Xn))*i=1nP(Xi)=YP(D|Y)P(Y|X1,X2,,Xn)i=1nP(Xi)(Due to Bayes’ rule)=YP(X1,X2,,Xn,Y,D) E241

A4. Following is the proof of Eq. (29)

Given uniform distribution of Xi (s), we have

P(X1)=P(X2)==P(Xn)=12E242

The joint probability becomes

P(Ω,Y,D)=12nP(Y|X1,X2,,Xn)P(D|Y)E243

The joint probability of Xi and D is

P(Xi,D)={Ω,Y,D}\{Xi,D}P(Ω,Y,D)=P(X1=1,X2=1,,Xi,,Xn1=1,Xn=1,Y=1,D)+P(X1=1,X2=1,,Xi,,Xn1=1,Xn=0,Y=1,D)++P(X1=0,X2=0,,Xi,,Xn1=0,Xn=1,Y=1,D)+P(X1=0,X2=0,,Xi,,Xn1=0,Xn=0,Y=1,D)+P(X1=1,X2=1,,Xi,,Xn1=1,Xn=1,Y=0,D)+P(X1=1,X2=1,,Xi,,Xn1=1,Xn=0,Y=0,D)++P(X1=0,X2=0,,Xi,,Xn1=0,Xn=1,Y=0,D)+P(X1=0,X2=0,,Xi,,Xn1=0,Xn=0,Y=0,D)=12nDS(P(Y=1|X1=1,X2=1,,Xi,,Xn1=1,Xn=1)+P(Y=1|X1=1,X2=1,,Xi,,Xn1=1,Xn=0)++P(Y=1|X1=1,X2=1,,Xi,,Xn1=0,Xn=1)+P(Y=1|X1=1,X2=1,,Xi,,Xn1=0,Xn=0))+12nMDS(P(Y=0|X1=1,X2=1,,Xi,,Xn1=1,Xn=1)+P(Y=0|X1=1,X2=1,,Xi,,Xn1=1,Xn=0)++P(Y=0|X1=1,X2=1,,Xi,,Xn1=0,Xn=1)+P(Y=0|X1=1,X2=1,,Xi,,Xn1=0,Xn=0))E244

(Due to Eq. (6))

The marginal probability of D is

P(D)={Ω,Y,D}\{D}P(Ω,Y,D)=P(X1=1,X2=1,,Xn=1,Y=1,D)+P(X1=1,X2=1,,Xn=0,Y=1,D)++P(X1=0,X2=0,,Xn=1,Y=1,D)+P(X1=0,X2=0,,Xn=0,Y=1,D)+P(X1=1,X2=1,,Xn=1,Y=0,D)=12nDS(P(Y=1|X1=1,X2=1,,Xn=1)+P(Y=1|X1=1,X2=1,,Xn=0)++P(Y=1|X1=1,X2=1,,Xn=1)+P(Y=1|X1=1,X2=1,,Xn=0))+12nMDS(P(Y=0|X1=1,X2=1,,Xn=1)+P(Y=0|X1=1,X2=1,,Xn=0)++P(Y=0|X1=1,X2=1,,Xn=1)+P(Y=0|X1=1,X2=1,,Xn=0))+P(X1=1,X2=1,,Xn=0,Y=0,D)+E245

By applying Table 2, the joint probability P(Xi, D) is determined as follows

P(Xi,D)=12nS(DaP(Y=1|a(Ω:{Xi}))+(MD)aP(Y=0|a(Ω:{Xi})))=12nS(DaP(Y=1|a(Ω:{Xi}))+(MD)a(1P(Y=1|a(Ω:{Xi}))))=12nS((2DM)s(Ω:{Xi})+2n1(MD))E246

Similarly, the marginal probability P(D) is

P(D)=12nS((2DM)s(Ω)+2n(MD)) E247

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Loc Nguyen (November 2nd 2017). Converting Graphic Relationships into Conditional Probabilities in Bayesian Network, Bayesian Inference, Javier Prieto Tejedor, IntechOpen, DOI: 10.5772/intechopen.70057. Available from:

Embed this chapter on your site Copy to clipboard

<iframe src="http://www.intechopen.com/embed/bayesian-inference/converting-graphic-relationships-into-conditional-probabilities-in-bayesian-network" />

Embed this code snippet in the HTML of your website to show this chapter

chapter statistics

325total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Bayesian Estimation of Multivariate Autoregressive Hidden Markov Model with Application to Breast Cancer Biomarker Modeling

By Hamid El Maroufy, El Houcine Hibbah, Abdelmajid Zyad and Taib Ziad

Related Book

First chapter

Bayesian Networks for Supporting Model Based Predictive Control of Smart Buildings

By Alessandro Carbonari, Massimo Vaccarini and Alberto Giretti

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us