Sufficient diagnostic proposition.

## Abstract

Bayesian network (BN) is a powerful mathematical tool for prediction and diagnosis applications. A large Bayesian network can constitute many simple networks, which in turn are constructed from simple graphs. A simple graph consists of one child node and many parent nodes. The strength of each relationship between a child node and a parent node is quantified by a weight and all relationships share the same semantics such as prerequisite, diagnostic, and aggregation. The research focuses on converting graphic relationships into conditional probabilities in order to construct a simple Bayesian network from a graph. Diagnostic relationship is the main research object, in which sufficient diagnostic proposition is proposed for validating diagnostic relationship. Relationship conversion is adhered to logic gates such as AND, OR, and XOR, which are essential features of the research.

### Keywords

- diagnostic relationship
- Bayesian network
- transformation coefficient

## 1. Introduction

Bayesian network (BN) is a directed acyclic graph (DAG) consists of a set of nodes and a set of arcs. Each node is a random variable. Each arc represents a relationship between two nodes. The strength of a relationship in a graph can be quantified by a number called *weight*. There are some important relationships such as prerequisite, diagnostic, and aggregation. The difference between BN and normal graph is that the strength of every relationship in BN is represented by a conditional probability table (CPT) whose entries are conditional probabilities of a child node given parent nodes. There are two main approaches to construct a BN, which are as follows

The first approach aims to learn BN from training data by learning machine algorithms.

The second approach is that experts define some graph patterns according to specific relationships and then, BN is constructed based on such patterns along with determined CPTs.

This research focuses on the second approach in which relationships are converted into CPTs. Essentially, relationship conversion aims to determine conditional probabilities based on weights and meanings of relationships. We will have different ways to convert graphic weights into CPTs for different relationships. It is impossible to convert all relationships but some of them such as diagnostic, aggregation, and prerequisite are mandatory ones that we must specify as computable CPTs of BN. Especially, these relationships are adhered to logic X-gates [1] such as AND-gate, OR-gate, and SIGMA-gate. The X-gate inference in this research is derived and inspired from noisy OR-gate described in the book “Learning Bayesian Networks” Neapolitan ([2], pp. 157–159). Díez and Druzdzel [3] also researched OR/MAX, AND/MIN, and noisy XOR inferences but they focused on canonical models, deterministic models, and ICI models whereas I focused on logic gate and graphic relationships. So, their research is different from mine but we share the same result that is AND-gate model. In general, my research focuses on applied probability adhered to Bayesian network, logic gates, and Bayesian user modeling [4]. The scientific results are shared with Millán and Pérez-de-la-Cruz [4].

Factor graph [5] represents factorization of a global function into many partial functions. If joint distribution of BN is considered as the global function and CPTs are considered as partial functions, the sumproduct algorithm [6] of factor graph is applied into calculating posterior probabilities of variables in BN. Pearl’s propagation algorithm [7] is very successful in BN inference. The application of factor graph into BN is only realized if all CPT (s) of BN are already determined whereas this research focuses on defining such CPTs firstly. I did not use factor graph for constructing BN. The concept “X-gate inference” only implies how to convert simple graph into BN. However, the arrange sum with a fixed variable mentioned in this research is the “*not-sum*” ([6], p. 499) of factor graph. Essentially, X-gate probability shown in Eq. (10) is as same as *λ* message in the Pearl’s algorithm ([6], p. 518) but I use the most basic way to prove the X-gate probability.

As default, the research is applied in learning context in which BN is used to assess students’ knowledge. Evidences are tests, exams, exercises, etc. and hypotheses are learning concepts, knowledge items, etc. Note that diagnostic relationship is very important to Bayesian evaluation in learning context because it is used to evaluate student’s mastery of concepts (knowledge items) over entire BN. Now, we start relationship conversion with a research on diagnostic relationship in the next section.

## 2. Diagnostic relationship

In some opinions like mine, the diagnostic relationship should be from hypothesis to evidence. For example, disease is hypothesis and symptom is evidence. The symptom must be conditionally dependent on disease. Given a symptom, calculating the posterior probability of disease is essentially to diagnose likelihood of such disease ([8], p. 1666). Inversely, the arc from evidence to hypothesis implies prediction where evidence and hypothesis represent observation and event, respectively. Given an observation, calculating the posterior probability of the event is essentially to predict/assert such event ([8], p. 1666). Figure 1 shows diagnosis and prediction.

The weight *w* of the relationship between *X* and *D* is 1. Figure 1 depicts simplest graph with two random variables. We need to convert diagnostic relationship into conditional probabilities in order to construct a simplest BN from the simplest graph. Note that hypothesis is binary but evidence can be numerical. In learning context, evidence *D* can be test, exam, exercise, etc. The conditional probability of *D* given *X* (likelihood function) is *P*(*D*|*X*). The posterior probability of *X* is *P*(*X*|*D*), which is used to evaluate student’s mastery over concept (hypothesis) *X* given evidence *D*. Eq. (1) specifies CPT of *D* when *D* is binary (0 and 1)

Eq (1) is our first relationship conversion. It implies

Evidence *D* can be used to diagnose hypothesis *X* if the so-called *sufficient diagnostic proposition* is satisfied, as seen in Table 1.

D is equivalent to X in diagnostic relationship if P(X|D) = kP(D|X) given uniform distribution of X and the transformation coefficient k is independent from D. In other words, k is constant with regards to D and so D is called sufficient evidence. |

The concept of sufficient evidence is borrowed from the concept of sufficient statistics and it is inspired from equivalence of variables *T* and *T’* in the research ([4], pp. 292-295). The proposition can be restated that evidence *D* is only used to assess hypotheses if it is sufficient evidence. As a convention, the proposition is called *diagnostic condition* and hypotheses have uniform distribution. The assumption of hypothetic uniform distribution (*P*(*X* = 1) = *P*(*X* = 0)) implies that we cannot assert whether or not given hypothesis is true before we observe its evidence.

In learning context, *D* can be totally used to assess student’s mastery of *X* if diagnostic condition is satisfied. Derived from such condition, Eq. (2) specifies transformation coefficient *k* given uniform distribution of *X*.

We need to prove that Eq. (1) satisfies diagnostic condition. Suppose the prior probability of *X* is uniform.

we have

It is easy to infer that the transformation coefficient *k* is 1, if *D* is binary. In practice, evidence *D* is often a test whose grade ranges within an interval {0, 1, 2,…, *η*}. Eq. (3) specifies CPT of *D* in this case

Where

As a convention, *X* = 1), the probability that she/he completes the exercise/test *D* is proportional to her/his mark on *D*

We need to prove that Eq. (3) satisfies diagnostic condition. Suppose the prior probability of *X* is uniform.

The assumption of prior uniform distribution of *X* implies that we do not determine if student has mastered *X* yet. Similarly, we have

So, the transformation coefficient *k* is *D* ranges in {0, 1, 2,…, *η*}.

In the most general case, discrete evidence *D* ranges within an arbitrary integer interval *D* is bounded integer variable whose lower bound and upper bound are *a* and *b*, respectively. Eq. (4) specifies CPT of *D*, where

Where

Note,

Similarly, we have

If evidence *D* is continuous in the real interval [*a, b*] with note that *a* and *b* are real numbers, Eq. (5) specifies probability density function (PDF) of continuous evidence

where

As a convention, [*a, b*] is called domain of continuous evidence, which can be replaced by open or half-open intervals such as (*a, b*), (*a, b*], and [*a, b*). Of course we have *D* is often a test whose grade ranges within real interval [*a, b*].

Functions *p*(*D*|*X* = 1) and *p*(*D*|*X* = 0) are valid PDFs due to

According to the diagnostic condition, we need to prove the equality

where,

When *D* is continuous, its probability is calculated in *ε*-vicinity where *ε* is very small number. As usual, *ε* is bias if *D* is measure values produced from equipment. The probability of *D* given *X*, where *D* + *ε**D* – *ε*

In fact, we have

In general, Eq. (6) summarizes CPT of evidence of single diagnostic relationship.

Where,

In general, if the conditional probability *P*(*D|X*) is specified by Eq. (6), the diagnostic condition will be satisfied. Note that the CPT *P*(*D|X*) is the PDF *p*(*D|X*) in case of continuous evidence. The diagnostic relationship will be extended with more than one hypothesis. The next section will mention how to determine CPTs of a simple graph with one child node and many parent nodes based on X-gate inferences.

## 3. X-gate inferences

Given a simple graph consisting of one child variable *Y* and *n* parent variables *X*_{i}, as shown in Figure 2, each relationship from *X*_{i} to *Y* is quantified by normalized weight *w*_{i} where 0 ≤ *w*_{i} ≤ 1. A large graph is an integration of many simple graphs. Figure 2 shows the DAG of a simple BN. As aforementioned, the essence of constructing simple BN is to convert graphic relationships of simple graph into CPTs of simple BN.

Child variable *Y* is called target and parent variables *X*_{i}s are called sources. Especially, these relationships are adhered to X-gates such as AND-gate, OR-gate, and SIGMA-gate. These gates are originated from logic gate [1]. For instance, AND-gate and OR-gate represent prerequisite relationship. SIGMA-gate represents aggregation relationship. Therefore, relationship conversion is to determined X-gate inference. The simple graph shown in Figure 2 is also called X-gate graph or X-gate network. Please distinguish the letter “X” in the term “X-gate inference” which implies logic operators (AND, OR, XOR, etc.) from the “variable *X*”.

All variables are binary and they represent events. The probability *P*(*X*) indicates event *X* occurs. Thus, *P*(*X*) implicates *P*(*X* = 1) and *P*(not(*X*)) implicates *P*(*X* = 0). Eq. (7) specifies the simple NOT-gate inference.

X-gate inference is based on three assumptions mentioned in Ref. ([2], p. 157), which are as follows

*X-gate inhibition*: Given a relationship from source*X*_{i}to target*Y*, there is a factor*I*_{i}that inhibits*X*_{i}from being integrated into*Y*. Factor*I*_{i}is called inhibition of*X*_{i}. That the inhibition*I*_{i}is turned off is prerequisite of*X*_{i}integrated into*Y*.*Inhibition independence*: Inhibitions are mutually independent. For example, inhibition*I*_{1}of*X*_{1}is independent from inhibition*I*_{2}of*X*_{2}.*Accountability*: X-gate network is established by accountable variables*A*_{i}for*X*_{i}and*I*_{i}. Each X-gate inference owns particular combination of*A*_{i}s.

Figure 3 shows the extended X-gate network with accountable variables *A*_{i}s ([2], p. 158).

The strength of each relationship from source *X*_{i} to target *Y* is quantified by a weight 0 ≤ *w*_{i} ≤ 1. According to the assumption of inhibition, probability of *I*_{i} = *OFF* is *p*_{i}, which is set to be the weight *w*_{i}.

If notation *w*_{i} is used, we focus on the strength of relationship. If notation *p*_{i} is used, we focus on probability of *OFF* inhibition. In probabilistic inference, *p*_{i} is also prior probability of *X*_{i} = 1. However, we will assume each *X*_{i} has uniform distribution later on. Eq. (8) specifies probabilities of inhibitions *I*_{i}s and accountable variables *A*_{i}s.

According to Eq. (8), given probability *P*(*A*_{i}=*ON* | *X*_{i}=1, *I*_{i}=*OFF*), it is assured 100% confident that accountable variables *A*_{i} is turned on if source *X*_{i} is 1 and inhibition *I*_{i} is turned off. Eq. (9) specifies conditional probability of accountable variables *A*_{i} (s) given *X*_{i} (s), which is corollary of Eq. (8).

Appendix A1 is the proof of Eq. (9). As a definition, the set of all *X*_{i}s is complete if and only if

The set of all *X*_{i}s is mutually exclusive if and only if

For each *X*_{i,} there is only one *A*_{i} and vice versa, which establishes a bijection between *X*_{i}s and *A*_{i}s. Obviously, the fact that the set of all *X*_{i}s is complete is equivalent to the fact that the set of all *A*_{i} (s) is complete. We will prove by contradiction that “the fact that the set of all *X*_{i} (s) is mutually exclusive is equivalent to the fact that the set of all *A*_{i} (s) is mutually exclusive.” Suppose *B*. Due to

By similar proof, we have

The extended X-gate network shown in Figure 3 is interpretation of simple network shown in Figure 2. Specifying CPT of the simple network is to determine the conditional probability *P*(*Y* = 1 | *X*_{1}, *X*_{2},…, *X*_{n}) based on extended X-gate network. The X-gate inference is represented by such probability *P*(*Y* = 1 | *X*_{1}, *X*_{2},…, *X*_{n}) specified by Eq. (10) ([2], p. 159).

Appendix A2 is the proof of Eq. (10). It is necessary to make some mathematical notations because Eq. (10) is complicated, which is relevant to arrangements of *X*_{i} (s). Given the set Ω = {*X*_{1}, *X*_{2},…, *X*_{n}} where all variables are binary, Table 2 specifies binary arrangements of Ω.

Given Ω = {X_{1}, X_{2},…, X_{n}} where |Ω| = n is cardinality of Ω.Let a(Ω) be an arrangement of Ω which is a set of n instances {X_{1}=x_{1}, X_{2}=x_{2},…, X_{n}=x_{n}} where x_{i} is 1 or 0. The number of all a(Ω) is 2^{|Ω|}. For instance, given Ω = {X_{1}, X_{2}}, there are 2^{2}=4 arrangements as follows:Let a(Ω:{X_{i}}) be the arrangement of Ω with fixed X_{i}. The number of all a(Ω:{X_{i}}) is 2^{|Ω|−1}. Similarly, for instance, a(Ω:{X_{1}, X_{2}, X_{3}}) is an arrangement of Ω with fixed X_{1}, X_{2}, X_{3}. The number of all a(Ω:{X_{1}, X_{2}, X_{3}}) is 2^{|Ω|−3}.Let c(Ω) and c(Ω:{X_{i}}) be the number of arrangements a(Ω) and a(Ω:{X_{i}}), respectively. Such c(Ω) and c(Ω:{X_{i}}) are called arrangement counters. As usual, counters c(Ω) and c(Ω:{X_{i}}) are equal to 2^{|Ω|} and 2^{|Ω|−1}, respectively but they will vary according to specific cases.Let F acting on every a(Ω). The number of arrangements on which F acts is c(Ω).Let x denote the X-gate operator, for instance, x = ⊙ for AND-gate, x = ⊕ for OR-gate, x = not ⊙ for NAND-gate, x = not ⊕ for NOR-gate, x = ⊗ for XOR-gate, x = not ⊗ for XNOR-gate, x = ⊎ for U-gate, s(Ω:{X_{i}}) and s(Ω) be sum of all X_{i}, respectively.For example, s(Ω) and s(Ω:{X_{i}}) for OR-gate are:Such s(Ω) and s(Ω:{X_{i}}) are called arrangement sum. They are acting function F.Note that Ω can be any set of binary variables. |

It is not easy to produce all binary arrangements of Ω. Table 3 shows a code snippet written by Java programming language for producing such all arrangements.

public class ArrangementGenerator { private ArrayList<int[]> arrangements; private int n; private int r; private ArrangementGenerator(int n, int r) { this.n = n; this.r = r; this.arrangements = new ArrayList(); } private void create(int[] a, int i) { for(int j = 0; j < n; j++) { a[i] = j; if(i < r - 1) create(a, i + 1); else if(i == r -1) { int[] b = new int[a.length]; for(int k = 0; k < a.length; k++) b[k] = a[k]; arrangements.add(b); } } } public int[] get(int i) { return arrangements.get(i); } public long size() { return arrangements.size(); } public static ArrangementGenerator parse(int n, int r) { ArrangementGenerator arr = new ArrangementGenerator(n, r); int[] a = new int[r]; for(int i=0; i<r; i++) a[i] = -1; arr.create(a, 0); return arr; } } |

Each element of the list “*arrangements*” is a binary arrangement *a*(Ω) presented by an array of bits (0 and 1). The method “*create*(*int*[] *a, int i*)” which is recursive method, is the main one that generates arrangements. The method call “*ArrangementGenerator.parse*(2, *n*)” will list all possible binary arrangements.

Eq. (11) specifies the connection between *s*(Ω:{*X*_{i} = 1}) and *s*(Ω:{*X*_{i} = 0}), between *c*(Ω:{*X*_{i} = 1}) and *c*(Ω:{*X*_{i} = 0}).

It is easy to draw Eq. (11) when the set of all arrangements *a*(Ω:{*X*_{i} = 1) is complement of the set of all arrangements *a*(Ω:{*X*_{i} = 0).

Let *K* be a set of *X*_{i}s whose values are 1 and let *L* be a set of *X*_{i}s whose values are 0. *K* and *L* are mutually complementary. Eq. (12) determines sets *K* and *L*.

The AND-gate inference represents prerequisite relationship satisfying AND-gate condition specified by Eq. (13).

From Eq. (10), we have

(Due to Eq. (9))

In general, Eq. (14) specifies AND-gate inference.

The AND-gate inference was also described in ([3], p. 33). Eq. (14) varies according to two cases whose arrangement counters are listed as follows

The **OR-gate** inference represents prerequisite relationship satisfying OR-gate condition specified by Eq. (15) ([2], p. 157).

The OR-gate condition implies

From Eq. (10), we have ([2], p. 159)

(Due to Eq. (9))

In general, Eq. (16) specifies OR-gate inference.

where *K* is the set of *X*_{i}s whose values are 1. The OR-gate inference was mentioned in Refs. ([2], p. 158) and ([3], p. 20). Eq. (16) varies according to two cases whose arrangement counters are listed as follows

According to De Morgan’s rule with regard to AND-gate and OR-gate, we have

(Due to Eq. (16))

According to Eq. (14), we also have

In general, Eq. (17) specifies NAND-gate inference and NOR-gate inference derived from AND-gate and OR-gate

where *K* and *L* are the sets of *X*_{i}s whose values are 1 and 0, respectively.

Suppose the number of sources *X*_{i}s is even. Let *O* be the set of *X*_{i}s whose indices are odd. Let *O*_{1} and *O*_{2} be subsets of *O*, in which all *X*_{i}s are 1 and 0, respectively. Let *E* be the set of *X*_{i}s whose indices are even. Let *E*_{1} and *E*_{2} be the subsets of *E*, in which all *X*_{i}s are 1 and 0, respectively.

Thus, *O*_{1} and *E*_{1} are the subsets of *K*. Sources *X*_{i}s and target *Y* follow **XOR-gate** if one of two XOR-gate conditions specified by Eq. (18) is satisfied.

From Eq. (10), we have

If both XOR-gate conditions are not satisfied then,

If the first XOR-gate condition is satisfied, we have

We have

(Due to Eq. (9))

We also have

(Due to Eq. (9))

Given the first XOR-gate condition, it implies

Similarly, given the second XOR-gate condition, we have

If one of XOR-gate conditions is satisfied then,

This implies Eq. (19) to specify XOR-gate inference.

Where,

Given *n* ≥ 2 and *n* is even, Eq. (19) varies according to six cases whose arrangement counters are listed as follows

Suppose the number of sources *X*_{i}s is even. According to **XNOR-gate** inference [1], the output is on if all inputs get the same value 1 (or 0). Sources *X*_{i} (s) and target *Y* follow XNOR-gate if one of two XNOR-gate conditions specified by Eq. (20) is satisfied.

From Eq. (10), we have

If both XNOR-gate conditions are not satisfied then,

If *A*_{i} = *ON* for all *i*, we have

(Please see similar proof in AND-gate inference)

If *A*_{i} = *OFF* for all *i*, we have

(Please see similar proof in OR-gate inference)

If one of XNOR-gate conditions is satisfied then,

This implies Eq. (21) to specify XNOR-gate inference.

where *K* and *L* are the sets of *X*_{i}s whose values are 1 and 0, respectively. Eq. (21) varies according to three cases whose arrangement counters are listed as follows

Let *U* be a set of indices such that *A*_{i} = *ON* and let *α* ≥ 0 and *β* ≥ 0 be predefined numbers. The U-gate inference is defined based on *α, β* and cardinality of *U*. Table 4 specifies four common U-gate conditions.

|U|=α α variables A_{i} = ON (s). Otherwise, |

|U|≥α α variables A_{i} = ON (s). Otherwise, |

|U|≤β β variables A_{i} = ON (s). Otherwise, |

α≤|U|≤β A_{i} = ON (s) is from α to β. Otherwise, |

Note that U-gate condition on |*U*| can be arbitrary and it is only relevant to *A*_{i}s (*ON* or *OFF*) and the way to combine *A*_{i}s. For example, AND-gate and OR-gate are specific cases of U-gate with |*U*| = *n* and |*U*| ≥ 1, respectively. XOR-gate and XNOR-gate are also specific cases of U-gate with specific conditions on *A*_{i} (s). However, it must be assured that there is at least one combination of *A*_{i}s satisfying the predefined U-gate condition, which causes that U-gate probability is not always equal to 0. In this research, U-gate is the most general nonlinear gate where U-gate probability contains products of weights (see Table 5). Later on, we will research a so-called SIGMA-gate that contains only linear combination of weights (sum of weights, see Eq. (23)). Shortly, each X-gate is a pattern owning a particular X-gate inference that is X-gate probability *P*(*X*_{1} × *X*_{2} ×…× *X*_{n}). Each X-gate inference is based on particular X-gate condition(s) relevant to only variables *A*_{i}s.

Let, | |

As a convention, | |

|U|=0 | |

|U|≥0 | The case | U|≥0 is the same to the case |U|≤n |

|U|=n | |

|U|=α0< α<n | |

|U|≥α0< α<n | |

|U|≤β0< β<n | |

α≤|U|≤β0< α<n0< β<n |

From Eq. (10), we have

Let *U* (s), we have

If

This implies all sets *U* (s) must be subsets of *K*. The U-gate probability is rewritten as follows

(Due to Eq. (9))

Let *P*_{U} be the U-gate probability; Table 5 specifies U-gate inference and cardinality of *U*) of *K*.

Note that the notation *j* elements taken from *n* elements.

Arrangement counters relevant to U-gate inference and the set *K* are listed as follows

The **SIGMA-gate** inference [9] represents aggregation relationship satisfying SIGMA-gate condition specified by Eq. (22).

where the set of *Ai* is complete and mutually exclusive

The sigma sum *Y* is exclusive union of *A*_{i}s and here, it does not express arithmetical additions.

This implies

The sigma sum *P*(*A*_{i}).

SIGMA-gate inference requires the set of *A*_{i}s is complete and mutually exclusive, which means that the set of *X*_{i}s is complete and mutually exclusive too. The SIGMA-gate probability is [9]

It implies

(Due to Eq. (9))

In general, Eq. (23) specifies the theorem of SIGMA-gate inference [9]. The base of this theorem was mentioned by Millán and Pérez-de-la-Cruz ([4], pp. 292-295).

where the set of *X*_{i}s is complete and mutually exclusive.

The arrangement counters of SIGMA-gate inference are *c*(Ω:{*X*_{i} = 1}) = *c*(Ω:{*X*_{i} = 0}) = 2^{n−1}, *c*(Ω) = 2^{n}.

Eq. (9) specifies the “clockwise” strength of relationship between *X*_{i} and *Y*. Event *X*_{i} = 1 causes event *A*_{i} = *ON* with “clockwise” weight *w*_{i}. There is a question “given *X*_{i} = 0, how likely the event *A*_{i} = *OFF* is”. In order to solve this problem, I define a so-called “counterclockwise” strength of relationship between *X*_{i} and *Y* denoted *ω*_{i}. Event *X*_{i} = 0 causes event *A*_{i} = *OFF* with “counterclockwise” weight *ω*_{i}. In other words, each arc in simple graph is associated with a clockwise weight *w*_{i} and a counterclockwise weight *ω*_{i}. Such graph is called *bi-weight simple graph* shown in Figure 4.

With bi-weight simple graph, all X-gate inferences are extended as so-called X-gate bi-inferences. Derived from Eq. (9), Eq. (24) specifies conditional probability of accountable variables with regard to bi-weight graph.

The probabilities *P*(*A*_{i} = *ON* | *X*_{i} = 0) and *P*(*A*_{i} = *OFF* | *X*_{i} = 1) are called clockwise adder *d*_{i} and counterclockwise adder *δ*_{i}. As usual, *d*_{i} and *δ*_{i} are smaller than *w*_{i} and *ω*_{i}. When *d*_{i} = 0, bi-weight graph becomes normal simple graph.

The total clockwise weight or total counterclockwise weight is defined as sum of clockwise weight and clockwise adder or sum of counterclockwise weight and counterclockwise adder. Eq. (25) specifies such total weights *W*_{i} and

where

Given Eq. (25), the set of all *A*_{i}s is complete if and only if

By extending aforementioned X-gate inferences, we get bi-inferences for AND-gate, OR-gate, NAND-gate, NOR-gate, XOR-gate, XNOR-gate, and U-gate as shown in Table 6.

The largest cardinalities of *K* (*L*) are 2^{n−1} and 2^{n} with and without fixed *X*_{i}. Thus, it is possible to calculate arrangement counters. As a convention, the product of probabilities is 1 if indices set is empty.

With regard to SIGMA-gate bi-inference, the sum of all total clockwise weights must be 1 as follows

Derived from Eq. (23), the SIGMA-gate probability for bi-weight graph is

Shortly, Eq. (26) specifies SIGMA-gate bi-inference.

where the set of *X*_{i}(s) is complete and mutually exclusive.

The next section will research diagnostic relationship which adheres to X-gate inference.

## 4. Multihypothesis diagnostic relationship

Given a simple graph shown in Figure 2, if we replace the target source *Y* by an evidence *D*, we get a so-called *multihypothesis diagnostic relationship* whose property adheres to X-gate inference. Maybe there are other diagnostic relationships in which X-gate inference is not concerned. However, this research focuses on X-gate inference and so multi-hypothesis diagnostic relationship is called *X-gate diagnostic relationship*. Sources *X*_{1}, *X*_{2},…, *X*_{n} become hypotheses. As a convention, these hypotheses have prior uniform distribution.

According to aforementioned X-gate network shown in Figures 2 and 3, the target variable must be binary whereas evidence *D* can be numeric. It is impossible to establish the evidence *D* as direct target variable. Thus, the solution of this problem is to add an augmented target binary variable *Y* and then, the evidence *D* is connected directly to *Y*. In other words, the *X-gate diagnostic network* have *n* sources {*X*_{1}, *X*_{2},…, *X*_{n}}, one augmented hypothesis *Y*, and one evidence *D*. As a convention, X-gate diagnostic network is called *X-D network*. The CPTs of the entire network are determined based on combination of diagnostic relationship and X-gate inference mentioned in previous sections. Figure 5 depicts the augmented X-D network. Note that variables *X*_{1}, *X*_{2},…, *X*_{n}, and *Y* are always binary.

Appendix A3 is the proof that the augmented X-D network is equivalent to X-D network with regard to variables *X*_{1}, *X*_{2},…, *X*_{n} and *D*. As a convention, augmented X-D network is considered as same as X-D network.

The simplest case of X-D network is NOT-D network having one hypothesis *X*_{1} and one evidence *D*, equipped with NOT-gate inference. NOT-D network satisfies diagnostic condition because it essentially represents the single diagnostic relationship. Inferred from Eqs. (1) and (7), the conditional probability *P*(*D|X*_{1}) and posterior probability *P*(*X*_{1}*|D*) of NOT-D network are

(Due to Bayes’ rule and uniform distribution of *X*_{1})

It implies NOT-D network satisfies diagnostic condition. Let

We will validate whether the CPT of diagnostic relationship, *P*(*D|X*) specified by Eq. (6), still satisfies diagnostic condition within general case, X-D network. In other words, X-D network is general case of single diagnostic relationship.

Recall from dependencies shown in Figure 5, Eq. (27) specifies the joint probability of X-D network.

Eq. (28) specifies the conditional probability of *D* given *X*_{i} (likelihood function) and the posterior probability of *X*_{i} given *D*.

where Ω = {*X*_{1}, *X*_{2},…, *X*_{n}} and the sign “\” denotes the subtraction (excluding) operator in set theory [10]. Eq. (29) specifies the joint probability *P*(*X*_{i}, *D*) and the marginal probability *P*(*D*) given uniform distribution of all sources. Appendix A4 is the proof of Eq. (29).

where *s*(Ω) and *s*(Ω:{*X*_{i}}) are specified in Table 2. From Eqs. (28–30) specifies conditional probability *P*(*D|X*_{i}), posterior probability *P*(*X*_{i}*|D*), and transformation coefficient for X-gate inference.

The transformation coefficient is rewritten as follows

Note that *S, D*, and *M* are abstract symbols and there is no proportional connection between 2^{n−1}*S* and *D* for all *D*, specified by Eq. (6). Assuming that such proportional connection 2^{n−1}*S* = *aD*^{j} exists for all *D* where *a* is arbitrary constant. Given binary case when *D* = 0 and *S* = 1, we have

There is a contradiction, which implies that it is impossible to reduce *k* into the following form

Therefore, if *k* is constant with regard to *D* then,

where *C* is constant. We have

It is implied that

This holds

Assuming *ND* = *S* we have

There is a contradiction because *M* is maximum value of *D*. Therefore, if *k* is constant with regard to *D* then *s*(Ω) = 2^{n−1}. Inversely, if *s*(Ω) = 2^{n−1} then *k* is

In general, the event that *k* is constant with regard to *D* is equivalent to the event *s*(Ω) = 2^{n−1}. This implies *diagnostic theorem* stated in Table 7.

There are four common conditions of U: |U|=α, |U|≥α, |U|≤β, and α≤|U|≤β. Note that U,The largest cardinality of |

Given X-D network is combination of diagnostic relationship and X-gate inference: The diagnostic condition of X-D network is satisfied if and only if At that time, the transformation coefficient becomes: Note that weights p_{i}^{} = w_{i} and ρ_{i}^{} = ω_{i}, which are inputs of s(Ω), are abstract variables. Thus, the equality s(Ω) = 2^{|Ω|−1} implies all abstract variables are removed and so s(Ω) does not depend on weights. |

The diagnostic theorem is the optimal way to validate the diagnostic condition.

The Eq. (30) becomes simple with AND-gate inference. Recall that Eq. (14) specified AND-gate inference as follows

Due to only one case *X*_{1} = *X*_{2} =…= *X*_{n} = 1, we have

Due to *X*_{i} = 0, we have

Derived from Eq. (30), Eq. (31) specifies conditional probability *P*(*D|X*_{i}), posterior probability *P*(*X*_{i}*|D*), and transformation coefficient according to X-D network with AND-gate reference called *AND-D network*.

For convenience, we validate diagnostic condition with a case of two sources Ω = {*X*_{1}, *X*_{2}}, *p*_{1} = *p*_{2} = *w*_{1} = *w*_{2} = 0.5, *s*(Ω) ≠ 2 for given X-gate then, such X-gate does not satisfy diagnostic condition.

Given AND-gate inference, by applying Eq. (14), we have

Given OR-gate inference, by applying Eq. (16), we have

Given XOR-gate inference, by applying Eq. (19), we have

Given XNOR-gate inference, by applying Eq. (21), we have

Given SIGMA-gate inference, by applying Eq. (23), we have

It is asserted that AND-gate, OR-gate, XOR-gate, and XNOR-gate do not satisfy diagnostic condition and so they should not be used to assess hypotheses. However, it is not asserted if U-gate and SIGMA-gate satisfy such diagnostic condition. It is necessary to expend equation for SIGMA-gate diagnostic network (called *SIGMA-D network*) in order to validate it.

In case of SIGMA-gate inference, by applying Eq. (23), we have

It is necessary to validate SIGMA-D network with SIGMA-gate bi-inference. By applying Eq. (26), we recalculate these quantities as follows

Obviously, quantities *s*(Ω), *s*(Ω:{*X*_{i}=1}), and *s*(Ω:{*X*_{i} = 0}) are kept intact. According to diagnostic theorem, we conclude that SIGMA-D network does satisfy diagnostic condition due to *s*(Ω)=2^{n−1}. Thus, SIGMA-D network can be used to assess hypotheses.

Eq. (32), an immediate consequence of Eq. (30), specifies conditional probability *P*(*D|X*_{i}), posterior probability *P*(*X*_{i}*|D*), and transformation coefficient for SIGMA-D network.

In case of SIGMA-gate, the augmented variable *Y* can be removed from X-D network. The evidence *D* is now established as direct target variable. Figure 6 shows a so-called *direct SIGMA-gate diagnostic network* (direct SIGMA-D network).

Derived from Eq. (23), the CPT of direct SIGMA-D network is determined by Eq. (33).

where the set of *X*_{i} (s) is complete and mutually exclusive.

Eq. (33) specifies valid CPT due to

From dependencies shown in Figure 6, Eq. (34) specifies the joint probability of direct SIGMA-D network.

Inferred from Eq. (29), Eq. (35) specifies the joint probability *P*(*X*_{i}, *D*) and the marginal probability *P*(*D*) of direct SIGMA-D network, given uniform distribution of all sources.

where *s*(Ω) and *s*(Ω:{*X*_{i}}) are specified in Table 2.

By browsing all variables of direct SIGMA-D network, we have

Similarly, we have

By applying Eq. (35), *s*(Ω:{*X*_{i} = 0}), *s*(Ω:{*X*_{i} = 1}), and *s*(Ω), we get the same result with Eq. (32).

Therefore, it is possible to use direct SIGMA-D network to assess hypotheses. It is asserted that SIGMA-D network satisfy diagnostic condition when single relationship, NOT-D network, direct SIGMA-D network are specific cases of SIGMA-D network. There is a question: does an X-D network that is different from SIGMA-D network and not aforementioned exist such that it satisfies diagnostic condition?

Recall that each X-D network is a pattern owning a particular X-gate inference which in turn is based on particular X-gate condition (s) relevant to only variables *A*_{i}s. The most general nonlinear X-D network is U-D network whereas SIGMA-D network is linear one. The U-gate inference given arbitrary condition on *U* is

Let *f* be the arrangement sum of U-gate inference.

The function *f* is sum of many large expressions and each expression is product of four possible sub-products (Π) as follows

In any case of degradation, there always exist expression *Expr* (s) having at least 2 sub-products (Π), for example,

Consequently, there always exist *Expr* (s) having at least 5 terms relevant to *p*_{i} and *ρ*_{i} if *n* ≥ 5, for example,

Thus, degree of *f* will be larger than or equal to 5 given *n* ≥ 5. According to diagnostic theorem, U-gate network satisfies diagnostic condition if and only if *f(p*_{i}, *ρ*_{i}) = 2^{n−1} for all *n* ≥ 1 and for all abstract variables *p*_{i} and *ρ*_{i}. Without loss of generality, each *p*_{i} or *ρ*_{i} is sum of variable *x* and a variable *a*_{i} or *b*_{i}, respectively. Note that all *p*_{i}, *ρ*_{i}, *a*_{i} are *b*_{i} are abstract variables.

The equation *f*−2^{n−1} = 0 becomes equation *g*(*x*) = 0 whose degree is *m* ≥ 5 if *n* ≥ 5.

where coefficients *C*_{i} s are functions of *a*_{i} and *b*_{i}s. According to Abel-Ruffini theorem [11], equation *g*(*x*) = 0 has no algebraic solution when *m* ≥ 5. Thus, abstract variables *p*_{i} and *ρ*_{i} cannot be eliminated entirely from *g*(*x*) = 0, which causes that there is no specification of U-gate inference *P*(*X*_{1}x*X*_{2}x…x*X*_{n}) so that diagnostic condition is satisfied.

It is concluded that there is no nonlinear X-D network satisfying diagnostic condition, but a new question is raised: does there exist the general linear X-D network satisfying diagnostic condition? Such linear network is called GL-D network and SIGMA-D network is specific case of GL-D network. The GL-gate probability must be linear combination of weights.

where *C* is arbitrary constant.

The GL-gate inference is singular if *α*_{i} and *β*_{i} are functions of only *X*_{i} as follows

The functions *h*_{i} and *g*_{i} are not relevant to *A*_{i} because the final equation of GL-gate inference is only relevant to *X*_{i} (s) and weights (s). Because GL-D network is a pattern, we only survey singular GL-gate. Mentioned GL-gate is singular by default and it is dependent on how to define functions *h*_{i} and *g*_{i}. The arrangement sum with regard to GL-gate is

Suppose *h*_{i} and *g*_{i} are probability mass functions with regard to *X*_{i}. For all *i*, we have

The arrangement sum becomes

GL-D network satisfies diagnostic condition if

Suppose the set of *X*_{i}s is complete.

This implies *C* = 0. Shortly, Eq. (36) specifies the singular GL-gate inference so that GL-D network satisfies diagnostic condition.

Functions *h*_{i}(*X*_{i}) and *g*_{i}(*X*_{i}) are always linear due to *X*_{i}^{m} = *X*_{i} for all *m* ≥ 1 when *X*_{i} is binary. It is easy to infer that SIGMA-D network is GL-D network with following definition of functions *h*_{i} and *g*_{i}.

According to Millán and Pérez-de-la-Cruz [4], a hypothesis can have multiple evidences as seen in Figure 7. This is *multi-evidence diagnostic relationship* opposite to aforementioned multihypothesis diagnostic relationship.

Figure 7 depicts the multi-evidence diagnostic network called M-E-D network in which there are *m* evidences *D*_{1}, *D*_{2},…, *D*_{m} and one hypothesis *Y*. Note that *Y* has uniform distribution.

In simplest case where all evidences are binary, the joint probability of M-E-D network is

The product

The posterior probability *P*(*Y* | *D*_{1}, *D*_{2},…, *D*_{m}) given uniform distribution of *Y* is

The possible transformation coefficient is

M-E-D network will satisfy diagnostic condition if *k* = 1 because all hypotheses and evidence are binary, which leads that following equation specified by Eq. (37) has 2*m* real roots *P*(*D*_{j}*|Y*) for all *m* ≥ 2.

Eq. (37) has no real root given *m* = 2 according to following proof. Suppose Eq. (37) has 4 real roots as follows

From Eq. (37), it holds

The final equation leads a contradiction (*b*_{1} = 2 or *b*_{1} = 1.5) and so it is impossible to apply the sufficient diagnostic proposition into M-E-D network. Such proposition is only used for one-evidence network. Moreover, X-gate inference absorbs many sources and then produces out of one targeted result whereas the M-E-D network essentially splits one source into many results. It is impossible to model M-E-D network by X-gates. The potential solution for this problem is to group many evidences *D*_{1}, *D*_{2},…, *D*_{m} into one representative evidence *D* which in turn is dependent on hypothesis *Y* but this solution will be inaccurate in specifying conditional probabilities because directions of dependencies become inconsistent (relationships from *D*_{j} to *D* and from *Y* to *D*) except that all *D*_{j}s are removed and *D* becomes a vector. However, evidence vector does not simplify the hazardous problem and it changes the current problem into a new problem.

Another solution is to reverse the direction of relationship, in which the hypothesis is dependent on evidences so as to take advantages of X-gate inference as usual. However, the reversion method violates the viewpoint in this research where diagnostic relationship must be from hypothesis to evidence. In other words, we should change the viewpoint.

Another solution is based on a so-called *partial diagnostic condition* that is a loose case of diagnostic condition for M-E-D network, which is defined as follows

where *k* is constant with regard to *D*_{j}. The joint probability is

M-E-D network satisfies partial diagnostic condition. In fact, given all variables are binary, we have

(Let Ψ = {*D*_{1}, *D*_{2},…, *D*_{m}})

(Due to uniform distribution of *Y*)

Partial diagnostic condition expresses a different viewpoint. It is not an optimal solution because we cannot test a disease based on only one symptom while ignoring other obvious symptoms, for example. The equality *P*(*Y|D*_{j}) = 0.5*P*(*D*_{j}*|Y*) indicates the accuracy is decreased two times. However, Bayesian network provides inference mechanism based on personal belief. It is subjective. You can use partial diagnostic condition if you think that such condition is appropriate to your application.

If we are successful in specifying conditional probabilities of M-E-D network, it is possible to define an extended network which is constituted of *n* hypotheses *X*_{1}, *X*_{2},…, *X*_{n} and *m* evidences *D*_{1}, *D*_{2},…, *D*_{m}. Such extended network represents *multi-hypothesis multi-evidence diagnostic relationship*, called M-HE-D network. Figure 8 depicts M-HE-D network.

The M-HE-D network is the most general case of diagnostic network, which was mentioned in Ref. ([4], p. 297). We can construct any large diagnostic BN from M-HE-D networks and so the research is still open.

## 5. Conclusion

In short, relationship conversion is to determine conditional probabilities based on logic gates that are adhered to semantics of relationships. The weak point of logic gates is to require that all variables must be binary. For example, in learning context, it is inconvenient for expert to create an assessment BN with studying exercises (evidences) whose marks are only 0 and 1. In order to lessen the impact of such weak point, the numeric evidence is used for extending capacity of simple Bayesian network. However, combination of binary hypothesis and numeric evidence leads to errors or biases in inference. For example, given a student gets maximum grade for an exercise but the built-in inference results out that she/he has not mastered fully the associated learning concept (hypothesis). Therefore, I propose the sufficient diagnostic proposition so as to confirm that numeric evidence is adequate to make complicated inference tasks in BN. The probabilistic reasoning based on evidence is always accurate. Application of the research can go beyond learning context whenever probabilistic deduction relevant to constraints of semantic relationships is required. A large BN can be constituted of many simple BN (s). Inference in large BN is hazardous problem and there are many optimal algorithms for solving such problem. In future, I will research effective inference methods for the special BN that is constituted of X-gate BN (s) mentioned in this research because X-gate BN (s) have precise and useful features of which we should take advantages. For instance, their CPT (s) are simple in some cases and the meanings of their relationships are mandatory in many applications. Moreover, I try my best to research deeply M-E-D network and M-HE-D network whose problems I cannot solve absolutely now.

Two main documents that I referred to do this research are the book “Learning Bayesian Networks” [2] by the author Richard E. Neapolitan and the article “A Bayesian Diagnostic Algorithm for Student Modeling and its Evaluation” [4] by authors Eva Millán and José Luis Pérez-de-la-Cruz. Especially, the SIGMA-gate inference is based on and derived from the work of the Eva Millán and José Luis Pérez-de-la-Cruz. This research is originated from my PhD research “A User Modeling System for Adaptive Learning” [12]. Other references relevant to user modeling, overlay model, and Bayesian network are [13–16]. Please concern these references.

**A1**. Following is the proof of Eq. (9)

It implies

**A2**. Following is the proof of Eq. (10)

(Because *Y* is conditionally independent from *X*_{i}s given *A*_{i}s)

(Because *A*_{i}s are mutually independent)

(Because each *A*_{i} is only dependent on *X*_{i}) ■

**A3**. Following is the proof that the augmented X-D network (shown in Figure 5) is equivalent to X-D network (shown in shown in Figures 2 and 3) with regard to variables *X*_{1}, *X*_{2},…, *X*_{n}, and *D*.

The joint probability of augmented X-D network shown in Figure 5 is

The joint probability of X-D network is

By applying total probability rule into X-D network, we have

(Because *D* is conditionally independent from all *X*_{i} (s) given *Y*)

**A4**. Following is the proof of Eq. (29)

Given uniform distribution of *X*_{i} (s), we have

The joint probability becomes

The joint probability of *X*_{i} and *D* is

(Due to Eq. (6))

The marginal probability of *D* is

By applying Table 2, the joint probability *P*(*X*_{i}, *D*) is determined as follows

Similarly, the marginal probability *P*(*D*) is