Open access peer-reviewed chapter

Node-Level Conflict Measures in Bayesian Hierarchical Models Based on Directed Acyclic Graphs

By Jørund I. Gåsemyr and Bent Natvig

Submitted: December 12th 2016Reviewed: June 8th 2017Published: November 2nd 2017

DOI: 10.5772/intechopen.70058

Downloaded: 1039

Abstract

Over the last decades, Bayesian hierarchical models defined by means of directed, acyclic graphs have become an essential and widely used methodology in the analysis of complex data. Simulation-based model criticism in such models can be based on conflict measures constructed by contrasting separate local information sources about each node in the graph. An initial suggestion of such a measure was not well calibrated. This shortcoming has, however, to a large extent been rectified by subsequently proposed alternative mutually similar tail probability-based measures, which have been proved to be uniformly distributed under the assumed model under various circumstances, and in particular, in quite general normal models with known covariance matrices. An advantage of this is that computationally costly precalibration schemes needed for some other suggested methods can be avoided. Another advantage is that noninformative prior distributions can be used when performing model criticism. In this chapter, we describe the basic framework and review the main uniformity results.

Keywords

  • cross-validation
  • data splitting
  • information contribution
  • MCMC
  • model criticism
  • pivotal quantity
  • preexperimental distribution
  • p-value

1. Introduction

Over the last decades, Bayesian hierarchical models have become an essential and widely used methodology in the analysis of complex data. Computational techniques such as Markow Chain Monte Carlo (MCMC) methods make it possible to treat very complex models and data structures. Analysis of such models gives intuitively appealing Bayesian inference based on posterior probability distributions for the parameters.

In the construction of such models, an understanding of the underlying structure of the problem can be represented by means of directed acyclic graphs (DAGs), with nodes in the graph corresponding to data or parameters, and directed edges between parameters representing conditional distributions. However, a perfect understanding of the underlying structure is usually an unachievable goal, and there is always a danger of constructing inadequate models. Box [1] suggests a pattern for the model building process where an initial candidate model is assessed for adequacy, and if necessary modified and elaborated on, leading to a new candidate that again is checked for adequacy, and so on. As a tool in this model criticism process, Ref. [1] suggests using the prior predictive distribution of some checking function or test statistic as a reference for the observed value of this checking function, resulting in a prior predictive p-value. This requires an informative and realistic prior distribution, which is not always available or even desirable. Indeed, as pointed out in Ref. [2], in an early phase of the model building process, it is often convenient to use noninformative or even improper priors and thus avoid costly and time-consuming elicitation of prior information. Noninformative priors may be used also for the inference because relevant prior information is unavailable.

There exist many other methods for checking the overall fit of the model or an aspect of the model of special interest, based on locating a test statistic or a discrepancy measure in some kind of a reference distribution. The posterior predictive p-value (ppp) of Ref. [3] uses the posterior distribution as reference and does not require informative priors. But this method uses data twice and can as a result be very conservative [2, 46]. Hjort et al. [5] suggest remedying this by using the ppp value as a test statistic in a prior predictive test. The computation of the resulting calibrated cppp-value is, however, very computer intensive in the general case, and again realistic, informative priors are needed. A node-level discrepancy measure suggested in Ref. [7] is subject to the same limitations. The partial posterior predictive p-value of Ref. [4] avoids double use of data and allows noninformative priors but may be difficult to compute and interpret in hierarchical models.

Comparison with other candidate models through a technique for model comparison or model choice, such as predictive methods, maximum posterior probability, Bayes factors or an information criterion, can also serve as tools for checking model adequacy indirectly when alternative candidate models exist.

In this chapter, we will, however, focus on methods for criticizing models in the absence of any particular alternatives. We will review methods for checking the modeling assumptions at each node of the DAG. The aim is to identify parts or building blocks of the model that are in discordance with reality, which may be in need of adjustment or further elaboration. O’Hagan [8] regards any node in the graph as receiving information from two disjoint subsets of the neighboring nodes. This information is represented as a conditional probability density or a likelihood or as a combination of these two kinds of information sources. Adopting the same basic perspective, our aim is to check for inconsistency between such subsets. The suggestion in Ref. [8] is to normalize these information sources to have equal height 1 and to regard the height of the curves at the point of intersection as a measure of conflict. However, as shown in Ref. [2], this measure tends to be quite conservative. Dahl et al. [9] demonstrated that it is also poorly calibrated, with false warning probabilities that vary substantially between models. Dahl et al. [9] also identified the different sources of inaccuracy and modified the measure of Ref. [8] to an approximately χ2-distributed quantity under the assumed model by instead normalizing the information sources to probability densities. In Ref. [10], these densities were instead used to define tail probability-based conflict measures. Gåsemyr and Natvig [10] showed that these measures are uniformly distributed in quite general hierarchical normal models with fixed variances/covariances. In Ref. [11], such uniformity results were proved in various situations involving nonnormal and nonsymmetric distributions. These uniformity results indicate that the measures of Refs. [9] and [10] have comparable interpretations across different models. Therefore, they can be used without computationally costly precalibration schemes, such as the one suggested in Ref. [5]. Gåsemyr [12] focuses on some situations where the conflict measure approach can be directly compared to the calibration method of Ref. [5] and shows that the less computer-intensive conflict measure approach performs at least as well in these situations. Moreover, the conflict measure approach can be applied in models using noninformative prior distributions.

Focusing on the special problem of identifying outliers among the second-level parameters in a random-effects model, Ref. [13] defines similar conflict measures. In this setting, the group-specific means are the nodes of interest. In some models, there exist sufficient statistics for these means. Then, outlier detection at the group level can also be based on cross validation, measuring the tail probability beyond the observed value of the statistic in the posterior predictive distribution given data from the other groups. In this context, the conflict measure approach can be viewed as an extension of cross-validation to situations where sufficient statistics do not exist. Also in Ref. [13] applications to the examination of exceptionally high hospital mortality rates and to results from a vaccination program are given. In Ref. [14], this methodology is used to check for inconsistency in multiple treatment comparison of randomized clinical trials. Presanis et al. [15] apply these conflict measures in complex cases of medical evidence synthesis.

Advertisement

2. Directed acyclic graphs and node-specific conflict

2.1. Directed acyclic graphs and Bayesian hierarchical models

An example of a DAG discussed extensively in Ref. [8] is the random-effects model with normal random effects and normal error terms defined by

Yi,jN(λi,σ2),λiN(μ,τ2),j=1,,ni,i=1,,m.E1

In general, we identify the nodes or vertices of the graph with the unknown parameters θand the observed data y, the latter appearing as bottom nodes and being the realizations of the random vector Y. In the Bayesian model, the parameters, the components of θ, are also considered as random variables. In general, if there is a directed edge from node ato node b, then ais a parent of b, and bis a child of a. We denote by Ch(a) the set of child nodes of a, and by Pa(b) the set of parent nodes of b. More generally, bis a descendant of aif there is a directed path from ato b. The set of descendants of ais denoted by Desc(a) and, for convenience, is defined to contain aitself. The directed edges encode conditional independence assumptions, indicating that, given its parents, a node is assumed to be independent of all other nondescendants. Hence, writing θ= (ν,μ), with μrepresenting the vector of top-level nodes, the joint density of (Y,θ) =(Y, ν, μ) is

p(y,ν,μ)=yyp(y|Pa(y))ννp(ν|Pa(ν))π(μ),E2

where π(μ) is the prior distribution of μ. The posterior distribution π(θ|y) is the basis for the inference.

This setup can be generalized in various directions. The nodes may be allowed to represent vectors, at both the parameter and the data levels [10]. Instead of DAGs, one may consider chain graphs, as described in Ref. [16], with undirected edges representing mutual dependence as in Markov random fields. Scheel et al. [17] introduce a graphical diagnostic for model criticism in such models.

2.2. Information contributions

The representation of a Bayesian hierarchical model in terms of a DAG is often meant to reflect an understanding of the underlying structure of the problem. By looking for a conflict associated with the different nodes in the DAG, we may therefore put our understanding of this structure to test. We may also identify parts of the model that need adjustment.

The idea put forward in Ref. [8] is that for each node λin a DAG one may in general think of each neighboring node as providing information about λand that it is of interest to consider the possibility of conflict between different sources of information. For instance, one may want to contrast the local prior information provided by the factor p(λ|Pa(λ)) with the likelihood information source formed by multiplying the factors p(γ|Pa(γ)) for all child nodes γ∈ Ch(λ). The full conditional distribution of λgiven all the observed and unobserved variables in the DAG, i.e.,

π(λ|(y,θ)λ)p(λ|Pa(λ))γCh(λ)p(γ|Pa(γ)),E3

is determined by these two types of factors. Here, (y,θ)λdenotes the vector of all components of (y,θ) except for λ.

Dahl et al. [9] normalize the product γCh(λ)p(γ|Pa(γ))to a probability density function denoted by fc(λ), the likelihood or child node information contribution, whereas the local prior density is denoted by fp(λ) and called the prior or parent node information contribution. These information contributions are integrated with respect to posterior distributions for the unknown nuisance parameters to form integrated information contribution (iic) denoted by gcand gp. In this construction, a key to avoid the conservatism of the measure suggested in Ref. [8] is to prevent dependence between the two information sources by introducing a suitable data splitting Y = (Yp, Yc)and condition the parameters of fpon ypand the parameters of fcon yc.

Definition 1For a given parameter node λ, denoted by βpthe vector whose components are Pa(λ), and by βcthe vector whose components are

γCh(λ)({γ}Pa(γ)){λ}=Ch(λ)[Pa(Ch(λ)){λ}]E4

LetY= (Yp, Yc) be a splitting of the dataY. Define the densities fp, fc, the prior respectively likelihood information contributions, by

fp(λ;βp)=p(λ|βp),fc(λ;βc)γCh(λ)p(γ|Pa(γ))E5

Define the integrated information contribution densities gp, gcby

gp(λ)=fp(λ;βp)π(βp|yp)dβp,gc(λ)=fc(λ;βc)π(βc|yc)dβc,E6

and denote by Gp, Gcthe corresponding cumulative distribution functions.

Note that βcmay contain data nodes. The second integral in Eq. (6) is then taken only with respect to the random components of βc, i.e., the parameters in βc. If βccontains no parameters, then gcand fccoincide. Definition 1 may also be extended to the case when λis a vector, corresponding to a subset of parameter nodes.

Combining the set of information sources linked to a specific node in different ways leads to a modification of Definition 1 where βcdoes not contain all child nodes of λ, the others being instead included in βptogether with their parent nodes. In this way, different types of conflict about the node may be revealed. This is natural, e.g., in the context of outlier detection among independent observations with a common mean. Note that βpand βcmay then be overlapping, containing common coparents with λ. The setup is illustrated in Figure 1 in the case when the set of common components, by abuse of notation denoted by βpβc, is empty. For the general setup, Definition 1 is modified as follows.

Figure 1.

Part of a DAG showing information sources aboutλ.

Definition 2Letγbe a vector whose components are a subset of Ch(λ), and defineβcas inEq. (4). Denote byγ1 the rest of the child nodes of λ, and letβpconsist ofγ1 together with its parent nodes in the same way as inEq. (4), as well as Pa(λ). The information contributions are then given by

fp(λ;βp)p(γ1|Pa(γ1)p(λ|Pa(λ)),E7
fc(λ;βc)p(γ|Pa(γ)).E8

InEq. (7),p(λ|Pa(λ))is replaced by the prior density π(λ) ifλ is a top-level parameter. The corresponding iic densities are defined byEq. (6)as before.

2.3. Node-specific conflict measures

The conflict measure cλ2of Ref. [9] is defined as

cλ2=(EGp(λ)EGc(λ))2/(varGp(λ)+varGc(λ))E9

The χ12-distribution is the reference distribution for this measure. For the conflict measures of Ref. [10], the uniform distribution on [0, 1] is the reference distribution. They focus on tail behavior but are based on the same iic distributions. The general distribution of information sources given in Definition 2 is also introduced in Ref. [10]. For a given pair Gp, Gcof iic distributions, let λp*and λc*be independent samples from Gpand Gc, respectively. Let Gbe the cumulative distribution function for δ=λp*λc*. Define

cλ3+=G(0),cλ3=G¯(0)=def1G(0)E10

and

cλ3=12min(G(0),G¯(0))=2|G(0)1/2|.E11

The cλ3+-measure and the Pλ conf measure of Ref. [13] are very similar. The latter measure is aimed at detecting outlying groups or units in a three-level hierarchical model, with the second-level parameters being location parameters for group-specific data. However, the measure is interpreted as a pvalue, with small values indicative of conflict. Gåsemyr and Natvig [10] also defines a measure based on defining a tail area in terms of the density gof G, namely

cλ4=PG(g(δ)>g(0)),E12

applicable also when λis a vector.

Example 1. To illustrate the theory, consider the random-effects model 1, with the variance parameters σ2, τ2 assumed known, and with μhaving the improper prior π(μ) = 1. For simplicity, assume ni= nfor all i. Suspecting the mth group of representing an outlier, let λ=λmbe the node of interest. Define the data splitting Yp, Ycby letting Yc=Ym=(Ym,1,,Ym,n), and let βc=yc, βp=μ. Denoting the normal density function by ϕ, it is easy to see that gc(λ)=fc(λ)=φ(λ;y¯c,σ2/n). Furthermore, fp(λ;μ) =φ(λ;μ,τ2). Given yp, μhas the density π(μ|yp)=φ(μ);i=1m1y¯i/(m1),(1/(m1))τ2+(1/(n(m1)))σ2). By a standard argument

gp(λ)=fp(λ;μ)π(μ|yp)dμ=φ(λ;i=1m1y¯i/(m1),(1+1/(m1))τ2+(1/(n(m1)))σ2).

It follows that g(δ)=φ(δ);i=1m=1y¯i/(m1)y¯c,(m/(m1))(τ2+σ2/n). The conflict measures (Eqs. (9), (10), (11), and (12)) can hence be calculated analytically, with no simulation needed in this case.

In a simulation study of the cλ2-measure in Ref. [9] using a warning level equal to the 95% quantile of the χ12-distribution, a false warning probability of close to 5% is obtained for a normal random-effects model with unknown variance parameters as in Eq. (1) and also in similar random-effects models with heavy-tailed t- and uniformly distributed random effects. Also with respect to detection power, this measure performs well when compared to a calibrated version of the measure given in Ref. [8], if an optimal data splitting is used. Refs. [10] and [11] prove preexperimental uniformity of the conflict measures in various situations, i.e., their distributions as functions of a Ywhich is distributed according to the assumed model are uniform, regardless of the true value of the basic parameter. Another way of stating this is that we obtain a proper p-value by subtracting these measures from 1. These results are reviewed in Section 5 of the present chapter.

2.4. Integrated information contributions as posterior distributions

In most cases, the conflict measures of Refs. [9] and [10] are based on simulated samples from Gpand Gc. Definitions 1 and 2 suggest obtaining such samples by running an MCMC algorithm to generate posterior samples of the unknown parameters in βpand βcand then generate samples λp*and λc*from the respective information contributions for each such parameter sample. If the information contributions are standard probability densities, this procedure is straightforward. If not, one may instead often use the fact that, under certain conditions on the data splitting, the distributions Gpand Gcare posterior distributions conditional on ypand yc, respectively, the latter based on the improper prior π(λ) = 1, independently of the coparents.

Theorem 1Suppose that the data splitting satisfies

Yc=Y[γCh(λ)βcDesc(γ)],Yp=YYc,E13

the latter expression by abuse of notation meaning the components ofYnot present inYc. Assume λ and the coparentsPa(Ch(λ) βp)λare independent. We then have

gp(λ)=π(λ|yp)

and, specifying as prior density

π(λ|Pa(Ch(λ)βc)λ)=1,gc(λ)=π(λ|yc).E14

The proof is given in Appendix A in the online supporting information for Ref. [11]. Specializing to the standard setup of Definition 1, where Ch(λ)βc, we see that the requirement for Eq. (13) to hold is that Ycconsists of all data descendant nodes of λ. In Ref. [9], this splitting was compared with two other splittings for cλ2and found to be optimal with respect to detection power. This measure was also found to be a well-calibrated measure under this splitting.

Advertisement

3. Noninvariance and reparametrizations

The iic distributions and the corresponding conflict measures are parametrization dependent. Based on experience so far, the conflict measures seem to be fairly robust to changes in parametrization. However, this noninvariance can be handled in a theoretically satisfactory way under certain circumstances.

Let ϕbe the parameter, in a standard parametrization, corresponding to a specific node in the DAG. Suppose for simplicity that Yc=Ch(φ). Assume that there exists a sufficient statistic Ycand an alternative parametrization λ, being a strictly monotonic function λ(ϕ), such that Ycλis a pivotal quantity, i.e., the density for Ycgiven λis of the form

p(yc|λ)=fYc(yc|λ)=f0(ycλ)E15

for some known density function f0. Such a parametrization will be considered as a canonical or reference parametrization if it exists, as opposed to the standard parametrization involving ϕ. Accordingly, the conflict measures given in Eqs. (9)(12) are preferably based on this reference parametrization.

By Theorem 1, samples λc*from Gcmay be obtained by MCMC as posterior samples from π(λ|yc)when the splitting satisfies Eq. (13) and the prior for λsatisfies Eq. (14), i.e., equals 1. According to an argument given in Section 1.3 of Ref. [18], such a prior expresses noninformativity for likelihoods of the form (Eq. (15)). Computationally, we may, however, use the standard parametrization. When generating φc*as posterior samples from π(ϕ|Yc), the prior density |dλ/dϕ| for ϕmust be used. Then, we may calculate λc*=λ(φc*). To represent the iic distribution Gp(λ), we may calculate λp*=λ(φp*)for samples φp*from π(φ|yp)according to the given model. Now, the cλ4-measure can be estimated from (Eq. (12)), using a kernel density estimate of g(δ) based on corresponding samples δ*=λp*λc*. However, if we limit attention to the cλ3-measure (Eq. (11)) and its one-sided versions (Eq. (10)), we may use the samples from π(φ|yc)and π(φ|yp)directly. To see this, note that the condition λp*λc*is equivalent to the condition φp*φc*(assuming that λis increasing as a function of ϕ). Hence, the probability G(0) that λp*λc*0can be estimated as the proportion of sample values for which φp*φc*.

Advertisement

4. Extensions to deterministic nodes: Relation to cross-validation, prediction and hypothesis testing

4.1. Cross-validation and data node conflict

The model variables Yare represented by the bottom nodes in the DAG describing the hierarchical model. The framework can be extended to also cover conflict concerning these nodes. In this way, cross-validation can be viewed as a special case of the conflict measure approach.

Let Ycbe an element in the vector Yof observable random variables. We define the prior iic density gp(yc) exactly as in Eq. (6), with λreplaced by yc. The Dirac measure at the observed value ycrepresents a degenerate iic information contribution about Yc. This leads to the following definitions:

cyc3+=Gp(yc),cyc3=G¯p(yc),E16
cyc3=12min(Gp(yc),G¯p(yc)),E17
cyc4=Pgp(gp(Yc)gp(yc)).E18

The measures (Eqs. (16)(18)) are called data node conflict measures. To see that these definitions are consistent with Eqs. (10)(12), note that λp*corresponds to Yc, and λc*is deterministic and corresponds to yc. We define X= Yc– yc, corresponding to δ. We then have g(x)=gp(x+yc). Hence,

G(0)=0g(x)dx=ycgp(y)dy=Gp(yc),E21

and accordingly, G¯(0)=G¯p(yc). It follows that Eqs. (16) and (17) are special cases of Eqs. (10) and (11). Moreover,

Pg(g(X)g(0))=Pgp(gp(Yc)gp(yc)),E22

showing that Eq. (18) is a special case of Eq. (12).

Furthermore, this correspondence between the data node conflict measures (Eqs. (16) and (17)) and the parameter node conflict measures (Eqs. (10) and (11)) can be used to motivate these latter measures. We will treat the c3+ measure as an example. Consider again a parameter node λ. If λwere actually observable and known to take the value λc, the data node version of the c3+ measure could be used to measure deviations toward the right tail of Gpas

Gp(λc)=λcgp(λ)dλ=0gp(δ+λc)dδ.E23

Now λis in reality not known, but we can take the expectation of this conflict with respect to the distribution Gc, which reflects the uncertainty about λwhen influence from data ypis removed. The result is the following theorem:

Theorem 2

EGc(Gp(λ)=cλ3+.E24

Proof:

EGc(Gp(λ)=gc(λ)(0gp(δ+λ)dδ)dλ=0(gp(δ+λ)gc(λ)dλ)dδ=0g(δ)dδ=G(0)=cλ3+E25

by Eq. (10).

4.2. Cross-validation and sufficient statistics

Suppose the node λof interest is the parent of the subvector Ycof Y. Suppose also that Ycis a sufficient statistic for Yc. Evidently then, the measures cλ3+and cYc3+address the same kind of possible conflict in the model. The following theorem, proved in Ref. [11], states that the two measures agree under certain conditions. This is a generalization of a result in Ref. [13], which also unnecessarily assumed symmetry for the conditional density of Yc.

Theorem 3Suppose the conditional density for the scalar variable Ycgiven the parameter λ is of the formfYc(y|λ)=fc,02(yλ). Then,

cYc3+=cλ3+.E26

When a sufficient statistic exists, the cross-validatory p-value is considered by Ref. [13] as the gold standard, and the aim of their construction is to provide a measure which is generally applicable and matches cross-validation when a sufficient statistic exists.

4.3. Prediction

As mentioned in Section 2, the c4 measure can be used to assess conflict concerning vectors of nodes. Applying this at the data node level, we may assess the quality of predictions of a subvector Ycof Ybased on a complementary subvector ypof observations. The relevant measure is given by Eq. (18), with Ycreplaced by the vector Yc. This is particularly well suited to models where data accumulate as time evolves. Such a conflict measure can be used to assess the overall quality of the model. It can also be used as a tool for model comparison and model choice.

4.4. Hypothesis testing

Suppose the top-level nodes μappearing in Eq. (2) are assumed fixed and known according to the model, so that π(μ) is a Dirac measure at these fixed values of the components of μ. Hence, the DAG has deterministic nodes both at the top and at the bottom, namely the vectors μand y, respectively. We may then check for a conflict concerning a component λof μby introducing a random version λ˜of λand contrast the corresponding gc(λ˜)with the fixed value λ. The random λ˜has the same children and coparents as λ, and the vector βc, the information contribution fc(λ˜;βc)and the iic density gcare defined as in Eqs. (4), (5) and (6). The respective conflict measures are defined as in Eqs. (16)(18) with ycreplaced by λand Gpand gpreplaced by Gcand gc. If the model is rejected when the conflict exceeds a certain predefined warning level, this corresponds to a formal Bayesian test of the hypothesis λ˜=λ. Using the conflict measure (Eq. (18)), we may put the whole vector μto test in this way.

Advertisement

5. Preexperimental uniformity of the conflict measures

In this section, we review some results concerning the distribution of the conflict measures. If cis one of the measures (Eqs. (10), (11), (12), (16), (17) or (18)), then preexperimentally, i.e., prior to observing the data y, cis a random variable taking a value in [0, 1]. A large value of cindicates a possible conflict in the model, and uniformity of ccorresponds to 1 – cbeing a proper p-value. This does not mean that we propose a formal hypothesis testing procedure for model criticism, possibly even adjusted for multiple testing, nor that we think that a fixed significance level represents an appropriate criterion signaling the need for changing the model. A relatively large value of cmay be accepted if there are convincing arguments for believing in a particular modeling aspect, while a less extreme value of cmay indicate a need for adjustments in modeling aspects that are considered questionable for other reasons. But the terms “relatively large” and “less extreme” must refer to a meaningful common scale. In our view, uniformity of the conflict measure under all sources of uncertainty is the natural ideal criterion for being a well-calibrated conflict measure, the fulfillment of which ensures comparable assessment of the level of conflict across models. This means that we aim for preexperimental uniformity in cases where the prior distribution is highly noninformative, and also, as discussed in the following subsection, in cases where an informative prior represents part of the randomness in the data-generating process (aleatory uncertainty) rather than subjective (epistemic) uncertainty about the location of a fixed but unknown λ. In this chapter, we limit attention to situations where exact uniformity is achieved. The pivotality condition (Eq. (15)) turns out to be a key assumption needed to obtain such exact results. Refs. [10] and [12] provide some examples where exact uniformity is achieved in other cases.

5.1. Data-prior conflict

Consider the model

YFY(y|λ),λFλ(λ),E27

where Fλis an arbitrary informative prior distribution. Here, we think of this prior distribution as representing aleatory rather than epistemic uncertainty. The corresponding densities are denoted by fYand fλ. If contrasting the prior density with the likelihood fY(y|λ)indicates a conflict between the prior and likelihood information contributions, we consider this a data-prior conflict. The following theorem, proved in Ref. [11], deals with this kind of conflict. Note that in this situation, the Yppart of the data splitting is empty.

Theorem 4Suppose the conditional density for the scalar variable Y given the parameter λ is of the formfY(y|λ)=f0(yλ)and that λ is generated from an arbitrary informative prior density fλ(λ). Then, the data-prior conflict measures about λ are preexperimentally uniformly distributed for both thecλ3- andcλ4-measures.

The theorem obviously applies to the location parameter of normal and t-distributions with fixed variance parameters, as well as the location parameter in the skew normal distribution [19]. If the vector Yconsists of IID normal variables, the theorem also applies to the location parameter, using as scalar variable the sufficient statistic Y¯. If the ncomponents of Yare IID exponentially distributed with failure rate λ, their sum is a sufficient statistic that is gamma distributed with shape parameter nand scale parameter λ. We may then use the fact that for a variable Ywhich is gamma distributed with known shape parameter and unknown scale parameter λ, the quantity log(Y)log(λ)is a pivotal statistic, and uniformity is obtained by combining Theorem 4 with the approach of Section 3. In the standard parametrization, the appropriate prior distribution is π(λ)=1/λ). Details are given in Ref. [11], which also deals with the gamma, inverse gamma, Weibull and lognormal distributions in a similar way.

5.2. Data-data conflict

Suppose all components of Yhave distributions determined by the same parameter λ. Suppose we want to contrast information contributions from separate parts of Yabout λand define the splitting (Yp,Yc)accordingly. Focusing on this kind of possible conflict, we assume complete prior ignorance about λand accordingly assume that λhas the improper prior π(λ)=1. Hence, recalling Eqs. (7) and (8), we contrast the information in fc(λ;Yc)with that in fp(λ;Yp). We use the term data-data conflict in this context, since there is no prior information incorporated in fp, and the two information contributions play symmetric roles. However, as a particular application, one may think of Ycas a scalar variable representing a possible outlier.

The following theorem is proved in Ref. [11].

Theorem 5Suppose that the conditional densities for the scalar variables Ypand Ycgiven the parameter λ are of the formfYp(y|λ)=fp,0(yλ),fYc(y|λ)=fc,0(yλ).

Assume λ has the improper priorπ(λ)=1. Then, the data-data conflict measures about λ are preexperimentally uniformly distributed for both thecλ3- andcλ4-measures.

Theorem 5 can be applied if the components of Ycand Ypare normally or lognormally distributed with known variance parameter, exponentially distributed, or gamma, inverse gamma or Weibull with known shape parameter, since pivotal quantities based on sufficient statistics exist for these distributions.

5.3. Normal hierarchical models with fixed covariance matrices

Allowing for each yand νappearing in Eq. (2) to be interpreted as vectors of nodes, we now assume that each conditional distribution in the decomposition (Eq. (2)) is multinormal with fixed and known covariance matrices. The random-effects model (Eq. (1)) is a simple example of this. We also assume that the top-level parameter vector μhas the improper prior 1 and that each linear mapping Pa(ν)E(ν|Pa(ν))has full rank.

Now let λbe any node in the model description. It is standard to verify that, regardless of how the vector of neighboring and coparent nodes βis decomposed into βp, containing Pa(λ), and βc, the densities fp(λ;βp)and fc(λ;βc)of Eqs. (5) and (8) are multinormal with fixed covariance matrices. Furthermore, this is true also for the iic densities gpand gcof Eq. (6), regardless of the data splitting. It follows that the density gof the difference δbetween independent samples from gpand gcis multinormal with expectation EG(δ)=EGp(λ)EGc(λ)and covariance matrix covG(δ)=covGp(λ)+EGc(λ). It follows that (δEG(δ))tcovG(δ)1(δEG(δ))is χ2-distributed with n=dim(λ)degrees of freedom, and the probability under Gthat g(δ)<g(0)is easily seen to be Ψn(EG(δ)tcovG(δ)1EG(δ)), where Ψnis the cumulative distribution function for the χn2-distribution. The preexperimental uniformity of this quantity is proved in Ref. [10].

Theorem 6Consider a hierarchical normal model as described above.

  1. Let λ be an arbitrary scalar or vector parameter node. If the data splitting satisfiesEq. (13), thencλ4is uniformly distributed preexperimentally.

  2. Suppose the data splitting(Yp,Yc)satisfiesCh(Pa(Yc))=Yc. Then,cYc4is uniformly distributed preexperimentally.

If λin (i) or Ycin (ii) are one dimensional, then Gis symmetric and unimodal, and therefore, the respective c3-measures are defined and coincide with the c4-measures. Gåsemyr et al. [10] also show that in that case the c3+- and c3−-measures are uniformly distributed preexperimentally.

Example 2. Consider the following DAG model, a regression model with randomly varying regression coefficients.

Yi,jN(Xi,jtξi,σ2),ξiN(ξ,Ω),j=1,,n,i=1,,m,π(ξ)1.E19

The munits could be groups of individuals, with yi,jthe measurement for a group member with individual covariate vector Xi,j, or individuals with the successive yi,jrepresenting repeated measurements over time. In this model, we could check for a possible exceptional behavior of the mth unit by means of the conflict measure cξm4. With a data splitting for which Yc=Ym=(Ym, 1,,Ym,n)the conditions for Theorem 6, part (i), are satisfied if dim(ξ) n, and the measure is preexperimentally uniformly distributed.

Advertisement

6. Concluding remarks

The assumption of fixed covariance matrices in the previous subsection is admittedly quite restrictive. In general, the presence of unknown nuisance parameters, such as parameters describing the covariance matrices in a normal model, makes the derivation of exact uniformity at least difficult and often impossible. Promising approximate results are reported in Ref. [9] for the closely related cλ2measure. Further empirical studies are needed in order to examine to what extent the conflict measures are approximately uniformly distributed in other situations. As an informal tool to be used in conjunction with subject matter insight, the conflict measure approach does not require exact uniformity in order to be useful.

© 2017 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Jørund I. Gåsemyr and Bent Natvig (November 2nd 2017). Node-Level Conflict Measures in Bayesian Hierarchical Models Based on Directed Acyclic Graphs, Bayesian Inference, Javier Prieto Tejedor, IntechOpen, DOI: 10.5772/intechopen.70058. Available from:

chapter statistics

1039total chapter downloads

1Crossref citations

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Classifying by Bayesian Method and Some Applications

By Tai Vovan

Related Book

First chapter

Making a Predictive Diagnostic Model for Rangeland Management by Implementing a State and Transition Model Within a Bayesian Belief Network (Case Study: Ghom- Iran)

By Hossein Bashari

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us