Node-Level Conflict Measures in Bayesian Hierarchical Models Based on Directed Acyclic Graphs Node-Level Conflict Measures in Bayesian Hierarchical Models Based on Directed Acyclic Graphs

Over the last decades, Bayesian hierarchical models defined by means of directed, acyclic graphs have become an essential and widely used methodology in the analysis of complex data. Simulation-based model criticism in such models can be based on conflict measures constructed by contrasting separate local information sources about each node in the graph. An initial suggestion of such a measure was not well calibrated. This shortcoming has, however, to a large extent been rectified by subsequently pro-posed alternative mutually similar tail probability-based measures, which have been proved to be uniformly distributed under the assumed model under various circum-stances, and in particular, in quite general normal models with known covariance matrices. An advantage of this is that computationally costly precalibration schemes needed for some other suggested methods can be avoided. Another advantage is that noninformative prior distributions can be used when performing model criticism. In this chapter, we describe the basic framework and review the main uniformity results.


Introduction
Over the last decades, Bayesian hierarchical models have become an essential and widely used methodology in the analysis of complex data. Computational techniques such as Markow Chain Monte Carlo (MCMC) methods make it possible to treat very complex models and data structures. Analysis of such models gives intuitively appealing Bayesian inference based on posterior probability distributions for the parameters.
In the construction of such models, an understanding of the underlying structure of the problem can be represented by means of directed acyclic graphs (DAGs), with nodes in the graph corresponding to data or parameters, and directed edges between parameters representing conditional distributions. However, a perfect understanding of the underlying structure is usually an unachievable goal, and there is always a danger of constructing inadequate models. Box [1] suggests a pattern for the model building process where an initial candidate model is assessed for adequacy, and if necessary modified and elaborated on, leading to a new candidate that again is checked for adequacy, and so on. As a tool in this model criticism process, Ref. [1] suggests using the prior predictive distribution of some checking function or test statistic as a reference for the observed value of this checking function, resulting in a prior predictive p-value. This requires an informative and realistic prior distribution, which is not always available or even desirable. Indeed, as pointed out in Ref. [2], in an early phase of the model building process, it is often convenient to use noninformative or even improper priors and thus avoid costly and time-consuming elicitation of prior information. Noninformative priors may be used also for the inference because relevant prior information is unavailable.
There exist many other methods for checking the overall fit of the model or an aspect of the model of special interest, based on locating a test statistic or a discrepancy measure in some kind of a reference distribution. The posterior predictive p-value (ppp) of Ref. [3] uses the posterior distribution as reference and does not require informative priors. But this method uses data twice and can as a result be very conservative [2,[4][5][6]. Hjort et al. [5] suggest remedying this by using the ppp value as a test statistic in a prior predictive test. The computation of the resulting calibrated cppp-value is, however, very computer intensive in the general case, and again realistic, informative priors are needed. A node-level discrepancy measure suggested in Ref. [7] is subject to the same limitations. The partial posterior predictive p-value of Ref. [4] avoids double use of data and allows noninformative priors but may be difficult to compute and interpret in hierarchical models.
Comparison with other candidate models through a technique for model comparison or model choice, such as predictive methods, maximum posterior probability, Bayes factors or an information criterion, can also serve as tools for checking model adequacy indirectly when alternative candidate models exist.
In this chapter, we will, however, focus on methods for criticizing models in the absence of any particular alternatives. We will review methods for checking the modeling assumptions at each node of the DAG. The aim is to identify parts or building blocks of the model that are in discordance with reality, which may be in need of adjustment or further elaboration. O'Hagan [8] regards any node in the graph as receiving information from two disjoint subsets of the neighboring nodes. This information is represented as a conditional probability density or a likelihood or as a combination of these two kinds of information sources. Adopting the same basic perspective, our aim is to check for inconsistency between such subsets. The suggestion in Ref. [8] is to normalize these information sources to have equal height 1 and to regard the height of the curves at the point of intersection as a measure of conflict. However, as shown in Ref. [2], this measure tends to be quite conservative. Dahl et al. [9] demonstrated that it is also poorly calibrated, with false warning probabilities that vary substantially between models. Dahl et al. [9] also identified the different sources of inaccuracy and modified the measure of Ref. [8] to an approximately χ 2 -distributed quantity under the assumed model by instead normalizing the information sources to probability densities. In Ref. [10], these densities were instead used to define tail probability-based conflict measures. Gåsemyr and Natvig [10] showed that these measures are uniformly distributed in quite general hierarchical normal models with fixed variances/covariances. In Ref. [11], such uniformity results were proved in various situations involving nonnormal and nonsymmetric distributions. These uniformity results indicate that the measures of Refs. [9] and [10] have comparable interpretations across different models. Therefore, they can be used without computationally costly precalibration schemes, such as the one suggested in Ref. [5]. Gåsemyr [12] focuses on some situations where the conflict measure approach can be directly compared to the calibration method of Ref. [5] and shows that the less computer-intensive conflict measure approach performs at least as well in these situations. Moreover, the conflict measure approach can be applied in models using noninformative prior distributions.
Focusing on the special problem of identifying outliers among the second-level parameters in a random-effects model, Ref. [13] defines similar conflict measures. In this setting, the groupspecific means are the nodes of interest. In some models, there exist sufficient statistics for these means. Then, outlier detection at the group level can also be based on cross validation, measuring the tail probability beyond the observed value of the statistic in the posterior predictive distribution given data from the other groups. In this context, the conflict measure approach can be viewed as an extension of cross-validation to situations where sufficient statistics do not exist. Also in Ref. [13] applications to the examination of exceptionally high hospital mortality rates and to results from a vaccination program are given. In Ref. [14], this methodology is used to check for inconsistency in multiple treatment comparison of randomized clinical trials. Presanis et al. [15] apply these conflict measures in complex cases of medical evidence synthesis.
2. Directed acyclic graphs and node-specific conflict 2.1. Directed acyclic graphs and Bayesian hierarchical models An example of a DAG discussed extensively in Ref. [8] is the random-effects model with normal random effects and normal error terms defined by In general, we identify the nodes or vertices of the graph with the unknown parameters θ and the observed data y, the latter appearing as bottom nodes and being the realizations of the random vector Y. In the Bayesian model, the parameters, the components of θ, are also considered as random variables. In general, if there is a directed edge from node a to node b, then a is a parent of b, and b is a child of a. We denote by Ch(a) the set of child nodes of a, and by Pa(b) the set of parent nodes of b. More generally, b is a descendant of a if there is a directed path from a to b. The set of descendants of a is denoted by Desc(a) and, for convenience, is defined to contain a itself. The directed edges encode conditional independence assumptions, indicating that, given its parents, a node is assumed to be independent of all other nondescendants. Hence, writing θ = (ν, μ), with μ representing the vector of top-level nodes, the joint density of (Y, θ) = (Y, ν, μ) is where π(μ) is the prior distribution of μ. The posterior distribution π(θ|y) is the basis for the inference.
This setup can be generalized in various directions. The nodes may be allowed to represent vectors, at both the parameter and the data levels [10]. Instead of DAGs, one may consider chain graphs, as described in Ref. [16], with undirected edges representing mutual dependence as in Markov random fields. Scheel et al. [17] introduce a graphical diagnostic for model criticism in such models.

Information contributions
The representation of a Bayesian hierarchical model in terms of a DAG is often meant to reflect an understanding of the underlying structure of the problem. By looking for a conflict associated with the different nodes in the DAG, we may therefore put our understanding of this structure to test. We may also identify parts of the model that need adjustment.
The idea put forward in Ref. [8] is that for each node λ in a DAG one may in general think of each neighboring node as providing information about λ and that it is of interest to consider the possibility of conflict between different sources of information. For instance, one may want to contrast the local prior information provided by the factor p(λ|Pa(λ)) with the likelihood information source formed by multiplying the factors p(γ|Pa(γ)) for all child nodes γ ∈ Ch(λ). The full conditional distribution of λ given all the observed and unobserved variables in the DAG, i.e., πðλjðy, θÞ Àλ Þ ∝ pðλjPaðλÞÞ Y γ ∈ ChðλÞ pðγjPaðγÞÞ; is determined by these two types of factors. Here, (y, θ) Àλ denotes the vector of all components of (y, θ) except for λ.
Dahl et al. [9] normalize the product Y γ ∈ ChðλÞ pðγjPaðγÞÞ to a probability density function denoted by f c (λ), the likelihood or child node information contribution, whereas the local prior density is denoted by f p (λ) and called the prior or parent node information contribution. These information contributions are integrated with respect to posterior distributions for the unknown nuisance parameters to form integrated information contribution (iic) denoted by g c and g p . In this construction, a key to avoid the conservatism of the measure suggested in Ref. [8] is to prevent dependence between the two information sources by introducing a suitable data splitting Y = (Y p , Y c ) and condition the parameters of f p on y p and the parameters of f c on y c .
Definition 1 For a given parameter node λ, denoted by β p the vector whose components are Pa(λ), and by β c the vector whose components are Define the densities f p , f c , the prior respectively likelihood information contributions, by Define the integrated information contribution densities g p , g c by and denote by G p , G c the corresponding cumulative distribution functions.
Note that β c may contain data nodes. The second integral in Eq. (6) is then taken only with respect to the random components of β c , i.e., the parameters in β c . If β c contains no parameters, then g c and f c coincide. Definition 1 may also be extended to the case when λ is a vector, corresponding to a subset of parameter nodes.
Combining the set of information sources linked to a specific node in different ways leads to a modification of Definition 1 where β c does not contain all child nodes of λ, the others being instead included in β p together with their parent nodes. In this way, different types of conflict about the node may be revealed. This is natural, e.g., in the context of outlier detection among independent observations with a common mean. Note that β p and β c may then be overlapping, containing common coparents with λ. The setup is illustrated in Figure 1 in the case when the  set of common components, by abuse of notation denoted by β p ∩ β c , is empty. For the general setup, Definition 1 is modified as follows.
Definition 2 Let γ be a vector whose components are a subset of Ch(λ), and define β c as in Eq. (4). Denote by γ 1 the rest of the child nodes of λ, and let β p consist of γ 1 together with its parent nodes in the same way as in Eq. (4), as well as Pa(λ). The information contributions are then given by f c ðλ; β c Þ ∝ pðγjPaðγÞÞ: In Eq. (7), p λjPaðλÞ is replaced by the prior density π(λ) if λ is a top-level parameter. The corresponding iic densities are defined by Eq. (6) as before.

Node-specific conflict measures
The conflict measure c 2 λ of Ref. [9] is defined as The χ 2 1 -distribution is the reference distribution for this measure. For the conflict measures of Ref. [10], the uniform distribution on [0, 1] is the reference distribution. They focus on tail behavior but are based on the same iic distributions. The general distribution of information sources given in Definition 2 is also introduced in Ref. [10]. For a given pair G p , G c of iic distributions, let λ Ã p and λ Ã c be independent samples from G p and G c , respectively. Let G be the cumulative distribution function for and The c 3þ λ -measure and the P conf λ measure of Ref. [13] are very similar. The latter measure is aimed at detecting outlying groups or units in a three-level hierarchical model, with the second-level parameters being location parameters for group-specific data. However, the measure is interpreted as a p value, with small values indicative of conflict. Gåsemyr and Natvig [10] also defines a measure based on defining a tail area in terms of the density g of G, namely applicable also when λ is a vector.
In a simulation study of the c 2 λ -measure in Ref. [9] using a warning level equal to the 95% quantile of the χ 2 1 -distribution, a false warning probability of close to 5% is obtained for a normal random-effects model with unknown variance parameters as in Eq. (1) and also in similar random-effects models with heavy-tailed t-and uniformly distributed random effects. Also with respect to detection power, this measure performs well when compared to a calibrated version of the measure given in Ref. [8], if an optimal data splitting is used. Refs. [10] and [11] prove preexperimental uniformity of the conflict measures in various situations, i.e., their distributions as functions of a Y which is distributed according to the assumed model are uniform, regardless of the true value of the basic parameter. Another way of stating this is that we obtain a proper p-value by subtracting these measures from 1. These results are reviewed in Section 5 of the present chapter.

Integrated information contributions as posterior distributions
In most cases, the conflict measures of Refs. [9] and [10] are based on simulated samples from G p and G c . Definitions 1 and 2 suggest obtaining such samples by running an MCMC algorithm to generate posterior samples of the unknown parameters in β p and β c and then generate samples λ Ã p and λ Ã c from the respective information contributions for each such parameter sample. If the information contributions are standard probability densities, this procedure is straightforward. If not, one may instead often use the fact that, under certain conditions on the data splitting, the distributions G p and G c are posterior distributions conditional on y p and y c , respectively, the latter based on the improper prior π(λ) = 1, independently of the coparents.
Theorem 1 Suppose that the data splitting satisfies the latter expression by abuse of notation meaning the components of Y not present in Y c . Assume λ and the coparents Pa ChðλÞ ∩ β p À λ are independent. We then have g p ðλÞ ¼ πðλjy p Þ and, specifying as prior density πðλjPaðChðλÞ ∩ β c Þ À λÞ ¼ 1; The proof is given in Appendix A in the online supporting information for Ref. [11]. Specializing to the standard setup of Definition 1, where ChðλÞ⊆β c , we see that the requirement for Eq. (13) to hold is that Y c consists of all data descendant nodes of λ. In Ref. [9], this splitting was compared with two other splittings for c 2 λ and found to be optimal with respect to detection power. This measure was also found to be a well-calibrated measure under this splitting.

Noninvariance and reparametrizations
The iic distributions and the corresponding conflict measures are parametrization dependent. Based on experience so far, the conflict measures seem to be fairly robust to changes in parametrization. However, this noninvariance can be handled in a theoretically satisfactory way under certain circumstances.
Let φ be the parameter, in a standard parametrization, corresponding to a specific node in the DAG. Suppose for simplicity that Y c ¼ ChðφÞ. Assume that there exists a sufficient statistic Y c and an alternative parametrization λ, being a strictly monotonic function λ(φ), such that Y c -λ is a pivotal quantity, i.e., the density for Y c given λ is of the form pðy c jλÞ ¼ f Yc ðy c jλÞ ¼ f 0 ðy c À λÞ (15) for some known density function f 0 . Such a parametrization will be considered as a canonical or reference parametrization if it exists, as opposed to the standard parametrization involving φ. Accordingly, the conflict measures given in Eqs. (9)-(12) are preferably based on this reference parametrization.
By Theorem 1, samples λ Ã c from G c may be obtained by MCMC as posterior samples from πðλjy c Þ when the splitting satisfies Eq. (13) and the prior for λ satisfies Eq. (14), i.e., equals 1. According to an argument given in Section 1.3 of Ref. [18], such a prior expresses noninformativity for likelihoods of the form (Eq. (15)). Computationally, we may, however, use the standard parametrization. When generating φ Ã c as posterior samples from π(φ|Y c ), the prior density |dλ/dφ| for φ must be used. Then, we may calculate λ Ã c ¼ λðφ Ã c Þ. To represent the iic distribution G p (λ), we may calculate λ Ã p ¼ λðφ Ã p Þfor samples φ Ã p from πðφjy p Þ according to the given model. Now, the c 4 λ -measure can be estimated from (Eq. (12)), using a kernel density estimate of g(δ) based on corresponding samples δ Ã ¼ λ Ã p À λ Ã c . However, if we limit attention to the c 3 λ -measure (Eq. (11)) and its one-sided versions (Eq. (10)), we may use the samples from πðφjy c Þ and πðφjy p Þ directly. To see this, note that the condition λ Ã p ≥ λ Ã c is equivalent to the condition φ Ã p ≥ φ Ã c (assuming that λ is increasing as a function of φ). Hence, the probability G (0) that λ Ã p À λ Ã c ≤ 0 can be estimated as the proportion of sample values for which φ Ã p ≤ φ Ã c .
4. Extensions to deterministic nodes: Relation to cross-validation, prediction and hypothesis testing 4.1. Cross-validation and data node conflict The model variables Y are represented by the bottom nodes in the DAG describing the hierarchical model. The framework can be extended to also cover conflict concerning these nodes. In this way, cross-validation can be viewed as a special case of the conflict measure approach.
Let Y c be an element in the vector Y of observable random variables. We define the prior iic density g p (y c ) exactly as in Eq. (6), with λ replaced by y c . The Dirac measure at the observed value y c represents a degenerate iic information contribution about Y c . This leads to the following definitions: c 4 y c ¼ P g p ðg p ðY c Þ ≥ g p ðy c ÞÞ: The measures (Eqs. (16)- (18)) are called data node conflict measures. To see that these definitions are consistent with Eqs. (10)- (12), note that λ Ã p corresponds to Y c , and λ Ã c is deterministic and corresponds to y c . We define X = Y cy c , corresponding to δ. We then have gðxÞ ¼ g p ðx þ y c Þ. Hence, and accordingly, Gð0Þ ¼ G p ðy c Þ. It follows that Eqs. (16) and (17) are special cases of Eqs. (10) and (11). Moreover, P g ðgðXÞ ≥ gð0ÞÞ ¼ P g p ðg p ðY c Þ ≥ g p ðy c ÞÞ; showing that Eq. (18) is a special case of Eq. (12).
Furthermore, this correspondence between the data node conflict measures (Eqs. (16) and (17)) and the parameter node conflict measures (Eqs. (10) and (11)) can be used to motivate these latter measures. We will treat the c 3+ measure as an example. Consider again a parameter node λ. If λ were actually observable and known to take the value λ c , the data node version of the c 3+ measure could be used to measure deviations toward the right tail of G p as Now λ is in reality not known, but we can take the expectation of this conflict with respect to the distribution G c , which reflects the uncertainty about λ when influence from data y p is removed. The result is the following theorem: Proof: by Eq. (10).

Cross-validation and sufficient statistics
Suppose the node λ of interest is the parent of the subvector Y c of Y. Suppose also that Y c is a sufficient statistic for Y c . Evidently then, the measures c 3þ λ and c 3þ Yc address the same kind of possible conflict in the model. The following theorem, proved in Ref. [11], states that the two measures agree under certain conditions. This is a generalization of a result in Ref. [13], which also unnecessarily assumed symmetry for the conditional density of Y c .
Theorem 3 Suppose the conditional density for the scalar variable Y c given the parameter λ is of the form f Yc ðyjλÞ ¼ f 2 c, 0 ðy À λÞ. Then, When a sufficient statistic exists, the cross-validatory p-value is considered by Ref. [13] as the gold standard, and the aim of their construction is to provide a measure which is generally applicable and matches cross-validation when a sufficient statistic exists.

Prediction
As mentioned in Section 2, the c 4 measure can be used to assess conflict concerning vectors of nodes. Applying this at the data node level, we may assess the quality of predictions of a subvector Y c of Y based on a complementary subvector y p of observations. The relevant measure is given by Eq. (18), with Y c replaced by the vector Y c . This is particularly well suited to models where data accumulate as time evolves. Such a conflict measure can be used to assess the overall quality of the model. It can also be used as a tool for model comparison and model choice.

Hypothesis testing
Suppose the top-level nodes μ appearing in Eq. (2) are assumed fixed and known according to the model, so that π(μ) is a Dirac measure at these fixed values of the components of μ. Hence, the DAG has deterministic nodes both at the top and at the bottom, namely the vectors μ and y, respectively. We may then check for a conflict concerning a component λ of μ by introducing a random versionλ of λ and contrast the corresponding g c ðλÞ with the fixed value λ. The randomλ has the same children and coparents as λ, and the vector β c , the information contribution f c ðλ; β c Þ and the iic density g c are defined as in Eqs. (4), (5) and (6). The respective conflict measures are defined as in Eqs. (16)- (18) with y c replaced by λ and G p and g p replaced by G c and g c . If the model is rejected when the conflict exceeds a certain predefined warning level, this corresponds to a formal Bayesian test of the hypothesisλ ¼ λ. Using the conflict measure (Eq. (18)), we may put the whole vector μ to test in this way.

Preexperimental uniformity of the conflict measures
In this section, we review some results concerning the distribution of the conflict measures. If c is one of the measures (Eqs. (10), (11), (12), (16), (17) or (18)), then preexperimentally, i.e., prior to observing the data y, c is a random variable taking a value in [0, 1]. A large value of c indicates a possible conflict in the model, and uniformity of c corresponds to 1c being a proper p-value. This does not mean that we propose a formal hypothesis testing procedure for model criticism, possibly even adjusted for multiple testing, nor that we think that a fixed significance level represents an appropriate criterion signaling the need for changing the model. A relatively large value of c may be accepted if there are convincing arguments for believing in a particular modeling aspect, while a less extreme value of c may indicate a need for adjustments in modeling aspects that are considered questionable for other reasons. But the terms "relatively large" and "less extreme" must refer to a meaningful common scale. In our view, uniformity of the conflict measure under all sources of uncertainty is the natural ideal criterion for being a well-calibrated conflict measure, the fulfillment of which ensures comparable assessment of the level of conflict across models. This means that we aim for preexperimental uniformity in cases where the prior distribution is highly noninformative, and also, as discussed in the following subsection, in cases where an informative prior represents part of the randomness in the data-generating process (aleatory uncertainty) rather than subjective (epistemic) uncertainty about the location of a fixed but unknown λ. In this chapter, we limit attention to situations where exact uniformity is achieved. The pivotality condition (Eq. (15)) turns out to be a key assumption needed to obtain such exact results. Refs. [10] and [12] provide some examples where exact uniformity is achieved in other cases.

Data-prior conflict
Consider the model where F λ is an arbitrary informative prior distribution. Here, we think of this prior distribution as representing aleatory rather than epistemic uncertainty. The corresponding densities are denoted by f Y and f λ . If contrasting the prior density with the likelihood f Y ðyjλÞ indicates a conflict between the prior and likelihood information contributions, we consider this a dataprior conflict. The following theorem, proved in Ref. [11], deals with this kind of conflict. Note that in this situation, the Y p part of the data splitting is empty.
Theorem 4 Suppose the conditional density for the scalar variable Y given the parameter λ is of the form f Y ðyjλÞ ¼ f 0 ðy À λÞ and that λ is generated from an arbitrary informative prior density f λ (λ). Then, the data-prior conflict measures about λ are preexperimentally uniformly distributed for both the c 3 λ -and c 4 λ -measures.
The theorem obviously applies to the location parameter of normal and t-distributions with fixed variance parameters, as well as the location parameter in the skew normal distribution [19]. If the vector Y consists of IID normal variables, the theorem also applies to the location parameter, using as scalar variable the sufficient statistic Y. If the n components of Y are IID exponentially distributed with failure rate λ, their sum is a sufficient statistic that is gamma distributed with shape parameter n and scale parameter λ. We may then use the fact that for a variable Y which is gamma distributed with known shape parameter and unknown scale parameter λ, the quantity logðYÞ À logðλÞ is a pivotal statistic, and uniformity is obtained by combining Theorem 4 with the approach of Section 3. In the standard parametrization, the appropriate prior distribution is πðλÞ ¼ 1=λÞ. Details are given in Ref. [11], which also deals with the gamma, inverse gamma, Weibull and lognormal distributions in a similar way.

Data-data conflict
Suppose all components of Y have distributions determined by the same parameter λ.
Suppose we want to contrast information contributions from separate parts of Y about λ and define the splitting ðY p , Y c Þ accordingly. Focusing on this kind of possible conflict, we assume complete prior ignorance about λ and accordingly assume that λ has the improper prior πðλÞ ¼ 1. Hence, recalling Eqs. (7) and (8), we contrast the information in f c ðλ; Y c Þ with that in f p ðλ; Y p Þ. We use the term data-data conflict in this context, since there is no prior information incorporated in f p , and the two information contributions play symmetric roles. However, as a particular application, one may think of Y c as a scalar variable representing a possible outlier.
The following theorem is proved in Ref. [11].
Theorem 5 Suppose that the conditional densities for the scalar variables Y p and Y c given the parameter λ are of the form f Yp ðyjλÞ ¼ f p, 0 ðy À λÞ, f Yc ðyjλÞ ¼ f c, 0 ðy À λÞ.
Assume λ has the improper prior πðλÞ ¼ 1. Then, the data-data conflict measures about λ are preexperimentally uniformly distributed for both the c 3 λ -and c 4 λ -measures.
Theorem 5 can be applied if the components of Y c and Y p are normally or lognormally distributed with known variance parameter, exponentially distributed, or gamma, inverse gamma or Weibull with known shape parameter, since pivotal quantities based on sufficient statistics exist for these distributions.

Normal hierarchical models with fixed covariance matrices
Allowing for each y and ν appearing in Eq.
(2) to be interpreted as vectors of nodes, we now assume that each conditional distribution in the decomposition (Eq. (2)) is multinormal with fixed and known covariance matrices. The random-effects model (Eq. (1)) is a simple example of this. We also assume that the top-level parameter vector μ has the improper prior 1 and that each linear mapping PaðνÞ ! EðνjPaðνÞÞ has full rank. Now let λ be any node in the model description. It is standard to verify that, regardless of how the vector of neighboring and coparent nodes β is decomposed into β p , containing PaðλÞ, and β c , the densities f p ðλ; β p Þ and f c ðλ; β c Þ of Eqs. (5) and (8) are multinormal with fixed covariance matrices. Furthermore, this is true also for the iic densities g p and g c of Eq. (6), regardless of the data splitting. It follows that the density g of the difference δ between independent samples from g p and g c is multinormal with expectation E G ðδÞ ¼ E Gp ðλÞ À E Gc ðλÞ and covariance matrix cov G ðδÞ ¼ cov Gp ðλÞ þ E Gc ðλÞ. It follows that δ À E G ðδÞ t cov G ðδÞ À1 δ À E G ðδÞ is χ 2 -distributed with n ¼ dimðλÞ degrees of freedom, and the probability under G that gðδÞ < gð0Þ is easily seen to be Ψ n E G ðδÞ t cov G ðδÞ À1 E G ðδÞ , where Ψ n is the cumulative distribution function for the χ 2 ndistribution. The preexperimental uniformity of this quantity is proved in Ref. [10].

Theorem 6
Consider a hierarchical normal model as described above.

i.
Let λ be an arbitrary scalar or vector parameter node. If the data splitting satisfies Eq. (13), then c 4 λ is uniformly distributed preexperimentally. ii.
Suppose the data splitting ðY p , Y c Þ satisfies Ch PaðY c Þ ¼ Y c . Then, c 4 Yc is uniformly distributed preexperimentally.
If λ in (i) or Y c in (ii) are one dimensional, then G is symmetric and unimodal, and therefore, the respective c 3 -measures are defined and coincide with the c 4 -measures. Gåsemyr et al. [10] also show that in that case the c 3+ -and c 3À -measures are uniformly distributed preexperimentally.
Example 2. Consider the following DAG model, a regression model with randomly varying regression coefficients.
Y i, j $ NðX t i, j ξ i , σ 2 Þ, ξ i $ Nðξ, ΩÞ, j ¼ 1, …, n, i ¼ 1, …, m, πðξÞ ∝ 1: The m units could be groups of individuals, with y i,j the measurement for a group member with individual covariate vector X i,j , or individuals with the successive y i,j representing repeated measurements over time. In this model, we could check for a possible exceptional behavior of the mth unit by means of the conflict measure c 4 ξ m . With a data splitting for which Y c ¼ Y m ¼ ðY m, 1 , …, Y m, n Þ the conditions for Theorem 6, part (i), are satisfied if dimðξÞ ≤ n, and the measure is preexperimentally uniformly distributed.

Concluding remarks
The assumption of fixed covariance matrices in the previous subsection is admittedly quite restrictive. In general, the presence of unknown nuisance parameters, such as parameters describing the covariance matrices in a normal model, makes the derivation of exact uniformity at least difficult and often impossible. Promising approximate results are reported in Ref. [9] for the closely related c 2 λ measure. Further empirical studies are needed in order to examine to what extent the conflict measures are approximately uniformly distributed in other situations. As an informal tool to be used in conjunction with subject matter insight, the conflict measure approach does not require exact uniformity in order to be useful.