Open access peer-reviewed chapter

Two Examples of Bayesian Evidence Synthesis with the Hierarchical Meta-Regression Approach

By Pablo Emilio Verde

Submitted: January 10th 2017Reviewed: June 28th 2017Published: November 2nd 2017

DOI: 10.5772/intechopen.70231

Downloaded: 973


This is the Information Age. We can expect that for a particular research question that is empirically testable, we should have a collection of evidence which indicates the best way to proceed. Unfortunately, this is not the case in several areas of empirical research and decision making. Instead, when researchers and policy makers ask a specific question, such as “What is the effectiveness of a new treatment?”, the structure of the evidence available to answer this question may be complex and fragmented (e.g. published experiments may have different grades of quality, observational data, subjective judgments, etc.).


  • Bayesian hierarchical models
  • meta-analysis
  • multi-parameters evidence synthesis
  • conflict of evidence
  • randomized control trials
  • retrospective studies

1. Introduction

In today’s information age one can expect that the digital revolution can create a knowledge-based society surrounded by global communications that influence our world in an efficient and convenient way. It is recognized that never in human history we have accumulated such an astronomical amount of data, and we keep on generating data at in an alarming rate. A new term, “big data,” was coined to indicate the existence of “oceans of data” where we may expect to extract useful information for any problem of interest.

In this technological society, one could expect that for a particular research question we should have a collection of high quality evidence which indicates the best way to proceed. Paradoxically, this is not the case in several areas of empirical research and decision making. Instead, when researchers and policy makers ask a specific and important question, such as “What is the effectiveness of a new treatment?”, the structure of the evidence available to answer this question may be complex and fragmented (e.g., published experiments may have different grades of quality, observational data, subjective judgments, etc.). The way how researchers interpret this multiplicity of evidence will be the basis for their understanding of reality and it will determine their future decisions.

Bayesian meta-analysis, which has its roots in the work of Eddy et al. [1], is a branch of statistical techniques for interpreting and displaying results of different sources of evidence, exploring the effects of biases and assessing the propagation of uncertainty into a coherent statistical model. A gentle introduction of this area can be found in Chap. 8 of Spiegelhalter et al. [2] and a recent review in Verde and Ohmann [3].

In this chapter we present a new method for meta-analysis that we have called: the “Hierarchical Meta-Regression” (HMR). The aim of HMR is to have an integrated approach for bias modeling when disparate pieces of evidence are combined in meta-analysis, for instance randomized and non-randomized studies or studies with different qualities. This is a different application of Bayesian inference than those applications with which we could be familiar, for instance an intricate regression model, where the available data bear directly upon the question of interest.

We are going to discuss two recent meta-analyses in clinical research. The reason for highlighting these two cases is that they illustrate a main problem in evidence synthesis, which is the presence of a multiplicity of bias in systematic reviews.

1.1. An example of meta-analysis of therapeutic trials

The first example, is a meta-analysis of 31 randomized controlled trials (RCTs) of two treatment groups of heart disease patients, where the treatment group received bone marrow stem cells and the control group a placebo treatment, Nowbar et al. [4]. The data of this meta-analysis appear in the Appendix, see Table 1. Figure 1 presents the forest plot of these 31 trials, where the treatment effect is measured as the difference of the ejection fraction between groups, which measures the improvement of left ventricular function in the heart.

Figure 1.

Meta-analysis results of studies applying treatments based on bone marrow stem cells to improve the left ventricular function.

At the bottom of Figure 1 we see average summaries represented by two diamonds: the first one corresponds to the fixed effect meta-analysis model. This model is based under the assumption that studies are identical and the between study variability is zero. The widest diamond represents the results of a random effects meta-analysis model, which assume a substantial heterogeneity between studies. In this meta-analysis both models confirmed a positive treatment of effect of a mean difference 3.95 95% CI [3.43; 4.47] and 2.92 and a 95% CI of [1.47, 4.36], respectively.

Could we conclude that we have enough evidence to demonstrate the efficacy of the treatment? Unfortunately, these apparently confirming results are completely misleading. The problem is that these 31 studies are very heterogeneous, which resulted in a wide 95% prediction interval [−4.33; 10.16] covering the no treatment effect, and a large number of contradictory evidence displayed in Figure 1.

In order to explain the sources of heterogeneity in this area Nowbar et al. [4] investigated whether detected discrepancies in published trials, might account for the variation in reported effect sizes. They define a discrepancy in a trial as two or more reported facts that cannot both be true because they are logically or mathematically incompatible. In other words, the term discrepancies is a polite way to indicate that a published study suffers from poor reporting, could be implausible or its results have been manipulated. For example, as we see at the bottom of Table 1 in the appendix, it would be difficult to believe in the results of a study with 55 discrepancies. In Section 2 we present a HMR model to analyze a possible link between the risk of bias results and the amount of discrepancies.

1.2. An example of meta-analysis of diagnostic trials

The topic of Section 3 is the meta-analysis of diagnostic trials. These trials play a central role in personalized medicine, policy making, healthcare and health economics. Figure 2 presents our example in this area. The scatter plot shows the diagnostic summaries of a meta-analysis investigating the diagnostic accuracy of computer tomography scans in the diagnostic of appendicitis [5]. Each circle identifies the true positive rate vs. the false positive rate of each study, where the different circles’ sizes indicate different sample sizes. One characteristic of this meta-analysis is the combination of disparate data. From 51 studies 22 were retrospective and 29 were prospective, which is indicated by the different grey scale of the circles.

Figure 2.

Display of the meta-analysis results of studies performing computer tomography scans in the diagnostic of appendicitis. Each circle identifies the true positive rate vs. the false positive rate of each study. Different colors are used for different study designs and different diameters for sample sizes.

The main problem in this area is the multiple sources of variability behind those diagnostic results. Diagnostic studies are usually performed under different diagnostic setups and patients’ populations. For a particular diagnostic technique we may have a small number of studies which may differ in their statistical design, their quality, etc. Therefore, the main question in meta-analysis of diagnostic test is: How can we combine the multiplicity of diagnostic accuracy rates in a single coherent model? A possible answer to this question is a HMR presented in Section 3. This model has been introduced by Verde [5] and it is available in the R’s package bamdit[6].


2. A Hierarchical Meta-Regression model to assess reported bias

Figure 3 shows the reported effect size and the 95% confidence intervals of 31 trials from [4] against the number of discrepancies (in logarithmic scale). The authors reported a positive statistical significant correlation between the size effect and the number of discrepancies detected in the papers. However, a direct correlation analysis of aggregated results is threatened by ecological bias and it may lead to misleading conclusions. The amount of variability presented by the 95% confidence intervals is very big to accept a positive correlation at face value. In this section we are going to present a HMR model to link the risk of reporting bias with the amount of reported discrepancies. This model assumes that the connection between discrepancies and size effect could be much more subtle.

Figure 3.

Relationship between effect size and number of discrepancies. The vertical axis corresponds to the effect size, the treatment group received a treatment based on bone marrow stem cells and the control group a placebo treatment. The horizontal axis corresponds to the number of discrepancies (in the logarithmic scale) found in the publication.

The starting point of any meta-analytic model is the description of a model for the pieces of evidence at face value. In statistical terms, this means the likelihood of the parameter of interest. Let y1, …, yNand SE1, …, SENbe the reported effect sizes and their corresponding standard errors, we assume a normal likelihood of θithe treatment effect of study i:


If a prior assumption of exchangeability was considered reasonable, a random effects Bayesian model incorporates all the studies into a single model, where the θ1, …, θNare assumed to be a random sample from a prior distribution with unknown parameters, which is known as a hierarchical model.

In this section we assume that exchangeability is unrealistic and we wish to learn how the un-observed treatment effects θ1, …, θNare linked with some observed covariate xi.

Let xibe the number of observed discrepancies in the logarithmic scale. We propose to model the association between the treatment effect θiand the observed discrepancies xiwith the following HMR model:


where the non-observable variable Iiindicates if study iis at risk of bias:


The parameter μcorresponds to the mean treatment effect of studies with low risk of bias. We assume that in our context of application biased studies could report higher effect sizes and the biased mean μbiased can be expressed as:


In this way, Kmeasures the average amount of bias with respect to the mean effect μ. Eq. (4) also ensures that μand μbiased are identifiable parameters in this model. The parameter τmeasures the between-studies variability in both components of the mixture distributions.

We model the probability that a study is biased as a function of xias follows:


In Eq. (5) positive values of α1 indicate that an increase in the number of discrepancies is associated with an increased risk of study bias.

In this HMR model the conditional mean is given by


Eqs. (5) and (6) can be calculated as functional parameters for a grid of values of x. Their posteriors intervals are calculated at each value of x.

This HMR not only quantifies the average bias Kand the relationship between bias and discrepancies in Eq. (5), but also allows to correct the treatment effect θiby its propensity of being biased:


where the amount (θi − K) measures the bias of study iand Pr(Ii = 1|xi) its propensity of being biased.

The HMR model presented above is completed by the following vague hyper-priors: For the regression parameters α0α1 ∼ N(0, 100). We give to the mean μ1 ∼ N(0, 100) and for the bias parameter K ∼ Uniform(0, 50). Finally, for the variability between studies we use τ ∼ Uniform(0, 100), which represent a vague prior within the range of possible study deviations.

The model presented in this section is mathematically non-tractable. We approximated the posterior distributions of the model parameters with Markov Chain Monte Carlo (MCMC) techniques implemented in OpenBUGS.

BUGSstands for Bayesian Analysis Using Gibbs Sampling, the OpenBUGSsoftware constructs a Directed Acyclic Graph (DAG) representation of the posterior distribution of all model’s parameters. This representation allows to automatically factorize the DAG as a product of each node (parameters or data) conditionally on its parents and children. The software scans each node and proposes a method of sampling. The kernel of the Gibbs sampling is built upon this algorithm.

Computations were performed with the statistical language Rand MCMC computations were linked to Rwith the package R2OpenBUGS. We used two chains of 20,000 iterations and we discarded the first 5000 for the burn-in period. Convergence was assessed visually by using the R package coda.

The diagonal panels of Figure 4 summarize the resulting posterior distributions for μ, K, τ, α0 and α1. The posterior of μclearly covers the zero indicating that the stem cells treatment is not effective. The bias parameter Kindicates a considerable over-estimation of treatment effects reported for some trials. The posterior of α1 is concentrated in positive values, which indicates that an increase in discrepancies is associated with an increase of the risk of reporting bias. The posteriors of α0 and α1 also present a large variability, which is expected when a hidden effect is modeled.

Figure 4.

Posterior distributions for the hyper-parameters of the HRM model. The diagonal displays the posterior distributions, the upper panels the pairwise correlations and the lower panels the pairwise posterior densities.

Further results of the Hierarchical Meta-Regression model appears in Figure 5, where posteriors 95% intervals are plotted against the number of discrepancies. On the left panel, we can see the relationship between the number of discrepancies and the probability that a study is biased. We can observe an increase of probability with an increase of the number of discrepancies, but also a large amount of variability. On the right panel appears the conditional mean of effect size as a function of the number of discrepancies, which corresponds to Eq. (6). Our analysis shows that the 95% posterior intervals of the conditional mean covers the zero effect in most of the range of discrepancies. Only for studies with more than 33 (exp(3.5)) discrepancies the model predicts a positive effect. One interesting result of this analysis is, that a horizontal line which may represent a zero correlation is also predicted by the model. This means that the regression calculated directly from the aggregated data contains an ecological bias and it is misleading. We have added this regression line to the plot to highlight this issue.

Figure 5.

Results of the Hierarchical Meta-Regression model. The posterior median and 95% intervals are displayed as solid lines. Left panel: relationship between the number of discrepancies and probability that a study is biased. Right panel: conditional mean of effect size as a function of the number of discrepancies.

The results presented so far indicate that increases in the amount of discrepancies increases the propensity of bias. The question is: How can we correct a particular study for its bias? Eq. (7) gives the bias correction of treatment effect in this HMR model.

In Figure 6 we can see HMR bias correction in action. We display two studies which have 21 and 18 discrepancies respectively. The solid lines correspond to the likelihood functions of these studies. These likelihoods represent the information of the effect size at face value. The dashed lines correspond to the posterior treatment effects after bias correction. Clearly, we can see a strong bias correction with the conclusion of no treatment effect.

Figure 6.

Bias correction for two studies with 21 and 18 discrepancies respectively. The solid lines correspond to the likelihood functions of effect sizes. The dashed lines represent the posteriors for treatment effect after bias correction.

3. Hierarchical Meta-Regression analysis for diagnostic test data

In meta-analysis of diagnostic test data, the pieces of evidence that we aim to combine are the results of Ndiagnostic studies, where results of the ith study (i = 1, …, N) are summarized in a 2 × 2 table as follows:

Patient status
With diseaseWithout disease

where tpiand fniare the number of patients with positive and negative diagnostic results from ni,1 patients with disease, and fpiand tniare the positive and negative diagnostic results from ni,2 patients without disease.

Assuming that ni,1 and ni,2 have been fixed by design, we model the tpiand fpioutcomes with two independent Binomial distributions:


where TPRiis the true positive rate or sensitivity, Sei, of study iand FPRiis the false positive rate or complementary specificity, i.e., 1 − Spi.

At face value, diagnostic performance of each study is summarized by the empirical true positive rate and true negative rate or specificity


and the complementary empirical rates of false positive rate and false negative diagnostic results,


In this type of meta-analysis we could separately model TPRiand FPRi(or Spi), but this approach ignores that these rates could be correlated by design. Therefore, it is more sensible to handle TPRiand FPRijointly.

We define the random effect Diwhich represents the study effect associated with the diagnostic discriminatory power:


However, diagnostic results are sensitive to diagnostic settings (e.g., the use of different thresholds) and to populations where the diagnostic procedure under investigation is applied. These issues are associated with the external validityof diagnostic results. To model external validity bias we introduce the random effect Si:


This random effect quantifies variability produced by patients’ characteristics and diagnostic setup, that may produce a correlation between the observed TPR^sand FPR^s. In short, we called Sithe threshold effectof study iand it represents an adjustment of external validity in the meta-analysis.

We could assume exchangeability of pairs (DiSi), but study’s quality is known to be an issue in diagnostic studies. For this reason we model the internal validityof a study by introducing random weights w1, …, wN. Conditionally to a study weight wi, the study effects Diand Siare modeled as exchangeable between studies and they follow a scale-mixture of bivariate Normaldistributions with the following mean and variance:


and scale mixing density


The inclusion of the random weights wiinto the model was proposed by [5]. This approach was generalized in [6] in two ways: firstly, by splitting wiin two weights w1,iand w2,icorresponding to each component Diand Sirespectively. Secondly, by putting a prior on the degrees of freedom parameter ν, which corresponds to an adaptive robust distribution of the random-effects.

The Hierarchical Meta-Regression representation of the model introduced above is the model based on the conditional distribution of (Di|Si = x) and the marginal distribution of Si. This HMR model was introduced by [7], who followed the stepping stones of the classical Summary Receiving Operating Characteristic (SROC) [8].

The conditional mean of (Di|Si = x) is given by:


where the functional parameters Aand Bare


We define the Bayesian SROC Curve(BSROC) by transforming back results from (SD) to (FPR, TPR) with


where g(p) is the logit(p) transformation, i.e. logit(p) = log(p/(1 − p)).

The BSROC curve is obtained by calculating TPR in a grid of values of FPR which gives a posterior conditionally on each value of FPR. Therefore, it is straightforward to give credibility intervals for the BSROC for each value of FPR.

One important aspect of the BSROC is that it incorporates the variability of the model’s parameters, which influences the width of its credibility intervals. In addition, given that FPR is modeled as a random variable, the curve is corrected by measurement error bias in FPR.

Finally, we can define a Bayesian Area Under the SROC Curve(BAUC) by numerically integrating the BSROC for a range of values of the FPR:


In some applications it is recommend to use the limits of integration within the observed values of FPR^s.

In order to make this complex HMR model applicable in practice, we have implemented the model in the R’s package bamdit, which uses the following set of hyper-priors:




The correlation parameter ρis transformed by using the Fisher transformation,


and a Normal prior is used for z:


Modeling priors in this way guarantees that in each MCMC iteration the variance-covariance matrix of the random effects θ1 and θ2 is positive definite. The values of the constants m1v1m2v2u1u2mrand vrhave to be given. They can be used to include valid prior information which might be empirically available or they could be the result of expert elicitation. If such information is not available, we recommend setting these parameters to values that represent weakly informative priors. In this work, we use m1 = m2 = mr = 0, v1 = v2 = 1, u1 = u2 = 5 and vr=1.7as weakly informative prior setup.

These values are fairly conservative, in the sense that they induce prior uniform distributions for TPRiand FPRi. They give locally uniform distributions for μ1 and μ2; uniforms for σ1 and σ2; and a symmetric distribution for ρcentered at 0.

Figure 7 summarizes the meta-analysis results of fitting the bivariate random-effect model to the computer tomography diagnostic data. The Bayesian Predictive Surface are presented by contours at different credibility levels and compare these curves with the observed data represented by the circles with varying diameters according to the sample size of each study. The scattered points are samples from the predictive posteriors and the histograms correspond to the posterior predictive marginals. This result was generated by using the functionsmetadiag()andplotin the R package bamdit.

Figure 7.

Results of the meta-analysis: Bayesian Predictive Surface by contours at different credibility levels.

Figure 8 displays the posteriors of each components’ weights. The left panel shows that prospective studies number 25 and 33 deviate with respect to the prior mean of 1, while on the right panel we see that a prospective study (number 47) and five retrospective studies (number 1, 3, 4, 8 and 29) have substantial variability.

Figure 8.

Posterior distributions of the component weights: it is expected that the posterior is centered at 1. Studies with retrospective design tend to present deviations in FPR.

An important aspect of wiis its interpretation as estimated bias correction. A prioriall studies included in the review have a mean of E(wi) = 1. We can expect that studies which are unusually heterogeneous will have posteriors substantially greater than 1. Unusual studies’ results could be produced by factors that may affect the quality of the study, such as errors in recording diagnostic results, confounding factors, loss to follow-up, etc. For that reason, the studies’ weights wican be interpreted as an adjustment of studies’ internal validity bias.

The BSROC curve and its area under the curve are presented in Figure 9. The left panel shows this HMR as a meta-analytic summary for this data. On the right panel the posterior distribution of the BAUC show quite a high diagnostic ability for computer tomography scans as diagnostic of appendicitis.

Figure 9.

Hierarchical Meta-Regression model: left panel shows the BSROC curve, the central line corresponds to the posterior median and the upper and lower curves correspond to the quantiles of the 2.5 and 97.5%, respectively. The right panel displays the posterior distribution of the area under the BSROC curve.

4. Conclusions

In this work we have seen the HMR in action. This approach of meta-analysis is based on a simple strategy: two sub-models are defined in the meta-analysis, one which models the problem of interest, for instance the treatment effect, and one which handles the multiplicity of bias. The meta-analysis is summarized by understanding how these components interact with each other.

The examples presented in this work have shown that we could have misleading conclusions from indirect evidence, if it were analyzed as directly contributing to the problem of interest.

For instance, in the first example, Section 2, we have seen in Figure 1 that pooling studies gave a wrong conclusion about the effect of stem cells treatment. The positive correlation between the aggregated effect size and the number of discrepancies exaggerates its relationship.

Actually, in Figure 5 the HMR has shown that it is possible to simultaneously have a zero correlation between effect size and discrepancies while still having a risk of reporting bias. In addition, the HMR allows to extract the amount of bias in the meta-analysis and to correct the treatment effect at the level of the study (Figure 6).

In the second example, Section 3, biases come from the external validity of diagnostic studies and the internal validity due to their quality. In this example the HMR showed that it was possible to simultaneously model these two types of subtle biases.

To account for internal validity bias, the application of a scale mixture of normal distributions allows us to detect conflictive studies, which can be considered as outliers. The Bayesian Summary Receiving Operative Curve accounts for the external validity bias due to changes in factors that affected the diagnostic results. In addition, the posterior for its Area Under the Curve (AUC) summarizes the results of the meta-analysis.


This work was supported by the German Research Foundation project DFG VE 896 1/1.


Trial IDEffect sizeSE (effect size)Sample sizeNumber of discrepanciesAuthor or principal investigatorYearCountry
t021.12.091007Lunde2007 Norway
t03−1.72.91237Srimahachota2011 Thailand
t0670.63404Meluzín2006Czech Republic
t11144.052013Suárez de Lezo2007Spain
t125.42.447718Huikuri HV2008Finland
t182.53.96202 Hendrikx2006Belgium
t19−0.21.171273Hirch2011The Netherlands
t2541.28403Rodrigo2012The Netherlands
t31−0.21.5418319Ribero dos Santos2012Brazil
t385.42.542715Tse2007Hong Kong
t48−3.92.62401Wöhrle 2010Germany
t4910.41.0111655Yousef (Strauer)2009Germany

Table 1.

Results from 31 randomized controlled trials of heart disease patients, where the treatment group received bone marrow stem cells and the control group a placebo treatment. The source of this table is Nowbar et al. [4].

© 2017 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Pablo Emilio Verde (November 2nd 2017). Two Examples of Bayesian Evidence Synthesis with the Hierarchical Meta-Regression Approach, Bayesian Inference, Javier Prieto Tejedor, IntechOpen, DOI: 10.5772/intechopen.70231. Available from:

chapter statistics

973total chapter downloads

1Crossref citations

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Bayesian Modeling in Genetics and Genomicsvvv

By Hafedh Ben Zaabza, Abderrahmen Ben Gara and Boulbaba Rekik

Related Book

First chapter

Making a Predictive Diagnostic Model for Rangeland Management by Implementing a State and Transition Model Within a Bayesian Belief Network (Case Study: Ghom- Iran)

By Hossein Bashari

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us