Results from 31 randomized controlled trials of heart disease patients, where the treatment group received bone marrow stem cells and the control group a placebo treatment. The source of this table is Nowbar et al. .
This is the Information Age. We can expect that for a particular research question that is empirically testable, we should have a collection of evidence which indicates the best way to proceed. Unfortunately, this is not the case in several areas of empirical research and decision making. Instead, when researchers and policy makers ask a specific question, such as “What is the effectiveness of a new treatment?”, the structure of the evidence available to answer this question may be complex and fragmented (e.g. published experiments may have different grades of quality, observational data, subjective judgments, etc.).
- Bayesian hierarchical models
- multi-parameters evidence synthesis
- conflict of evidence
- randomized control trials
- retrospective studies
In today’s information age one can expect that the digital revolution can create a knowledge-based society surrounded by global communications that influence our world in an efficient and convenient way. It is recognized that never in human history we have accumulated such an astronomical amount of data, and we keep on generating data at in an alarming rate. A new term, “big data,” was coined to indicate the existence of “oceans of data” where we may expect to extract useful information for any problem of interest.
In this technological society, one could expect that for a particular research question we should have a collection of high quality evidence which indicates the best way to proceed. Paradoxically, this is not the case in several areas of empirical research and decision making. Instead, when researchers and policy makers ask a specific and important question, such as “What is the effectiveness of a new treatment?”, the structure of the evidence available to answer this question may be complex and fragmented (e.g., published experiments may have different grades of quality, observational data, subjective judgments, etc.). The way how researchers interpret this multiplicity of evidence will be the basis for their understanding of reality and it will determine their future decisions.
Bayesian meta-analysis, which has its roots in the work of Eddy et al. , is a branch of statistical techniques for interpreting and displaying results of different sources of evidence, exploring the effects of biases and assessing the propagation of uncertainty into a coherent statistical model. A gentle introduction of this area can be found in Chap. 8 of Spiegelhalter et al.  and a recent review in Verde and Ohmann .
In this chapter we present a new method for meta-analysis that we have called: the “Hierarchical Meta-Regression” (HMR). The aim of HMR is to have an integrated approach for bias modeling when disparate pieces of evidence are combined in meta-analysis, for instance randomized and non-randomized studies or studies with different qualities. This is a different application of Bayesian inference than those applications with which we could be familiar, for instance an intricate regression model, where the available data bear directly upon the question of interest.
We are going to discuss two recent meta-analyses in clinical research. The reason for highlighting these two cases is that they illustrate a main problem in evidence synthesis, which is the presence of a multiplicity of bias in systematic reviews.
1.1. An example of meta-analysis of therapeutic trials
The first example, is a meta-analysis of 31 randomized controlled trials (RCTs) of two treatment groups of heart disease patients, where the treatment group received bone marrow stem cells and the control group a placebo treatment, Nowbar et al. . The data of this meta-analysis appear in the Appendix, see Table 1. Figure 1 presents the forest plot of these 31 trials, where the treatment effect is measured as the difference of the ejection fraction between groups, which measures the improvement of left ventricular function in the heart.
At the bottom of Figure 1 we see average summaries represented by two diamonds: the first one corresponds to the fixed effect meta-analysis model. This model is based under the assumption that studies are identical and the between study variability is zero. The widest diamond represents the results of a random effects meta-analysis model, which assume a substantial heterogeneity between studies. In this meta-analysis both models confirmed a positive treatment of effect of a mean difference 3.95 95% CI [3.43; 4.47] and 2.92 and a 95% CI of [1.47, 4.36], respectively.
Could we conclude that we have enough evidence to demonstrate the efficacy of the treatment? Unfortunately, these apparently confirming results are completely misleading. The problem is that these 31 studies are very heterogeneous, which resulted in a wide 95% prediction interval [−4.33; 10.16] covering the no treatment effect, and a large number of contradictory evidence displayed in Figure 1.
In order to explain the sources of heterogeneity in this area Nowbar et al.  investigated whether detected discrepancies in published trials, might account for the variation in reported effect sizes. They define a discrepancy in a trial as two or more reported facts that cannot both be true because they are logically or mathematically incompatible. In other words, the term discrepancies is a polite way to indicate that a published study suffers from poor reporting, could be implausible or its results have been manipulated. For example, as we see at the bottom of Table 1 in the appendix, it would be difficult to believe in the results of a study with 55 discrepancies. In Section 2 we present a HMR model to analyze a possible link between the risk of bias results and the amount of discrepancies.
1.2. An example of meta-analysis of diagnostic trials
The topic of Section 3 is the meta-analysis of diagnostic trials. These trials play a central role in personalized medicine, policy making, healthcare and health economics. Figure 2 presents our example in this area. The scatter plot shows the diagnostic summaries of a meta-analysis investigating the diagnostic accuracy of computer tomography scans in the diagnostic of appendicitis . Each circle identifies the true positive rate vs. the false positive rate of each study, where the different circles’ sizes indicate different sample sizes. One characteristic of this meta-analysis is the combination of disparate data. From 51 studies 22 were retrospective and 29 were prospective, which is indicated by the different grey scale of the circles.
The main problem in this area is the multiple sources of variability behind those diagnostic results. Diagnostic studies are usually performed under different diagnostic setups and patients’ populations. For a particular diagnostic technique we may have a small number of studies which may differ in their statistical design, their quality, etc. Therefore, the main question in meta-analysis of diagnostic test is: How can we combine the multiplicity of diagnostic accuracy rates in a single coherent model? A possible answer to this question is a HMR presented in Section 3. This model has been introduced by Verde  and it is available in the R’s package
2. A Hierarchical Meta-Regression model to assess reported bias
Figure 3 shows the reported effect size and the 95% confidence intervals of 31 trials from  against the number of discrepancies (in logarithmic scale). The authors reported a positive statistical significant correlation between the size effect and the number of discrepancies detected in the papers. However, a direct correlation analysis of aggregated results is threatened by ecological bias and it may lead to misleading conclusions. The amount of variability presented by the 95% confidence intervals is very big to accept a positive correlation at face value. In this section we are going to present a HMR model to link the risk of reporting bias with the amount of reported discrepancies. This model assumes that the connection between discrepancies and size effect could be much more subtle.
The starting point of any meta-analytic model is the description of a model for the pieces of evidence at face value. In statistical terms, this means the likelihood of the parameter of interest. Let
If a prior assumption of exchangeability was considered reasonable, a random effects Bayesian model incorporates all the studies into a single model, where the
In this section we assume that exchangeability is unrealistic and we wish to learn how the un-observed treatment effects
where the non-observable variable
In this way,
We model the probability that a study is biased as a function of
In Eq. (5) positive values of
In this HMR model the conditional mean is given by
This HMR not only quantifies the average bias
where the amount (
The HMR model presented above is completed by the following vague hyper-priors: For the regression parameters
The model presented in this section is mathematically non-tractable. We approximated the posterior distributions of the model parameters with Markov Chain Monte Carlo (MCMC) techniques implemented in
Computations were performed with the statistical language
The diagonal panels of Figure 4 summarize the resulting posterior distributions for
Further results of the Hierarchical Meta-Regression model appears in Figure 5, where posteriors 95% intervals are plotted against the number of discrepancies. On the left panel, we can see the relationship between the number of discrepancies and the probability that a study is biased. We can observe an increase of probability with an increase of the number of discrepancies, but also a large amount of variability. On the right panel appears the conditional mean of effect size as a function of the number of discrepancies, which corresponds to Eq. (6). Our analysis shows that the 95% posterior intervals of the conditional mean covers the zero effect in most of the range of discrepancies. Only for studies with more than 33 (exp(3.5)) discrepancies the model predicts a positive effect. One interesting result of this analysis is, that a horizontal line which may represent a zero correlation is also predicted by the model. This means that the regression calculated directly from the aggregated data contains an ecological bias and it is misleading. We have added this regression line to the plot to highlight this issue.
The results presented so far indicate that increases in the amount of discrepancies increases the propensity of bias. The question is: How can we correct a particular study for its bias? Eq. (7) gives the bias correction of treatment effect in this HMR model.
In Figure 6 we can see HMR bias correction in action. We display two studies which have 21 and 18 discrepancies respectively. The solid lines correspond to the likelihood functions of these studies. These likelihoods represent the information of the effect size at face value. The dashed lines correspond to the posterior treatment effects after bias correction. Clearly, we can see a strong bias correction with the conclusion of no treatment effect.
3. Hierarchical Meta-Regression analysis for diagnostic test data
In meta-analysis of diagnostic test data, the pieces of evidence that we aim to combine are the results of
|With disease||Without disease|
At face value, diagnostic performance of each study is summarized by the empirical true positive rate and true negative rate or specificity
and the complementary empirical rates of false positive rate and false negative diagnostic results,
In this type of meta-analysis we could separately model TPR
We define the random effect
However, diagnostic results are sensitive to diagnostic settings (e.g., the use of different thresholds) and to populations where the diagnostic procedure under investigation is applied. These issues are associated with the
This random effect quantifies variability produced by patients’ characteristics and diagnostic setup, that may produce a correlation between the observed and . In short, we called
We could assume exchangeability of pairs (
and scale mixing density
The inclusion of the random weights
The Hierarchical Meta-Regression representation of the model introduced above is the model based on the conditional distribution of (
The conditional mean of (
where the functional parameters
We define the
where g(p) is the logit(p) transformation, i.e. logit(p) = log(p/(1 − p)).
The BSROC curve is obtained by calculating TPR in a grid of values of FPR which gives a posterior conditionally on each value of FPR. Therefore, it is straightforward to give credibility intervals for the BSROC for each value of FPR.
One important aspect of the BSROC is that it incorporates the variability of the model’s parameters, which influences the width of its credibility intervals. In addition, given that FPR is modeled as a random variable, the curve is corrected by measurement error bias in FPR.
Finally, we can define a
In some applications it is recommend to use the limits of integration within the observed values of .
In order to make this complex HMR model applicable in practice, we have implemented the model in the R’s package
The correlation parameter
and a Normal prior is used for
Modeling priors in this way guarantees that in each MCMC iteration the variance-covariance matrix of the random effects
These values are fairly conservative, in the sense that they induce prior uniform distributions for
Figure 7 summarizes the meta-analysis results of fitting the bivariate random-effect model to the computer tomography diagnostic data. The Bayesian Predictive Surface are presented by contours at different credibility levels and compare these curves with the observed data represented by the circles with varying diameters according to the sample size of each study. The scattered points are samples from the predictive posteriors and the histograms correspond to the posterior predictive marginals. This result was generated by using the functions
Figure 8 displays the posteriors of each components’ weights. The left panel shows that prospective studies number 25 and 33 deviate with respect to the prior mean of 1, while on the right panel we see that a prospective study (number 47) and five retrospective studies (number 1, 3, 4, 8 and 29) have substantial variability.
An important aspect of
The BSROC curve and its area under the curve are presented in Figure 9. The left panel shows this HMR as a meta-analytic summary for this data. On the right panel the posterior distribution of the BAUC show quite a high diagnostic ability for computer tomography scans as diagnostic of appendicitis.
In this work we have seen the HMR in action. This approach of meta-analysis is based on a simple strategy: two sub-models are defined in the meta-analysis, one which models the problem of interest, for instance the treatment effect, and one which handles the multiplicity of bias. The meta-analysis is summarized by understanding how these components interact with each other.
The examples presented in this work have shown that we could have misleading conclusions from indirect evidence, if it were analyzed as directly contributing to the problem of interest.
For instance, in the first example, Section 2, we have seen in Figure 1 that pooling studies gave a wrong conclusion about the effect of stem cells treatment. The positive correlation between the aggregated effect size and the number of discrepancies exaggerates its relationship.
Actually, in Figure 5 the HMR has shown that it is possible to simultaneously have a zero correlation between effect size and discrepancies while still having a risk of reporting bias. In addition, the HMR allows to extract the amount of bias in the meta-analysis and to correct the treatment effect at the level of the study (Figure 6).
In the second example, Section 3, biases come from the external validity of diagnostic studies and the internal validity due to their quality. In this example the HMR showed that it was possible to simultaneously model these two types of subtle biases.
To account for internal validity bias, the application of a scale mixture of normal distributions allows us to detect conflictive studies, which can be considered as outliers. The Bayesian Summary Receiving Operative Curve accounts for the external validity bias due to changes in factors that affected the diagnostic results. In addition, the posterior for its Area Under the Curve (AUC) summarizes the results of the meta-analysis.
This work was supported by the German Research Foundation project DFG VE 896 1/1.
|Trial ID||Effect size||SE (effect size)||Sample size||Number of discrepancies||Author or principal investigator||Year||Country|
|t11||14||4.05||20||13||Suárez de Lezo||2007||Spain|
|t31||−0.2||1.54||183||19||Ribero dos Santos||2012||Brazil|