Bayesian Model Averaging and Compromising in Dose-Response Studies

Steven B. Kim

doi:10.5772/intechopen.68786

Abstract

Dose-response models are applied to animal-based cancer risk assessments and human-based clinical trials usually with small samples. For sparse data, we rely on a parametric model for efficiency, but posterior inference can be sensitive to an assumed model. In addition, when we utilize prior information, multiple experts may have different prior knowledge about the parameter of interest. When we make sequential decisions to allocate experimental units in an experiment, an outcome may depend on decision rules, and each decision rule has its own perspective. In this chapter, we address the three practical issues in small-sample dose-response studies: (i) model-sensitivity, (ii) disagreement in prior knowledge and (iii) conflicting perspective in decision rules.

Keywords

dose-response models
model-sensitivity
model-averaging
prior-sensitivity
consensus prior
Bayesian decision theory
individual-level ethics
population-level ethics
Bayesian adaptive designs
sequential decisions
continual reassessment method
c-optimal design
Phase I clinical trials

Author Information

Show +

Steven B. Kim*
- Department of Mathematics and Statistics, California State University, CA, United States

*Address all correspondence to: stkim@csumb.edu

1. Introduction

Dose-response modeling is often used to learn about the effect of an agent on a particular outcome with respect to dose. It is widely applied to animal-based cancer risk assessments and human-based clinical trials. A sample size is typically small; so many statistical issues can arise from a limited amount of data. The issues include the impact of a misspecified model, prior-sensitivity, and conflicting ethical perspectives in clinical trials. In this chapter, we focus on cases when an outcome variable of interest is binary (a predefined event happened or not) when an experimental unit is exposed to a dose. Main ideas are preserved for cases when an outcome variable is continuous or discrete.

There are two different approaches to statistical inference. One approach is called frequentist inference. In this framework, we often rely on the sampling distribution of a statistic and large-sample theories. Another approach is called Bayesian inference. It is founded on Bayes’ Theorem, and it allows researchers to express prior knowledge independent of data. In a small-sample study, Bayesian inference can be more useful than frequentist inference because we can incorporate both researcher’s prior knowledge and observed data to make inference for the parameter of interest. Bayesian ideas are briefly introduced for dose-response modeling with a binary outcome in Section 2.

In a small-sample study, we often rely on a parametric model to gain statistical efficiency (i.e., less variance in parameter estimation), but our inference can be severely biased by the use of a wrong model. To account for model uncertainty, it is reasonable to specify multiple models and make inference based on “averaged-inference.” In this regard, Bayesian model averaging (BMA) is a useful method to gain robustness [1]. The BMA method has a wide range of application, and we focus its application to animal-based cancer risk assessments in Section 3.

In clinical trials, study participants are real patients, and therefore, we need to carefully consider ethics. There are conflicting perspectives of individual- and population-level ethics in early phase clinical trials. Individual-level ethics focuses on the benefit of trial participants, whereas population-level focuses on the benefit of future patients, which may require some level of sacrifice from trial participants. We compare the two conflicting perspectives in clinical trials based on Bayesian decision theory, and we discuss a compromising method in Section 4 [2, 3].

A sample size for an early phase (Phase I) clinical trial is often less than 30 subjects. Dose allocations for first few patients and statistical inference for future patients heavily depend on researcher’s prior knowledge in sparse data. When multiple researchers have different prior knowledge about a parameter of interest, one compromising approach is to combine their prior elicitations and average them (i.e., consensus prior) [4, 5]. When we average the prior elicitations, there are two different approaches to determine the weight of each prior elicitation, weights determined before observing data and after observing data. We discuss operating characteristics of the two different weighting methods in the context of Phase I clinical trials in Section 5.

2. Bayesian inference

In statistics, we address a research question by a parameter, which is often denoted by θ. We begin Bayesian inference by modeling the prior knowledge about θ. A function, which models the prior knowledge about θ, is called the prior density function of θ, and we denote it by f(θ). It is a non-negative function, which satisfies ∫Ω f(θ)dθ=1, where Ω is the set of all possible values of θ (i.e., parameter space). We then model data y→=(y1,…,yn) given θ. The likelihood function, denoted by f(y→|θ), quantifies the likelihood of observing a particular sample y→=(y1,…,yn) under an assumed probability model. By Bayes’ Theorem, we update our knowledge about θ after observing data y→ as

f(θ|y→)=f(y→|θ) f(θ)f(y→) .E1

The function f(θ|y→) is called the posterior density function of θ given data y→. Since we treat observed data y→=(y1,…,yn) as fixed numbers, we often express Eq. (1) as follows

f(θ|y→)∝f(y→|θ)f(θ)=kf(y→|θ)f(θ) ,E2

where k is the normalizing constant which makes ∫Ω f(θ|y→)dθ=1. We can often realize f(θ|y→) based on the prior density function f(θ) and the likelihood function f(y→|θ) without considering the denominator f(y→)=∫f(y|θ) f(θ)dθ in Eq. (1) which is called the marginal likelihood.

2.1. Example

Suppose we observe n = 20 rats for 2 years. Let π be the parameter of interest, which is interpreted as the probability of developing some type of tumor. Suppose a researcher models the prior knowledge about π using the prior density function

f(π)=Γ(a+b)Γ(a) Γ(b) πa−1 (1−π)b−1 , 0<π<1 .E3

It is known as the beta distribution with shape parameters a > 0 and b > 0. We often denote the beta distribution by π∼Beta(a,b), and the values of a and b must be specified by the researcher independent from data. Let y→=(y1,…,yn) denote observed data, where y_i = 1 if the _i^th rat developed tumor and y_i = 0 otherwise. Assuming y₁,…, y_n are independent observations, the likelihood function is as follows

f(y→|π)=∏i=1n πyi (1−π)1−yi=πs (1−π)n−s ,E4

where s=∑i=1nyi is the total number of rats developed tumor. By Eq. (2), the posterior density function of π is as follows

f(π|y→)=k πa+s−1 (1−π)b+n−s−1 ,E5

where k=Γ(a+b+n)Γ(a+s) Γ(b+n−s) is the normalizing constant, which makes ∫01f(π|y→)dπ=1. We can recognize that π|y→∼Beta(a+s,b+n−s).

If the researcher fixed a = 2 and b =3 and observed s = 9 from a sample of size n = 20, the prior density function is f(π)=k π (1−π)2 with k=Γ(5)Γ(2) Γ(3)=12, and the posterior density function is f(π|y→)=k π10 (1−π)13 with k=Γ(25)Γ(11) Γ(14)=27457584. The prior and posterior distributions are shown in Figure 1. The knowledge about π becomes more certain (less variance) after observing the data.

Figure 1.
The prior f(π) in the dotted curve and the posterior f(π|y→) in the solid curve.

2.2. Example

This example is simplified from Shao and Small [6]. In dose-response studies, we model π as a function of dose x. There are many link functions between π and x used in practice. In this example, we focus on a link function

πx=eβ0+β1x1+eβ0+β1xE6

which is known as a logistic regression model. It is commonly assumed that a dose-response curve increases with respect to dose, so we assume β₁ > 0 (and β₀ can be any real number). There are two regression parameters in Eq. (6), β₀ and β₁, and we denote them as β→=(β0,β1). Figure 2 presents two dose-response curves. The solid curve is generated by β→=(−1,2), and the dotted curve is generated by β→=(−2,5). As β₀ increases, the background risk π0=eβ01+eβ0 increases, where π₀ is interpreted as the probability of tumor development at dose x = 0. The dose-response curve increases when β₁ > 0, and it decreases when β₁ < 0. The rate of change in the dose-response curve is determined by|β₁|.

Figure 2.
Two dose-response curves using the logistic link.

To express prior knowledge about β→, we need to find an appropriate prior density function f(β→). It is not simple because it is difficult to express one’s knowledge on the two-dimensional parameters β→=(β0,β1). For mathematical convenience, some practitioners use a flat prior density function f(β→)∝1. Another way of expressing a lack of prior knowledge about β→ is as follows

f(β→)∝12πσ2 e−β02+β122σ2 Iβ1>0E7

with an arbitrarily large value of σ [6]. When a reliable source of prior information is available, there is a practical method, which is known as the conditional mean prior [7], and it will be discussed in a later section (see Section 4.2). In an experiment, the experimental doses x→=(x1,…,xn) are fixed, and we observe random binary outcomes y→=(y1,…,yn). Given y→ (and fixed x→), the likelihood function is as follows

f(y→|β→)=∏i=1n (eβ0+β1xi1+eβ0+β1xi)yi (11+eβ0+β1xi)1−yi=eβ0s1+β1s2∏i=1n(1+eβ0+β1xi) ,E8

where s1=∑i=1nyi and s2=∑i=1nxiyi. By incorporating both prior and data, the posterior density function is as follows

f(β→|y→)∝f(β→) eβ0s1+β1s2∏i=1n(1+eβ0+β1xi) .E9

In an animal-based studies, one parameter of interest is the median effective dose, which is denoted by ED₅₀. It is the dose, which satisfies

πED50=eβ0+β1ED501+eβ0+β1ED50=.5 ,E10

and it can be shown that ED50=−β0β1 by algebra. In the case of β₀ = −2 and β₁ = 5, we have ED₅₀ = .4 as describe in the figure with the dotted curve. In the case of β₀ = −1 and β₂ = 2, we have ED₅₀ = .5 as described in the figure with the solid curve.

In 1997, International Agency for Research on Cancer classified 2,3,7,8-Tetrachlorodibenzo-p-dioxin (known as TCDD) as a carcinogen for humans based on various empirical evidence [8]. In 1978, Kociba et al. presented the data on male Sprague-Dawley rats at four experimental doses 0, 1, 10 and 100 nanograms per kilogram per day (ng/kg/day) [9]. In the control dose group, nine of 86 rats developed tumor (known as hepatocellular carcinoma); three of 50 rats developed the tumor at dose 1; 18 of 50 rats developed the tumor at dose 10; and 34 of 48 rats developed the tumor at dose 100 [6]. Without loss of generosity, we let x_i = 0 for i = 1,…, 86; x_i = 1 for i = 87,…136; x_i = 10 for i = 137,…, 186; and x_i = 100 for i = 187,…, 234. The given information is sufficient to calculate s1=∑i=1nyi=64 and s2=∑i=1nxiyi=3583. By the use of the flat prior f(β→)∝1 with the restriction β₁ > 0, given the observed sample of size n = 234, we can generate random numbers of β→=(β0,β1) from the posterior density function

f(β→|y→)∝eβ0s1+β1s2∏i=1n(1+eβ0+β1xi) Iβ1>0 ,E11

where Iβ1>0=1 if β₁ > 0 and Iβ1>0=0 otherwise. Using a method of Markov Chain Monte Carlo (MCMC), we can approximate the posterior distribution of β→ as shown in the left panel of Figure 3. By transforming (β₀, β₁) to ED50=−β0β1, we can approximate the posterior distribution of the median effective dose ED₅₀ as shown in the right panel. The posterior mean of ED₅₀ is E(ED50|y→)=64.9 with 95% credible interval (50.8, 82.5), the 2.5th percentile and the 97.5th percentile of the posterior distribution.

Figure 3.
Approximate posterior distributions of (β₀, β₁) and ED50=−β0β1.

3. Bayesian model averaging

In a small sample, we borrow the strength of a parametric model to gain efficiency in parameter estimation. However, an assumed model may not describe the true dose-response relationship adequately. The impact of model misspecification is not negligible particularly in a poor experimental design. In such a limited practical situation, Bayesian model averaging (BMA) can be a useful method to account for model uncertainty. It is widely applied in practice, and in this section, we focus on the application to cancer risk assessment for the estimation of a benchmark dose [1, 6, 10, 11].

Let θ denote a parameter of interest. Suppose we have a set of K candidate models denoted by M={M1,…,MK}. Let β→k denote the vector of regression parameters under model M_k for k =1,…, K. Suppose θ is a function of β→k, and the interpretation of θ must be common across all models. Let f(β→k|Mk) and f(y→|β→k,Mk) denote the prior density function and the likelihood function, respectively, under M_k. By the Law of Total Probability, the posterior density function of θ is as follows

f(θ|y→)=∑k=1Kf(θ|Mk,y→)P(Mk|y→) .E12

In Eq. (12), the posterior density function f(θ|Mk,y→) depends on model M_k, and the posterior model probability P(Mk|y→) quantifies the plausibility of model M_k after observing data, which is given by

P(Mk|y→)=f(y→|Mk) P(Mk)∑j=1Kf(y→|Mj)P(Mj) .E13

In Eq. (13), the prior model probability P(M_k) is determined before observing data such that P(Mk)>0 for k=1,…,K and ∑k=1KP(Mk)=1. The marginal likelihood under M_k requires the integration

f(y→|Mk)=∫ f(y→|β→k,Mk) f(β→k|Mk) dβ→k .E14

In the BMA method, all K models contribute to inference of θ through the averaged posterior density function in Eq. (12), and the weight of contribution is determined by Bayes’ Theorem in Eq. (13).

3.1. Example

This example is continued from the example in Section 2.2. Recall π_x is interpreted as the probability of a toxic event (tumor development) at dose x. In many cancer risk assessments, a parameter of interest is θ_γ at a fixed risk level γ, which is defined as follows

γ=πθγ−π01−π0E15

or equivalently πθγ=π0+(1−π0) γ. In words, θ_γ is a dose corresponding to a fixed increase in the risk level. In frequentist framework, Crump defined a benchmark dose as a lower confidence limit for θ_γ [12]. In Bayesian framework, an analogous definition would be a lower credible bound (i.e., a fixed low percentile of the posterior distribution of θ_γ). The definition is widely applied to the public health protection [13].

In practice, γ is fixed between 0.01 and 0.1. Often, the estimation of θ_γ is highly sensitive to an assumed dose-response model because we have a lack of information at low doses. Shao and Small fixed γ = 0.1 and applied BMA with K = 2 models, logistic model and quantal-linear model [6]. In the quantal-linear model, the probability of tumor development is modeled by

πx=β0+(1−β0)(1−e−β1x) .E16

with the restrictions 0 < β₀ < 1 and β₁ > 0 under the monotonic assumption. The logistic model was given in Eq. (6) of Section 2.2.

Let M₁ denote the logistic model, and let M₂ denote the quantal-linear model. Assume the uniform prior model probabilities P(M1)=P(M2)=.5 and flat priors on the regression parameters. By posterior sampling, we can approximate the posterior model probabilities P(M1|y→)=.049 and P(M2|y→)=.951. Under M₁, the posterior mean of θ_0.1 is 20.95 with the 5th percentile 16.74. Under M₂, the posterior mean is 8.25 with the 5th percentile 5.95. These results are very similar to the results reported by Shao and Small [6]. From these model-specific statistics, we can calculate the model-averaged posterior mean

E(θ0.1|y→)=∑k=12 E(θ0.1|Mk,y→)P(Mk|y→)=20.95 (.049)+8.25 (.951)=8.87 .E17

However, we are not able to calculate the 5th percentile of the model-averaged posterior distribution based on the given statistics. In fact, we need to approximate the posterior distribution f(θ0.1|y→), which is a mixture of f(θ0.1|M1,y→) and f(θ0.1|M2,y→) weighted by P(M1|y→)=.049 and P(M2|y→)=.951, respectively, as shown in Figure 4. In the figure, the left panel shows an approximation of f(θ0.1|M1,y→), the middle panel shows an approximation of f(θ0.1|M2,y→), and the right panel shows an approximation of the averaged posterior f(θ0.1|y→). The averaged posterior density f(θ0.1|y→) is bimodal, but it is very close to f(θ0.1|M2,y→) because the quantal-linear model M₂ fits the data better than the logistic model M₁ by a Bayes factor of P(M2|y→)P(M1|y→)=.951.049=19.4. The 5th percentile of the model-averaged posterior distribution is approximately 5.97, and it is a BMA-BMD based on the BMA method proposed by Raftery et al. [1] and the BMD estimation method suggested by Crump [12].

Figure 4.
Posterior distributions of θ_0.1 from the logistic model (left panel), the quantal-linear model (middle panel), and the Bayesian model averaging (right panel).

4. Application of Bayesian decision theory to Phase I trials

In a Phase I cancer trial, the main objectives are to study the safety of a new chemotherapy and to determine an appropriate dose for future patients. Since trial participants are cancer patients, dose allocations require ethical considerations. Whitehead and Williams discussed several Bayesian approaches to dose allocations [14]. One decision rule is devised from the perspective of trial participants (individual-level ethics), and another decision rule is devised from the perspective of future patients (population-level ethics). However, a decision rule, which is devised from the population-level ethics, is not widely accepted in current practice [15]. Instead, there are some proposed decision rules, which compromise between the individual- and population-level perspectives [3, 16]. In this section, we discuss the two conflicting perspectives in Phase I clinical trials and a compromising method based on Bayesian decision theory.

Assume a dose-response relationship follows a logistic model

πx=eβ0+β1x1+eβ0+β1x,E18

where x is a dose in the logarithmic scale (base e) and π_x is the probability of observing an adverse event due to the toxicity of a new chemotherapy at dose x. The logarithmic transformation on the dose is to satisfy π_x → 0 as x→0. Let x→n=(x1,…,xn) denote a series of decisions for n patients (i.e., allocated doses) and y→n=(y1,…,yn) denote a series of observed responses, where y_i = 1 indicates an adverse event and y_i = 0 otherwise. Let L(β→,xn+1) denote a loss by allocating the next patient at x_n+1. Based on Bayesian Decision Theory, we want to find x_n+1 which minimizes the posterior mean of L(β→,xn+1). If we let A denote an action space, a set of all possible dose allocations for the next patients, the decision rule can be written as follows:

xn+1*=argminxn+1∈A E(L(β→,xn+1) | y→n) .E19

A choice of L has a substantial impact on the operating characteristics of a Phase I trial including (i) the degree of under- and over-dosing in trial, (ii) the observed number of adverse events at the end of a trial, and (iii) the quality of estimation at the end of a trial.

4.1. Parameter of interest: maximum tolerable dose

Let N denote an available sample size for a Phase I clinical trial. A typical sample size is N ≤ 30. Let γ denote a target risk level, the probability of an adverse event. In a cancer study, a typical target risk level γ is fixed between .15 and .35 depending on the severity of an adverse event. Then, the dose corresponding to γ is called a maximum tolerable dose (MTD) at level γ, and we denote it by θ_γ in the logarithmic scale. Under the logistic model in Eq. (18), it is defined as follows

θγ=log(γ1−γ)−β0β1 .E20

At the end of a trial (observing N responses), we estimate θ_γ by the posterior mean θ^γ,N=E(θγ|y→N) for future patients.

4.2. Prior density function: conditional mean priors

A consequence of sequential decisions heavily depends on a prior density function f(β→). In particular, the first decision x₁ must be made based on prior knowledge only because empirical evidence is not observed yet. In addition, the later decisions x₂, x₃,… and the final inference of θ_γ are substantially affected by f(β→) as a Phase I study is typically based on a small sample. In this regard, we want to carefully utilize researchers’ prior knowledge about β→, but it may be difficult to express their prior knowledge directly through f(β→). In this section, we discuss a method of eliciting prior knowledge, which is more tractable than prior elicitation directly on β→.

Suppose a researcher selects two arbitrarily doses, say x₋₁ < x₀. Then, the researcher may express their prior knowledge by two independent beta distributions

πxi=eβ0+β1xi1+eβ0+β1xi∼Beta(ai,bi) , j=−1,0.E21

Using the Jacobian transformation from (πx−1,πx0) to β→=(β0,β1), it can be shown that the prior density function of β→ is given by

f(β→)∝(x0−x−1)∏i=−10(eβ0+β1xi1+eβ0+β1xi)ai(11+eβ0+β1xi)bi.E22

It is known as conditional mean priors under the logistic model [7].

4.3. Posterior density function: conjugacy

For notational convenience, we let y_i = a_i and n_i = a_i + b_i for i = −1,0. By conjugacy, the posterior density function of β→ can be concisely written as follows

f(β→|y→n)∝eβ0s1+β1s2∏i=−1n(1+eβ0+β1xi),E23

where s1=∑i=−1nyi and s2=∑i=−1nxiyi. After observing n responses, the decision rule for the next patient is as follows

xn+1*=argminxn+1∈A ∫L(β→,xn+1) f(β→|y→n)dβ→.E24

4.4. Loss functions for individual- and population-level ethics

A loss function, which reflects the perspective of individual-level ethics, is as follows:

LI(β→,xn+1)=(xn+1−θγ)2.E25

This loss function is analogous to the original continual reassessment method proposed by O’Quigley et al. [17]. The square error loss attempts to treat a trial participant at θ_γ, and the expected square error loss is minimized by the posterior mean of θ_γ.

From the perspective of population-level ethics, Whitehead and Brunier proposed a loss function, which is equal to the asymptotic variance of the maximum likelihood estimator for θ_γ [18]. The Fisher expected information matrix with a sample of size n + 1 is given by

I(β→)=(∑i=1n+1τi∑i=1n+1τixi∑i=1n+1τixi∑i=1n+1τixi2),E26

where τi=πxi(1−πxi). Then, the loss function (the asymptotic variance) is given by

LP(β→,xn+1)=[∇h→(β→)]T [I(β→)]−1 [∇h→(β→)],E27

where

∇h→(β→)=(∂θγ∂β0∂θγ∂β1)=−1β1(1θγ)E28

is the gradient vector, the partial derivatives of θ_γ with respect to β₀ and β₁. Kim and Gillen decomposed the population-level loss function as follows

LP(β→,xn+1)=τn+1(xn+1−θγ)2+sn(0)[(θγ−μn)2+σn2][sn(0)sn(2)−sn(1)sn(1)]+sn(0)τn+1[(xn+1−μn)2+σn2],E29

where

sn(m)=∑i=1nτixim , m=0,1,2,μn=∑i=1nwixi,σn2=∑i=1nwixi2−(∑i=1nwixi)2E30

with the weight defined as wi=τi∑i=1nτj [3]. Eq. (29) has the following important remarks. In fact, LP(β→,xn+1) considers individual-level ethics by including LI(β→,xn+1)=(xn+1−θγ)2 in the numerator. By including (xn+1−μn)2 in the denominator, where μn=∑i=1nwixi, the population-level loss function reduces a loss by allocating the next patient further away from the weighted average of previously allocated doses (i.e., devised from information gain). In long run, LP(β→,xn+1) is devised from a compromise between individual- and population-level ethics, but the compromising process is rather too slow to be implemented in a small-sample Phase I clinical trial [3].

4.5. Loss function for compromising the two perspectives

Kim and Gillen proposed to accelerate the compromising process by modifying LP(β→,xn+1) of Eq. (29) as follows

LB,λ(β→,xn+1)=an(λ) τn+1(xn+1−θγ)2+sn(0)[(θγ−μn)2+σn2][sn(0)sn(2)−sn(1)]+sn(0)τn+1[(xn+1−μn)2+σn2] ,E31

where

an(λ)=(1+nN)λ(1+∑i=1nyiNγ)E32

is an accelerating factor [3]. It has two implications. First, the compromising process is accelerated toward the individual-level ethics as the trial proceeds (i.e., n increases). Second, the compromising process toward the individual-level ethics is accelerated at a faster rate when an adverse event is observed (i.e., ∑i=1nyi increases). The tuning parameter λ controls the rate of acceleration. It imposes more emphasis on population-level ethics as λ → 0 and more emphasis on individual-level ethics as λ → ∞. The choice of λ shall depend on the severity level of an adverse event.

4.6. Simulation

To study the operating characteristics of L_B,_λ with respect to λ, we assume the logistic model with β₀ = −3 and β₁ = .8 as a true dose-response relationship as shown in Figure 5 in the left panel. The target risk level is fixed at γ = .2, so the true MTD is given by θ_.2 = 2.02 in the logarithmic scale. We consider three different priors based on the conditional mean priors given in Eq. (22). For simplicity, we set a₋₁ = 1, b₋₁ = 3, a₀ = 3 and b₀ = 1 for all three priors. Then, we let x−1=−4 and x0=4 for Prior 1; x−1=0 and x0=8 for Prior 2; and x−1=4 and x0=12 for Prior 3. Figure 5 in the right panel shows an approximated f(θ_.2) for each prior. Prior 1 significantly underestimates the true θ.2=2.02 with prior mean E(θ.2)=−1.70, Prior 3 overestimates the truth with E(θ.2)=5.38, and Prior 2 has a prior estimate relatively close to the truth with E(θ.2)=1.40.

Figure 5.
The true dose-response relationship πx=eβ0+β1x1+eβ0+β1x with β₀ = −3 and β₁ = .8 (where x is the dose in the logarithmic scale) in the simulation (left panel) and the three prior distributions of θ_.2 approximated by kernel density (right panel).

Let N = 20 be a fixed sample size. Let Y_i = 1 denote an adverse event observed from the ith patient (Y_i = 0 otherwise), so ∑i=1NYi denotes the total number of adverse events observed at the end of a trial. The sum ∑i=1nYi is random from a trial to another trial, and we want ∑i=1nYi to behave like Binomial(20,.2) which is the case when we treat N = 20 to the true MTD θ_.2. Figure 6 shows three simulated trials under the loss function L_B,_λ with λ = 0,1,5. When λ = 0, the up-and-down scheme has a high degree of fluctuation in order to maximize information about θ_.2. When λ = 1, the up-and-down scheme is stabilized after the first few adverse events, and the stabilization occurs quickly when λ = 5 to treat trial participants near an estimated θ_.2.

Figure 6.
Three simulated trials using the loss function LB,λ with λ=0 (left), λ=1 (middle) and λ=5 (right) with a sample of size N = 20 and assumed parameter values β0=−3, β1=8 and θ.2=2.02.

Let θ^.2=E(θ.2|y→N), the posterior estimate of θ_.2 at the end of a trial, so πθ^.2 implies the true probability of an adverse event at the estimated MTD. We focus on the following criteria: (i) E(πθ^.2) which we desire to be close to γ=.2 for future patients, (ii) V(πθ^.2) which we desire to be as low as possible for future patients, (iii) E[(πθ^.2−.2)2] which we desire to be as low as possible for future patients, (iv) E(∑i=120Yi) which we desire to be close to Nγ=4 for trial participants and (v) P(3≤∑i=120Yi≤5) which we desire to be close to one for trial participants.

Table 1 summarizes simulation results of 10,000 replicates for each prior. For all three priors, we observe similar tendencies. First, E(πθ^.2) gets closer to θ = .2 as λ increases. Second, V(πθ^.2) decreases as λ decreases to zero. The average square distance between πθ^.2 and γ=.2 measures a balance between | E(πθ^.2)−.2 | and V(πθ^.2), and the superiority depends on priors. Lastly, as λ → 0, we have larger P(3≤∑i=120Yi≤5) and more robust E(∑i=120Yi) to prior elicitation.

Prior	λ	E(πθ^.2)	V(πθ^.2)	E[(πθ^.2−.2)2]	E(∑i=120Yi)	P(3≤∑i=120Yi≤5)
1	0	0.0964	0.0019	0.0126	2.4353	0.4318
	.5	0.1034	0.0024	0.0118	2.0997	0.2298
	1	0.1082	0.0028	0.0113	1.8969	0.1714
	2	0.1100	0.0031	0.0112	1.6929	0.1211
	5	0.1157	0.0035	0.0106	1.3128	0.0596
2	0	0.1665	0.0054	0.0065	4.1217	0.9889
	.5	0.1705	0.0056	0.0065	3.9598	0.9877
	1	0.1727	0.0060	0.0068	3.9025	0.9670
	2	0.1751	0.0066	0.0072	3.8707	0.9291
	5	0.1763	0.0067	0.0073	3.8442	0.9068
3	0	0.2743	0.0048	0.0103	6.1875	0.1600
	.5	0.2673	0.0048	0.0093	6.3954	0.1430
	1	0.2606	0.0046	0.0083	6.6194	0.1165
	2	0.2562	0.0045	0.0077	6.8035	0.1020
	5	0.2499	0.0044	0.0068	7.0274	0.0760

Table 1.

Simulation results of 10,000 replicates for λ = 0, .5, 1, 2, 5 and each prior.

In summary, when we emphasize more on population-level ethics, we have a smaller variance in the estimation for future patients (with a greater absolute bias, potentially due to Jensen’s Inequality), and the distribution of ∑i=1nYi becomes more robust to prior elicitations. When we emphasize more on individual-level ethics, we have a larger variance in the estimation, and the distribution of ∑i=1nYi becomes more sensitive to prior elicitations.

5. Consensus prior

In Bayesian inference, researchers are able to utilize information, which is independent of observed data. It allows researchers to incorporate any form of information, such as one’s experience and existing literature, which may be particularly useful in a small-sample study. On the other hand, we concern subjectivity and prior sensitivity in sparse data. Furthermore, it is possible to have disagreement among multiple researchers’ prior elicitations about a parameter θ.

Suppose there are K researchers with their own prior density functions, say f(θ|Qk) for k=1,…,K, and they have the same likelihood function f(y→|θ). Each prior elicitation leads to a unique Bayes estimator

θ^k=E(θ|y→,Qk)=∫θ f(θ|y→,Qk)dθ ,E33

where f(θ|y→,Qk)∝f(y→|θ) f(θ|Qk) is the posterior density function of θ given data y→ and the k^th prior elicitation Q_k. For posterior estimation, one reasonable approach to compromise is a weighted average ∑k=1Kwkθ^k, where w_k > 0 for k = 1,…,K and ∑k=1Kwk=1. In this section, we discuss two different weighting methods. The first method is to fix w_k before observing data (referred to as prior weighting scheme). The second method is to determine wk(y→) after observing data y→ so that wk(y→) increases when the k^th prior elicitation Q_k is better supported by the observed data y→ (referred to as posterior weighting scheme) [5].

For a prior weighting scheme, we denote wk=P(Qk) which quantifies the credibility of the k^th prior elicitation. For a posterior weighting scheme, we consider

where f(y→|Qk)=∫f(y→|θ)f(θ|Qk)dθ is the marginal likelihood from the k^th prior elicitation. This formulation is similar to the BMA method discussed in Section 3. It can be shown that ∑k=1Kwk(y→) θ^k is the Bayes estimator (the posterior mean of θ) when a consensus prior f(θ)=∑k=1Kwk f(θ|Qk) is used with wk=P(Qk) [5].

Samaniego discussed self-consistency when compromised inference is used through the prior weighting scheme ∑k=1Kwkθ^k [4]. Let θ denote a parameter of interest and

E(θ)=∫ θ f(θ)dθ=θ*E35

be the prior expectation, the mean of the prior density function f(θ). Let θ˜ denote a sufficient statistic, which serves as an unbiased estimator for θ. When we satisfy E(θ|θ˜=θ*)=θ*, it is called self-consistency [4].

Self-consistency can be achieved under simple models. For example, let Y→=(Y1,…,Yn) be a random sample, where Yi∼Bernoulli(θ), and assume θ∼Beta(a,b) for prior. It can be shown that the maximum likelihood estimator θ˜=1n∑i=1nYi is a sufficient statistic and an unbiased estimator for θ. The posterior mean is a weighted average between θ^* and θ˜ as follows

E(θ|θ˜=θ*)=c θ*+(1−c) θ˜ ,E36

where c=a+ba+b+n. If we observe θ˜=θ*, we can achieve the self-consistency because E(θ|θ^=θ*)=θ*. In words, when prior estimate and maximum likelihood estimate are identical, the posterior estimate must be consistent with the prior estimate and the maximum likelihood estimate. The self-consistency can be also achieved in the prior weighting scheme under certain conditions as illustrated in the following example.

5.1. Binomial experiment

Let Yi∼Bernoulli(π) for i=1,…,n and assume Y1,…,Yn are independent. Suppose the k^th researcher specifies the prior distribution π|Qk∼Beta(ak,bk) for k=1,…,K. For the prior weighting scheme, let wk=P(Qk), the prior probability for the k^th prior elicitation (fixed before observing data). Since E(π|Qk)=akak+bk and the expectation E (⋅) is a linear operator, the average of “consensus prior” is

E(π)=∫01πf(π)dπ=∫01π(∑k=1Kf(π|Qk)P(Qk))dπ=∑k=1Kwk(∫01πf(π|Qk)dπ)=∑k=1KwkE(π|Qk) .E37

Let E(π)=π* and suppose the K researchers observed the consistent result π˜=1n∑i=1nYi=π*. The individual-specific Bayes estimator is as follows

π^k=E(π|π˜=π*,Qk)=ckE(π|Qk)+(1−ck)π* ,E38

for the k^th researcher, where ck=ak+bkak+bk+n. The compromised Bayes estimator is as follows

E(π|π˜=π*)=∑k=1Kwk π^k=∑k=1Kwk [ck E(π|Qk)+(1−ck) π*] .E39

If we allow individual-specific prior elicitation a_k and b_k with the restriction a_k + b_k = m for all K researchers (i.e., the same strength of prior elicitation), the value ck=mm+n is constant over all researcher. By letting the constant c_k = c,

E(π|π˜=π*)=c(∑k=1Kwk E(π|Qk))+(1−c) π*(∑k=1Kwk)=c E(π)+(1−c) π*=π* ,E40

so the self-consistency is satisfied.

For the posterior weighting scheme given data y→=(y1,…,yn), the marginal likelihood from the k^th prior elicitation is as follows

f(y→|Qk)=∫01 f(y→|π) f(π|Qk)dπ=Γ(ak+bk)Γ(ak) Γ(bk) Γ(ak+s) Γ(bk+n−s)Γ(ak+bk+n) ,E41

where s=∑i=1nyi is an observed sufficient statistic. Then, the posterior weighting scheme becomes ∑k=1Kwk(y→)π^k with

wk(y→)=wk f(y→|Qk)∑j=1Kwjf(y→|Qj),π^k=ak+sak+bk+n.E42

If we desire an equal strength from each researcher’s prior elicitation, we may fix ak+bk=m and wk=1K. In the posterior weighting scheme, it is difficult to achieve the self-consistency.

Whether self-consistency is satisfied, the practical concern is the quality of estimation such as bias, variance and mean square error. Assuming K = 2 researchers have disagreeing prior knowledge and a sample of size n = 10, let us consider three cases. Suppose two researchers express relatively mild disagreement as (a1,b1)=(1,3) and (a2,b2)=(3,1) in Case 1, relatively strong disagreement as (a1,b1)=(2,6) and (a2,b2)=(6,2) in Case 2, and even stronger disagreement as (a1,b1)=(3,9) and (a2,b2)=(9,3) in Case 3. For each case, Figure 7 provides the relative bias, variance and mean square error (MSE) for comparing the posterior weighting scheme ∑k=13wk(y→)π^k to the prior weighting scheme ∑k=13wkπ^k. When a relative MSE is smaller than one, it implies a smaller MSE for the posterior weighting scheme. As the true value of π is well between the two prior guesses E(π|Q1)=.25 and E(π|Q2)=.75, the posterior weighting scheme shows a greater MSE due to greater variance. When the true value of π deviates away from either prior guess, the posterior weighting schemes show a smaller MSE due to smaller bias. The tendency is stronger when the two disagreeing prior elicitations are stronger (i.e., stronger prior disagreement). The bottom line is a clear bias-variance tradeoff when we compare the two weighting schemes. ∑k=13wk(y→) π^k is able to reduce bias when there is strong discrepancy between “consensus prior” and data, but it has larger variance than ∑k=13wk π^k because wk(y→) depends on random data.

Figure 7.
Comparing prior and posterior weighting schemes for different degrees of disagreements.

5.2. Applications to Phase I trials under logistic regression model

In this section, we apply the prior weighting scheme and the posterior weighting scheme to Phase I clinical trials under the logistic regression model. We consider the three priors considered in Section 4.6. We denote Prior 1, 2 and 3 by Q₁, Q₂ and Q₃, respectively. The three priors had the same hyper-parameters a−1,k=1, b−1,k=3, a0,k=3, b0,k=1, but they were different by x−1,k=−4,0,4 and x0,k=4,8,12 for k=1,2,3, respectively. By the use of the conditional mean prior in Eq. (22), the prior density function of β→ for prior Q_k is given by

f(β→|Qk)∝(x0,k−x−1,k) ∏i=−10(eβ0+β1xi1+eβ0+β1xi)ai,k(11+eβ0+β1xi)bi,k.E43

The prior means were E(θ.2|Q1)=−1.70, E(θ.2|Q2)=1.40 and E(θ.2|Q3)=5.38 for Priors 1, 2 and 3, respectively.

For simulation study, we consider three simulation scenarios with sample size N = 20. In Scenario 1, we assume β₀ = −5 and β₁ = .6, so the true MTD is θ.2=6.02, which deviates significantly from all of the three prior means. In Scenario 2, we assume β₀ = −3 and β₁ = .8 as in Section 4.6, so θ.2=2.02 is well surrounded by the three prior means. In Scenario 3, we assume β₀ = −1 and β₁ = 1.2, so θ.2=−.32 is close to the most conservative prior mean E(θ.2|Q1)=−1.70. We consider the loss function LI(β→,xn+1)=(xn+1−θ.2)2 discussed in Section 4.4, which focuses on individual-level ethics. We use the uniform prior probabilities wk=P(Qk)=1/3 for k=1,2,3 for implementing both prior and posterior weighting scheme.

Table 2 provides the simulation results of 10,000 replicates for each scenario under the prior weighting scheme and under the posterior weighting scheme. Since the posterior weighting scheme adaptively updates wk(y→) based on empirical evidence, it can reduce bias, but it has greater variance in the estimation of θ₂. As a consequence, when the true MTD was close to one extreme prior estimate (Scenarios 1 and 3), the use of the posterior weighting scheme yields a smaller E[(πθ^.2−.2)2], E(∑i=120Yi) closer to Nγ=4, and P(3≤∑i=120Yi≤5) closer to one when compared to the use of the prior weighting scheme. In Scenario 3, the average number of adverse events was 4.6 for the posterior weighting scheme, but it was as high as 7.1 in the prior weighting scheme. On the other hand, when the true MTD was well surrounded by the three prior estimates (Scenario 2), the use of the prior weighting scheme yielded more plausible results.

Scenario	Method	E(πθ^.2)	V(πθ^.2)	E[(πθ^.2−.2)2]	E(∑i=120Yi)	P(3≤∑i=120Yi≤5)
1	Prior weighting	0.0967	0.0014	0.0121	1.1090	0.0398
1	Posterior weighting	0.1853	0.0073	0.0075	2.7304	0.5900
2	Prior weighting	0.2018	0.0059	0.0059	3.8432	0.9042
2	Posterior weighting	0.2048	0.0110	0.0110	4.2848	0.8920
3	Prior weighting	0.2929	0.0071	0.0157	7.1090	0.0568
3	Posterior weighting	0.1951	0.0133	0.0133	4.6036	0.8646

Table 2.

Simulation results of 10,000 replicates for the prior and posterior weighting schemes.

The simulation results are analogous to the simpler model in Section 5.1. When the true parameter is not well surrounded by prior guesses, the posterior weighting scheme is preferable with respect to mean square error due to smaller bias. When the true parameter is well surrounded by prior guesses, the prior weighting scheme is beneficial with respect to mean square error due to smaller variance.

As a final comment, we shall be careful about the strength of individual prior elicitations when we implement the posterior weighting scheme in Phase I clinical trials. The strength of individual prior elicitations depends on (i) the hyper-parameters ai,k and bi,k, (ii) the prior weight wk=P(Qk) as well as (iii) the distance between the two arbitrarily chosen doses x0,k−x1,k. It can be seen through the expression

f(β→)=∑k=1Kf(β→|Qk) P(Qk)∝∑k=1Kwk (x0,k−x−1,k) ∏i=−10(eβ0+β1xi1+eβ0+β1xi)ai,k(11+eβ0+β1xi)bi,k.E44

When researchers determine consensus prior elicitations before initiating a trial, the multiplicative term wk (x0,k−x−1,k) shall be carefully considered together with the hyper-parameters ai,k and bi,k [5].

6. Concluding remarks

In this chapter, we have discussed Bayesian inference with averaging, balancing, and compromising in sparse data. In the cancer risk assessment, we have observed that low-dose inference can be very sensitive to an assumed parametric model (Section 3.1). In this case, the Bayesian model averaging can be a useful method. It provides robustness by using multiple models and posterior model probabilities to account for model uncertainty. In the application of Bayesian decision theory to Phase I clinical trials, we have observed that the sequential sampling scheme heavily depends on a loss function. A loss function, which is devised from individual-level ethics, focuses on the benefit of trial participants, and a loss function, which is devised from population-level ethics, focuses on the benefit of future patients. It is possible to balance between the two conflicting perspectives, and we can adjust a focusing point by the tuning parameter (Sections 4.5 and 4.6). Finally, the use of a weighted posterior estimate can be a compromising method when two or more researchers have prior disagreement. We have compared the prior and posterior weighting schemes in a small-sample binomial problem (Section 5.1) and in a small-sample Phase I clinical trial (Section 5.2). The prior weighting scheme (data-independent weights) outperforms when prior estimates surround the truth, and the posterior weighting scheme (data-dependent weights) outperforms when the truth is not well surrounded by prior estimates. One method does not outperform the other method for all parameter values, so it is important to be aware of their bias-variance tradeoff.

References

1. Raftery AE, Madigan D, Hoeting JA. Bayesian model averaging for linear regression models. Journal of the American Statistical Association. 1997;92:171-191
2. Whitehead J, Williamson D. Bayesian decision procedures based on logistic regression models for dose-finding studies. Journal of Biopharmaceutical Statistics. 1998;8:445-467
3. Kim SB, Gillen DL. A Bayesian adaptive dose-finding algorithm for balancing individual- and population-level ethics in Phase I clinical trials, Sequential Analysis. 2016;35(4):423-439
4. Samaniego FJ. A Comparison of the Bayesian and Frequentist Approaches to Estimation. New York: Springer; 2010
5. Kim SB, Gillen DL. An alternative perspective on consensus priors with applications to Phase I clinical trials. Jacobs Journal of Biostatistics. 2016;1(1):006
6. Shao K, Small MJ. Potential uncertainty reduction in model-averaged benchmark dose estimates informed by an additional dose study. Risk Analysis. 2011;31:1156-1175
7. Bedrick EJ, Christensen R, Johnson W. A new perspective on priors for generalized linear models. Journal of the American Statistical Association. 1996;91(436):1450-1460
8. International Agency for Research on Cancer. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. Vol. 69. Lyon: IARC; 1997. ISBN 92-832-1269-X
9. Kociba RJ, Keyes DG, Beyer JE, Carreon RM, Wade CE, Dit-tenber DA. et al. Results of a two-year chronic toxicity andoncogenicity study of 2,3,7,8-tetrachlorodibenzo-p-dioxin in rats. Toxicology and Applied Pharmacology. 1978;46(2):279-303
10. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Statistical Science. 1999;14(4):382-417
11. Simmons SJ, Chen C, Li X, Wang Y, Piegorsch WW, Fang Q, Hu B, Dunn GE. Bayesian model averaging for benchmark dose estimation. Environmental and Ecological Statistics. 2015;22(1):5-16
12. Crump KS. A new method for determining allowable daily intakes. Fundamental and Applied Toxicology. 1984;4:854-871
13. EPA (US Environmental Protection Agency). Benchmark dose technical guidance, EPA/100/R-12/001, Risk Assessment Forum. Washington, DC: U.S. Environmental Protection Agency; 2012
14. Whitehead J, Williamson D. Bayesian decision procedures based on logistic regression models for dose-finding studies. Journal of Biopharmaceutical Statistics. 1998;8:445-467
15. O’Quigley J, Conaway M. Continual reassessment and related dose-finding designs. Statistical Science. 2010;25:202-216
16. Bartroff J, Lai TL. Incorporating individual and collective ethics into Phase I cancer trial designs. Biometrics. 2011;67:596-603
17. O’Quigley J, Pepe M, Fisher L. Continual reassessment method: A practical design for Phase 1 clinical trials in cancer. Biometrics. 1990;46:33-48
18. Whitehead J, Brunier H. Continual reassessment method: Bayesian decision procedures for dose determining experiments. Statistics in Medicine. 1995;14:885-893

[1] 1. Raftery AE, Madigan D, Hoeting JA. Bayesian model averaging for linear regression models. Journal of the American Statistical Association. 1997;92:171-191

[2] 2. Whitehead J, Williamson D. Bayesian decision procedures based on logistic regression models for dose-finding studies. Journal of Biopharmaceutical Statistics. 1998;8:445-467

[3] 3. Kim SB, Gillen DL. A Bayesian adaptive dose-finding algorithm for balancing individual- and population-level ethics in Phase I clinical trials, Sequential Analysis. 2016;35(4):423-439

[4] 4. Samaniego FJ. A Comparison of the Bayesian and Frequentist Approaches to Estimation. New York: Springer; 2010

[5] 5. Kim SB, Gillen DL. An alternative perspective on consensus priors with applications to Phase I clinical trials. Jacobs Journal of Biostatistics. 2016;1(1):006

[6] 6. Shao K, Small MJ. Potential uncertainty reduction in model-averaged benchmark dose estimates informed by an additional dose study. Risk Analysis. 2011;31:1156-1175

[7] 7. Bedrick EJ, Christensen R, Johnson W. A new perspective on priors for generalized linear models. Journal of the American Statistical Association. 1996;91(436):1450-1460

[8] 8. International Agency for Research on Cancer. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. Vol. 69. Lyon: IARC; 1997. ISBN 92-832-1269-X

[9] 9. Kociba RJ, Keyes DG, Beyer JE, Carreon RM, Wade CE, Dit-tenber DA. et al. Results of a two-year chronic toxicity andoncogenicity study of 2,3,7,8-tetrachlorodibenzo-p-dioxin in rats. Toxicology and Applied Pharmacology. 1978;46(2):279-303

[10] 10. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Statistical Science. 1999;14(4):382-417

[11] 11. Simmons SJ, Chen C, Li X, Wang Y, Piegorsch WW, Fang Q, Hu B, Dunn GE. Bayesian model averaging for benchmark dose estimation. Environmental and Ecological Statistics. 2015;22(1):5-16

[12] 12. Crump KS. A new method for determining allowable daily intakes. Fundamental and Applied Toxicology. 1984;4:854-871

[13] 13. EPA (US Environmental Protection Agency). Benchmark dose technical guidance, EPA/100/R-12/001, Risk Assessment Forum. Washington, DC: U.S. Environmental Protection Agency; 2012

[14] 14. Whitehead J, Williamson D. Bayesian decision procedures based on logistic regression models for dose-finding studies. Journal of Biopharmaceutical Statistics. 1998;8:445-467

[15] 15. O’Quigley J, Conaway M. Continual reassessment and related dose-finding designs. Statistical Science. 2010;25:202-216

[16] 16. Bartroff J, Lai TL. Incorporating individual and collective ethics into Phase I cancer trial designs. Biometrics. 2011;67:596-603

[17] 17. O’Quigley J, Pepe M, Fisher L. Continual reassessment method: A practical design for Phase 1 clinical trials in cancer. Biometrics. 1990;46:33-48

[18] 18. Whitehead J, Brunier H. Continual reassessment method: Bayesian decision procedures for dose determining experiments. Statistics in Medicine. 1995;14:885-893

Bayesian Model Averaging and Compromising in Dose-Response Studies

Bayesian Inference

Abstract

Keywords

Author Information

Steven B. Kim*

1. Introduction

2. Bayesian inference

2.1. Example

Figure 1.

2.2. Example

Figure 2.

Figure 3.

3. Bayesian model averaging

3.1. Example

Figure 4.

4. Application of Bayesian decision theory to Phase I trials

4.1. Parameter of interest: maximum tolerable dose

4.2. Prior density function: conditional mean priors

4.3. Posterior density function: conjugacy

4.4. Loss functions for individual- and population-level ethics

4.5. Loss function for compromising the two perspectives

4.6. Simulation

Figure 5.

Figure 6.

Table 1.

5. Consensus prior

5.1. Binomial experiment

Figure 7.

5.2. Applications to Phase I trials under logistic regression model

Table 2.

6. Concluding remarks

References

Two Examples of Bayesian Evidence Synthesis with the Hierarchical Meta-Regression Approach

Your cart

Bayesian Model Averaging and Compromising in Dose-Response Studies

Bayesian Inference

Abstract

Keywords

Author Information

Steven B. Kim*

1. Introduction

2. Bayesian inference

2.1. Example

Figure 1.

2.2. Example

Figure 2.

Figure 3.

3. Bayesian model averaging

3.1. Example

Figure 4.

4. Application of Bayesian decision theory to Phase I trials

4.1. Parameter of interest: maximum tolerable dose

4.2. Prior density function: conditional mean priors

4.3. Posterior density function: conjugacy

4.4. Loss functions for individual- and population-level ethics

4.5. Loss function for compromising the two perspectives

4.6. Simulation

Figure 5.

Figure 6.

Table 1.

5. Consensus prior

5.1. Binomial experiment

Figure 7.

5.2. Applications to Phase I trials under logistic regression model

Table 2.

6. Concluding remarks

References

Continue reading from the same book

Bayesian Inference

Your cart