Open access peer-reviewed chapter

Bayesian vs Frequentist Power Functions to Determine the Optimal Sample Size: Testing One Sample Binomial Proportion Using Exact Methods

Written By

Valeria Sambucini

Submitted: 04 January 2017 Reviewed: 20 June 2017 Published: 02 November 2017

DOI: 10.5772/intechopen.70168

From the Edited Volume

Bayesian Inference

Edited by Javier Prieto Tejedor

Chapter metrics overview

1,842 Chapter Downloads

View Full Metrics

Abstract

In order to avoid the drawbacks of sample size determination procedures based on classical power analysis, it is possible to define analogous criteria based on ‘hybrid classical-Bayesian’ or ‘fully Bayesian’ approaches. We review these conditional and predictive procedures and provide an application, when the focus is on a binomial model and the analysis is performed through exact methods. The distinction between analysis and design prior distributions is essential for the practical implementation of the criteria: some guidelines for choosing these priors are discussed, and their impact on the required sample size is examined.

Keywords

  • analysis and design prior distributions
  • binomial proportion
  • Bayesian power functions
  • conditional and predictive approach
  • sample size determination
  • saw-toothed behaviour of power

1. Introduction

The calculation of an adequate sample size is a crucial aspect in the design of experiments. Researchers need to select the appropriate number of participants required to ensure ethically and scientifically valid results. If samples are too large, time and resources are wasted, often for minimal gain. On the other hand, too small samples may lead to inaccurate results. Therefore, sample size determination (SSD) plays a very important role in the design aspect of studies in many fields, especially in the context of clinical trials where, in addition to economical problems, investigators have to deal with important ethical implications.

Sample size determination (SSD) methods, when the focus is on hypothesis testing, are typically related to the concept of power function. Let us denote the parameter of interest by θ and let us assume that we are interested in testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1, where Θ0 and Θ1 form a partition of the parameter space Θ. The most widely used frequentist SSD criterion consists in choosing the minimal sample size that guarantees a given power, for a fixed type I error rate, under the assumption that θ is equal to a suitable design value, θD ∈ Θ1. In practice, the idea is to ensure a sufficiently large probability of obtaining a statistically significant result (i.e. of rejecting the null hypothesis), when the true value of θ belongs to the alternative hypothesis and is equal to θD. In many textbooks (see [13], among others) sample size formulas, derived using this procedure, are provided in many occurring situations, under different hypothesis testing and based on both categorical and quatitative data.

In the frequentist criterion described above, a crucial role is played by the design value that the trial is designed to detect with high probability, whose uncertainty is not accounted for. In fact, the local optimality is one of the most criticized aspects of the method. Moreover, this frequentist procedure does not allow to take into account pre-experimental information about θ, for instance available from previous studies. By adopting a ‘hybrid classical-Bayesian approach’ or a ‘fully Bayesian approach’, it is possible to define analogous criteria for sample size selection that allow the researcher to avoid the problem of the local optimality or/and to introduce possible prior information in the SSD process.

In this chapter, we illustrate how to construct frequentist and Bayesian power functions, based on both conditional and predictive approaches, and how to use them to determine the optimal sample size. An essential element of the method is the use of two different prior distributions for the parameter of interest, which play two distinct roles in the criteria. The importance of this distinction in sample size determination problems has been stressed by several authors (see, for instance, [49] among others). The rest of the chapter is organized as follows: in Section 2, we review both the frequentist conditional and predictive procedures based on power analysis to determine the optimal sample size. Section 3 provides a description of analogous methods based on Bayesian power functions. Then, in Section 4, we formalize different SSD criteria that depend on the shape of the power curves as a function of the sample size and, as a consequence, on the nature of the data distributions. Furthermore, in Section 5, we illustrate an application of the frequentist and Bayesian SSD procedures, when the parameter of interest is a single binomial proportion. Finally, Section 6 contains a brief final discussion.

Advertisement

2. Frequentist power functions and SSD methods

Let us consider a parameter of interest θ and assume that we are interested in testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1, where Θ0 and Θ1 form a partition of the parameter space Θ. Moreover, let Yn be the random result of the experiment that is typically a suitable statistic used to summarize the data relevant to the parameter θ. In the notation, we have highlighted that Yn depends on the sample size n. Finally, we denote by fn(yn|θ) the sampling distribution of Yn.

The power function is defined as the probability of obtaining a statistically significant result that leads to reject the null hypothesis H0, when the actual value of the parameter is θ. In a frequentist approach, the investigator is firstly required to specify a fixed level α for the type I error probability that one is willing to tolerate. This significance level is typically set equal to 0.05 and is used to obtain the rejection region of H0, denoted by RH0, that represents an appropriate subset of outcomes that—if observed—lead to the rejection of H0. Therefore, given a frequentist test of size α, Yn is considered a statistically significant result if it belongs to RH0. Consequently, in general terms, the power function is defined as

ηnθ=PθYnRH0,E1

where Pθ is the probability measure associated with a suitable distribution of Yn.

In order to exploit the frequentist power function in Eq. (1) for sample size determination purposes, investigators can adopt two different approaches: the conditional and the predictive one. The conditional approach is certainly the most widely known and used, when performing sample size calculations based on pre-study power analysis. It requires the specification of a suitable design value for θ, denoted by θD, that belongs to the alternative hypothesis and is considered a relevant value important to detect. By assuming that the true value of the parameter is equal to θD, we obtain the frequentist conditional power given by

ηFCnθD=Pfn(|θD)(YnRH0),E2

where Pfn|θD is the probability measure associated with the sampling distribution of Yn when θ = θD. Since θD has to be selected within the subspace Θ1, the conditional frequentist power can be interpreted as the probability of correctly rejecting H0, when the true value of the parameter belongs to the alternative hypothesis and is exactly equal to θD. Then, the sample size determination criterion consists in choosing the minimal sample size that guarantees a desired level for ηFCnθD. In practice, the idea is to ensure a sufficiently large probability of rejecting H0, when the true θ belongs to the alternative hypothesis and, more specifically, it is equal to θD ∈ Θ1.

The SSD procedure based on the power function in Eq. (2) is strongly affected by the choice of θD. In order to account for uncertainty in the specification of the design value and to avoid local optimality, it is natural to incorporate Bayesian concepts into the sample size determination process. By adopting a ‘hybrid classical-Bayesian approach’, it is possible to model uncertainty on the appropriate design value for θ through the elicitation of a prior distribution, denoted by πD(θ) and called design prior. This prior is used to compute the marginal or prior predictive distribution of the data by averaging the sampling distribution as follows:

mnDyn=Θfnyn|θπDθdθ.E3

Therefore, the design prior cannot be a non-informative improper distribution in order to have mnDyn well defined. In any case, the elicitation of a non-informative πD(θ) would not be reasonable choice. In fact, the design prior is used to introduce uncertainty on the suitable design value for θ that we need to specify when using the SSD procedure previously described and the possible guessed values have to belong to the subspace Θ1. Thus, πD(θ) serves to describe a design scenario of interest that supports values of θ under the alternative hypothesis: it has to be an informative distribution that assigns a negligible probability to values of θ under the null hypothesis.

Once the design prior has been elicited, the idea is to average the conditional frequentist power with respect to it by computing

ΘηFCnθπDθdθ=Θ[RH0fn(yn|θ)dyn]πD(θ)dθ=RH0mnDyndyn.E4

This leads to the frequentist predictive power that is given by

ηFPnπD=PmnDYnRH0,E5

where PmnD is the probability measure associated with the marginal distribution of Yn obtained using πD(θ). The power function in Eq. (5) expresses the probability of making a correct decision by rejecting H0, when θ actually belongs to the subspace defined under the alternative hypothesis, where we can assume that it is distributed according to the design prior. Therefore, the corresponding SSD criterion requires to select the minimum n to achieve a desired level for ηFPnπD.

Note that if πD(θ) is chosen as a point mass distribution centred on θD, no uncertainty on the relevant design values is taken into account and the marginal distribution coincides with the sampling one. In this case, there is no difference between the frequentist power functions obtained under the conditional and the predictive approach.

Advertisement

3. Bayesian power functions and SSD methods

In the previous section, we have described how to select the sample size through power functions by assuming that a frequentist analysis will be performed at the end of the study. In both the frequentist conditional and predictive powers, the decision about the two hypotheses is based on the construction of the rejection region of H0 of a classical test of fixed size α. A major limitation to the fully classical and the hybrid classical-Bayesian approaches previously introduced is the inability to incorporate past experience and information about the unknown parameter, as well as expert prior opinions. The use of a ‘fully Bayesian approach’ allows to take into account important knowledge and belief about θ when planning the study.

It is well known that the information available before starting the study can be expressed by introducing a prior distribution for θ, πA(θ), which in this context is typically called analysis prior to distinguish it from the design prior. It is worth pointing out that πA(θ) is the usual prior distribution employed in a Bayesian analysis: it formalizes pre-experimental knowledge, often represented by historical data, and subjective opinions of experts and is used to compute the posterior distribution of the parameter, πnAθ|ynfnyn|θπAθ. Moreover, it is often chosen as a non-informative distribution to avoid the inclusion of external evidence in the posterior inference.

Let us recall that, in general terms, a power function is defined as the probability of obtaining a significant result, i.e. a result that leads to the rejection of the null hypothesis. Then, to exploit this function as a useful tool to determine the optimal sample size, we need to compute it under the assumption that the alternative hypothesis is true. In practice, we have to consider a design scenario where the true θ belongs to Θ1, so that the power function represents the probability of making a correct decision. Therefore, to define power functions from a Bayesian point of view, first of all we need to decide when we reject the null hypothesis in a Bayesian setting, that is we have to establish the condition for the ‘Bayesian significance’. Following Spiegelhalter et al. [10], we define the result Yn as ‘significant from a Bayesian perspective’ if the corresponding posterior probability that θ belongs to the alternative hypothesis is sufficiently large, that is if

PπnA(|Yn)(θΘ1)>λ,E6

where PπnA|Yn denotes the probability measure associated with the posterior distribution of θ computed using the analysis prior and λ ∈ (0, 1) represents a suitably specified threshold. Let us stress that, since we are dealing with a pre-experimental problem, the posterior probability in Eq. (6) is a random variable, depending on a random result that has not yet been observed. In order to construct Bayesian power functions, we need to compute the probability of obtaining a Bayesian significant result. Similar to what we have seen in the frequentist case, we can use two alternative distributions of the data, according to the approach we decide to adopt.

The conditional approach realizes the pre-experimental assumption that the alternative hypothesis is true, by fixing a design value θD ∈ Θ1, which is considered relevant and important to detect. Then the sampling distribution of Yn conditional on θD, fn(⋅|θD), is used to compute the probability of getting Bayesian significance. In this way, we obtain the Bayesian conditional power

ηBCnθD=Pfn(|θD)(PπnA(|Yn)(θΘ1)>λ).E7

The predictive approach, instead, aims at avoiding the problem of local optimality in the SSD procedure by introducing a design prior for θ, πD(θ), that accounts for additional uncertainty involved in the choice of the design values θD. Then, the prior predictive distribution of Yn, mnD, is computed and used in place of the sampling distribution conditional on θD. This leads to the Bayesian predictive power

ηBPnπD=PmnD()(PπnA(|Yn)(θΘ1)>λ).E8

Both the power functions in Eqs. (7) and (8) express the probability of rejecting H0 under a Bayesian framework, assuming that the true θ actually belongs to H1. In fact, we assume that θ is equal to a specific value under the alternative hypothesis (conditional approach) or that θ is in the specific subspace defined under the alternative hypothesis, where we can assume that it is distributed according to the design prior (predictive approach). The sample size determination criteria, therefore, require to select the minimal sample size to ensure a sufficiently large level for ηBCnθD or ηBPnπD. Moreover, note that, when the specified design prior distribution assigns the whole mass probability to θD, the two Bayesian power functions coincide, leading to the same optimal sample size.

Advertisement

4. SSD criteria according to the nature of the distribution of Yn

In this section, we explicitly formalize the SSD criteria based on frequentist and Bayesian power functions, according to the nature of the random result Yn. When Yn has a continuous distribution, each of the power functions previously introduced shows a monotonically increasing behaviour as a function of n. In this case, the SSD criteria sensibly select the minimum sample size to guarantee the desired level of power, that is

nFC=minnN:ηFCnθD>γ,E9
nFP=minnN:ηFPnπD>γ,E10
nBC=minnN:ηBCnθD>γ,E11
nBP=minnN:ηBPnπD>γ,E12

for a conveniently chosen threshold γ ∈ (0, 1]. Let us remark that in the notation for the optimal sample sizes, as well as in the notations for the power functions, the subscripts are used to specify the approach (frequentist or Bayesian) adopted at the analysis stage. The superscripts, instead, indicate the appoach (conditional or predictive) used to represent the design expectations. An application of the criteria formalized above is provided by Gubbiotti and De Santis [11], where it is assumed that the statistic Yn follows a normal distribution with mean equal to θ and known variance.

However, it may happen that ηFCnθD, ηFPnπD, ηBCnθD and ηBPnπD are not monotonically increasing functions of the sample size: this occurs when dealing with discrete distributions of Yn. In these cases, the power functions show a basically increasing behaviour as a function of n, but with some small fluctuations. A suitable SSD criterion has to take into account this kind of behaviour. For instance, instead of selecting the smallest sample size that attains the condition of interest, it can be considered more appropriate to select the smallest sample size in such a way that the condition is fulfilled also for all the sample size values greater than it. Given a threshold γ ∈ (0, 1), the corresponding SSD criteria are

nFC=minn*N:ηFCnθD>γ,nn*,E13
nFP=minn*N:ηFPnπD>γ,nn*,E14
nBC=minn*N:ηBCnθD>γ,nn*,E15
nBP=minn*N:ηBPnπD>γ,nn*.E16

In this way, it is possible to avoid the paradox of having the condition of interest fulfilled for the selected sample size, but not satisfied for some larger values of n any longer.

Advertisement

5. Single binomial proportion using exact methods

In this section, we focus on exact procedures for one-sample testing problem with binary response. For instance, in a clinical context, we could be interested in evaluating the efficacy of a new experimental treatment or drug that is received at the same dose by all the n patients enrolled in the trial. No comparisons with other therapies are involved. A binary response variable, which assumes value 1 if clinicians classify the patient as a responder to the therapy and 0 otherwise, is considered and, therefore, the parameter of interest θ is the true response rate (i.e. an unknown proportion). In these one-arm studies, θ is compared with a fixed target value, say θ0, that should ideally represent the response rate for the current ‘gold standard’ therapy and that is typically obtained through historical data. Values of θ greater than θ0 suggest that the experimental drug can be considered sufficiently effective and, therefore, the following hypotheses are considered

H0:θ=θ0andH1:θ>θ0.E17

This kind of single-arm studies is typically conducted in phase II of clinical trials, whose primary goal is not to definitively assess the efficacy of new drugs, but to screen out those that are ineffective. In practice, in the clinical development process of a new drug, phase II aims at avoiding that not sufficiently promising treatments reach phase III, where randomized controlled trials, based on large patients groups, are generally conducted.

It is important to point out that the power functions based on exact procedures usually do not have explicit forms. Hence, exact formulas for sample size calculations cannot be obtained. However, it is possible to proceed numerically by evaluating the conditions of interest for different increasing or decreasing values of the sample size, until reaching the optimal one. In the following sections, we provide the expressions of the frequentist and Bayesian power functions for non-comparative studies with binary responses. The saw-toothed shape of the power curves as a function of n is shown and, hence, the conservative criteria illustrated in the previous section are adopted. All the graphical and numerical results have been obtained by using the R programming language [12].

5.1. Frequentist conditional power

In the statistical context described above, the number of responders out of the n patients treated with the new drug (i.e. the number of successes in n trials) is the natural statistic Yn we have to consider and its sampling distribution is

fnyn|θ=binynnθ,foryn=0,...,n,E18

where bin(⋅; nθ) denotes the probability mass function of a binomial distribution of parameters n and θ.

Let us consider the two hypotheses in Eq. (17). For a fixed significance level α and assuming that H0 is true, there exists a non-negative integer r between 0 and n such that

i=rnbininθ0αandi=r1nbininθ0>α.E19

Then, the rejection region at α level is RH0=yn0,1,...,n:ynr, where the critical value r can be expressed in symbols by

r=mink0,1,...,n:i=knbininθ0α.E20

For a given design value θD, that has to be specified under the alternative hypothesis, the frequentist conditional power is provided by

ηFCnθD=Pfn(|θD)(YnRH0)=yn=rnbin(yn;n,θD).E21

In practice, ηFCnθD is obtained by the sum of the probabilities of the all the outcomes that belong to RH0, when we assume that the true θ is equal to the design value.

Figure 1 shows the behaviour of the frequentist conditional power as a function of n, when θ0 = 0.2, θD = 0.4 and α = 0.05. It is evident that ηFCnθD is not a monotonically increasing function of the sample size, because of the discrete nature of the sampling distribution of Yn. The reasons for this saw-toothed behaviour can be clarified by the numerical results presented in Table 1. Here, for all the possible values of the sample size between 3 and 50, we provide not only the level of the frequentist conditional power used to obtain Figure 1, but also the corresponding critical value r and the actual value for the type I error probability. Obviously, this latter value is always below the fixed threshold 0.05. Note that whenever the sample size is increased by one unit, the corresponding critical value r may also increase or it may remain constant. In the second case, both the actual type I error rate and the conditional frequentist power grow up; otherwise, if also the critical value changes by one unit, they both get smaller. To help in reading the table, the colours white and grey are used alternately to highlight blocks of sample sizes with the same critical value: within each block both the power and the actual type I rate monotonically raise as n increases. But, in correspondence with the first sample size of the subsequent block, they both decrease. This determines the basically increasing behaviour of the power as a function of n, with some small fluctuations, which is represented in Figure 1. For additional discussion about the saw-toothed shape of the frequentist power function, the reader is referred to Chernick and Liu [13].

Figure 1.

Behaviour of ηFCnθD as a function of n, when θ0 = 0.20, θD = 0.4 and α = 0.05.

nrηFCnθDActual type I error ratenrηFCnθDActual type I error rate
330.06400.008027100.69130.0304
430.17920.027228100.74120.0391
540.08700.006729100.78530.0493
640.17920.017030110.70850.0256
740.28980.033331110.75460.0327
850.17370.010432110.79540.0411
950.26660.019633120.72420.0216
1050.36690.032834120.76690.0274
1160.24650.011735120.80480.0344
1260.33480.019436120.83800.0424
1360.42560.030037130.77830.0231
1460.51410.043938130.81360.0288
1570.39020.018139130.84460.0355
1670.47280.026740130.87150.0432
1770.55220.037741140.82190.0242
1880.43660.016342140.85090.0298
1980.51220.023343140.87620.0362
2080.58410.032144140.89790.0436
2180.65050.043145150.85700.0250
2290.54600.020146150.88070.0304
2390.61160.027347150.90120.0366
2490.67210.036248150.91870.0437
2590.72650.046849160.88510.0256
26100.63580.023250160.90450.0308

Table 1.

Numerical calculations related to Figure 1: sample sizes, corresponding critical values, frequentist conditional power and actual values for the type I error rate, when θ0 = 0.20, θD = 0.4 and α = 0.05.

Now, the problem of which sample size we should select arises because of the non-monotonic behaviour of ηFCnθD. If we set the desired threshold γ for the power equal to 0.8, we have that the smallest sample size that meets the power requirement is n = 35. At that sample size, the critical value is 12 and the power level is 0.8048. Then for n = 36, the critical value is still 12 and the power increases to 0.8380. However, the power drops below 0.8 to 0.7783, when n = 37, at which r = 13, and rises again over 0.8 when n = 38. Then ηFCnθD never decreases below 0.8 for sample sizes greater than 38. Therefore, instead of selecting the smallest n that attains the power condition, it can be more appropriate to consider the more conservative sample size criterion formalized in Section 4, according to which the optimal sample size is selected as

nFC=minn*N:ηFCnθD>γ,nn*.E22

The criterion ensures that the power will not decrease below the desired threshold for any larger sample size: in our specific case, it consists in selecting n = 38, instead of n = 35.

5.2. Frequentist predictive power

In order to model uncertainty in the specification of the design value, we need to adopt the hybrid classical-Bayesian approach described previously. We introduce a beta design prior density for θ, πD(θ) = beta(θαDβD), that is used to obtain the prior predictive distribution of the data. It is well known that by averaging the binomial sampling fn(yn|θ) with respect to the beta design prior, we obtain the following marginal distribution

mnDyn=betabinynαDβDn,foryn=0,...,n,E23

where beta-bin(⋅; αDβDn) denotes the probability mass function of a beta-binomial distribution with parameters (αDβDn).

The design prior πD(θ) can be elicited in many different ways. One useful possibility consists in (i) setting the prior mode equal to the fixed design value θD, which investigators would choose within the subset under H1 when using the conditional approach, and (ii) regulating the concentration of the distribution around its mode according to the degree of uncertainty one wishes to express. This can be done by using for the hyperparameters of πD(θ) the following expressions:

αD=nDθD+1andβD=nD1θD+1,E24

where θD is the prior mode and nD is a design parameter that can be interpreted as prior sample size. The larger the nD, the smaller the variance of the beta design prior. Therefore, we need to increase nD if we want to reduce uncertainty on the guessed values of θ. More specifically, if we set nD = ∞, the design prior of θ assigns all the probability mass to θD: in this case, no uncertainty is involved and the marginal distribution of the data coincides with the sampling distribution conditional on θD. We thus must set nD < ∞ to distinguish between conditional and predictive approaches. In particular, once a prior mode θD has been selected, the researcher can choose nD by assuring a large level (say very close to 1) for PπDθ>θ0, that is the probability assigned by πD(θ) to the event θ > θ0. Let us assume, for instance, that θ0 = 0.2 and consider three possible choices for θD (i.e. 0.3, 0.4 and 0.5). For each of them, we compute the smallest nD such that PπDθ>θ0 is about equal to 0.999, and the behaviour of the corresponding design priors is shown in Figure 2(a). Clearly, if the prior mode approaches θ0, we need to increase nD to guarantee that PπDθ>θ00.999. Moreover, for a fixed prior mode θD, if we decided to decrease the value of nD with respect to the one used in the graph, PπDθ>θ0 would decrease. In fact, nD has been specified in order to express the minimum degree of prior enthusiasm about the efficacy of the treatment necessary to have the prior probability that θ exceeds the target θ0 at least equal to the chosen level 0.999. An alternative way of proceeding consists in choosing nD by ensuring a fixed level for the prior probability assigned to a symmetrical interval around the prior mode. For instance, if we set θD = 0.4, we can find that 255, 111 and 60 are the values of nD such that it is about equal to 0.999 the probability that πD(θ) assigns to the intervals (0.3, 0.5), (0.25, 0.55) and (0.2, 0.6), respectively. The corresponding design prior distributions are shown in Figure 2(b). It is important to point out that all the design densities, represented in both the graphs of Figure 2, express uncertainty in the suitable design value that it is worthwhile to consider when applying the SSD criteria based on power analysis. Thus, all the distributions assign a negligible probability to values of θ smaller than θ0, which are those values specified under H0.

Figure 2.

Possible choices of the design prior distribution, when θ0 = 0.2.

Once πD(θ) has been specified, the frequentist predictive power can be obtained by computing the probability of rejecting the null hypothesis at α level with respect to mnDyn. Hence, we have

ηFPnπD=PmnDYnRH0=yn=rnbetabinynαDβDn,E25

where r is the critical value provided in Eq. (20). In practice ηFPnπD is given by the sum of the probabilities of the all the outcomes inside RH0, computed under a design scenario according to which the true θ belongs to the interval (θ0, 1), where it is distributed according to the design prior density. Let us remark again that if the design prior is a point mass distribution on θD (i.e. nD = ∞), we have that the frequentist power functions, conditional and predictive coincide.

Similarly to the frequentist conditional power, also the predictive one presents a saw-toothed shape as a function of n, since mnDyn is a discrete distribution. Therefore, we suggest to adopt the conservative approach previously described and to select

nFP=minn*N:ηFPnπD>γ,nn*,E26

for a fixed desired threshold γ. Figure 3 shows the behaviour of the frequentist predictive power as a function of n for different choices of the design prior, when θ0 = 0.2 and α = 0.05. More specifically, we consider the three πD(θ) plotted in Figure 2(b) that are all centred on θD = 0.4, but with different degrees of concentrations regulated by the nD value. In each graph, we highlight which is the optimal sample size obtained according to the criterion in Eq. (26) when γ = 0.8. Note that the larger the nD, the smaller the degree of uncertainty we introduce through the design prior and, as a consequence, the smaller the optimal sample size. In fact, we obtain the optimal values 46, 42 and 39, for nD equal to 60, 111 and 255, respectively. If we set nD = ∞, we would retrieve the conditional criterion in Eq. (22), where no uncertainty is considered in specifying the design value, and the optimal n would be equal to 38 (see Figure 1). Moreover, let us fix again θ0 = 0.2, α = 0.05 and γ = 0.8 and consider the three design prior distributions in Figure 2(a), which are characterized by different prior modes. The evident difference between the prior scenarios represented by these design priors clearly affects the optimal sample size: we obtain the optimal values 157, 46 and 23, for (θDnD) = (0.3, 163), (θDnD) = (0.4, 43) and (θDnD) = (0.5, 20), respectively.

Figure 3.

Behaviour of ηFPnπD as a function of n for different choices of the design prior distribution, when θ0 = 0.2 and α = 0.05.

5.3. Bayesian conditional power

When we decide to adopt a Bayesian approach to establish the statistical significance of the result, we need to introduce an analysis prior distribution for θ. In our specific case, it is computationally convenient to specify a beta analysis prior, πA(θ) = beta(θαAβA): in this way, from conjugate analysis we obtain that the corresponding posterior distribution is still a beta density with updated parameters,

πnAθ|yn=betaθ;αA+yn,βA+nyn.E27

Through πA(θ), the researcher can incorporate in the SSD procedure pre-experimental knowledge, as well as sceptical or enthusiastic expert prior opinions about the efficacy of the experimental treatment. However, one of the most common ways of proceeding is to choose a non-informative—or based on very weak information–density, to let the posterior distribution be based almost entirely on the evidence in the data. We could, therefore, specify πA(θ) = beta(θ; 1, 1) or consider the non-informative Jeffreys prior. Alternatively, if we want to use informative analysis prior distributions, we can express the hyperparameters in terms of the prior mode θA and the prior sample size nA, that is

αA=nAθA+1andβA=nA1θA+1.E28

In this way, for instance, it is possible to express scepticism or optimism about large treatment effects by setting θA less or higher than the target θ0, respectively. Obviously, when θA < θ0, the larger the nA, the larger the degree of scepticism we wish to express; while, when θA > θ0 larger values of nA are used to increase the degree of enthusiasm we desire to take into account. However, the value nA = 1 is often used to have a weakly informative prior distribution. The upper panel of Figure 4 shows three possible choices for the analysis prior when θ0 = 0.2. These distributions are obtained by fixing the prior mode θA and, then, selecting nA so that PπAθ>θ0 (i.e. the probability assigned by πA(θ) to the event θ > θ0) is about equal to a desired level. More specifically, we have considered (i) a sceptical prior mode θA = 0.1 and PπAθ>θ00.4, (ii) a neutral prior mode θA = 0.2 and PπAθ>θ00.6 and finally (iii) an enthusiastic prior mode θA = 0.3 and PπAθ>θ00.8. The corresponding values of nA are 7, 14 and 4, respectively. These densities will be used to illustrate how the optimal sample sizes based on Bayesian powers are affected by the information formalized through the analysis priors.

Figure 4.

Upper panel: possible choices of the analysis prior distribution, when θ0 = 0.2. Lower panel: behaviour of ηCBnθD as a function of n for each of the analysis prior distributions represented in the upper panel, when θ0 = 0.2, θD = 0.4 and λ = 0.9.

The random result Yn is defined as ‘significant’ from a Bayesian perspective, if the corresponding posterior probability that θ > θ0 is sufficiently large. In symbols, we decide to reject the null hypothesis, on the basis of the result Yn, if the following condition is satisfied.

PπnA(|Yn)(θ>θ0)>λ,E29

where PπA|Yn is the probability measure associated with the posterior distribution in Eq. (27) and λ ∈ (0, 1) is a pre-specified threshold. It is worth noting that, for a given value of n, the posterior quantity PπnA|Ynθ>θ0 is an increasing function of Yn. As a consequence, we can find a non-negative integer r˜ between 0 and n, such that

PπnA(|r˜)θ>θ0>λandPπnA(|r˜1)θ>θ0λ,E30

and we can claim that H0 is rejected if the observed number of responders yn is equal to or greater than r˜. In practice, r˜ represents the smallest number of successes such that the condition for the Bayesian significance is satisfied, and in symbols it can be expressed by

r˜=min{k{0,1,...,n}:PπnA(|k)(θ>θ0)>λ}.E31

By considering a fixed design value θD greater than θ0, the Bayesian conditional power is therefore obtained as

ηBCnθD=Pfn(|θD)(PπnA(|Yn)(θ>θ0)>λ)=yn=r˜nbin(yn;n,θD).E32

Essentially, it is given by the sum of the probabilities of all the Bayesian significant results, computed assuming that the true θ is equal to θD.

Since we are dealing with discrete data, also this power function is not monotonically increasing as a function of n. Let us assume that θ0 = 0.20, θD = 0.4 and λ = 0.9. The detailed calculations shown in Table 2 can help to understand why ηBCnθD has the typical saw-toothed behaviour. For each sample size between 3 and 50, the table provides the corresponding value of r˜, the level of the Bayesian conditional power and the posterior probability that θ exceeds θ0 conditional on the result r˜. Clearly, these latter values are always larger than the threshold λ that is 0.9. The white and grey colours are used alternately to highlight blocks of sample sizes with the same value of r˜ associated. When the sample size grows, but r˜ remains constant, PπnA|r˜θ>θ0 decreases, while ηBCnθD increases. However, when both n and r˜ are simultaneously increased by one unit, PπnA|r˜θ>θ0 jumps up, while the Bayesian power drops.

nr˜ηBCnθDPπnA|r˜θ>θ0nr˜ηBCnθDPπnA|r˜θ>θ0
330.06400.92632790.81610.9077
440.02560.970328100.74120.9464
540.08700.955829100.78530.9354
640.17920.937730100.82370.9230
740.28980.915931100.85660.9092
850.17370.961832110.79540.9460
950.26660.947633110.83100.9356
1050.36690.930434110.86170.9239
1150.46720.910235110.88770.9110
1260.33480.955936120.83800.9460
1360.42560.942237120.86670.9362
1460.51410.926038120.89110.9252
1560.59680.907539120.91180.9131
1670.47280.951840130.87150.9464
1770.55220.938841130.89450.9371
1870.62570.923742130.91400.9267
1970.69190.906543130.93050.9153
2080.58410.949144130.94410.9028
2180.65050.936745140.91640.9381
2280.71020.922646140.93200.9284
2380.76270.906747140.94500.9176
2490.67210.947448140.95580.9059
2590.72650.935749150.93360.9394
2690.77450.922550150.94600.9301

Table 2.

Numerical calculations to explain the saw-toothed behaviour of ηBCnθD as a function of n: sample sizes, the corresponding value of r˜, the Bayesian conditional power and the posterior probability that θ > θ0 when the observed result is equal to r˜ successes, for θ0 = 0.20, θD = 0.4 and λ = 0.9.

Because of the saw-toothed nature of the power curve, for a fixed threshold γ, the optimal sample size is selected using the conservative criterion, that is

nBC=minn*N:ηBCnθD>γ,nn*.E33

The lower panel of Figure 4 shows the behaviour of the Bayesian conditional power as a function of n for each of the three analysis prior density plotted in the upper panel, when θ0 = 0.2, θD = 0.4 and λ = 0.9. In each graph, it is indicated the optimal sample size according to the criterion in Eq. (33) for γ = 0.8. As expected, as we move from sceptical prior opinions towards more enthusiastic beliefs about the efficacy of the experimental treatment, the required sample size decreases.

5.4. Bayesian predictive power

Besides introducing pre-experimental information, if we also wish to model uncertainty on the design value, we have to consider the Bayesian predictive power. Therefore, as described in Section 5.3, we elicit an analysis prior distribution to obtain the beta posterior density πnAθ|yn. Moreover, following the indications provided in Section 5.2, we introduce a design prior distribution to construct the marginal distribution mnDyn.

The Bayesian predictive power is computed by adding the probabilities of all the Bayesian significant results, computed under the design scenario expressed through the design prior. Thus, we have

ηBPnπD=PmnD(PπnA(|Yn)(θ>θ0)>λ)=yn=r˜nbetabinynαDβDn,E34

where r˜ is given in Eq. (31). Obviously, also ηBPnπD shows the typical saw-toothed behaviour as a function of n, because of the discrete nature of the beta-binomial marginal distribution of yn. Therefore, given a desired threshold γ and according to the suitable conservative approach previously used, we select the optimal sample size as

nBP=minn*N:ηBPnπD>γ,nn*.E35

In Table 3 we provide the values of nBP, for different choices of the analysis and the design prior densities. More specifically, we consider the three analysis priors plotted in the upper panel of Figure 4 and the design prior distributions represented in both the panels of Figure 2, when θ0 = 0.2 and λ = 0.9. Similarly to what we have seen for the Bayesian conditional power, the sample sizes obtained under the sceptical analysis prior are uniformly larger than those obtained under the more enthusiastic distributions. As regard the impact of the design priors, it is straightforward to see that the stronger the degree of uncertainty on the appropriate design value expressed by πD(θ), the larger the required sample size. For instance, for a fixed prior mode of the design prior, nBP increases as nD get smaller (see Table 3(b), where θD = 0.4). However, let us note that more evident changes in the sample size can be appreciated when we compare the effects of design priors based on different prior modes (see the results in Table 3(a), where the design priors represent very distant design scenarios).

θA = 0.1θA = 0.2θA = 0.3
θDnDnA = 7nA = 14nA = 4
(a) Design prior distributions in Figure 2(a)
0.316312010994
0.443373122
0.520211811
(b) Design prior distributions in Figure 2(b)
0.460373122
0.4111333122
0.4255332722

Table 3.

nBP for different choices of the analysis and the design priors, when θ0 = 0.2 and λ = 0.9.

These Bayesian predictive SSD procedures, which include the conditional ones as a special case, have been exploited in Ref. [8] to construct single-arm two-stage design for phase II of clinical trials based on binary data. In Ref. [14], instead, an extension to the randomized case has been presented, while in Ref. [15] the same procedures have been implemented by adding the possibility of taking into account uncertainty in the historical response rate.

Advertisement

6. Conclusions

Especially in clinical research, the pre-experimental power analysis is one of the most commonly used methods for sample size calculations. It is tacitly implied that the power function is constructed under a frequentist framework. However, it is possible to introduce Bayesian concepts in the power analysis to provide more flexibility to the sample size determination process.

When the power function is used as a tool to obtain the appropriate sample size, the general idea is to ensure a large probability of correctly rejecting the null hypothesis H0, when it is actually false because the true θ belongs to H1. Therefore, the conjecture that the alternative hypothesis is true represents an essential element of the method. It can be realized by assuming that the true θ is equal to a fixed design value θD, suitably selected inside H1 (conditional approach); alternatively, we can introduce uncertainty on the guessed design value by introducing a design prior distribution that assigns negligible probability to values of θ under H0 (predictive approach). Moreover, the decision about the rejection of H0 can be made under a frequentist framework or by performing a Bayesian analysis. In the latter case, it is possible to incorporate in the methodology pre-experimental information possibly available through the specification of an analysis prior distribution. By combining frequentist and Bayesian procedures of analysis, with both the conditional and predictive approaches, we obtain the four power functions described in this chapter. Let us remark that the Bayesian predictive power is the one that allows to add more flexibility to the sample size calculations. At the same time, it let the researcher take into account prior knowledge, as well uncertainty on the design value. However, no design uncertainty can be involved by considering a point-mass design distribution. On the other hand, if no information is available, it is possible to elicit a non-informative analysis prior and let the analysis be based entirely on the data.

References

  1. 1. Ryan TP. Sample Size Determination and Power. Haboken: Wiley; 2013
  2. 2. Chow SC, Wang H, Shao J. Sample Size Calculations in Clinical Research. 2nd ed. Boca Raton: Chapman and Hall/CRC; 2008
  3. 3. Julious SA. Sample Sizes for Clinical Trials. Boca Raton: Chapman and Hall/CRC; 2010.
  4. 4. Wang F, Gelfand AE. A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models. Statistical Science. 2002;17(2):193-208. DOI: 10.1214/ss/1030550861
  5. 5. De Santis F. Sample size determination for robust Bayesian analysis. Journal of the American Statistical Association. 2006;101(473):278-291. DOI: 10.1198/016214505000000510
  6. 6. Sahu SK, Smith TMF. A Bayesian method of sample size determination with practical applications. Journal of the Royal Statistical Society: Series A. 2006;169:235-253. DOI: 10.1111/j.1467-985X.2006.00408.x
  7. 7. Brutti P, De Santis F, Gubbiotti S. Robust Bayesian sample size determination in clinical trials. Statistics in Medicine. 2008;27(13):2290-2306. DOI: 10.1002/sim.3175
  8. 8. Sambucini V. A Bayesian predictive two-stage design for phase II clinical trials. Statistics in Medicine. 2008;27(8):1199-1224. DOI: 10.1002/sim.3021
  9. 9. Sambucini V. A Bayesian predictive strategy for an adaptive two-stage design in phase II clinical trials. Statistics in Medicine. 2010;29(13):1430-1442. DOI: 10.1002/sim.3800
  10. 10. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. New York: Wiley; 2004
  11. 11. Gubbiotti S, De Santis F. Classical and Bayesian power functions: Their use in clinical trials. Biomedical Statistics and Clinical Epidemiology. 2008;2(3):201-211. DOI: 10.1198/016214505000000510
  12. 12. R Core Team. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. 2016. Available from: http://www.R-project.org
  13. 13. Chernick MR, Liu CY. The saw-toothed behavior of power versus sample size and software solutions: Single binomial proportion using exact methods. The American Statistician. 2002;56(2):149-155. DOI: 10.1198/000313002317572835
  14. 14. Cellamare M, Sambucini V. A randomized two-stage design for phase II clinical trials based on a Bayesian predictive approach. Statistics in Medicine. 2015;34(6):1059-1078. DOI: 10.1002/sim.6396
  15. 15. Matano F, Sambucini V. Accounting for uncertainty in the historical response rate of the standard treatment in single-arm two-stage designs based on Bayesian power functions. Pharmaceutical Statistics. 2016;15(6):517-530. DOI: 10.1002/pst.1788

Written By

Valeria Sambucini

Submitted: 04 January 2017 Reviewed: 20 June 2017 Published: 02 November 2017