Biostatistics Advance Access originally published online on October 9, 2006
Biostatistics 2007 8(3):595-608; doi:10.1093/biostatistics/kxl031
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Screening designs for drug development
Department of Biostatistics & Applied Mathematics, The University of Texas, M. D. Anderson Cancer Center, Houston, TX 77030, USA and Department of Statistics, Rice University, Houston, TX 77005, USA
Department of Statistics, Rice University, Houston, TX 77005, USA pmueller{at}mdanderson.org
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
We propose drug screening designs based on a Bayesian decision-theoretic approach. The discussion is motivated by screening designs for phase II studies. The proposed screening designs allow consideration of multiple treatments simultaneously. In each period, new treatments can arise and currently considered treatments can be dropped. Once a treatment is removed from the phase II screening trial, a terminal decision is made about abandoning the treatment or recommending it for a future confirmatory phase III study. The decision about dropping treatments from the active set is a sequential stopping decision. We propose a solution based on decision boundaries in the space of marginal posterior moments for the unknown parameter of interest that relates to each treatment. We present a Monte Carlo simulation algorithm to implement the proposed approach. We provide an implementation of the proposed method as an easy to use R library available for public domain download (http://www.stat.rice.edu/~rusi/ or http://odin.mdacc.tmc.edu/~pm/).
Keywords: backward induction; bayesian optimal design; clinical trial design; forward simulation; utility function
| 1. INTRODUCTION |
|---|
|
|
|---|
We develop a Bayesian decision-theoretic approach to screening designs for drug development. The proposed process is appropriate for a sequence of phase II studies targeting the same disease area, carried out at the same institution, competing for the same pool of potentially eligible patients, and subject to common resource constraints. For example, at large institutions dedicated to clinical research in cancer, such as the University of Texas, M. D. Anderson Cancer Center, a large number of new agents or new combinations of anticancer agents undergo evaluation for activity. The process is typically carried out through separate phase II studies with only informal learning between studieseven if the studies draw patients with similar disease characteristics. We develop an approach that considers such a sequence of studies as one large encompassing screening design and borrows information between studies. An easy to use implementation, as an R library, allows interested readers to implement the proposed algorithm with minimal effort.
Most screening designs for culling active therapies from many new agents that are in development consider each study in isolation, even though investigators recognize the need for reproducibility of results (Simon, 1987
). After several similar phase II studies have appeared, one is left to combine the information informally and arrive at a decision whether to move ahead with the treatment or not. The question of how many repeat studies to complete is also left informal. In particular, one intuitively would think that the number of replicate phase II studies might depend on the strength of evidence already available concerning the activity of the new agent. Currently, however, decision making does not incorporate such quantitative information in a formal way.
Yao and others (1996)
proposed a formal way to screen multiple agents for activity in a series of phase II vaccine trials. For each treatment being considered, a single-arm clinical study is carried out. The decision concerns choosing the sample size for each phase II study and a threshold to minimize the overall expected sample size (or time) needed until an active agent is identified. The decision problem is discussed in the frequentist paradigm, in which the type I and type II error probabilities are prespecified and preserved over the sequence of experiments. The formal setup in Yao and others (1996)
considers one treatment at a time and assumes independent binary outcomes. In later work, Yao and Venkatraman (1998)
, Wang and Leung (1998)
, and Leung and Wang (2001)
consider a variety of extensions leading to 2-stage designs and fully sequential designs in the same setup. Strauss and Simon (1995)
consider a generalization based on 2-armed randomized trials for each new treatment. One arm is the new treatment, and the other arm is the best treatment found so far. At the end of this sequence of randomized studies, one chooses the "winner" that will be compared to a standard regimen in a randomized comparative trial. Stout and Hardwick (2005)
discuss the above-mentioned approaches as special cases of a more general setup.
In this paper, we build on these methods to develop a sequential decision-theoretic design for drug screening. We introduce two important directions of generalization. First, we allow for multiple treatments to be considered at any given time. New treatments can arise, and existing ones can be dropped at any time if the current evidence suggests that it is optimal to abandon further development of them or that it is optimal to move them to phase III. Second, we cast drug screening as a decision problem. Using a simulation-based solution allows us to consider essentially arbitrarily complex utility functions and probability models. Also, the proposed approach includes the possibility to restrict the action space, for example by considering only designs with certain type I and type II error probabilities.
We propose a probability model that allows borrowing information between treatments, which is appropriate when treatments target the same disease and are likely to be based on similar mechanisms. We consider a utility function that includes terms related to sampling cost and a final payoff that is realized if the future phase III trial shows a statistically significant improvement over the standard of care. The use of a utility function that is based on the sampling cost and the final payoff means that we focus on the perspective of the drug developer or the investigator who is carrying out the trial. We propose to accommodate the interests of regulators and patients by restricting consideration to rules that satisfy constraints on type I and type II error probabilities. For comparison, we also consider a utility of the form proposed by Yao and others (1996)
, who seek to minimize the number of patients before the first treatment is recommended for phase III. The decision criterion for the screening trial is the expected utility, appropriately marginalizing with respect to the unknown true success probability, and the future outcomes in the phase III study. For an extensive discussion of utility functions for clinical trials, see Gittins and Pezeshk (2002)
.
In Section 2, we formally state the drug screening process as a decision problem by defining a probability model, an action space, and a utility function that serves as the decision criterion. In Section 3, we discuss a simulation-based approach for solving the decision problem. In Section 4, we show results for a simulated example. In Section 5, we assess the uncertainty and robustness of these results. In Section 6, we compare our approach with that of Yao and Venkatraman (1998)
in a clinical immunology problem. Finally, we conclude in Section 7 with a final discussion of features and limitations of the proposed approach.
| 2. DRUG SCREENING |
|---|
|
|
|---|
Our approach is based on casting the screening process as a formal decision problem. The basic ingredients of a decision-theoretic setup are an action space
of possible decisions d
, a probability model p(
,y) for all relevant random variables, including parameters
and future data y, and a utility function u(d,
,y). The probability model is conveniently factored into a prior probability model p(
) and a sampling model p(y|
). It can be argued (DeGroot, 2004
to maximize the expectation of u. The expectation is with respect to p, conditioning on all data observed at the time of decision making, and marginalizing over all parameters and all future data. Sometimes, the action space is restricted to decisions that satisfy certain constraints, for example prespecified bounds on type I and type II errors (false-positive and false-negative rates). In such cases, the maximization is carried out over the restricted set.
Let yti be the outcome at time t = 1,...,T for treatment i
At, where At is the set of treatments being considered at time t. We assume a finite time horizon T for the entire screening process, and we allow for a random number of treatments at any given time t.
After observing the outcomes yti, i
At, we make a sequential stopping decision dti for each treatment. We denote with dti = 0 the action of removing treatment i from At and with dti = 1 the action of continuing recruitment for treatment i. If we decide dti = 0, then a terminal second-step decision ai indicates whether to abandon treatment i (ai = 0) or whether to recommend to proceed with a confirmatory phase III study (ai = 1).
Finally, before the next decision at time t + 1, new treatments might be proposed and added to the set At + 1. Let
nt denote the number of new treatments arising in period t and denote with
j = Pr(
nt = j), for j = 0,1,... , its probability distribution. In the last period, T, continuation is not possible. That is, dTi = 0 for all i
AT. Figure 1 illustrates the sequence of decisions and observations.
|
The formal definition of the decision problem requires a probability model for all involved random variables. We assume binomial sampling. We make this assumption mainly for ease of exposition. With minor modifications, the proposed approach can be adapted to other sampling models. Thus, without major loss of generality, we assume
|
| (2.1) |
with known Nti. In particular, accrual rates can vary across treatments. The unknown success probabilities arise from a common prior distribution, possibly involving a regression on treatment-specific covariates. We use a Beta prior,
i
Be(u,v), with random hyperparameters (u,v) that allow borrowing of information between treatments. As prior distribution on these hyperparameters, we assume Gamma distributions, subject to a bound on u + v,
|
| (2.2) |
The restriction limits the extent of borrowing of strength across treatments. That is, no matter how many treatments and patients we have observed, the data will never provide more information about a new treatment than the equivalent of 10 patients. The choice of 10 is arbitrary. Any alternative bound, or no bound, could be used without any change in the following discussion. In the context of phase II trials with typically small sample sizes, we consider 10 to be a reasonable choice.
Finally, we include a bound N
for the number of eligible patients who can be recruited for enrollment at time t. Setting N
=
defines the problem without recruitment limits. We assume without loss of generality that N
remains the same across t. When a new treatment arises and no patients are available, data collection for the new treatment has to wait until one of the existing treatments is dropped. We do not consider adaptive allocation to treatments.
Let nT be the overall number of treatments considered in the screening process, and let d
(dti,t = 1,2,3,...,T;i
At) and a
(ai,i = 1,...,nT) denote the sequence of decisions. Recall that dit denotes the stopping decision and ai denotes the terminal decision upon stopping enrollment for treatment i. Let y = (yti,t = 1,2,...,T,i
At), and let
= (
i;i = 1,...,nT) denote the parameters of the sampling model for y.
The utility function u(d,a,
,y) formalizes preferences across possible outcomes corresponding to assumed responses y, parameters
, and decisions (d,a), that is it reports the value of a hypothetical realization (y,
,a,d) of the entire trial. An important advantage of the proposed simulation-based solution is that we are free to specify a utility function that reflects the scientific problem, without constraints to convenient analytic properties.
In our implementation, we use a utility function that includes sampling cost plus a payoff for every treatment that is recommended for phase III and is approved at the end of a future confirmatory phase III study that compares the experimental therapy versus the standard of care. The payoff is weighted by the size of the advantage over the standard of care. Regulatory approval is formalized as a statistically significant treatment effect at the conclusion of the confirmatory trial. We build a utility function for the entire process in steps, leading eventually to the utility function stated in (2.3).
First, suppose that for treatment i we start recruitment at time t0i and we stop recruitment at time t1i, that is dti = 0 at time t = t1i. If the treatment is abandoned (ai = 0), then we only record a linear sampling cost c1·
Nti = c1·N·i. Here, N·i is the total number of patients assigned to treatment i.
If we proceed with a phase III trial (ai = 1), then we record the sampling cost c1n3, where n3 is the sample size of the future study, and we add a payoff for a significant phase III result, weighted by the estimated size of the advantage over the standard of care. Let
0 denote the success probability for the standard of care. Let
and
denote the maximum likelihood estimates for
i and
0 at the end of the phase III trial, and let B denote the event of observing a significant result. Let c2 denote the reward for recommending a treatment that shows a significant treatment effect in the confirmatory trial, that is the reward for a successful drug development. The reward is scaled by the estimated size of the advantage over placebo and the probability of B. We record
. Putting everything together, we have
![]() | (2.3) |
We now discuss the evaluation of n3, Pr(B|y1,...,yt1i), and
. Let (mti,sti) denote the posterior mean and standard deviation for
i at time t, and let (m·i,s·i) denote their value at time t1i. The phase III sample size n3 is chosen for a test comparing H0, H0:
i =
0, versus an alternative H1, H1:
i = m·i, for a given significance level
3 and power 1 ß3. Let
, and let zp denote the (1 p) standard normal quantile. We assume that the final test is carried out as a z-test to compare two binomial proportions. Assuming known
0, we approximate the phase III sample size as
|
|
Next, we evaluate the posterior predictive probability p(B|y1,...,yt1i). The event B is defined by the z-statistic falling in the rejection region in favor of the experimental arm. Thus,
|
|
Using a normal approximation to the posterior predictive distribution
, we can approximate p(B|y1,...,yt1i). Denote by µ
and 
the moments of this normal approximation,
|
|
Finally, we evaluate
, the posterior predictive expectation for the size of the advantage over standard of care, conditional on B. This conditional expectation is evaluated as the expected value of a normal random variable left truncated at
(Jawitz, 2004
).
The described action set, probability model, and utility function formally define the decision problem. We now proceed to find the optimal solution by maximizing the utility u(d,a,
,y) as a function of the decisions, marginalizing with respect to
and all future data that are unknown at the time of a decision and conditioning on all available data.
We first discuss the terminal decision ai, the indicator for recommending a phase III trial. The terminal decision is carried out at time t = t1i. From (2.3), we find that ai = 1 is optimal if and only if
|
| (2.4) |
If m·i <
0, it is not possible to achieve the desired power in the phase III trial, and we set ai = 0. This solves the choice for the terminal decision ai, once we have decided to stop enrollment in treatment i.
The continuation decision dti is complicated by its sequential nature. To find the optimal solution at time t, we need to compare expected utilities under dti = 0 and dti = 1. To find the expected utility under continuation, dti = 1, we need to know the decision for t + 1, etc. A full solution involves the use of backward induction. But the computational cost of backward induction makes a full solution infeasible even in fairly simple situations. DeGroot (2004)
, Brockwell and Kadane (2003)
, and Berry and others (2001)
discuss alternative, computationally intensive approaches that allow one to approximate full backward induction. Many Bayesian clinical trial designs avoid the difficult problem of optimal sequential decisions by stopping short of a formal decision-theoretic approach. Instead, many methods include a combination of posterior inference for the probability model with reasonable but ad hoc rules for the desired decisions. A typical example is the approach proposed in Thall and others (1995)
. The method proceeds by evaluating posterior probabilities of clinically meaningful events. When these probabilities cross predefined boundaries, certain decisions are indicated. The boundaries are fixed to achieve desired frequentist properties. Spiegelhalter and others (2004)
refer to such decision rules as proper Bayes. The main problem with such approaches is the large number of arbitrary choices. The major advantage is the ease of implementation.
We propose rules that are derived as optimal Bayes rules by maximizing expected utility. But we avoid the prohibitive computational cost of backward induction by appealing to an approximation. Instead of a full backward induction solution, we use decision boundaries in the space of marginal posterior moments (log(sti),mti) to approximate the optimal sequential decision. See Figure 2 for an example. The decision boundary is defined by two line segments starting at (s0,b0) and going through (s1,b1), with b1 > b0, and (s1,b2), with b2 < b0, respectively. The two values s0 and s1 are fixed, leaving b = (b0,b1,b2) to identify the decision boundary. At the end of each period t, we compare the marginal moments (log(sti),mti) with the decision boundaries. If mti lies between the two lines, then we continue to accrue patients for treatment i, that is dti = 1. If not, we drop treatment i (dti = 0). In summary,
|
| (2.5) |
We write d = d(b) to highlight the nature of d as a rule determined by the decision boundary b.
|
Figure 2 shows the decision boundaries for a specific choice of (b0,b1,b2). The (fixed) offset so determines the log(sti) value where the two half lines join. We always stop accruing patients when log(sti) < s0. This has the desirable implication of imposing an upper bound on the amount of information, as measured by posterior variance, before making a stopping decision. The stopping decision is followed by the terminal decision ai, as described earlier.
Using decision boundaries as in (2.5) reduces the solution of the sequential decision problem to finding the optimal parameters b = (b0,b1,b2). The optimal choice is determined by maximizing expected utility
|
| (2.6) |
The expectation is over 
p(
) and yti
p(yti|
) and plugging in the optimal terminal decisions ai.
| 3. EXPECTED UTILITY MAXIMIZATION BY SIMULATION |
|---|
|
|
|---|
We resort to forward simulation to evaluate expected utility
|
| (3.1) |
using the optimal terminal rule a. Forward simulation was introduced in Carlin and others (1998)
to solve sequential decision problems that can be described by decision boundaries. We simulate once, up-front, possible realizations, j = 1,...,M, of the screening process, keeping all treatments in the trial until a final horizon T. That is, we do not include stopping in the simulation. The arrival of new treatments is simulated using the multinomial probabilities
j.
To evaluate expected utility U(b) for a decision boundary described by b, we look through the file of saved simulations. Let ui denote the ith term in (2.3). Whenever a treatment hits the decision boundary b, it is removed from the current set. When this happens, we compute the optimal terminal decision ai using (2.4) and record the realized utility ui for this treatment. Summing ui over all treatments we get a realization of the utility (2.3). Averaging over all simulated realizations, j = 1,...,M, we obtain an estimate
of the expected utility U(b). In other words, we use the Monte Carlo average
to evaluate the expected utility integral (3.1). Similarly, we can evaluate the expected value of other summary statistics, such as the number of patients tested with each treatment or the probabilities of type I and II errors. Finally, evaluating
over a grid on b, we find the optimal decision boundary b*.
Evaluation of U(b) as a sample average
does not exploit assumed regularities of the expected utility surface as a function of b. That is, we ignore that we could learn about b also by looking at close-by designs b'. This is formalized by fitting a smooth surface
to the observed sample averages
as a function of b. Such smoothing was proposed in Müller and Parmigiani (1996)
as a generic method to improve expected utility evaluations. We propose to define a smooth surface
as a locally weighted linear regression of
on b, using only main effects for b0, b1, and b2.
The described algorithm requires the evaluation of (mti,sti) for a large number of times, treatments, and simulations. This can be very computationally intensive when no closed form is available, as is the case for the model defined by (2.1) and (2.2). We implemented instead an empirical Bayes approximation to (mti,sti) as proposed, for example, in Gelman and others (1995)
.
| 4. SIMULATION EXAMPLE |
|---|
|
|
|---|
We implemented the described method for the following problem. We assume a standard of care with success probability
0 = 0.5, sampling in cohorts of Nti = 2 patients, and multinomial probabilities (
0,...,
3) = (0.7,0.2,0.05,0.05) for the arrival of new treatments
nt. We specify no limit on the number of available patients, that is N
=
.
We set the prior parameters to Au = 3, Bu = 1, Av = 3, and Bv = 1, corresponding to a prior mean E(
i) = 0.5 and standard deviation SD(
i) = 0.27. The experimenter expects new treatments to be as good as the standard of care on the average, but the actual performance of individual treatments can vary considerably. For the utility function, we use relative weights c1 = 1 and c2 = 10000, that is the final payoff for a successful drug is 10 000 times the sampling cost for one patient. The value of c2 was chosen to achieve a power of approximately 80%. See below for a definition of power and type II error in the context of this simulation. The time horizon is assumed as T = 100. We investigated the impact of T on the solution by considering a doubling of the time horizon to T = 200. Comparing the reported optimal rules we found no significant change, leading us to interpret T = 100 as a reasonable approximation for a process with infinite horizon.
We add one more important feature to the decision problem. Let 1 ß denote the probability of an effective treatment, that is a treatment with simulation truth
i >
0, being recommended for phase III. The probability is over repeated simulations and averaging with respect to the prior over all
i >
0. We refer to ß as the false-negative probability (type II error) and 1 ß as power. Similarly, we define
as the false-positive probability (type I error). We constrain the set of allowable decision rules b to such rules that imply
0.05 and ß
0.20, that is power > 80%. The motivation for adding the constraint is that the utility function (2.3) could be criticized as being too narrowly focused on the perspective of the investigator and drug developer only. In the simulation, the constraint is imposed by restricting the grid search for the optimal rule b* to decision rules b that satisfy the conditions. To evaluate
, we find the relative frequency of simulated treatments with
i <
0 in the forward simulation that are recommended for phase III. To evaluate ß, we count the treatments with
i >
0 that are not recommended for phase III. All results are based on M = 1000 forward simulations.
We evaluate the expected utility U(b) in (3.1) over a 3-dimensional grid, as described in Section 3. Figure 3(a), (b), and (c) plots the surface
for several values of b0. The flat nature of the surface with respect to b1 indicates that a wide range of b1 values yield similar expected utility.
|
Figure 2 shows the optimal decision boundaries subject to
0.05 and ß
0.2, some simulated trajectories, and the terminal decisions. For each treatment in the simulated trial, we plot the trajectory of (mti,sti). We follow each treatment from right (high, prior variance) to left until the trajectory crosses a decision boundary. This defines the stopping time t = t1i and the terminal decision ai.
Rows 1 and 2 of Table 1 provide the solution to both the unconstrained and the constrained optimization problems. The optimal unconstrained decision has higher expected utility than the optimal constrained decision, but it requires a larger average number of patients and it implies higher
and ß.
|
Finally, we fit a smooth surface
to
by locally weighted linear regression, as proposed in Section 3. The optimal bandwidth is selected by leaving 1/3 of the grid points out of the model fit and minimizing the mean square error of the predictions for those points. Figure 3(c), (d), and (e) shows the fit. The optimal decisions, shown in rows 3 and 4 of Table 1, are very similar to those obtained without smoothing. The fact that the optimal design changed little confirms that the chosen Monte Carlo sample size, M = 1000, was sufficiently large for this optimal design problem. For smaller M, the advantages of smoothing should be more noticeable.
| 5. UNCERTAINTY AND SENSITIVITY OF THE OPTIMAL DECISION |
|---|
|
|
|---|
We consider two sources of uncertainty in the final solution b*. First, numerical errors in evaluating the expected utilities imply uncertainty about the location of the maximum b*. We refer to this uncertainty as "numerical uncertainty." Second, even if we identify the correct mode of the expected utility surface, there may be other designs with almost equally high expected utility. We refer to a set of designs b with expected utility U(b) within a small neighborhood of U(b*) as "almost optimal designs." While it is possible to reduce the first source of uncertainty by more extensive simulation, the latter uncertainty is inherent in the problem. We can only aim to honestly describe it.
To evaluate the numerical uncertainty in b*, we select designs b within a neighborhood of b*. We then approximate the expected utility U(b) in that neighborhood with a quadratic response surface, U(b) = Q(b;
) +
. Here,
are the regression coefficients of the quadratic function and
are independent normal residuals. The posterior distribution on
implies a posterior distribution on the mode b
(
) of the response surface. We report 95% posterior intervals for b
to summarize the numerical uncertainty in the optimization. The results are shown in Table 2. The table is based on a neighborhood of b* defined by ||b b*||
0.01. We judge the reported uncertainties to be negligible based on the comparison with the suboptimal set of designs discussed below. The small size of the reported numerical uncertainties confirms that the chosen Monte Carlo sample size M = 1000 was sufficiently large.
|
Next, we find the set of almost optimal designs. In the forward simulation, 203 = 8000 triples b = (b0,b1,b2) were considered, that is we estimated the expected utility and type I and type II error rates for 8000 possible values of b. Of these, 55 designs satisfied the constraint
and
and had an expected utility greater than 95% of the utility under the optimal design b*, that is
. Here, b* denotes the optimal rule for the constrained problem. We refer to these 55 designs as the almost optimal designs. They are suboptimal but only by a negligible difference in expected utility. The range of almost optimal designs is reported in Table 2. Since the decision problem is invariant with respect to any additive shift of the utility function, it is not possible to recommend a universal threshold like the 95% chosen here. The choice depends on the problem. The reported range of almost optimal designs is a useful diagnostic to help interpret, critique, and modify the proposed solution. Typically, the utility function is only a stylized description of the decision problem. The range of suboptimal designs allows the investigator to consider adjustments of the proposed solution to accommodate secondary goals and nuances of the decision problem that were not included in the formal utility function. For example, a large range on b1 or b2 might lead an investigator to propose designs with a narrower continuation region than b*, that is shorter total time for each treatment under consideration.
We assess the sensitivity of the solutions with respect to the choice of the main features of the decision problem: the utility function, the prior probability model, the parameterization of the decision boundary, and the maximum number of patients enrolled across all trials at each time (N
).
We first consider changes to the utility function defined in (2.3). We leave the general form of the utility unchanged, but we now weight the payoff for a significant phase III result by the true advantage over placebo (
i
0), rather than the estimated advantage. We define the utility function
![]() | (5.1) |
The optimal designs under the corresponding expected utility U2(b) are shown in rows 56 of Table 1 (smooth version only). Compared to the solution under the original utility function, b0 in the solution of the unconstrained problem decreases, the expected sample size
increases, and the type I error probability decreases slightly. The solution of the constrained problem is robust with respect to the change in the utility.
Next, we consider changes in the prior probability model. In (2.2), we defined a hierarchical model with hyperparameters that allow the pooling of information between treatments. We investigate the change in the optimal design if at the time of analysis, we ignore the hierarchy and use independent beta priors. We continue to use the hierarchical model as the simulation truth. For a meaningful comparison, we use U2 since it does not depend on model-based estimates of
i. Table 1 presents the optimal decisions for both the constrained and unconstrained problems. Without pooling information, the expected number of patients per treatment is increased. The change is most extreme in the constrained problem. The expected utility of the optimal design changes only little. We conclude that by using the hierarchical prior, we can gain the same payoff with fewer patients. Of course, this conclusion is only valid if the true sampling process does in fact include dependence across treatments.
Next, we consider changes in the parameterization of the decision boundaries. In (2.5), we imposed that the boundaries be linear in log(sti). We investigate changing the boundaries to be linear in sti, that is in (2.5), we replace log(sti) by sti. Results are shown in Table 1. The optimal design, its expected utility, and the expected number of patients per treatment are similar to the results for the log-scale parameterization. Lack of such robustness would be a concern. It would indicate that the optimal sequential rule is very poorly approximated by boundaries on the chosen grid.
Finally, we consider changing N
. We investigate the solution for only N
= 10 eligible patients available to enroll at each time (across all treatments in At). The optimal rule, shown in Table 1, results in smaller sample sizes and reduced utility, especially under the unconstrained problem.
| 6. A CLINICAL IMMUNOLOGY EXAMPLE |
|---|
|
|
|---|
We apply the proposed approach to a screening design for vaccines in a clinical immunology scenario. The same example was analyzed in Yao and others (1996)
K1. Otherwise, N2 more patients are accrued, and the treatment is discarded if the overall number of successes is
K2 and recommended otherwise. They then repeat the same process with the second treatment, and so forth. The decision parameters are K1, K2, N1, and N2. The method also includes a truncation, that is stopping the accrual before observing N1 patients if the number of failures is already
N1 K1, and stopping before N2 patients if we already have more than N2 K2 failures. Truncation can significantly reduce the expected number of patients.
In our sequential approach, we define the utility function to be the average number of patients needed to recommend one treatment. This allows us to compare the results across methods. Specifically, we define the utility function to be the ratio of the total number of patients enrolled across all treatments to the number of treatments recommended for phase III. Let N·i = 
Nti denote the total number of patients on treatment i. We define
![]() |
Like Yao and others, we set the success probability for the standard of care to be
0 = 0.5, and we use a Beta prior with parameters
and
, chosen to match the moments based on historical data,
and
. The prior gives a high probability to success probabilities close to 0 or 1.
We evaluate designs with M = 1000 simulations on a grid with 20 equally spaced values of b0 in [0.3,0.7], b1 in [0.3,0.8], and b2 in [0.2,0.6]. We use cohorts of N = 2 patients. After each batch, the posterior moments are evaluated and the decision to stop is taken according to (2.5). For the terminal decision, we use a fixed rule. Upon stopping the enrollment, a treatment is recommended when stopping was indicated by crossing the upper boundary, and a treatment is abandoned if stopping was indicated by crossing the lower boundary. We then select b to maximize
, the Monte Carlo sample average utility in the forward simulation. Again, the maximization is restricted to designs b that satisfy the constraints
and
.
Table 3 shows the optimal decision boundaries for several values of
max and ßmax, and compares them with the 2-stage optimal design with truncation proposed in Yao and Venkatraman (1998)
. The fully sequential approach with the optimal decision boundaries yields a reduction between 27% and 57% in the expected number of patients necessary to recommend a treatment for phase III evaluation. In some cases, the actual
and
are lower than the upper bound imposed by the constraints. The reduced sample size is a natural consequence of the fully sequential setup and does not reflect on any deficiency in the other method.
|
| 7. DISCUSSION |
|---|
|
|
|---|
We have proposed a Bayesian decision-theoretic approach to optimal screening designs for phase II studies. Its main strength is the generality of the simulation-based solution which allows for a wide range of probability models and essentially any utility function. Another advantage is the possibility of optimizing within a subset of rules that satisfy certain properties.
As a decision-theoretic approach, the proposed method inherits the usual limitations of expected utility maximization. In particular, it requires the specification of a utility function and a prior probability model.
The use of decision boundaries to solve the sequential design problem greatly reduces the computational burden to find the optimal sequential decision. At the same time, however, it restricts the possible actions to those described by such decision boundaries. Instead of decision boundaries, one could consider the optimal decision for all possible values of a suitable summary statistic, in our case (sti,mti), on a finite grid. This is explored in Ding (2006)
.
The basic framework developed in this paper allows many generalizations. The prior model could easily be generalized to include a regression of success probabilities on treatment-specific covariates. For example, we might learn that treatments that target a specific molecular mechanism are more successful than others. Another important direction of generalization is the sampling model. Consider problems with a continuous response or an outcome that reports the time to some clinically meaningful event. The algorithm can still be used in such problems as long as we can define a single parameter upon which to base the inference. For example, when analyzing time until tumor progression, one could define the summaries (mti,sti) as posterior moments of a log hazard ratio for treatment relative to the standard of care. The nature of the event time as a delayed response would cause no difficulty in the optimal design scheme. Delayed responses are accounted for in the definition of the posterior moments. In particular, the definition of the likelihood function would include different factors for censored observations and for observed event times, as usual in posterior inference for event time data.
| ACKNOWLEDGMENTS |
|---|
Research was supported by National Institute of Health/National Cancer Institute grants R33 CA97534-01 and R01 CA075981. We thank Raquel Montes Díaz and Roberto Carta for work on earlier prototypes of proposed method. Conflict of Interest: None declared. Funding to pay the Open Access publication charges for this article was provided by M. D. Anderson Cancer Center.
| REFERENCES |
|---|
|
|
|---|
-
Berry DA, Mueller P, Grieve AP, Smith M, Parke T, Blazek R, Mitchard N, Krams M. Adaptive Bayesian designs for dose-ranging drug trials. In: Case Studies in Bayesian Statistics, Volume VGatsonis C, Kass RE, Carlin B, Carriquiry A, Gelman A, Verdinelli I, West M, eds. (2001) New York: Springer. 99182. Lecture Notes in Statistics.
Brockwell AE, Kadane JB. A gridding method for Bayesian sequential decision problems. Journal of Computational and Graphical Statistics (2003) 12:566584.[CrossRef][Web of Science]
Carlin B, Kadane J, Gelfand A. Approaches for optimal sequential decision analysis in clinical trials. Biometrics (1998) 54:964975.[CrossRef][Web of Science][Medline]
DeGroot M. Optimal Statistical Decisions (2004) New York: Wiley-Interscience.
Ding M. Bayesian optimal design for phase II screening trials, [PhD. Thesis]. (2006) Houston, TX: M.D. Anderson Cancer Center and Rice University.
Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis (1995) Boca Raton, FL: Chapman & Hall.
Gittins J, Pezeshk H. A decision-theoretic approach to sample size determination in clinical trials. Journal of Biopharmaceutical Statistics (2002) 12:535551.[CrossRef][Medline]
Jawitz J. Moments of truncated continuous univariate distributions. Advances in Water Resources (2004) 27:269281.[CrossRef][Web of Science]
Leung DHY, Wang YG. Optimal designs for evaluating a series of treatments. Biometrics (2001) 57:168171.[CrossRef][Web of Science][Medline]
Müller P, Parmigiani G. Optimal design via curve fitting of monte carlo experiments. Journal of the American Statistical Association (1996) 90:13221330.[Web of Science]
Simon R. How large should a phase II trial of a new drug be? Cancer Treatment Reports (1987) 71:10791085.[Web of Science][Medline]
Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trails and Health Care Evaluation (2004) Chichester, UK: John Wiley and Sons.
Stout Q, Hardwick J. Optimal screening designs with flexible cost and constraint structures. Journal of Statistical Planning and Inference (2005) 132:149162.[CrossRef][Web of Science]
Strauss N, Simon R. Investigating a sequence of randomized phase-II trials to discover promising treatments. Statistics in Medicine (1995) 14:14791489.[Web of Science][Medline]
Thall P, Simon R, Estey E. Bayesian sequential monitoring designs for single-arm clinical trials with multiple outcomes. Statistics in Medicine (1995) 14:357379.[Web of Science][Medline]
Wang YG, Leung DHY. An optimal design for screening trials. Biometrics (1998) 54:243250.[CrossRef][Web of Science][Medline]
Yao TJ, Begg CB, Livingston PO. Optimal sample size for a series of pilot trials of new agents. Biometrics (1996) 52:9921001.[CrossRef][Web of Science][Medline]
Yao TJ, Venkatraman ES. Optimal two-stage design for a series of pilot trials of new agents. Biometrics (1998) 54:11831189.[CrossRef][Web of Science][Medline]
Received April 25, 2005; revised September 1, 2006; accepted for publication October 3, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





