Biostatistics Advance Access originally published online on June 20, 2006
Biostatistics 2007 8(2):306-322; doi:10.1093/biostatistics/kxl011
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Adaptive design: estimation and inference with censored data in a semiparametric model
Department of Biostatistics, M. D. Anderson Cancer Center, Houston, TX 77030, USA yshen{at}mdanderson.org
Department of Mathematical Sciences, Indiana University at South Bend, IN 46634, USA
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
In this article, we provide a method of estimation for the treatment effect in the adaptive design for censored survival data with or without adjusting for risk factors other than the treatment indicator. Within the semiparametric Cox proportional hazards model, we propose a bias-adjusted parameter estimator for the treatment coefficient and its asymptotic confidence interval at the end of the trial. The method for obtaining an asymptotic confidence interval and point estimator is based on a general distribution property of the final test statistic from the weighted linear rank statistics at the interims with or without considering the nuisance covariates. The computation of the estimates is straightforward. Extensive simulation studies show that the asymptotic confidence intervals have reasonable nominal probability of coverage, and the proposed point estimators are nearly unbiased with practical sample sizes.
Keywords: Confidence interval; Cox model; Martingale; Nuisance covariates; Point estimator; Self-designing
| 1. INTRODUCTION |
|---|
|
|
|---|
In a typical phase III trial, especially in studies of cancer and other chronic diseases, the primary goal is often to compare the duration of time to an event between patient groups on different treatment arms. The time to an event of interest may be censored due to loss to follow-up or the event not having occurred at the time of analysis. During the course of these long-term clinical trials, design adaptations sometimes are necessary on the basis of the observed data, without inflating the type I error rate while achieving the study goals. While attractive for its flexibility, the adaptive design makes it challenging to perform the final analyses, including the estimation of a treatment difference and its confidence interval at the end of trial. As expected, conventional analytic methods often lead to biased estimators due to the interim decisions within an adaptive design (Cheng and Shen, 2004
Similarly, the estimation of the parameters is often difficult after the classical group sequential tests even with a fixed sample size. The classical group sequential designs were proposed to stop trials early for highly significant positive results in the fixed sample design framework (Pocock, 1977
; O'Brien and Fleming, 1979
; Lan and DeMets, 1983). The basic motivations of the two types of group sequential designs are different. However, similar difficulties in estimation are found with either type of design due to sequential sampling. Following the classical sequential tests, Todd and Whitehead (1996)
proposed a bias-adjusted MLE based on triangular test. Others, including Siegmund (1978)
, Tsiatis and others (1984)
, and Emerson and Fleming (1990)
, developed some orderings on the outcome space, either by the sample mean or by the analysis time at which the study terminates. These methods of estimation for the classical sequential designs with a fixed maximum sample size rely on a prespecified outcome space and, therefore, are not directly applicable to the adaptive design with a random maximum sample size.
Most adaptive designs and corresponding methods of analysis have been developed mainly for instantaneously observed continuous or binary outcomes; thus, the methods are not directly applicable to survival responses with staggered entry. In general, for time-to-event outcomes, the adaptive design and analysis procedures can be considerably complicated. With staggered entry and long-term follow-up, partial but increasing information becomes available from the study participants at successive monitoring times. The information is contained not only in the sample size but also in the number of events observed during the study. Within group sequential designs, Schäfer and Müller (2001)
proposed a conditional rejection error probability function to make design modifications in an ongoing trial with censored survival data. More recently, Shen and Cai (2003)
generalized the self-designing trial of Fisher (1998)
and Shen and Fisher (1999)
from immediately observed outcomes to censored survival outcomes with staggered entry for comparison of two treatment arms. However, the attention is still restricted to testing the equality of two survival distributions. As recognized by Schäfer and Müller (2001)
and others, the problem of estimating the parameter of interest and its confidence interval at the end of the adaptive trials remains unsolved, especially with censored outcomes. Another limitation in the existing designs and inference is that the constructed test statistics do not consider other risk factors that are commonly available at baseline in clinical trials. When the effects of nuisance covariates are substantial, the sample size estimation based on the design without considering the nuisance covariates can be less efficient (Lagakos and Schoenfeld, 1984
; Self and Mauritsen, 1988
), and the estimator of the treatment effect can be biased (Gail and others, 1984
; Struthers and Kalbfleisch, 1986
).
Here, we describe the adaptive design and inference method within the framework of the Cox proportional hazards model, in the presence of covariates other than the treatment indicator in a randomized clinical trial. We construct the bias-adjusted estimator of the hazard ratio between the treatment arms and its confidence interval at the end of the trial, with or without other covariates adjustment in the model. The final statistic for testing the treatment effect can be derived as a weighted average of linear rank statistics, while adjusting for the effects of nuisance covariates via the semiparametric Cox proportional hazards function. In Section 2, we present the estimation and inference procedures for the adaptive design with and without nuisance covariates. We summarize the simulation studies in Section 3, and illustrate an example in Section 4. We make general remarks in Section 5.
| 2. INFERENCE AND ESTIMATION OF TREATMENT EFFECT |
|---|
|
|
|---|
We consider the setting where patient entry is staggered, random, and independent of the survival and censoring times. After an initial accrual period of ta, patients are followed for an additional period of time, tf. Assume that the two treatment groups under study are entered sequentially and allocated symmetrically. The total number of interim analyses m does not have to be fixed in advance. Let Ai denote the time of entry, Ci denote the time from entry to censoring, Vi denote the time from entry to failure, Zi be the indicator variable for the treatment group, and Wi be a vector of nuisance variables for the ith patient, i = 1,2,... . A key assumption for the method is that the time of entry into the study is statistically independent of the time to failure (or censoring), and the random variables A, V, and C are conditionally independent given covariates (Tsiatis, 1981
; Tsiatis and others, 1985
). Data collected on the ith individual are described by independent and identically distributed vectors, {(Ai,Vi,Ci,Zi,Wi),i = 1,...}.
When data are reviewed at time t, we observe the time to failure or censoring, Xi(t), and the failure indicator
i(t) for the ith individual, where Xi(t) = max{min(Vi,t Ai,Ci),0}, and
i(t) = 1 if Vi
min(t Ai,Ci) and 0 otherwise. The following variables can be represented as identically and independently distributed random vectors {Xi(t),
i(t),Zi,Wi} for i = 1,2,.... Let
t =
{Xi(u),
i(u),Zi,Wi,0 < u < t,i = 1,...} define a filtration at time t. Thus,
tj is the
-field generated by the cumulated data for each block up to the jth look, where j = 1,2,...,m.
First, we consider the simple two-arm randomized design without nuisance covariates. To measure the treatment effect, we assume the semiparametric proportional hazards model (Cox, 1972
)
|
| (2.1) |
where Z = 1 for a patient in the intervention arm, Z = 0 in the control, and
is the log-relative risk of the event. The one-sided hypotheses to be tested is as followsH0:
=
0 against the alternative H1:
>
0 with type I error rate
and power 1 ß at
=
, where
> 0. Without loss of generality, let
0 = 0.
Shen and Cai (2003)
introduced a weighted sum of rank test statistics that can update the sample size based on the observed survival data with staggered entry. The asymptotic distribution had been derived for the proposed test statistic under sequential monitoring in this setting. The trial is assessed periodically after observing every Bj events between tj 1 and tj, where (t1 < t2 <
< tm) are determined as the time when one observes Bj events since tj 1, and Bj is often a prefixed constant integer. At the jth interim analysis at time tj, nj individuals have entered into the study, and Bj events have been observed from tj 1; a linear rank test statistic based on all cumulated data up to time tj, U(tj,
0), is computed under the null hypothesis. Specifically, the statistic at calendar time tj can be written as
![]() |
where Ni(t,u) = I(Xi(t)
u,
i(t) = 1),
, and Yi(t,u) = I{Xi(t)
u}.
It has been proven that the increments U(t1,
0), U(t2,
0) U(t1,
0),...,U(tj,
0) U(tj 1,
0) are asymptotically independent for prefixed times (Tsiatis, 1981
; Slud, 1984
, and others). Gail and others (1982)
found that the asymptotic joint distribution of the linear rank statistics without nuisance covariates, evaluated at times defined by observing prespecified number of events, has the same structure of independent increments as prefixed times. Tsiatis and others (1985)
further confirmed this independent increment structure under the Cox proportional hazards model with nuisance covariates. A theoretical justification using stopping times can be found in Appendix A for the asymptotic independent increments property of U(tj,·) for j = 1,...,m.
Under the null hypothesis, the distribution of
converges to a normal distribution with mean 0 and variance 

, where 
is the variance of
. Note that 
can be estimated using the formula of Fleming and Harrington (1991
, p 261). Thus, under H0 the standardized information cumulated during (tj 1,tj) can be expressed as
|
| (2.2) |
which is asymptotically normal, with mean 0 and Var(S(tj)) = 1 by Slutsky's theorem.
Recall that the final test statistic Tm in a self-designing trial is constructed based on the weighted standardized S(tj) as
![]() |
where wj = wj(S(t1),...,S(tj 1)) is a nonnegative weight function consisting of data up to the (j 1)th step, and 
w
= 1 (Shen and Cai, 2003
). Under the null hypothesis, Tm is asymptotically normally distributed with mean 0 and variance 1.
In a self-designing clinical trial, it is important to assess futility at the interims for ethical and economical reasons. For this purpose, we use the Wald-type constant likelihood futility boundary proposed by Shen and Cai (2003)
at time tj defined by
|
| (2.3) |
If the linear rank test statistic at tj, U(tj,
0)
f(tj), the trial is terminated and the null hypothesis is accepted, because of concerns that the intervention may be ineffective or inferior to the standard one in prolonging the survival time of patients. Such an early stopping boundary can be useful not only to constrain an unnecessary increase of sample size in the self-designing setting under the null hypothesis but also to preserve the type I error rate. Given the sequential procedure, we can explicitly express the probability of rejecting the null hypothesis at stage m as a probability of not terminating the trial for futility at any of the first (m 1) interim analyses, and the final test statistic being greater than the critical value z
, which follows
|
|
under the null hypothesis. Thus, the probability of rejecting the null hypothesis under H0 is below the nominal level,
.
If the trial is not terminated by futility, i.e. U(tj,
0) > f(tj), the recruitment is continued, and a weight function is estimated based on the cumulated data prior to the jth step, as long as 
w
< 1. The procedure is iterated until step m, if 
w
< 1 and 
w
1. In other words, all the weight function is spent at step m, and the trial is terminated at the calendar time tm. Specifically, the weight function is constructed as an inverse function of the additional number of events required to achieve the specified power (1 ß) at step j(j > 1), denoted by d(t
) d(tj 1); thus,
![]() | (2.4) |
where t
is the calendar time to exit the trial. Note that t
is different from tj in general. By solving the conditional power equation, we obtain
![]() |
where z
and zß denote the upper
- and ß-standard normal percentiles,
is the estimated variance of treatment indicator Z, and
is the estimated hazard ratio based on the partial likelihood using data up to the (j 1)th block. The weight at the first step is often assigned a fixed fraction based on the design parameters (Shen and Fisher, 1999
; Cheng and Shen, 2004
).
With the fixed sample design, the maximum partial likelihood estimate
solved from the score equation, U(
,tm) = 0, at the end of the trial is a consistent estimate of
. However, with the adaptive sequential designs, this estimator can be biased due to interim stopping rules. To derive an asymptotically unbiased estimator and its confidence interval for
, we need to construct a "pivotal" quantity of the unknown parameter
and the final test statistic, so that the asymptotic distribution of the "pivot" is known both under the null and under the alternative hypotheses (Cheng and Shen, 2004
).
One difficulty involving censored survival data is that the final test statistic is proposed through the score statistic at each stage. Applying the martingale central limit theorem, the cumulated score statistic assessed at time tj, n
U(tj,
), asymptotically behaves like a Gaussian process with mean value nj

and variance 
. The tightness of the process n
U(tj,
) has been established for the Cox model with covariates by Bilias and others (1997)
. Under the alternatives (for any
> 0), the normalized process of
, converges weakly to a zero-mean Guassian process. We, therefore, define the pivot as
![]() |
Through a proof similar to that of Shen and Cai (2003)
, we can verify that Vm asymptotically converges to a standard normal distribution for any
(under both the null and alternative hypotheses), using the properties of the martingale theorem and the characteristic function. Based on the above distribution property for Vm, a two-sided asymptotic confidence interval for
at the end of the trial can be constructed at significance level 100(1 2
) from
![]() |
and leads to the form
![]() |
The distribution of Vm also yields to a bias-adjusted moment estimator for
, which is also the center of the constructed confidence interval,
![]() |
Using the arguments similar to those in Cheng and Shen (2004)
, we can prove the consistency of
in terms of block size. The practical implication of this property is that the block size for the adaptive design should be moderate to have a consistent estimator.
To ensure the martingale property of Vm, one more block of data (or at least one failure) should be observed to spend all the left weight at the mth step (m = j + 1), when a trial is terminated at the jth step due to futility, or when the conditional power at the jth step is greater than 1 ß. If we allow the trial to stop at the jth step and accept H0 due to futility, the event {m = j} may no longer be
tm 1 measurable. Therefore, Vm with m = j does not have an asymptotic standard Gaussian distribution. As the associated editor suggested, it is possible to use empirical process theory instead of martingale theory to derive the asymptotic distribution of Vm using data observed up to step j. However, the variance of
may be increased slightly without the extra step. For the futility termination, the decision to accept H0 should remain the same even with the extra block of observed data to maintain the logical consistency, as discussed by Cheng and Shen (2004)
.
It has been widely recognized that the use of important risk factors can increase the power of the inference in the design and reduce bias in the estimation of the treatment effect in the analysis of clinical trials. Therefore, an important generalization of the adaptive design for censored survival outcomes is directed to the comparison of the survival distributions with adjustment for other covariates. The Cox regression model (Cox, 1972
) is a most commonly used semiparametric model for adjusting nuisance covariates, while the primary interest is to assess the treatment effect.
Let
define the covariate coefficient vector for the nuisance covariate vector, W. Under the proportional hazards model assumption, the hazard function, given all available covariates, is as follows:
|
| (2.5) |
With staggered entry, the asymptotic theory has been developed for the joint distribution of a sequence of score statistics evaluated at different times under the Cox model for the classical group sequential designs (Jennison and Turnbull, 1997
; Gu and Ying, 1995
; Tsiatis and others, 1985
). Specifically, the sequence of score statistics has an asymptotical joint multinormal distribution and an independent increment property. At the jth interim analysis at time tj, the score process for testing the null hypothesis,
=
0, is denoted by
, where
![]() |
and
is solved from the constrained partial likelihood equation with
=
0 based on data observed up to the review time tj. That is,
![]() |
Denote
0 to be the true value of
. Tsiatis and others (1985)
showed that the finite-dimensional distributions of
and n 1/2U(tj,
0,
0) are asymptotically equivalent and the limiting process is a time-rescaled Brownian motion, if Zi and Wi are independent. The results have been extended by Gu and Ying (1995)
to more general cases with dependent Z and W. Based on these established properties,
has independent increments with any type of joint distribution for Z and W. At two different review times tj and tk,
converge in distribution to a bivariate Gaussian process with mean 0 and covariance function 
(tj) for tj < tk, and
![]() |
where
and Yi(t,u) are defined in Section 2.1, and
|
|
The consistent estimator of 
(tj) can be obtained by substituting
0 and
0(u) in the above formulae by
and
![]() |
The proof for the consistency of
was outlined in Gu and Ying (1995)
.
The construction of the final test statistic in this setting then becomes straightforward. We can follow a procedure similar to that described in Section 2.2 without nuisance covariates,
![]() |
and
depends on all the information, including nuisance covariate data observed up to tj 1. We will outline the derivation of
next. Note that the constructed test statistic, Tm(W), corresponds to testing the null hypothesis for
=
0, while the variance estimator,
, contains information from both the nuisance covariates and the treatment indicator.
The weight function,
, for j > 1 is estimated iteratively using observed data prior to the jth look. For j = 1, as usual, we often assign a fixed fraction based on the design parameters. To ensure the specified power 1 ß given the accumulated data up to step (j 1), we will solve the additional number of events needed, d(t
) d(tj 1), from the conditional power equation,
![]() |
where S(t
,
) is the standardized increment of the covariate-adjusted score statistic between tj 1 and t
. We replace
and
by the estimated
and
, respectively, using the observed data up to step (j 1) on the right side of the following inequality:
![]() |
where
can be approximated by
, and 
is the variance of treatment indicator Z (Tsiatis and others, 1985
). This approximation is based on a condition that the treatment assignment is independent of nuisance covariates W. In randomized controlled clinical trials, this condition is easily satisfied. Furthermore, this approximation is used to construct the weight function only. The consistent estimator of 
(tj) used in the final test statistic is general enough to handle a possible dependence between Z and W.
Assuming that the jth step is the last step of the trial, then
. By solving the above conditional power equation, we obtain d(t
) d(tj 1) as a function depending on data observed prior to the jth step
![]() |
With an estimated additional number of events required between tj 1 and t
, the weight function can be constructed exactly the same as in (A.1).
Using arguments similar to those used for cases without nuisance covariates, we can prove the asymptotic properties of Tm(W). (More details are provided in Appendix B.) Specifically, the pivotal function,
![]() |
follows an asymptotic standard normal distribution for any
. Thus, the bias-adjusted moment estimate for
and its corresponding 100(1
)% confidence interval can be derived as
![]() |
Under the null hypothesis,
= 0, Vm(W) reduces to the test statistic, Tm(W), which follows an asymptotic standard normal distribution. The futility stopping rule is derived the same way as in Section 2.2, replacing
j by
z(tj) in (2.3).
| 3. SIMULATION STUDY |
|---|
|
|
|---|
We conduct a series of simulation studies to evaluate the performance of the proposed adaptive design inferences, including the test and estimation under either the null or the alternative hypotheses. For different underlying survival distributions, we compare the power/size properties of the test with or without a nuisance covariate adjustment using the proportional hazards model in the adaptive designs, and compare the properties with the corresponding standard test statistic with or without a nuisance covariate in the fixed sample designs. We also obtain the naive estimates for both the treatment effect and the nuisance covariate in the adaptive designs, the adjusted estimates for the treatment effect with and without the nuisance covariate at the end of the adaptive trial, and compare them with the corresponding naive estimates in the fixed sample design, which should be asymptotically unbiased estimates under the correctly specified models.
The empirical type I error rates and power estimates are based on simulating 1000 independent clinical trials under different designs. All tests performed are one-sided. We simulate patient survival times from the proportional hazards model with one treatment indicator variable, Z, and one nuisance covariate, W. The two covariates are dichotomous, independent, and follow a Bernoulli distribution with probability of 0.5. The relative risk of the covariate Z is
, and the relative risk of W is
. Assume that patients enter the trial with a Poisson process at a fixed rate per unit time. The baseline survival follows an exponential distribution.
As investigated by Shen and Cai (2003)
, different approaches to adjusting the number of events in a clinical trial include varying the accrual duration, accrual rate, or follow-up duration after accrual. Due to limited space, we consider only the first type of adjustment in the simulations. Specifically, this adaptive design allows for the accrual duration to be changed from 2 to 4 years, whereas the fixed sample design is based on an accrual duration of 2 years and an additional follow-up period of 2 years after accrual. Assume a fixed accrual rate of 126 patients per year. This setting requires 88 events (252 patients) to achieve 90% power with a type I error rate of 0.025 to detect an expected log-relative risk of
= log(2) = 0.693 (baseline hazard rate of 0.2 in the control arm). With the fixed hazard rate in the control group, we assume various hazard rates in the treatment group, which leads to the log-hazard ratio,
, varying from 0.600 to 0.693. We consider the effect of the nuisance covariate, described by the log-hazard ratio,
, taking values from 0, 0.5, and 0.8 to 1.2.
The number of events in the first interim analysis are specified as 40% of the number of events required in the fixed sample design (denoted as B1). For the subsequent analyses, we use a constant block size of events of 5 or 10. In each simulation, a usual logrank statistic without adjusting the nuisance covariate and a statistic adjusting for the covariate in the Cox model for testing the treatment effect are computed based on the fixed sample design at the prespecified termination time, and compared with the adaptive test statistics with and without adjusting the covariate in terms of the type I error rate, power, and average number of events. We evaluate the performance of the proposed point estimate (denoted by "Adj. Est,
" in the tables) and the corresponding coverage probability of the 95% confidence interval (denoted by "Adj. 95% CP,
" in the tables) with or without adjusting the nuisance covariate, and compare the estimate with the corresponding naive estimate of
, which is obtained by simply solving the score equation at the end of the trial without considering the interim analyses (denoted by "Naive Est,
" in the tables).
Table 1 summarizes the empirical type I error rates and the performance of the estimates for
and
under the null hypothesis. The results show that the proposed adaptive statistics are quite conservative compared to the standard tests with or without adjusting the nuisance covariate in terms of type I error rate. As expected, the naive estimators of
obtained from the score equations at the end of the adaptive trials are somewhat biased compared to the adjusted estimators. The coverage probability of 95% confidence intervals given in Section 2 is fairly accurate across all cases with or without adjusting for the nuisance covariate. The empirical results are consistent with the observations of Struthers and Kalbfleisch (1986)
that the estimators of
under both (2.1) and (2.5) are correct when
= 0. In other words, if
= 0, omitting the nuisance covariate, W, has little impact on the inference for the treatment effect when W and Z are independent. Therefore, the corresponding bias-adjusted estimators and their confidence intervals for the adaptive designs are comparable with or without adjusting for the nuisance covariate. At the end of the adaptive design with the adjustment for the nuisance covariate, we also calculate the naive estimates for
, which are summarized in Table 1 at the last row in each panel. They are asymptotically unbiased, and comparable to the estimator of
under the fixed sample design (the last column and the last row of each panel).
|
When the alternative hypothesis is correctly specified in Table 2,
=
, the proposed tests achieve power similar to that from the fixed sample design, when there is no effect for the nuisance covariate (
= 0). The power of the usual logrank test or Wald test based on the fixed sample designs can be much lower than the expected one, when the effect of the nuisance covariate increases, even when the fixed sample design relies on a correctly specified treatment effect,
(Lagakos and Schoenfeld, 1984
). As expected, it is important to adjust for the nuisance covariate, when the effect
is large (relative to
). The covariate-adjusted estimators often have somewhat less bias than the estimators ignoring the nuisance covariate, and a larger block size also leads to a smaller bias. When the effect of the nuisance covariate is minor or moderate, the statistics based on the logrank and the Cox model perform similarly. We also calculate the naive estimator of
with or without adjusting for the covariate under the fixed sample design. The results are in accord with the early findings of Gail and others (1984)
can be biased with omitted covariates from the proportional hazards model under the fixed sample design. When ignoring the nuisance covariate that has a strong covariate effect (with
= 1.2) in the adaptive design, the proposed estimator of
can be biased due to the misspecified model. The magnitude of the biases is comparable to that of the naive estimator of
in the corresponding fixed sample design omitting the nuisance covariate.
|
As suggested by the editors, we included additional comparisons of power and estimation between the adaptive design and a classical group sequential design with fixed sample sizes. With the same type I and type II error rates as in the fixed sample designs, we planned a maximum number of five interim analyses, with roughly equal numbers of events observed between two consecutive analyses for the O'Brien & Fleming (OBF) design in Table 3. The results were summarized under the column "OBF" for this design. Table 3 shows the results when data are generated with a smaller log-hazard ratio than the assumed one (
= 0.60). It is clear that both logrank-based and Cox-model-based test statistics of the adaptive design have a substantial gain in power compared with the fixed sample tests with or without interim analyses. This is not surprising because the classical group sequential designs do not allow to modify the maximum sample size. In addition, because of the interim efficacy tests and potential early termination of the trials, OBF designs have smaller average number of events (and power) compared to the fixed sample design without interim analyses. Similar to the naive estimator of
under the adaptive design, the naive estimator of
under OBF is also overestimated. For example, when
= 0.5 and
= 0.6, the two naive estimators are 0.679 and 0.650 under Cox model for the adaptive and OBF designs. As expected, the naive estimators for
under both adaptive and OBF designs are unbiased when Z and W are independent.
|
When the effect of the nuisance covariate is large (relative to
), the Cox-model-based test statistic leads to an improved power compared to the test statistic, ignoring the nuisance covariate in both the adaptive and fixed sample designs. Moreover, the coverage probability of the proposed confidence interval at the final analysis is closer to the significance level with the nuisance covariate adjustment than without the adjustment. In general, the bias-adjusted estimators have smaller bias compared to the naive estimators across all scenarios. The naive estimator of
under the fixed sample design in all tables is unbiased, and the coverage probabilities under the correctly specified models should be close to the nominal level, because there is no sequential sampling involved in the fixed sample design. However, the power of testing can be lower than expected when the log-hazard ratio is overestimated under the fixed sample design. | 4. EXAMPLE |
|---|
|
|
|---|
To illustrate the method, we use data from a completed, multicenter randomized clinical trial comparing treatments for colon cancer. We show what would have happened in terms of inference and estimation, had the trial been adaptively designed. In the trial, patients with resected stages B and C colorectal carcinoma were randomized to receive either placebo, adjuvant levamisole (lev), or adjuvant levamisole plus fluorouracil (lev + 5-FU) for 1 year after surgery. One of the primary goals was to evaluate the efficacy of levamisole, either alone or in combination with 5-FU, to reduce recurrence of the disease. Detailed analyses of the trial are given by Moertel and others (1990)
Among a total of 619 patients who entered the trial between the years of 1984 and 1987, we removed 12 patients from our analysis due to a lack of information about their lymph nodes. Accrual in the actual trial lasted from 1984 to 1987, with a median follow-up of 7.5 years. Using the usual Cox proportional hazards model in the fixed sample design with all observed data, the estimated log-hazard ratio (relative risk) is 0.49 (statistically significant with a one-sided p-value < 0.001) for disease-free survival between patients in the placebo arm and the lev + 5-FU arm, while adjusting for the nodal status.
Suppose that the investigators would like to detect a 50% decrease of the hazard rate in the treatment group, compared with the placebo group, in which patients have a baseline hazard rate of 0.23, assuming an exponential distribution. These assumptions are reasonably close to the protocol for a fixed sample design (Laurie and others, 1989
) and the observed data. Given an accrual duration of 4 years and an additional 5 years of follow-up, a sample size of 619 would have about 80% power to detect a 50% decrease in the hazard rate of recurrence and/or death.
With a fixed accrual rate and additional follow-up, we assume that the accrual duration can be extended from 2 to 4 years, if necessary. We apply the proposed adaptive design procedure with the block size of 20 at each look (except B1 = 50); the trial would be terminated with the null hypothesis being rejected after observing 90 events, i.e. upon completion of the third interim analysis. In this case, a total of 448 patients were entered to the study, and the total study duration was 5 years. Specifically, the trial would be stopped earlier under the adaptive design than with the fixed sample design and would claim the efficacy of levamisole together with 5-FU, compared with the placebo. The adjusted log-relative risk is estimated to be 0.52 with 95% confidence interval of (0.06, 0.98). The inference based on the self-designing trial results in a test statistic of 2.2 with a p-value of 0.01. The naive estimate of the log-relative risk between the two treatment arms is 0.71 in the Cox proportional hazards model at the end of the third interim analysis, which is substantially overestimated compared to the one estimated at the end of the trial (0.49) under the fixed sample design.
| 5. DISCUSSION |
|---|
|
|
|---|
We have presented a general inference and estimation theory for the self-designing trials with censored survival data under the commonly used semiparametric proportional hazards model. Shen and Cai (2003)
As demonstrated in the literature for other sequential design or in the fixed sample design settings (Lakagos and Schoenfeld, 1984; Tsiatis and others, 1985
; Jennison and Turnbull, 1997
), as well as in our study, the use of a test ignoring nuisance covariates often results in a loss of power when the nuisance covariate effects are strong, even though the fixed sample design is based on a correctly specified treatment effect. The use of covariate-adjusted tests at the interim analyses as well as at the final analysis of the adaptive design leads to a moderate gain in power and some improvement in efficiency for the estimates. The empirical results in our study show that the estimator of the treatment effect (
> 0) is asymptotically biased toward zero if the nuisance covariate is omitted from the model in the adaptive design. This is analogous to the findings in Struthers and Kalbfleisch (1986)
under the fixed sample design. While the effect of a nuisance covariate is mild or moderate compared to the primary treatment effect, the overall performance of the inference using the Cox model is similar to the inference using a logrank-type of test (omitting the nuisance covariates) under both adaptive design and fixed sample design.
The asymptotic properties for the sequential test statistics with the adjustment of nuisance covariates are mainly based on the work of Gu and Ying (1995)
. The weighted score process for testing the treatment effect while adjusting for other covariates in the Cox model has asymptotic normal distribution under the null hypothesis. Similarly, the distribution property for the proposed pivot has been derived by the corresponding variancecovariance matrix for all covariates. The basic results lead to the estimation of the treatment effect and its confidence interval at the end of an adaptive trial. The accuracy of the approximation under a variety of scenarios is fairly good.
It is often of interest to estimate the effect of nuisance covariates on the primary outcome at the end of an adaptive trial. Intuitively, the naive estimators at the end of the trial for nuisance covariates should be unbiased when randomizing patients to different treatment arms is independent of the nuisance covariates. Because the interim decision rules which induce the bias into the estimation of the treatment effect at the end of the adaptive trial are devised to monitor the treatment effect only, rather than the nuisance covariates, the naive estimate for the nuisance effect should be unbiased. The empirical results, as shown in the tables, confirm this intuition. On the other hand, the validity of the estimation procedure for
is still maintained when Z and W are not independent, since trial monitoring and subsequent decisions of trial conduct are not based on testing the nuisance covariates in the design. However, the naive estimator of
at the end of the study can be biased in this case.
| APPENDIX A |
|---|
|
|
|---|
As suggested by one reviewer, the asymptotically independent increment structure of U(tj,·) can be derived if t1,...,tj,...,tm are stopping times.
By definition (e.g. Fleming and Harrington, 1991
, p 52), a stopping time tj for the filtration, {
t:t
0}, is defined as a nonnegative random variable with the property that, for each t
0, the event {tj
t} lies in
t. Suppose for the counting processes defined on Section 2.1,
t =
{Xi(u),
i(u),Zi,Wi,0
u
t,i = 1,...}. The filtration
tj at time tj is the
-field generated by the cumulated data up to the jth look, where j = 1,2,...,m. Intuitively, the time tj is a stopping time if at any fixed time t one can observe whether or not Bj events have already occurred, where Bj is prefixed (Andersen and others, 1995
, p 61). Thus, it suffices to show that tm is a stopping time, where m is a random integer determined by the data observed up to the (m 1)th step.
Proving that tm is a stopping time is equivalent to showing that {tm
t}
t for any t > 0. By decomposing the event of {tm
t} by {m = k}, we reexpress the event {tm
t} as
|
| (A.1) |
The last equality holds because t1 < t2 <
tk 1 < tk <
. Furthermore
![]() | (A.2) |
where Ni(u,t) = I(Xi(u)
t,
i(u) = 1) and u
t. Recall that {m = k}
tk 1 from Shen and Fisher (1999)
, which implies
|
| (A.3) |
Combining (A.1) through (A.3), tm is proved to be a stopping time. For any fixed j ( < m), it is just a special case of the above derivations. Therefore, t1,...,tj,...,tm are stopping times.
| APPENDIX B |
|---|
|
|
|---|
Following the same arguments as in Gu and Ying (1995)
,
defined in Section 2.3 can be decomposed as
|
|
where both U(tj,
0,
0) and H(tj,
0,
0) can be expressed as the sum of i.i.d. martingale integrals with independent increments. Thus,
converges to a Gaussian process with independent increments and variancecovariance function 
(tj) under the null hypothesis. Under the alternatives, n
U(tj,
) behaves like a Gaussian process asymptotically with mean value nj

(tj) and variance 
(tj). To have a more precise mathematical interpretation, n
U(tj,
) can be normalized under a sequence of local alternatives of
n = n 1/2
1, and converges to B(
z(tj)) +
z(tj)
1, where B is the standard Brownian motion process. By the independent increment property of the score process of
,
is asymptotically independent of
for j
k. Consider the normalized score process,
, which, by Slutsky's theorem, can be written as
![]() |
where op(1) is uniform in tj. This, together with the independent increment property of
, leads to the asymptotic independence of
and
(for j
k). Under the contiguous alternatives, the limiting distribution of
is normal with variance 1 and mean
1(
(tj) 
(tj 1))1/2. Hence, the limit of
has a standard normal distribution under the alternatives. Under the null hypothesis, the second term is zero. Under the general alternatives, the normalized process,
, converges weakly to a zero-mean Guassian process. Thus, the asymptotic normality of Vm(W) follows from an application of the theorem of Shen and Fisher (1999)
. Since the limiting distribution property of Vm in Section 2.2 is a special case of Vm(W) when
= 0, we do not repeat that proof.
| ACKNOWLEDGMENTS |
|---|
The authors thank the Associate Editor and a referee for helpful suggestions. This research was partially supported by National Cancer Institute Grant 2R01 CA079466. The research was done while the second author was visiting the Department of Biostatistics and Applied Mathematics, M. D. Anderson Cancer Center. Conflict of Interest: None declared.
| REFERENCES |
|---|
|
|
|---|
-
Andersen PK, Borgan O, Gill RD, Keiding N. (1995) Statistical Models Based on Counting Processes(Springer, Berlin, Germany).
Bilias Y, Gu M, Ying Z. (1997) Towards a general asymptotic theory for Cox model with staggered entry. The Annals of Statistics 25:66282.[CrossRef]
Cheng Y and Shen Y. (2004) Estimation of a parameter and its exact confidence interval following sequential sample size re-estimation trials. Biometrics 60:9108.[CrossRef][Web of Science][Medline]
Cox DR. (1972) Regression models and life-tables (with discussion). Journal of the Royal Statistical Society, Series B 34:187220.
Emerson SS and Fleming TR. (1990) Parameter estimation following sequential hypothesis testing. Biometrika 77:87592.
Fisher L. (1998) Self-designing clinical trials. Statistics in Medicine 17:155162.[CrossRef][Web of Science][Medline]
Fleming TR and Harrington DP. (1991) Counting Processes and Survival Analysis(John Wiley & Sons, New York).
Gail MH, Demets DL, Slud EV. (1982) Simulation studies on increments of the two-sample logrank score test for survival time data, with application to group sequential boundaries. In Crowley J and Johnson RA (Eds.). Survival Analysis(Institute of Mathematical Statistics, Hayward, CA).
Gail MH, Wieand S, Piantadosi S. (1984) Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika 71:43144.
Gu M and Ying Z. (1995) Group sequential methods for survival data using partial likelihood score processes with covariate adjustment. Statistica Sinica 5:793804.[Web of Science]
Jennison C and Turnbull BW. (1997) Group-sequential analysis incorporating covariate information. Journal of the American Statistical Association 92:133041.[CrossRef][Web of Science]
Lagakos SW and Schoenfeld DA. (1984) Properties of proportional-hazards score tests under misspecified regression models. Biometrics 40:103748.[CrossRef][Web of Science][Medline]
Lan KKG and Demets DL. (1983) Discrete sequential boundaries for clinical trials. Biometrika 70:65963.
Laurie JA, Moertel CG, Fleming TR, Wieand HS, Leigh JE, Rubin J, McCormack GW, Gerstner JB, Krook JE, Malliard J, others J. (1989) Surgical adjuvant therapy of large-bowel carcinoma: an evaluation of levamisole and the combination of levamisole and fluorouracil. Journal of Clinical Oncology 7:144756.[Abstract]
Moertel CG, Fleming TR, Macdonald JS, Haller DG, Laurie JA, Goodman PJ, Ungerleider JS, Emerson WA, Tormey DC, Glick JH. (1990) Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. The New England Journal of Medicine 322:3528.[Abstract]
O'Brien PC and Fleming TR. (1979) A multiple testing procedure for clinical trials. Biometrics 35:54956.[CrossRef][Web of Science][Medline]
Pocock SJ. (1977) Group sequential methods in the design and analysis of clinical trials. Biometrika 64:1919.
Schäfer H and Müller HH. (2001) Modification of the sample size and the schedule of interim analyses in survival trials based on data inspections. Statistics in Medicine 20:374151.[CrossRef][Web of Science][Medline]
Self SG and Mauritsen RH. (1988) Power/sample size calculations for generalized linear models. Biometrics 44:7986.[CrossRef][Web of Science]
Shen Y and Cai J. (2003) Sample size reestimation for clinical trials with censored survival data. Journal of the American Statistical Association 98:41826.[CrossRef][Web of Science]
Shen Y and Fisher L. (1999) Statistical inference for self-designing clinical trials with a one-sided hypothesis. Biometrics 55:1907.[CrossRef][Web of Science][Medline]
Siegmund D. (1978) Estimation following sequential tests. Biometrika 65:3419.
Slud EV. (1984) Sequential linear rank tests for two-sample censored survival data. The Annals of Statistics 12:55171.
Struthers CA and Kalbfleisch JD. (1986) Misspecified proportional hazard models. Biometrika 73:3639.
Todd S and Whitehead J. (1996) Point and interval estimation following a sequential clinical trial. Biometrika 83:45361.
Tsiatis AA. (1981) The asymptotic joint distribution of the efficient scores test for the proportional hazards model calculated over time. Biometrika 68:3115.
Tsiatis AA, Rosner GL, Mehta CR. (1984) Exact confidence intervals following a group sequential test. Biometrics 40:797803.[CrossRef][Web of Science][Medline]
Tsiatis AA, Rosner GL, Tritchler DL. (1985) Group sequential tests with censored survival data adjusting for covariates. Biometrika 72:36573.
Yang J, Chen P, Lu K. (2005) Adaptive design for censored survival data adjusting for covariates. International Biometric Society ENAR Program & Abstracts, 2005(International Biometric Society, Washington, DC).
Received December 5, 2005; revised April 7, 2006; accepted for publication June 16, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



















