Skip Navigation


Biostatistics Advance Access originally published online on November 10, 2005
Biostatistics 2006 7(2):252-267; doi:10.1093/biostatistics/kxj005
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/2/252    most recent
kxj005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Park, Y.
Right arrow Articles by Wei, L. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Park, Y.
Right arrow Articles by Wei, L. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

One- and two-sample nonparametric inference procedures in the presence of a mixture of independent and dependent censoring

Yuhyun Park

Department of Biostatistics, Harvard University, 677 Huntington Avenue, Boston, MA 02115, USA

Lu Tian

Department of Preventive Medicine, Northwestern University, 680 North Lake Shore Drive, Suite 1102, Chicago, IL 60611, USA

L. J. Wei*

Department of Biostatistics, Harvard University, 677 Huntington Avenue, Boston, MA 02115, USA wei{at}sdac.harvard.edu

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCES WITH RIGHT...
 3. INFERENCES WITH INTERVAL...
 4. REMARKS
 REFERENCES
 
In survival analysis, the event time T is often subject to dependent censorship. Without assuming a parametric model between the failure and censoring times, the parameter {Theta} of interest, for example, the survival function of T, is generally not identifiable. On the other hand, the collection {Omega} of all attainable values for {Theta} may be well defined. In this article, we present nonparametric inference procedures for {Omega} in the presence of a mixture of dependent and independent censoring variables. By varying the criteria of classifying censoring to the dependent or independent category, our proposals can be quite useful for the so-called sensitivity analysis of censored failure times. The case that the failure time is subject to possibly dependent interval censorship is also discussed in this article. The new proposals are illustrated with data from two clinical studies on HIV-related diseases.

Keywords: Competing risks; Martingale; Sensitivity analysis; Simultaneous confidence interval; Survival analysis


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCES WITH RIGHT...
 3. INFERENCES WITH INTERVAL...
 4. REMARKS
 REFERENCES
 
In survival analysis, the time to the event of interest is often subject to dependent right censorship. For example, in a double-blind clinical trial, AIDS Clinical Trials Group 175 (ACTG 175), conducted by the ACTG, 2467 patients were randomly assigned to one of four daily regimens (Hammer et al., 1996Go). The primary end point was the time T from randomization to one of the following events: a ≥50% decline in the CD4 cell count, development of AIDS, and death. One thousand nine hundred and two event times were censored. Although the majority of these event times was censored administratively, 663 patients were off the treatments without having any of the above clinical events due to, for example, toxicity or request from the patient or investigator, which is likely related to the primary end point. As indicated by Tsiatis (1975)Go and Peterson (1976)Go for handling the general dependent competing risks problem, one may draw misleading conclusions, for instance about the survival function of the primary end point for a study such as ACTG 175 using the standard inference procedures in survival analysis.

With various parametric assumptions on the dependence structure between the event and censoring times, novel inference procedures and sensitivity analyses were proposed, for example, by Fisher and Kanarek (1974)Go, Slud and Rubinstein (1983)Go, Klein and Moeschberger (1988)Go, Klein et al. (1992)Go, Moeschberger and Klein (1995)Go, Zheng and Klein (1995)Go, Lin et al. (1996)Go, DiRienzo and Lagakos (2001)Go, and DiRienzo (2003)Go. When auxiliary variables are available, innovative research has been done, for example, by Robins and Rotnitzky (1992)Go, Robins (1993)Go, Robins and Finkelstein (2000)Go, Satten et al. (2001)Go, and Scharfstein and Robins (2002)Go.

In this article, we consider the case that the failure time T may be censored by either a dependent or an independent censoring variable without assuming a parametric or semiparametric dependence structure between the failure and censoring times. Although the parameter {Theta} of interest for this case may not be identifiable, the collection {Omega} of all possible values of {Theta} is often well defined. For example, if {Theta} is the survival function of T, {Omega} is the collection of nonincreasing functions which are bounded by Peterson bounds (Peterson, 1976Go). For the two-sample problem with the proportional hazards assumption, {Theta} is the ratio of two hazard functions, which is a scalar parameter, and {Omega} is the set of all possible positive values of {Theta} under the above nonparametric setting. In this paper, we propose inference procedures for {Omega} under various one- and two-sample settings. Specifically, we present a consistent estimate Formula and a (1 – {alpha}) confidence set Formula for {Omega} such that Formula where 0 < {alpha} < 1. Such confidence ‘interval’ estimation provides more information than the single ‘point’ estimation. Moreover, by varying the criteria of classifying censoring to the dependent or independent category, our proposal can be quite useful for sensitivity analysis of censored failure time observations. The new proposals are illustrated with the data from the aforementioned ACTG 175 study. To the best of our knowledge, under the nonparametric setting for the relationship between the survival and censoring variables, there are no confidence interval estimation procedures available for the set of all attainable values of the parameter of interest in the presence of a mixture of dependent and independent censoring.

Lastly, in this paper we discuss the case that the failure time is subject to dependent interval censorship and present certain one- and two-sample inference procedures. The procedures are illustrated with the data from a well-known study on the HIV-1 infection incidence among hemophilia patients.


    2. INFERENCES WITH RIGHT-CENSORED OBSERVATIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCES WITH RIGHT...
 3. INFERENCES WITH INTERVAL...
 4. REMARKS
 REFERENCES
 

2.1 One-sample problems

2.1.1 Confidence bands for all possible values of the survival function.

Let T be the continuous failure time of interest, D be the continuous, dependent censoring variable, and C be the independent censoring variable. Also, let {(Ti, Di, Ci), i = 1, ..., n} be n independent copies of (T, D, C). For the ith subject, one can only observe (Xi, {eta}i), where Xi = min( Ti, Di, Ci) and

Formula

First, suppose that we are interested in making inferences about the survival function S(t) of T. In the presence of censoring, generally S(t) cannot be estimated well nonparametrically for small or large t, here we let the parameter {Theta} be the function S(·) defined in a predetermined, finite interval Formula = [{tau}1, {tau}2], where {tau}1 and {tau}2 are known constants such that pr(X ≤ {tau}1, T < D {wedge} C) > 0 and pr(X ≥ {tau}2) > 0. Without assuming a parametric dependence structure between T and D, S(·) is not identifiable. On the other hand, the set {Omega} of all attainable values of Formula is the collection of nonincreasing functions which are bounded below by SL(t) and above by SU(t), where SL(t) = pr(T {wedge} D ≥ t) and

Formula 2(2.1)

t isin Formula 2 (Peterson, 1976Go). In the competing risks literature, the right-hand side of (2.1) is the so-called cumulative incidence function.

Note that

Formula 2(2.2)

where Formula 2

Formula 2

and

Formula 2

are the cause-specific hazard functions with respect to T and D, respectively (Aalen, 1978Go; Kalbfleisch and Prentice, 2002Go, p. 251). To obtain a consistent estimate Formula 2 for {Omega}, one needs to estimate SL(t) and SU(t). To this end, let

Formula 2

where I(·) is the indicator function and m = T, D. Then, the Aalen–Nelson estimator Formula 2 is a consistent estimator for {Lambda}m(t) and a consistent estimator Formula 2 for {Omega} is a set of nonincreasing functions which are bounded above by Formula 2 and below by Formula 2 where Formula 2 and Formula 2 are obtained by replacing {Lambda}m(t) in (2.2) with Formula 2

Note that in the presence of independent and dependent censoring, it seems quite natural to replace each dependently censored observation by a value which is beyond the largest observed or censored event time in the data set and construct the standard Kaplan–Meier (KM) estimator to estimate the upper bound SU(·) of the survival function. Unfortunately, this naive estimator may not be consistent to SU(·) and generally yields a larger estimate than ours due to the fact that the independent censoring assumption for such a KM estimator is violated.

Now, to obtain a (1 – {alpha}) confidence set Formula 2 of {Omega}, one needs the joint distribution of the process Formula 2 To this end, note that

Formula 2

where

Formula 2

Since MTi(·) and MDi(·) are orthogonal martingales (Fleming and Harrington, 1991Go, p. 42), it follows that the processes Formula 2 and Formula 2 converge jointly to a two-dimensional Gaussian process, for s, t isin Formula 2, as Formula 2 To relax the constraint that the cumulative hazard function is nonnegative, one usually reparametrizes this function by considering its log-transformation. By the functional {delta}-method, for large n, the distribution of the process, indexed by (s, t),

Formula 2

can be well approximated by that of the process

Formula 2(2.3)

Generally, the distribution of a function of (2.3) may be rather difficult to obtain analytically. On the other hand, we may approximate the distribution of (2.3) utilizing a simple perturbation technique proposed by Lin et al. (1993)Go. To this end, let {G1, ..., Gn} be a random sample from the standard normal, which is independent of the data. Consider a process which is obtained by replacing Mmi(t) in (2.3) by Gi x I(Xi ≤ t, {eta}i = I(m = T) + 2 x I(m = D)), m = T, D. Then, for large n, conditional on the data, the distribution of the resulting process

Formula 2(2.4)

gives a good approximate to the unconditional distribution of (2.3). Note that the only random quantities in (2.4) are Gi, i = 1, ..., n. Also note that the two components of (2.4) utilize the same Gi multiplier. Since T and D are assumed to be continuous random variables, which do not have events simultaneously, conditional on the data these two components of (2.4) are uncorrelated. To obtain an approximation to the distribution of (2.4), one may generate a large number, say, N, of independent random samples {Gi, i = 1, ..., n} to obtain N realizations of (2.4).

For convenience, define two random processes Formula 2 such that

Formula 2

It follows that the distribution of the process Formula 2 is asymptotically Gaussian and it can be approximated by the conditional distribution of the process Formula 2 where Formula 2 and Formula 2 are obtained by replacing {Lambda}m(t) in (2.2) with Formula 2 m = T, D. Let {sigma}L(s) and {sigma}U(t) be the estimated standard errors for Formula 2 and Formula 2 respectively. These two standard errors can be obtained via the sample variances based on the above N realizations of (Formula 2 Formula 2

A (1 – {alpha}) confidence set Formula 2 for {Omega} is the collection of nonincreasing functions S(·) which satisfy

Formula 2(2.5)

where t isin Formula 2 and c is chosen to satisfy

Formula 2(2.6)

Note that the probability measure (2.6) is generated by {Gi, i = 1, ..., n}, but conditional on the data.

Now, we use the data from the ACTG 175 study to illustrate the above inference procedures for the survival function. Although there were four treatment groups in the study, for illustration, we only compare the AZT (zidovudine) monotherapy with the other three treatments combined. Six hundred and nineteen out of 2467 patients were randomly assigned to the AZT monotherapy. There were 423 and 1479 such failure times censored in the AZT and combined groups, respectively. In Table 1, we list the reasons for censoring. Here, for illustration, we let D be the dependent censoring time when the study patient was off treatment without reaching the primary clinical event due to toxicity or the request of the investigator or patient. There were 157 and 506 such dependent censored events for the AZT and combined groups, respectively. For each group, {tau}1 and {tau}2 are chosen such that they approximately equal to the lower and upper fifth percentiles of the observed failure times, respectively. For the AZT group, {tau}1 = 140 (days) and {tau}2 = 950 (days) and for the combined group, {tau}1 = 170 and {tau}2 = 995. The standard error estimates {sigma}L(t) and {sigma}U(t) and the cutoff point c is obtained with N = 1000 realizations of Formula 2 and Formula 2 In Figures 1(a) and (b), the collection of nonincreasing functions, whose upper and lower bounds are denoted by the solid lines, is the point estimate Formula 2 and the region bounded by the dotted lines is the 0.95 confidence set Formula 2 These figures are quite informative. For example, on Day 700, on average, the survival probability is between 0.82 and 0.87 with its confidence band of (0.79, 0.89) for the combined treatment group. For the monotherapy group, the survival probability is approximately between 0.71 and 0.76 with the confidence band of (0.65, 0.81).


View this table:
[in this window]
[in a new window]
 
Table 1. Numbers of patients off treatment without primary clinical efficacy events

 

Figure 1
View larger version (15K):
[in this window]
[in a new window]
 
Fig. 1. (a) and (b): Formula 2 and 0.95 confidence set Formula 2 for possible values of S(·); (c) and (d): Formula 2 and 0.95 confidence set Formula 2 for possible values of pth quantiles with data from ACTG 175 (solid lines are the boundaries for Formula 2 dotted lines are boundaries for Formula 2

 
Note that Formula 2 = [{tau}1, {tau}2] may be chosen based on clinical interest. For the present data set, we find that the choice of this interval does not seem critical with respect to the cutoff value c in (2.6). For example, if we let {tau}1 and {tau}2 be the lower and upper 1st, 5th, 10th, and 20th percentiles of the observed failure times for the AZT group, with N = 1000, the corresponding cutoff points, c, are 2.9, 2.7, 2.7, and 2.6, respectively.

2.1.2 Confidence bands for all possible values of the quantile function.

Suppose that we are interested in making inferences about the quantile process of the survival function, for example, the median or upper and lower quartiles of T. To this end, let tp be the pth quantile of the survival function S(·), that is, 1 – S(tp) = p. Here, the parameter {Theta} is a function tp of p isin Formula 2 = [p1, p2], a predetermined interval such that Formula 2 Let tlp and tup be the ‘pth’ quantiles for SL(·) and SU(·), respectively. Then, the set {Omega} of all possible values of {Theta} consists of nondecreasing functions tp, p isin Formula 2, which are bounded below by the function tlp and above by tup. A consistent estimator Formula 2 can be obtained easily via estimators Formula 2 and Formula 2 for tlp and tup by solving the equations Formula 2 and Formula 2

One may use the aforementioned perturbation technique to obtain a (1 – {alpha}) confidence set Formula 2 for {Omega}. Since the processes Formula 2 and Formula 2 are tight, it follows that the asymptotic distribution of the process Formula 2 indexed by (p, r), is the same as the conditional distribution of Formula 2 Conditional on the data, let Formula 2 and Formula 2 be the random variables such that

Formula 2

Then, using the results from Goldwasser et al. (2004)Go, for large n, the distribution of the process Formula 2 indexed by (p, r), can be approximated well by that of the process Formula 2 where p, r isin Formula 2. Let {phi}lp and {phi}up be the estimated standard errors of log Formula 2 and log Formula 2 respectively. Then, Formula 2 consists of all nondecreasing functions tp, such that,

Formula 2(2.7)

p isin Formula 2. Here, c is chosen to satisfy

Formula 2(2.8)

Again, we use the data from ACTG 175 to illustrate the above procedure. In Figures 1(c) and (d), we present the point estimates Formula 2 and 0.95 interval estimates Formula 2 for the corresponding pth quantiles based on (2.7) and (2.8) with p isin Formula 2 = [0.04, 0.32] for the AZT group, and = [0.03, 0.21] for the combined group. Here, Formula 2 is the region bounded by the solid lines and Formula 2 is bounded by the dotted lines. For example, with p = 0.15, on average, the 15th percentile is between 607 and 786 with a confidence band of (493, 967) for the combined group. On the other hand, on average, the 15th percentile for the AZT group is (395, 476) with the band of (268, 696).

2.2 Two-sample problems

In this section, we present nonparametric and semiparametric inference procedures for various parameters which quantify the relative merit between two independent groups of failure times in the presence of dependent censoring. To this end, all the aforementioned theoretical and empirical quantities in Section 2.1 are subindexed by their group membership k, k = 1, 2. For example, the data now consist of {(Xki, {eta}ki), i = 1, ..., nk; k = 1, 2}.

2.2.1 Confidence bands for all possible differences of two survival functions.

Suppose that we are interested in {Theta} = {S2(t) – S1(t), t isin Formula 2}, the difference of two underlying survival functions, where L is a predetermined interval [{tau}1, {tau}2] such that pr(Xk1 ≤ {tau}1, Tk1 < Dk1 {wedge} Ck1, k = 1, 2) > 0 and pr(Xk1 ≥ {tau}2, k = 1, 2) > 0. Note that {Omega} consists of functions S2(·) – S1(·), which satisfy

Formula 2(2.9)

t isin Formula 2. A consistent estimator Formula 2 of {Omega} can be obtained by replacing SkL(t) and SkU(t) in the lower and upper bounds of (2.9) with their empirical counterparts, k = 1, 2. A (1 – {alpha}) confidence set Formula 2 is the collection of functions of t, which belong to the intervals

Formula 2

where {xi}kL(t) and {xi}kU(t) are the estimated standard errors of Formula 2 and Sk*U(t), k = 1, 2, and c is chosen such that

Formula 2

Again, an approximation to the above cutoff point c can be obtained via the perturbation technique discussed in Section 2.1.

We use the data from ACTG 175 to illustrate the above proposal. To this end, we let S2(t) and S1(t) be the survival functions for the combined and AZT groups, respectively. First, we assume that the dependent censoring is due to toxicity or the request from the patient or investigator. In Figure 2(a), we present a point estimate Formula 2 and a 0.95 interval estimate Formula 2 for {Omega} with Formula 2 = [170, 950]. With N = 1000 sets of realizations from {Formula 2 Formula 2 k = 1, 2}, Formula 2 is composed of functions bounded by the solid lines, and Formula 2 is the set of functions bounded by the dotted lines. For example, on Day 700, the estimated set of all possible values of the difference between the two survival probabilities is (0.03, 0.19) with a 0.95 confidence band of ( – 0.03, 0.24). In Figure 2(b), we present a similar plot, but assume that the dependent censoring event is only due to the request from the patient or investigator. There are 90 and 290 such events in the AZT and combined groups, respectively. For this case, on Day 700, the point estimate for the difference between two groups is (0.06, 0.16) with a confidence band of (0.004, 0.22). Lastly, we assume that all the censoring variables are independent of T, and in Figure 2(c) we provide the KM estimate denoted by the solid line, and a 0.95 confidence set Formula 2 whose boundaries are the dotted lines. For this case, on Day 700, the point estimate for the difference between two groups is 0.09 with a confidence band of (0.04, 0.13). The plots in Figure 2 provide valuable information regarding sensitivity of the censoring assumptions.


Figure 2
View larger version (9K):
[in this window]
[in a new window]
 
Fig. 2. Formula 2 and 0.95 confidence set Formula 2 for possible values of S2(·) – S1(·) with data from ACTG 175 under various independent censoring assumptions (solid lines are the boundaries for Formula 2 dotted lines are the boundaries for Formula 2

 
2.2.2 Interval estimation for possible values of the proportionality parameter {Theta} under the proportional hazards model.

Suppose that there exists an unknown constant {Theta} such that

Formula 2(2.10)

a two-sample proportional hazards model. We are interested in making inferences about {Theta}. Note that for t isin Formula 2,

Formula 2

Let {Theta}L = sup tisinFormula 2 {Theta}L(t) and {Theta}U = inf t isinFormula 2{Theta}U(t). It is not difficult to show that any member of the interval {Omega} = [{Theta}L, {Theta}U] is an attainable value for {Theta} in Model (2.10). Let Formula 2 and Formula 2 be the estimators obtained by replacing S(t) with Formula 2 in {Theta}L(t) and {Theta}U(t), respectively. Similarly, Formula 2 and Formula 2 are obtained with S(t) replaced by S*(t). A consistent point estimator for {Omega} is Formula 2 where Formula 2 and Formula 2

To derive a (1 – {alpha}) confidence set Formula 2 unfortunately, it is rather difficult, if not impossible, to obtain the joint distribution of Formula 2 and Formula 2 analytically or numerically. Now, consider the following class of interval estimates for {Omega}, indexed by time t isin Formula 2,

Formula 2(2.11)

where {psi}L(t) and {psi}U(t) are the estimated standard errors for Formula 2 and Formula 2 Note that for any ‘predetermined’ t isin Formula 2, an interval (2.11) with c {approx} 1.96 is a valid 0.95 confidence set for {Omega}. However, such an interval for {Omega} can be quite large. To obtain a robust interval estimate, first, we let the cutoff point c in (2.11) be chosen such that

Formula 2(2.12)

With this relatively larger threshold value than 1.96, the set of intervals (2.11) is a (1 – {alpha}) simultaneous confidence band for {Omega} across t isin Formula 2. Thus, the ‘narrowest’ interval from this band is a valid (1 – {alpha}) confidence set for {Omega}. For example, a possible choice for Formula 2 is the interval

Formula 2(2.13)

Now, we use the data set from ACTG 175 to illustrate the procedure (2.13). First, let us assume that the dependent censoring event is due to toxicity or the request from the patient or investigator. For this case, with Formula 2 = [170, 950] the cutoff point c based on (2.12) is 2.48. In Figure 3, we present a 0.95 simultaneous confidence band (2.11) with c = 2.48. The minimizer for Formula 2 is t = 844, and the maximizer for Formula 2 is t = 812. It follows from (2.13) that a 0.95 confidence interval for {Theta} is ( – 1.21, 0.03), indicating that even without assuming a parametric model between the failure and dependent censoring times, patients in the combined group were doing better than those in the AZT group. It is interesting to note that for any predetermined t isin [600,800], the corresponding 0.95 confidence interval Formula 2 is almost identical to our interval (–1.21, 0.03). On the other hand, if one chooses t < 500, the resulting interval for {Theta} is quite wide. For example, when t = 200, the pointwise interval is ( –2.21, 0.50), which is much larger than ours. Lastly, if one assumes that all censorings are independent of T, the maximum partial likelihood estimate for {Theta} is –0.60 and the corresponding 0.95 interval for {Theta} = {Omega} is (–0.78, –0.43).


Figure 3
View larger version (7K):
[in this window]
[in a new window]
 
Fig. 3. 0.95 simultaneous confidence band for [{Theta}L(t), {Theta}U(t)] under the proportional hazards model with data from ACTG 175.

 
2.2.3 Interval estimation for possible values of the scale-change parameter under the accelerated failure time model.

Now, suppose that there exists an unknown {Theta} such that S2(e{Theta}t) = S1(t), t > 0, the so-called two-sample accelerated failure time model (Kalbfleisch and Prentice, 2002Go, pp.217–246). We are interested in making inferences about {Theta}. Note that

Formula 2

for p isin Formula 2 = [p1, p2], where Formula 2 and Formula 2 are the lower and upper boundaries of Formula 2. Let {Theta}L = sup p isin Formula 2{Theta}L(p) and {Theta}U = inf p isin Formula 2{Theta}U(p). Then, {Omega} = [{Theta}L, {Theta}U]. Let Formula 2 and Formula 2 Also, let Formula 2 and Formula 2 = log Formula 2 The point estimate Formula 2 where Formula 2 and Formula 2 are the empirical counterparts of {Theta}L and {Theta}U, respectively. Moreover, it follows from a similar argument in Section 2.1 that the distribution of Formula 2 can be approximated well by that of Formula 2 where p, r isin Formula 2. Similar to the case of the proportional hazards model discussed above, a (1 – {alpha}) confidence interval Formula 2 of {Omega} is

Formula 2(2.14)

where the cutoff point c is chosen such that

Formula 2

For the ACTG 175 study, we considered the case that the censoring was due to the toxicity or the request from the patient or investigator. With M = [0.04, 0.21], a 0.95 confidence interval (2.14) for {Omega} is (–0.20, 1.07).


    3. INFERENCES WITH INTERVAL-CENSORED DATA
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCES WITH RIGHT...
 3. INFERENCES WITH INTERVAL...
 4. REMARKS
 REFERENCES
 
Suppose that for each Ti, one cannot observe Ti directly, but only observe an interval (ELi, EUi) which contains Ti, i = 1,..., n. When ELi and EUi are ‘independent’ of Ti, nonparametric estimation procedures for S(t) were proposed, for example, by Peto (1973)Go, Turnbull (1976)Go, and Gentleman and Geyer (1994)Go. Regression methods have been studied, for example, by Bacchetti (1990)Go, Rabinowitz et al. (1995)Go, Rosenberg (1995)Go, Huang (1996Go, 1999Go), Huang and Wellner (1997)Go, Kooperberg and Clarkson (1997)Go, Joly et al. (1998)Go, Betensky et al. (2001Go, 2002Go), and Cai and Betensky (2003)Go.

Unlike the case with the dependent right censorship, for the interval-censored data, even if one can identify which interval censorings are informative and which are not, it is not clear how to utilize this valuable information to obtain ‘sharp’ theoretical bounds such as the Peterson bounds for S(t). Here, we propose inference procedures which are valid even when all interval censorings are informative. To this end, let SL(t) and SU(t) be the survival functions of ELi and EUi, respectively. The parameter {Theta} is {S(t), t isin Formula 2}, where Formula 2 is the predetermined interval [{tau}1, {tau}2] such that pr(EUi ≤ {tau}1) > 0 and pr(ELi ≥ {tau}2) > 0. The {Omega} consists of nonincreasing functions S(t) such that SL(t) ≤ S(t) ≤ SU(t), t isin Formula 2. The SL(t) and SU(t) can be estimated well by Formula 2 and Formula 2 and a consistent estimator Formula 2 for {Omega} can be obtained accordingly.

To obtain a (1 – {alpha}) confidence set Formula 2 of {Omega}, note that for large n, the distribution of the process

Formula 2

can be approximated well by the conditional distribution of

Formula 3(3.15)

where Formula 3 Now, let SL*(s) and SU*(t) be the random processes such that

Formula 3

A (1 – {alpha}) confidence set of {Omega} is exactly like (2.5), where {sigma}L(t) and {sigma}U(t) are the estimated standard errors for Formula 3 and Formula 3 via (3.1), and the cutoff point c is obtained via (2.6) with the current Formula 3 and Formula 3

Now, for comparing two independent groups of failure times {Tki, i = 1,..., nk;k = 1, 2} with the interval-censored data {(EkLi, EkUi)}, let us assume that the two failure times follow a proportional hazards model with parameter e{Theta}. Using the arguments via (2.11)–(2.13) with the current Formula 3 and Formula 3 one can obtain a (1 – {alpha}) confidence interval (2.13) for {Theta}.

We use the so-called ‘five-center cohort’ data set from a well-known, multicenter study on the HIV-1 infection incidence among hemophilia patients to illustrate the above interval estimation procedures for {Omega} (Kroner et al., 1994Go; Betensky et al., 2002Go). During the 1980s, persons with hemophilia had a high risk of infection with HIV due to their need for infusion of factor VIII or factor IX concentration, products manufactured from the donor's plasma. For this five-center cohort, patients were enrolled without regard to their HIV antibody status. For each patient, repeated serum samples were taken between early 1978 and early 1987, and HIV seroconverters were individuals with both a last negative and first positive serum sample. Thus, each infected subject was assigned a ‘window’ of time in which he/she seroconverted. It is not clear from the literature if the sampling times for the patient were independent of the underlying T. In Figure 4, the solid lines are the upper and lower boundaries of the point estimate Formula 3 and a 0.95 interval estimate Formula 3 for the collection of S(·) is the region bounded by the dashed lines. Here, we let {tau}1 = 1000 (days) and {tau}2 = 5000 (days). Note that the dotted line in the center is the estimated S(·) under the assumption of independent interval censoring.


Figure 4
View larger version (10K):
[in this window]
[in a new window]
 
Fig. 4. Formula 3 and 0.95 confidence set Formula 3 for possible values of S(·) with data from the hemophilia study (solid lines are the boundaries for Formula 3 dashed lines are boundaries for Formula 3 dotted line is the estimated S(·) under the assumption of independent interval censoring).

 
One of the goals of the study was to examine if the patient's average annual dose of nonheat-treated factor VIII concentrate used from 1978 (or birth) to 1984 was related to the time of seroconvertion. For all the analyses done for this study in the literature, the dose level was classified as high ( >20 000 U), low (1–20 000 U), or none. Let us assume that the failure time T2i for the high dose and T1i for the group without using factor VIII concentration have a proportional hazards structure with the proportionality parameter {Theta}. First, we obtain the two bounds corresponding to (2.11) for 2500 ≤ t ≤ 4500 and c = 2.67. Then, it follows from (2.13) that a 0.95 confidence interval Formula 3 is (2.1, 3.6), indicating that the high-dose group of patients tended to have a much higher HIV incidence rate than the group of patients who did not use this particular concentration.

Under the assumption of independent interval censoring, the estimated log hazard ratio via the nonparametric maximum likelihood estimation for the proportional hazards model (Huang and Wellner, 1997Go) is 3.1 with 0.95 confidence interval of (2.7, 3.5), which is not markedly different from ours.


    4. REMARKS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCES WITH RIGHT...
 3. INFERENCES WITH INTERVAL...
 4. REMARKS
 REFERENCES
 
In this article, the approach we took for handling dependent censoring case is quite different from those in the literature. For most clinical studies with an event time as the end point, the censoring variable is a mixture of dependent and independent censoring times. Almost all existing methods assume parametric or semiparametric dependence structures between the failure and the mixed censoring times. As one of the referees kindly pointed out, in practice it seems rather difficult to quantify such a dependence relationship to implement the resulting inference procedures. Our proposal does not need to specify a parametric model, but does need the information about the causes of censoring.

Under the current setting, a sensitivity analysis consists of various subanalyses corresponding to different classifications of censoring as either dependent or independent. For example, in Figure 2, we present results from three distinct censoring classifications for ACTG 175. In Figure 2(a), inferences about the difference between the two survival functions were made assuming that dependent censoring was due to toxicity or due to the patient's or primary physician's request for withdrawal. The most common reason of such a request was that the patient did not respond favorably to the assigned treatment with respect to certain efficacy-related markers. This type of drop out was likely related to the time to the event of interest. The relationship between the toxicity occurrence and the event time is not that clear. One could argue that the treatment might be too ‘potent’ and caused toxicity. In that case a patient, who developed toxicity, might have much lower HIV-RNA values (viral-load) during the study and, consequently, a longer expected time to event than other patients. On the other hand, toxicity could be an indicator of general poor health, and therefore of shorter expected times to event. Of course it is also possible that toxicity is not at all correlated with the primary end point. In Figure 2(b), we present the results obtained under the assumption that toxicity leads to independent censoring. The confidence band for the difference of two survival functions is slightly tighter in this case compared to that of dependent censoring. This suggests that even if we misclassified toxicity as a cause of independent or dependent censoring, there was no significant impact on the conclusion of the treatment difference. Lastly, in Figure 2(c) we present the results obtained under the assumption that all censoring is independent of the underlying event times. This routine analysis likely exaggerates the treatment difference.

As suggested by an associate editor and the editors, a sensitivity analysis under our setting should consist of three components: (a) providing rationales for classifying sources of censoring as either dependent or independent, (b) specifying any uncertainty involved in this classification, and (c) clearly describing how to perform sensitivity analyses. Furthermore, we strongly encourage investigators of clinical studies to carefully document each patient's reasons for going off treatment or off study so that rational and informative sensitivity analyses can be performed at the interim looks and also at the end of the study.

We have compared our proposal with a typical parametric method in the literature. Specifically, we applied the novel procedure for the one-sample problem studied by Slud and Rubinstein (1983)Go to analyze the data set from ACTG 175. Slud & Rubinstein introduced a function {phi}(t), which reflects the relationship between T and the censoring variable C*, where

Formula 3

and under the present setting C* = min( C, D). For each {phi}(t), the survival function of T is identifiable and can be estimated in the presence of the dependent censoring variable C*. They suggested to specify two functions {phi}1(t) and {phi}2(t) to obtain two bounds of the nonidentifiable survival function. Note that when {phi}1(t) = 0 and {phi}2(t) = {infty}, the resulting survival functions correspond to the lower and upper Peterson bounds with the dependent censoring variable C*. In comparing with our method, we assumed that the dependent censoring was due to toxicity or the request from the patient or investigator. In Figure 5, for various {phi}1 and {phi}2, we present the estimated Slud–Rubinstein bounds (dashed lines) and our point estimate Formula 3 (solid lines). Their bounds are markedly narrower than ours in Figure 5(a), but much larger in Figure 5(d). Since Slud & Rubinstein used {phi}(t) to model the dependence between the failure time and a ‘mixture’ of the dependent and independent censoring times, it is not clear which {phi}(t)'s would result in our estimated upper and lower bounds of all possible values of the underlying survival function. A generalization to this type of parametric methods is to model the relationship, say, via {phi}(t), only between the dependent censoring and failure times and derive inference procedures in the presence of an extra independent censoring variable.


Figure 5
View larger version (15K):
[in this window]
[in a new window]
 
Fig. 5. Comparisons between the Slud & Rubinstein bounds (dashed lines) and Formula 3 (solid lines) for various {phi}1 and {phi}2 with data from the AZT group of ACTG 175.

 
Extending our proposals to the general regression problems seems quite challenging due to the difficulty of identifying possible values of the regression parameters with dependent censorship under a nonparametric setting.


    ACKNOWLEDGMENTS
 
The authors are very grateful to two referees, an associate editor, and the editors for insightful comments on the paper. This research is partially supported by the grants from US National Institutes of Health.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCES WITH RIGHT...
 3. INFERENCES WITH INTERVAL...
 4. REMARKS
 REFERENCES
 

    AALEN, O. (1978). Nonparametric estimation of partial transition probabilities in multiple decrement models. Annals of Statistics 6, 534–545.

    BACCHETTI, P. (1990). Estimating the incubation period of AIDS by comparing population infection and diagnosis patterns. Journal of the American Statistical Association 85, 1002–1008.[CrossRef]

    BETENSKY, R. A., LINDSEY, J. C., RYAN, L. M. AND WAND, M. P. (2002). A local likelihood proportional hazards model for interval censored data. Statistics in Medicine 21, 263–275.[CrossRef][Web of Science][Medline]

    BETENSKY, R. A., RABINOWITZ, D. AND TSIATIS, A. A. (2001). Computationally simple accelerated failure time regression for interval censored data. Biometrika 88, 703–711.[Abstract/Free Full Text]

    CAI, T. AND BETENSKY, R. A. (2003). Hazard regression for interval censored data with penalized spline. Biometrics 59, 570–579.[CrossRef][Web of Science][Medline]

    DIRIENZO, A. G. (2003). Nonparametric comparison of two survival-time distributions in the presence of dependent censoring. Biometrics 59, 497–504.[Medline]

    DIRIENZO, A. G. AND LAGAKOS, S. W. (2001). Bias correction for score tests arising from misspecified proportional hazards regression models. Biometrika 88, 421–434.[Abstract/Free Full Text]

    FISHER, L. AND KANAREK, P. (1974). Presenting censored survival data when censoring and survival times may not be independent. In Proschan, F. and Serfling, R. (eds), Reliability and Biometry: Statistical Analysis of Lifelength. Philadelphia, PA: SIAM, pp. 303–326.

    FLEMING, T. R. AND HARRINGTON, D. P. (1991). Counting Processes and Survival Analysis. New York: Wiley.

    GENTLEMAN, R. AND GEYER, C. J. (1994). Maximum likelihood for interval censored data: consistency and computation. Biometrika 81, 618–623.[Abstract/Free Full Text]

    GOLDWASSER, M. A., TIAN, L. AND WEI, L. J. (2004). Statistical inference for infinite dimensional parameters via asymptotically pivotal estimating functions. Biometrika 91, 81–94.[Abstract/Free Full Text]

    HAMMER, S. M., KATZENSTEIN, D. A., HUGHES, M. D., GUNDACKER, H., SCHOOLEY, R. T., HAUBRICH, R. H., HENRY, W. K., LEDERMAN, M. M., PHAIR, J. P., NIU, M. et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335, 1081–1090.[Abstract/Free Full Text]

    HUANG, J. (1996). Efficient estimation for the proportional hazards model with interval censoring. Annals of Statistics 24, 540–568.[CrossRef]

    HUANG, J. (1999). Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica 9, 501–519.

    HUANG, J. AND WELLNER, J. A. (1997). Interval censored survival data: a review of recent progress. Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis. New York: Springer.

    JOLY, P., COMMENGES, D. AND LETENNEUR, L. (1998). A penalized likelihood approach for arbitrarily censored and truncated data: application to age-specific incidence of dementia. Biometrics 54, 185–194.[CrossRef][Web of Science][Medline]

    KALBFLEISCH, J. D. AND PRENTICE, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd edition. New York: Wiley.

    KLEIN, J. P. AND MOESCHBERGER, M. L. (1988). Bounds on net survival probabilities for dependent competing risks. Biometrics 44, 529–538.[Medline]

    KLEIN, J. P., MOESCHBERGER, M. L., LI, Y. H. AND WANG, S. T. (1992). Estimating random effects in the Framingham heart study (with discussion). In Klein, J. and Goel, P. (eds), Survival Analysis: State of the Art. Dordrecht, The Netherlands: Kluwer, pp. 99–120.

    KOOPERBERG, C. AND CLARKSON, D. B. (1997). Hazard regression with interval-censored data. Biometrics 53, 1485–1494.[CrossRef][Web of Science][Medline]

    KRONER, B. L., ROSENBERG, P. S., ALEDORT, L. M., ALVORD, W. G. AND GOEDERT, J. J. (1994). HIV-1 infection incidence among persons with hemophilia in the United States and Western Europe, 1978–1990. Journal of Acquired Immune Deficiency Syndromes 7, 279–286.[Medline]

    LIN, D. Y., ROBINS, J. M. AND WEI, L. J. (1996). Comparing two failure time distributions in the presence of dependent competing risks. Biometrika 83, 381–393.[Abstract/Free Full Text]

    LIN, D. Y., WEI, L. J. AND YING, Z. (1993). Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80, 557–572.[Abstract/Free Full Text]

    MOESCHBERGER, M. L. AND KLEIN, J. P. (1995). Statistical methods for dependent competing risks. Lifetime Data Analysis 1, 195–204.[CrossRef][Medline]

    PETERSON, A. V. (1976). Bounds for a joint distribution function with fixed sub-distribution functions: application to competing risks. Proceedings of the National Academy of Sciences of the United States of America 73, 11–13.[Abstract/Free Full Text]

    PETO, R. (1973). Experimental survival curves for interval-censored data. Applied Statistics 22, 86–91.[CrossRef]

    RABINOWITZ, D., TSIATIS, A. A. AND ARAGON, J. (1995). Regression with interval-censored data. Biometrika 82, 501–513.[Abstract/Free Full Text]

    ROBINS, J. M. (1993). Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. In Proceedings of the Biopharmaceutical Section, American Statistical Association, pp. 24–33.

    ROBINS, J. M. AND FINKELSTEIN, D. H. (2000). Correcting for non-compliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank test. Biometrics 56, 779–788.[CrossRef][Web of Science][Medline]

    ROBINS, J. M. AND ROTNITZKY, A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. In Jewell, N. and Dietz, K. (eds), AIDS Epidemiology: Methodological Issues. Boston, MA: Birkhauser, pp. 297–331.

    ROSENBERG, P. S. (1995). Hazard function estimation using B-splines. Biometrics 51, 874–887.[CrossRef][Web of Science][Medline]

    SATTEN, G. A., DATTA, S. AND ROBINS, J. M. (2001). An estimator for the survival function when data are subject to dependent censoring. Statistics and Probability Letters 54, 397–403.[CrossRef]

    SCHARFSTEIN, D. O. AND ROBINS, J. M. (2002). Estimation of the failure time distribution in the presence of informative censoring. Biometrika 89, 617–634.[Abstract/Free Full Text]

    SLUD, E. V. AND RUBINSTEIN, L. V. (1983). Dependent competing risks and summary survival curves. Biometrika 70, 643–649.[Abstract/Free Full Text]

    TSIATIS, A. A. (1975). A nonidentifiability aspect of the problem of competing risk. Proceedings of the National Academy of Sciences of the United States of America 72, 20–22.[Abstract/Free Full Text]

    TURNBULL, B. W. (1976). The empirical distribution function with arbitrary grouped, censored and truncated data. Journal of the Royal Statistical Society Series B: Statistical Methodology 38, 290–295.

    ZHENG, M. AND KLEIN, J. P. (1995). Estimates of marginal survival for dependent competing risks based on an assumed copula. Biometrika 82, 127–138.[Abstract/Free Full Text]

    Received April 13, 2004; revised August 8, 2005; revised October 24, 2005; accepted for publication October 26, 2005.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    BiostatisticsHome page
    J. Zhang and D. F. Heitjan
    Impact of nonignorable coarsening on Bayesian inference
    Biostat., October 1, 2007; 8(4): 722 - 743.
    [Abstract] [Full Text] [PDF]


    Home page
    NEJMHome page
    S. W. Lagakos
    Time-to-Event Analyses for Long-Term Treatments -- The APPROVe Trial
    N. Engl. J. Med., July 13, 2006; 355(2): 113 - 117.
    [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/2/252    most recent
    kxj005v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Park, Y.
    Right arrow Articles by Wei, L. J.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Park, Y.
    Right arrow Articles by Wei, L. J.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?