Skip Navigation


Biostatistics Advance Access originally published online on May 2, 2006
Biostatistics 2007 8(2):197-211; doi:10.1093/biostatistics/kxl001
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/2/197    most recent
kxl001v2
kxl001v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Song, X.
Right arrow Articles by Zhou, X.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Song, X.
Right arrow Articles by Zhou, X.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

A semiparametric approach for the nonparametric transformation survival model with multiple covariates

Xiao Song* and Shuangge Ma

Department of Biostatistics, University of Washington, Box 357232, 1705 Northeast Pacific Street, Seattle, WA 98195, USA songx{at}u.washington.edu

Jian Huang

Department of Statistics and Actuarial Science and Program in Public Health Genetics, University of Iowa, 241 Schaeffer Hall, Iowa City, IA 52242, USA

Xiao-Hua Zhou

Department of Biostatistics, University of Washington, Box 357232, 1705 Northeast Pacific Street, Seattle, WA 98195, USA and Puget Sound Health Care System, 1660 South Columbian Way, Seattle, WA 98018, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 
The nonparametric transformation model makes no parametric assumptions on the forms of the transformation function and the error distribution. This model is appealing in its flexibility for modeling censored survival data. Current approaches for estimation of the regression parameters involve maximizing discontinuous objective functions, which are numerically infeasible to implement with multiple covariates. Based on the partial rank (PR) estimator (Khan and Tamer, 2004), we propose a smoothed PR estimator which maximizes a smooth approximation of the PR objective function. The estimator is shown to be asymptotically equivalent to the PR estimator but is much easier to compute when there are multiple covariates. We further propose using the weighted bootstrap, which is more stable than the usual sandwich technique with smoothing parameters, for estimating the standard error. The estimator is evaluated via simulation studies and illustrated with the Veterans Administration lung cancer data set.

Keywords: Nonparametric transformation model; Partial rank estimator; Survival analysis; Weighted bootstrap


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 
Semiparametric models are widely used to characterize the relationship between survival time and covariates. Popular semiparametric models include the proportional hazards model (Cox, 1972Go), the additive hazards model (Aalen, 1980Go), and the accelerated failure time model (Kalbfleisch and Prentice, 2002Go). Although these semiparametric models are more flexible than parametric models, they may still be restrictive in practice. Our study is partly motivated by examples like the Veterans Administration lung cancer data set described in Kalbfleisch and Prentice (2002Go, pp. 71–2), which includes measurements from a clinical trial of 137 patients with advanced inoperable lung cancer. The patients were randomized to either a standard or a test chemotherapy. The primary end point for therapy comparison was time to death. There were 128 events. Possible heterogeneity may exist for some baseline covariates, such as disease extent and pathology, previous treatment of the disease, demographic background, and initial health status. It is a common practice to adjust for such covariates in assessing the treatment effect on time to death. Therneau and Grambsch (2000Go, Chapter 6.3) assessed the proportional hazards assumption for this data set and indicated that the proportionality might not hold. A more flexible model is needed to account for such non-proportionality.

One feasible alternative is the transformation model, which postulates that an unknown monotone increasing transformation of the survival time depends on the covariates through a linear model. Cheng and others (1995)Go, Fine and others (1998)Go, and Chen and others (2002)Go studied this model with a known error distribution and an unspecified transformation function, and Cai and others (2005)Go explored the situation of a specific parametric transformation (Box–Cox transformation) with no parametric assumptions on the error; the former includes the proportional hazards model as a special case and the latter includes the accelerated failure time model as a special case. A more flexible form is the nonparametric transformation model that makes no parametric assumptions on either the transformation function or the error. This model includes the aforementioned models as special cases and is especially preferred when, for example, the Cox model cannot be properly justified.

Several approaches have been proposed for estimating the regression parameters in the nonparametric transformation model with uncensored outcomes, including the maximum rank correlation estimator (Han, 1987Go) and the monotone rank estimator (Cavanagh and Sherman, 1998Go). Both estimators can be extended to the case of censored survival outcomes using the inverse censoring probability weighting technique (Khan and Tamer, 2004Go). However, this requires that the censoring time is independent of the survival time and covariates, and that the support of censoring time contains that of the survival time. Such restrictions may be unrealistic even for randomized clinical trials. Recently, Khan and Tamer (2004)Go proposed an appealing partial rank (PR) approach that relaxes these assumptions. This approach allows the censoring time to depend on the covariates as long as it is conditionally independent of the survival time given the covariates, and there is no restriction on the support of the censoring time. However, like the maximum rank correlation and monotone rank estimators, the PR estimator is based on maximization of a discontinuous objective function. It is practically impossible to compute this estimator when there are multiple covariates.

In this paper, based on the PR estimator of Khan and Tamer (2004)Go, we propose a new estimator, called the smoothed partial rank (SPR) estimator, for estimating the regression parameters in the nonparametric transformation model. The proposed estimator maximizes a SPR objective function. Smoothing makes it feasible to adapt the rank-based approach to data with multiple covariates, without loss of asymptotic efficiency. We further propose using the weighted bootstrap for computation of the variance by analogy to that introduced in Jin and others (2001)Go.

Similar approximation with a rank estimator has been investigated in Ma and Huang (2005)Go in a binary classification study, which focuses on variable selection with microarray data. Compared to Ma and Huang (2005)Go, our study makes the following new contributions. First, although the approximation idea has been proposed in other areas, it has not been fully explored in survival analysis—theoretical proof and finite sample performance are not known. Second, the asymptotic properties of such approximated estimators have not been established in previous studies. Moreover, the inference based on the weighted bootstrap has not been investigated.

The rest of this paper is organized as follows: We give the model definition in Section 2. The estimator is derived in Section 3. We show the asymptotic properties and propose the weighted bootstrap method in Section 4. The finite sample properties of the estimator are assessed by simulation studies in Section 5. We apply the approach to the Veterans Administration lung cancer data in Section 6. The paper concludes with a discussion in Section 7.


    2. MODEL DEFINITION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 
Let T denote the survival time, C denote the censoring time, and Z be a length d vector of covariates. Under right censoring, the observed survival data are V = min(T,C) and {Delta} = I(T ≤ C). Assume that the survival time depends on the covariate through the nonparametric transformation model,


Formula (2.1)

where g(·) is an unspecified monotone increasing function, e is the random error term with an unknown distribution independent of Z, ß is a length d vector of regression coefficients, and ß' denotes the transpose of ß. This model is very flexible and includes many of the popular models as special cases. For example, the proportional hazards model and the proportional odds model are two special cases of (2.1) with e following the standard extreme value and logistic distributions, respectively; the accelerated failure time model is a special case of (2.1) with g(·) = log(·). However, the additive hazards model (Aalen, 1980Go) is not a transformation model.

The nonparametric transformation model is location and scale invariant. To avoid identifiability problem and without loss of generality, the first element of ß is restricted to 1, that is, ß = (1,{theta}')'. This indicates a positive correlation between the outcome and the first covariate (the covariate can be recoded if necessary to achieve this). Our interest focuses on estimation and inference of {theta}.


    3. ESTIMATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 
Suppose that the observed data (Vi,{Delta}i,Zi), i = 1,...,n, are independent and identically distributed as (V,{Delta},Z). The PR approach (Khan and Tamer, 2004Go) is based on the PRs of the observed survival times, the event indicators, and the linear predictors. Specifically, the PR estimator has the form


Formula (3.1)

where I(·) is the indicator function and Formula. The objective function Formula can be considered as a generalization of Kendall's {tau}-correlation statistic to censored data, which is a U-statistic of order 2. Loosely speaking Formula seeks to maximize the Kendall's {tau}-correlation between the survival time and a linear combination of covariates, with a proper consideration of the censoring. Under some regularity conditions similar to those given in the Appendix, Khan and Tamer (2004)Go proved that the PR estimator is Formula consistent and asymptotically normal, using the standard U-statistic theory developed in Han (1987)Go and Sherman (1993)Go. However, we note that the objective function Formula can be viewed as a weighted sum of indicator functions and hence discontinuous. Maximization is extremely difficult in practice, if not impossible, when there are multiple covariates. When d ≥ 2, a brutal search is usually needed to obtain the estimator, which is very time consuming. Although the Nelder–Mead method (Nelder and Mead, 1965Go) may be an alternative faster optimization method, it may even fail to reach the local maxima, especially when d is relatively large.

To tackle this difficulty, we propose to use a continuous differentiable function to approximate the indicator function containing ß in (3.1). Then the objective function will be a smooth function of ß. Specifically, we propose using the sigmoid function s(u) = 1/{1 + exp( – u)}. For large |u|, s(u) is a good approximation to I(u > 0). However, for u close to 0, this approximation is not accurate enough and thus may lead to unsatisfactory estimator of ß. An effective way motivated by the sieve estimate to improve accuracy is to introduce a sequence of strictly positive and decreasing numbers {sigma}n satisfying limn->{infty}{sigma}n = 0, and use the family of functions sn(u) = s(u/{sigma}n) to approximate I(u > 0) in (3.1). The SPR estimator Formula is given by


Formula (3.2)

Here Formula. Following the same arguments as in Khan and Tamer (2004)Go, we can show that On(ß) is also a U-statistic of order 2.

The objective function On is continuously differentiable. So commonly used algorithms, for example, the Newton–Raphson algorithm or the gradient search method, can be used to obtain Formula. Compared to brutal search as needed for maximizing Formula, those algorithms are fast and relatively insensitive to the number of covariates.

A similar approach was proposed by Horowitz (1992)Go in the context of maximum score estimator for the binary response model. Horowitz considered a general class of distribution-like kernel functions for approximation of I(u > 0). Let K(u) be a differentiable distribution function on the real line such that it is nondecreasing and satisfies limu->{infty}K(u) = 0 and limu-> + {infty}K(u) = 1. Then Kn(u) = K(u/{sigma}n) can be used to approximate I(u > 0). Since the sigmoid function is also the logistic distribution function, sn is a special case of Kn. Because of the good approximation property and simplicity of sn, we focus on sn in this paper. Most theoretical and computational results hold for general Kn with minor modifications.


    4. ASYMPTOTIC PROPERTIES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 

4.1 Consistency and asymptotic normality

We now investigate the asymptotic properties of the proposed SPR estimator. We state the main results here and relegate the regularity conditions and the proofs to the Appendix.

THEOREM 4.1 Under Assumptions A1–A6 given in the Appendix, if {sigma}n->0 as n->{infty}, then Formula almost surely as n->{infty}.

Theorem 4.1 establishes the consistency of the proposed SPR estimator. We now investigate the asymptotic distribution of the SPR estimator. Recall that ß = (1,{theta}')'. To indicate that {theta} is the actual parameter, write ß({theta}) = (1,{theta}')'. Let X = (V,{Delta},Z), x = (v,{delta},z). First, we define


Formula

The asymptotic distribution of Formula is given in the following theorem.

THEOREM 4.2 Under Assumptions A1–A8 given in the Appendix, if {sigma}n->0 as n->{infty}, Formula, where {Sigma} = A 1B{A – 1}', A = – E{{triangledown}2{tau}(x,ß({theta}0))}/2, and B = E{{triangledown}1{tau}(x,ß({theta}0)){triangledown}1{tau}'(x,ß({theta}0))}.

Theorem 4.2 shows that the SPR estimator is asymptotically equivalent to the PR estimator. The proposed smoothing approach leads to a computationally affordable estimate with no loss of efficiency. We refer to Khan and Tamer (2004)Go for the asymptotic properties of the PR estimator. The results hold generically when the sigmoid function s(·) is replaced by any symmetric distribution function with a continuous second-order derivative.

4.2 Inference

We can estimate the variance–covariance matrix {Sigma} with the plug-in estimator


Formula

However, this plug-in estimator can be unstable and sensitive to the choice of the tuning parameter {sigma}n, especially for small sample size cases. Simulation studies show that the empirical coverage probability of the 95% confidence interval built with the plug-in variance estimate can be even smaller than 50% for sample size as large as 400. We refer to Section 5 for numerical studies. Alternatively, we consider using the weighted bootstrap by analogy to that used in Jin and others (2001)Go and Cai and others (2005)Go. Specifically, consider a stochastic perturbation of On(ß({theta})) which has the form of


Formula

where Wi, i = 1,...,n, are independent realizations of a positive random variable W, which has a known distribution, and h is a known function specified in the following theorem.

THEOREM 4.3 Assume Assumptions A1–A8 hold and {sigma}n->0 as n->{infty}. Let Formula be the maximizer of OFormula(ß({theta})). If

C1. W has known mean µ > 0 and variance 4µ2 and h(Wi,Wj) = Wi + Wj; or
C2. W has mean 1 and variance 1 and h(Wi,Wj) = WiWj,

then conditional on the data {(Vi,{Delta}i,Zi), i = 1,...,n}, Formula.

Theorem 4.3 implies that the unconditional distribution of Formula can be approximated by the conditional distribution of Formula. Thus in practice, we can generate a large sample of {Wi,i = 1,...,n}. For each realized sample, compute Formula. Then the asymptotic variance of Formula can be approximated by the sample variance of Formula. To distinguish the two weighted bootstrap methods, the former (C1) is termed type I and the latter (C2) is termed type II. The validity of the type I weighted bootstrap follows directly from the U-statistic format of the objective function On, the asymptotic normality result in Theorem 2 and Proposition A3 of Jin and others (2001)Go. Validity of the type II weighted bootstrap can be proved using similar arguments as those in Cai and others (2005)Go.

Inference based on the PR estimator is not considered in detail in Khan and Tamer (2004)Go. It can be shown that the weighted bootstrap methods in Theorem 4.3 can be applied to the PR estimator as well, due to the U-statistic format of objective function Formula and the asymptotic normality of the PR estimator. See Section 5 for numerical results.


    5. SIMULATION STUDIES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 
Extensive simulation studies are conducted to assess the performance of the SPR estimator. First, we compare the performance of the SPR estimator and the PR estimator in the case of two covariates with the dimension of {theta} equal to 1. We consider this simple setting to save computational cost for the PR estimator, which demands a computationally expensive brutal search. We assume that the two covariates follow a bivariate normal distribution with mean (1,0.5), variance 1 for each covariate, and correlation – 0.2 between the two covariates. The survival time T depends on Z1 and Z2 through a proportional hazards model with the regression coefficients equal to ( – 1, – 1) and the baseline hazard equal to 1. This model corresponds to the transformation model (2.1) with {theta} = 1. The censoring time C is generated from an exponential distribution with mean 4, leading to a censoring rate of 52%. The SPR and PR estimates are computed for 100 data sets with sample size n = 200. The standard errors are estimated using the sandwich method and the two weighted bootstrap methods with 100 samples of {Wi,i = 1,...,n}. The 95% Wald confidence intervals are calculated correspondingly. For the two weighted bootstrap methods, Wi are generated as follows: Wi/10 follows Beta(0.125,1.125) for the type I weighted bootstrap, and (Formula – 1)Wi/Formula follows Beta(Formula – 1,1) for the type II weighted bootstrap. The sandwich variance estimator of the PR estimator depends on the selection of the smoothing parameters, say {xi}1 for A and {xi}2 for B, respectively (Sherman, 1993Go). The SPR estimator depends on the choice of the smoothing parameter {sigma}n. To assess the impact of the smoothing parameters on the performance of the estimators, we conduct simulations with {sigma}n = cn – 1/2, {xi}1 = cn – 1/4, and {xi}2 = cn – 1/6, where c takes the values 1/9, 1, and 3. The results are shown in Table 5. Both the PR and the SPR estimators show negligible biases. The weighted bootstrap methods perform well for both estimators: the standard errors track the sampling standard deviation reasonably well with better performance for the type I method, and the coverage probabilities are close to the nominal level. In contrast, the sandwich method is sensitive to the choices of the smoothing parameters: it performs well when c = 3, but when c = 1/9, the standard errors seriously underestimate the standard deviations and the coverage probabilities are well below the nominal level; the performance for c = 1 lies between those for c = 1/9 and 3.

Next we consider the case of three covariates with the dimension of {theta} equal to 2. The covariates are generated from a multivariate normal distribution with mean (0,1,0.3), variance 1 for each covariate, and correlation 0.2 between any two covariates. The survival time T depends on the covariates through a proportional hazards model with the regression coefficients equal to ( – 1,0.5, – 0.5) and baseline hazard equal to 1. This model corresponds to the transformation model (2.1) with {theta} = ({theta}1,{theta}2) = ( – 0.5,0.5). The censoring time C is generated as above, leading to a censoring rate of 36%. We fit the model using only the SPR approach because it is very difficult to implement the PR method in this case. Table 5 presents the results from 100 simulated data sets with n = 200 and the smoothing parameter {sigma} = cn – 1/2, c = 1/9,1,3. The performance of the SPR estimator is similar to that observed for one estimable parameter except that the type II bootstrap method performs worse for c = 1/9 and 1 with the coverage probabilities for {theta}2 reaching 1.

We have also conducted simulations under other survival models such as the accelerated failure time model and observed similar results. The type I bootstrap method outperforms the type II method in all the cases. We note that as sample size increases, the performance of the type II method improves and is able to provide satisfactory inference results with moderate to large sample size cases. We also note that the bootstrap distribution may be skewed if the sample size is not large and there may exist some "outliers" due to the failure of reaching the global maxima for some bootstrap samples. In contrast, the normalized median absolute deviation of the estimates obtained from the bootstrap data sets is relatively stable and close to the empirical standard deviation (see Tables 1 and 2). We suggest using the normalized median absolute deviation from the type I bootstrap method to estimate the standard error.


View this table:
[in this window]
[in a new window]

 
Table 1. Simulation results in the case of one estimable regression parameter

 

View this table:
[in this window]
[in a new window]

 
Table 2. Simulation results in the case of two estimable regression parameters

 
The performance of the SPR approach depends on the shape of the objective function On(ß), if the gradient search or the Newton methods are used. One concern is that there may exist some local maximizers which impede the optimization procedure. A simple solution is to search over a wide range of starting values (Gammerman, 1996Go). Theoretically speaking, the empirical objective function of the SPR estimate converges to its expectation when n->{infty}, which converges to the expectation of the PR objective function as {sigma}n->0. The expectation of the PR objective function has a unique maximizer and has a negative definite Hessian matrix in a neighborhood of the true parameter value (Khan and Tamer, 2004Go). Thus, as long as the sample size is large enough, the empirical objective function of the SPR estimate has a unique maximizer with probability converging to 1 in a neighborhood of the true parameter value. In our empirical studies, local maximizers do not cause a serious problem. When there are two estimable parameters, the objective function generally has a shape similar to that shown in Figure 1, which is obtained from a simulated data set with sample size 200 and has a unique maximizer and the maximizer can be achieved with the gradient search or the Newton methods. This ensures that the good performance of the SPR approach.


Figure 1
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Objective function On(ß) for one simulated data set.

 
The accuracy of the sigmoid approximation depends on the tuning parameter {sigma}n. Theoretically speaking, the smaller the {sigma}n is, the better the sigmoid approximation is. Extensive simulation studies show that as long as {sigma}n is small enough, the SPR estimate is insensitive to the choice of {sigma}n. However, numerical studies also show that for extremely small {sigma}n, the maximization procedure may be unstable. In data analyses, a rule of thumb for choosing {sigma}n is to guarantee a majority of |ß'(ZiZj)/{sigma}n| > 5 (Gammerman, 1996Go). We propose the following approach for choosing {sigma}n. Initialize {sigma}Formula = an, where an is user specified and data independent, satisfying an->0 as n->{infty}. In our data analysis, we use an = 1/Formula. Construct the SPR estimate Formula with {sigma}Formula, under the identifiability constraint. Theoretically speaking, the estimator with {sigma}Formula = an is consistent. Denote {sigma}Formula as the largest constant such that 95% of the Formula is greater than 5. Set {sigma}n = min({sigma}Formula,{sigma}Formula). With the proposed procedure, the asymptotic requirement {sigma}n = o(1) is met; meanwhile the rule of thumb for choosing {sigma}n is also satisfied. Extensive simulation studies show that the estimation and inference results are relatively not sensitive to the choice of an, as long as it is small enough.

An alternative, possibly more efficient, way is to update the tuning parameter in a manner similar to the above approach at each iteration of the gradient search or the Newton methods. However, this demands extra computation by searching {sigma}n that satisfies Formula at each iteration. So the total computational efficiency does not differ much from the one-step approach above. In our study, we use the one-step approach.

Compared to most smoothing methods, the SPR has a weaker requirement on the smoothing parameter {sigma}n: we only require asymptotically {sigma}n->0. No special rate restriction is posed. So for the SPR, the tuning parameter selection is less crucial. However, we do note that if the initial choice of {sigma}n is too small or too large, then the proposed tuning parameter selection may not work well. This can usually be detected if there are many extremely large or small Formula, which warrant a modification of the initial guess an.


    6. VETERANS ADMINISTRATION LUNG CANCER STUDY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 
As illustration, we apply the proposed approach to the Veterans Administration lung cancer data described in Section 1. As in Therneau and Grambsh (2000, Chapter 6.3), we consider the effect of treatment, karnofsky score (karno), time in months from diagnosis to randomization (diagtime), age in years (age), prior therapy (prior, with 0 for no therapy and 10 otherwise), and histological type of the tumor (celltype: squamous, small cell, adeno, large cell) on the time to death. Specifically, d = 8 covariates are included, which are denoted by Z1 = karno/10, Z2 = diagtime/100, Z3 = I(treatment = test chemotherapy), Z4 = age/100, Z5 = prior/10, Z6 = I(cell type = small), Z7 = I(celltype = squamous), and Z8 = I(celltype = large). The nonparametric transformation model is fitted using the SPR approach. An important issue in fitting the nonparametric model is to choose the covariate with the coefficient fixed at 1.

One possible way is to fit a proportional hazards model with all d covariates and choose the continuous covariate with the smallest p-value. Alternatively, for each continuous covariate, we can fit a proportional hazards model with the covariate as the single one in the model and choose the covariate with the smallest p-value. We take the first approach for this specific data set and the coefficient for Z1 = karno/10 is fixed at 1. The standard errors are computed using the type I resampling method as suggested at the end of Section 5. Since the sample size is small, the standard errors are large. The only significant covariate besides karnofsky score is I(celltype = large).

For the purpose of comparison, we also give in Table 6 the results from the Cox proportional hazards model and the accelerated failure time model with the same covariates. If the Cox model fits the data well, we would expect the ratios of the corresponding coefficients from the two models to be close to a constant, which is not the case for this data set. This indicates that the proportional hazards assumption may not hold, which conforms to the results in Therneau and Grambsch (2000Go, Chapter 6.3). For similar reasons, the accelerated failure time model may not fit the data well.

Note that for the Cox model, the accelerated failure time model, and the nonparametric transformation model, the estimated linear combination of the covariates ß'Z may be used as a predictor for survival. The predictive capacity may be assessed using the areas under the survival ROC curve (AUC) proposed by Heagerty and others (2000)Go. Figure 2 shows the AUCs for the estimated linear combinations for these three models over a range of times. The AUCs from the nonparametric transformation model are greater than those from the Cox model and the accelerated failure time model except at a few times after 150 days. Thus, the nonparametric transformation model appears to provide a better fit to this data set. However, no significant difference is detected due to the large variances of the AUCs. A larger sample size may be needed for more reliable comparison of the models.


Figure 2
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Estimated AUC for the Veterans Administration data. Solid line, nonparametric transformation model; dotted line, Cox model; dashed line, accelerated failure time model.

 
For practical use, the above results only provide an estimate of "relative risk," i.e we only have the relative risk score ß'Z. To get a more comprehensive description of the conditional hazard or the survival time itself, we may need to estimate the link function g and/or the error distribution. This may be achieved by methods similar to those in the absence of censoring (e.g. Horowitz, 1996Go; Chen, 2002Go), but is beyond the scope of our study. For a large number of medical studies, especially those targeting at evaluation of specific covariate effects, we may be more interested in the relative risk or the relative effects of different covariates and the SPR estimate itself is enough in such cases.

We also note that with the PR and SPR estimates, one estimated coefficient needs to be kept fixed. So it is not feasible to evaluate the significance of its corresponding covariate. A possible ad hoc solution is to fix another coefficient for a relatively important continuous covariate, and redo the estimation and inference. By rotating the fixed coefficient, we can evaluate the significance of all covariates. Simulation study shows that as long as the chosen covariate is moderately associated with the outcome, the effect of choosing different covariates for identifiability is ignorable.


    7. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 
We have proposed a SPR estimator for the nonparametric transformation model for survival data with no parametric assumptions on both the transformation function and the error distribution. The proposed estimator is asymptotically equivalent to the PR estimator, but is much easier to compute in the case of multiple covariates. Combined with the flexibility of the transformation model, the proposed approach provides a valuable alternative for analyzing censored survival data. The idea of smoothing the objective function can be extended to estimate the transformation function and will be investigated in our future research. The proposed SPR estimate can be easily achieved using existing software. R code is available upon request.

This paper advances previous publications, especially Ma and Huang (2005)Go, by comprehensively investigating the sigmoid approximation approach in survival studies under the transformation model. The asymptotic properties and the inference based on the weighted bootstrap are established. The proposed approach makes the transformation model more applicable in medical studies. Several important issues, like the merits of the transformation model and when to use it, have been investigated in the literature (see Khan and Tamer, 2004Go, and the references therein).

A more general model is T = g{h(ß'Z,e)}, where h is a strictly increasing function of each of its components. This model includes the additive hazards model as a special case, which does not belong to the framework of model (2.1). The estimation and inference procedure in this paper can be extended to this general model and will be pursued in a separate study.


    A. APPENDIX
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 

A.1 Technical details

We note that the only special properties of the sigmoid function we use in the proof of Theorems 4.1 and 4.2 are its symmetry, s(u) + s( – u) = 1, and that it is smooth and has a continuous second derivative. Therefore, the proofs below are valid when we use any scaled symmetric distribution function with a continuous second derivative as an approximation to the indicator function I(u > 0).

A.2 Regularity conditions

Let R be the support operator, for example, R(Z) denotes the support of Z. Denote ß0 = (1,{theta}Formula)' as the true value of ß. Let {chi} denote the last d – 1 components of the covariate vector Z, let g0(w|r) denote the conditional density of W = ßFormulaZ given {chi} = r, and let p0({delta},y,w|r) denote the conditional density of ({Delta},Y,W) given {chi} = r. We assume the following conditions:

A1. The set {ZisinR(Z):Pr({Delta} = 1|Z) > 0} has a positive measure.
A2. The random error {epsilon} is independent of C and Z.
A3. The first component of Z has everywhere positive Lebesgue density, conditional on other components.
A4. The parameter space B containing ß0 is a compact subset of Rd.
A5. R(Z) is not contained in any proper linear subspace of Rd.
A6. T and C are conditionally independent given Z.
A7. (a) For each x, the function {tau}(x,ß({theta})) is twice differentiable with respect to {theta} in a neighborhood of {theta}0 with the kth derivative {triangledown}k{tau}(x,ß({theta})), k = 1,2. The second derivative {triangledown}2{tau}(x,ß({theta})) satisfies the Lipschitz condition. (b) The partial derivatives of g0(w|r) and p0({delta},y,w|r) with respect to t exist and are bounded.
A8. E||{triangledown}1{tau}(x,ß({theta}0))||2 and E||{triangledown}2{tau}(x,ß({theta}0))|| are finite, and E{{triangledown}2{tau}(x,ß({theta}0))} is nonsingular.

Assumptions A1–A6 are relatively mild and close to their counterparts in Khan and Tamer (2004)Go. Particularly, Assumptions A1, A2, and A6 are usually made for semiparametric models with censored survival data; Assumptions A3–A5 are needed for identifiability. Assumptions A7 and A8 are made to guarantee the Formula consistency and asymptotic normality of the PR estimator in Khan and Tamer (2004)Go.

A.3 Proof of Theorem 4.1.

Since it has been proved in Khan and Tamer (2004)Go that the PR estimator is consistent, it suffices to prove that Formula.

For any {eta} > 0, we have


Formula

where


Formula

On the set {|u| > {eta}}, we have |sn(u) – I(u > 0)| ≤ exp( – |u|/{sigma}n) < exp( – {eta}/{sigma}n). Thus, when {sigma}n->0, sn(u)->I(u > 0) uniformly on the set {|u| > {eta}}. Therefore, Tn1 converges to 0 uniformly over {Theta}. The second term T2n ≤ {n(n – 1)} – 1{sum}i!=jI(|ß'(ZiZj)| < {eta}). Since the class of indicator functions {I(|ß'(ZiZj)| < {eta}):ßisinB} is manageable, by uniform convergence of U-processes (Nolan and Pollard, 1987Go, Theorem 7), the right-hand side converges almost surely to P(|ß'(ZiZj)| < {eta}) over B. However, under Assumption A2, it can be proved in a similar way as in Lemma 4 of Horowitz (1992)Go, P(|ß'(Zi Zj)| < {eta}) converges to 0 uniformly over B as {eta}->0. This completes the proof of consistency.

A.4 Proof of Theorem 4.2.

Theorem 4.2 can be proved following the method of Sherman (1993)Go. Using the same notations as those in Section 4.1, we first define


Formula

where


Formula

Note that fn is the smooth approximation of the function f. In Section 4.1, we defined {tau}(x,ß({theta})) = E{f*(x,X,ß({theta}))}. We now consider its smoothed counterpart E{{tau}n(x,ß({theta}))} with {tau}n(x,ß({theta})) = E{fFormula(x,X,ß({theta}))} and the expectation is over X. Denote Pn as the empirical measure and Un as the U-process operator as in Sherman (1993)Go. Consider the function {Gamma}n({theta}) = On({theta}) – On({theta}0).

First, we have


Formula (A.1)

These equations can be proved based on condition A7 and by noting that sn(u) + sn( – u) = 1, rewriting {tau}n as an integral, changing variables in the integral, and using the Taylor expansion.

We can write {Gamma}n({theta}) = {Gamma}n0({theta}) + Pngn(·,{theta}) + Unhn(·,·,{theta}), where


Formula

Next we show that


Formula (A.2)


Formula (A.3)

where Gn = FormulaPn{triangledown}1{tau}(·,{theta}0)->dN(0,B), and


Formula (A.4)

uniformly in an op(1) neighborhood of {theta}0.

To prove (A.2), we first note that


Formula (A.5)

where Formula{triangledown}2Formula. This can be proved by following Theorem 4 of Sherman (1993)Go. We then conclude (A.2) by noting (A.1) and the assumption that {sigma}n->0.

Equation (A.3) can also be proved following Theorem 4 of Sherman (1993)Go and (A.1).

To prove (A.4), consider the function


Formula

Since {sigma}n->0, it suffices to show that Unh(·,·,ß({theta}),{sigma}) = op(n – 1) uniformly over an op(1) neighborhood of (ß0,0). First, by Assumption A3 and using the dominated convergence theorem,


Formula

Consider S = {h(x,X,ß({theta}),{sigma}n):ßisinB,{sigma}nisin(0,1]}. Note that S is Euclidean by Lemma 22 (ii) of Nolan and Pollard (1987)Go and it is bounded by 4. Therefore, (A.4) follows from Theorem 3 of Sherman (1993)Go or Corollary 8 of Sherman (1993)Go.

Finally, based on (A.2), (A.3), (A.4), and the assumption that {sigma}n->0, by Theorem 1 of Sherman (1993)Go, Formula. The asymptotic normality now follows from Theorem 2 of Sherman (1993)Go. See also Theorem 3.2.16 of Van der Vaart and Wellner (1996Go, Chapter 3.2).


View this table:
[in this window]
[in a new window]

 
Table 3. Results for the Veterans Administration data

 


    ACKNOWLEDGMENTS
 
We thank the Editors, the Associate Editor, and a referee for helpful comments. This research is supported in part by the National Institutes of Health grants N01-HC-95159 (Ma) and HL72288 (Huang), and U.S. Department of Veterans Affairs, Veterans Affairs Health Administration, HSR&D grant ECI-03-206 (Zhou and Song). This paper presents the findings and conclusions of the authors. It does not necessarily represent those of Veterans Administration HSR&D Service. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL DEFINITION
 3. ESTIMATION
 4. ASYMPTOTIC PROPERTIES
 5. SIMULATION STUDIES
 6. VETERANS ADMINISTRATION LUNG...
 7. DISCUSSION
 A. APPENDIX
 REFERENCES
 

    Aalen OO. (1980) A model for nonparametric regression analysis of counting processes. In Klonecki W, Kozek A, Rosinski J (Eds.). Mathematical Statistics and Probability Theory, Lecture Notes in Statistics(Springer, New York) Volume 2: pp. 1–25.

    Cai T, Tian L, Wei LJ. (2005) Semi-parametric Box-Cox power transformation models for censored survival observations. Biometrika 92:619–32.[Abstract/Free Full Text]

    Cavanagh C and Sherman RP. (1998) Rank estimators for monotonic index models. Journal of Econometrics 84:351–81.[CrossRef][Web of Science]

    Chen K, Jin Z, Ying Z. (2002) Semiparametric analysis of transformation models with censored data. Biometrika 89:659–68.[Abstract/Free Full Text]

    Chen S. (2002) Rank estimation of transformation model. Econometrica 70:1683–97.[CrossRef][Web of Science]

    Cheng SC, Wei LJ, Ying Z. (1995) Analysis of transformation models with censored data. Biometrika 82:835–45.[Abstract/Free Full Text]

    Cox DR. (1972) Regression models and life tables (with discussion). Journal of the Royal Statististical Society, Series B 34:187–200.

    Fine JP, Ying Z, Wei LJ. (1998) On the linear transformation model with censored data. Biometrika 85:980–6.[Abstract/Free Full Text]

    Gammerman A. (1996) Computational Learning and Probabilistic Reasoning(Wiley, New York).

    Han AK. (1987) Non-parametric analysis of a generalized regression model. Journal of Econometrics 35:303–16.[CrossRef][Web of Science]

    Heagerty PJ, Lumley T, Pepe MS. (2000) Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56:337–44.[CrossRef][Web of Science][Medline]

    Horowitz JL. (1992) A smoothed maximum score estimator for the binary response model. Econometrica 60:505–31.[CrossRef][Web of Science]

    Horowitz JL. (1996) Semiparametric estimation of a regression model with an unknown transformation of the dependent variable. Econometrica 64:103–37.[CrossRef][Web of Science]

    Jin Z, Ying Z, Wei LJ. (2001) A simple resampling method by perturbing the minimand. Biometrika 88:381–90.[Abstract/Free Full Text]

    Kalbfleisch JD and Prentice RL. (2002) The Statistical Analysis of Failure Time Data(Wiley, New York).

    Khan S and Tamer E. (2004) Partial rank estimation of duration models with general forms of censoring. Journal of Econometrics (in press).

    Ma S and Huang J. (2005) Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21:4356–62.[Abstract/Free Full Text]

    Nelder JA and Mead R. (1965) A simplex algorithm for function minimization. Computer Journal 7:308–13.

    Nolan D and Pollard D. (1987) U-processes: rates of convergence. Annals of Statististics 15:780–99.

    Sherman R. (1993) The limiting distribution of the maximum rank correlation estimator. Econometrica 61:123–37.[CrossRef][Web of Science]

    Therneau T and Grambsch P. (2000) Modelling Survival Data(Springer, New York).

    Van der Vaart AW and Wellner JA. (1996) Weak Convergence and Empirical Processes: With Applications to Statistics(Springer, New York).

    Received December 20, 2005; revised April 24, 2006; accepted for publication April 28, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    8/2/197    most recent
    kxl001v2
    kxl001v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Song, X.
    Right arrow Articles by Zhou, X.-H.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Song, X.
    Right arrow Articles by Zhou, X.-H.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?