Skip Navigation


Biostatistics Advance Access originally published online on April 5, 2006
Biostatistics 2006 7(4):599-614; doi:10.1093/biostatistics/kxj028
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/4/599    most recent
kxj028v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Sparling, Y. H.
Right arrow Articles by Bautista, O. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sparling, Y. H.
Right arrow Articles by Bautista, O. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Parametric survival models for interval-censored data with time-dependent covariates

Yvonne H. Sparling, Naji Younes and John M. Lachin*

The Biostatistics Center, Department of Biostatistics and Epidemiology, School of Public Health and Health Services, The George Washington University, 6110 Executive Boulevard, Suite 750, Rockville, MD 20852, USA jml{at}biostat.bsc.gwu.edu

Oliver M. Bautista

Merck and Company, Blue Bell, PA 19422, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 
We present a parametric family of regression models for interval-censored event-time (survival) data that accomodates both fixed (e.g. baseline) and time-dependent covariates. The model employs a three-parameter family of survival distributions that includes the Weibull, negative binomial, and log-logistic distributions as special cases, and can be applied to data with left, right, interval, or non-censored event times. Standard methods, such as Newton–Raphson, can be employed to estimate the model and the resulting estimates have an asymptotically normal distribution about the true values with a covariance matrix that is consistently estimated by the information function. The deviance function is described to assess model fit and a robust sandwich estimate of the covariance may also be employed to provide asymptotically robust inferences when the model assumptions do not apply. Spline functions may also be employed to allow for non-linear covariates. The model is applied to data from a long-term study of type 1 diabetes to describe the effects of longitudinal measures of glycemia (HbAFormula) over time (the time-dependent covariate) on the risk of progression of diabetic retinopathy (eye disease), an interval-censored event-time outcome.

Keywords: Interval-censored data; Parametric models; Time-dependent covariate


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 
In medical and biological research, the analysis of event-time or survival data aims to describe the risk (hazard) function of event times in a population, the associated survival or cumulative incidence functions, and the effects of covariates on risk. When event times are not observed exactly, these times are censored. The event time is "right censored" when follow-up is curtailed without observing the event. "Left censoring" arises when the event occurs at some unknown time prior to an individual's inclusion in a cohort. The event time is considered "interval censored" when an event occurs within some interval of time but the exact time of the event is unknown (cf. Kalbfleisch and Prentice, 2002Go).

A variety of models have been developed for interval-censored data. Finkelstein and Wolfe (1985)Go present a semiparametric model, which is based on factoring the joint likelihood function for a random interval, and they consider a set of distinct endpoints that comprise the interval as nuisance parameters. Finkelstein (1986)Go also develops a method for fitting the proportional hazards model for interval-censored data where the baseline survival function quantities at the distinct endpoints are considered nuisance parameters. Seaman and Bird (2001)Go also extend the interval-censored proportional hazards model to accommodate time-dependent covariates. The baseline hazard function is defined as piecewise constant between a specified finite set of time points, and the function is estimated over each interval using the EM algorithm. Betensky and others (2002)Go describe a proportional hazards model for interval-censored data using local likelihood estimation, and they allow an arbitrarily smooth covariate function to describe the covariate vector's effect on the hazard function.

Goetghebeur and Ryan (2000)Go develop a semiparametric regression model for interval-censored data that employs the EM algorithm. Here, the E-step requires estimating the risk set sizes and number of events that occurred at each set of possible event times and the M-step estimates the regression coefficients. Rabinowitz and others (2000)Go use conditional logistic regression to fit proportional odds models to interval-censored data, and they assume that the conditional distribution of the interval endpoints given the covariates follows a semiparametric proportional odds regression model.

Parametric models have also been proposed. Odell and others (1992)Go describe a Weibull regression model for interval-censored data with fixed (e.g. baseline) covariates. Rabinowitz and others (1995)Go extend the accelerated failure model to the interval-censored case. They present a class of score statistics for estimating the regression coefficients without specifying the distribution function of the residuals or the joint distribution of the covariates and the interval times. Moreover, Younes and Lachin (1997)Go present a link-based model that can be applied to the interval-censored case. The model employs a link function to describe the manner by which the covariates act upon the survival times, and it uses B -splines to approximate the background hazard function. Kooperberg and Clarkson (1997)Go apply the hazard regression methodology of Kooperberg and others (1995)Go to interval-censored data and time-dependent covariates, and they estimate the logarithm of the conditional hazard function using splines and tensor products.

All these methods for interval-censored data that allow for time-dependent covariates are either computationally intensive or of high dimension due to the many nuisance parameters. As an alternative, we present a family of parametric survival models for left, right, and interval-censored data with fixed and time-dependent covariates. This approach provides a direct computational solution with only a few model parameters in addition to the covariate effects. Furthermore, the proposed family of parametric models includes a proportional hazard model and a proportional odds model as special cases. As noted by Lindsey (1998)Go, parametric regression models in the presence of heavily interval-censored data are robust and are generally more informative than the corresponding non-parametric models.

In Section 2, we present a three-parameter family of event-time distributions that includes the Weibull, negative binomial, and log-logistic, among others, as special cases, and the associated hazard and cumulative hazard functions. We then describe the likelihood function for the family in terms of linear functions of fixed and time-dependent covariates, model estimation, and inferences. In Section 3, we present model diagnostics to examine the event time distribution, the functional form of the covariates, and overall fit of the model. An example is provided in Section 4, followed by caveats and discussion in Section 5.


    2. MODEL SPECIFICATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 
Odell and others (1992)Go, among others, describe a general likelihood function that allows for left, right, and interval censoring. For the ith subject Formula, let Formula be the event time, Formula be the observed event, left- or right-censoring time, and for interval-censored observations, let Formula be the left-censoring time and Formula the right-censoring time. Indicator functions for the ith observation are defined as follows:

Formula 2(2.1)

Note that Formula 2.

A subject is right censored Formula 2 when the subject was last known to be event free at time Formula 2. A subject is left censored Formula 2 when the subject is known to have had the event sometime prior to Formula 2 but it is not known when the subject was previously event free or when the event occurred. A subject is interval censored Formula 2 when it is known that the subject was event free at time Formula 2 and to have had the event sometime up to Formula 2. When time is defined with a definite start time at Formula 2, then a left-censored observation can be viewed as an interval-censored observation with Formula 2 and Formula 2. A subject is known to have had the event exactly Formula 2 at time Formula 2, when the subject is known to have been event free immediately prior to Formula 2, i.e. Formula 2, Formula 2.

Let Formula 2 denote the probability density function of the event times and Formula 2 denote the distribution function. Under the assumption of independent censoring, the likelihood function for a sample of n independent observations is

Formula 2(2.2)

(Odell and others, 1992Go). To accomodate covariate effects, both quantitative and qualitative, let Formula 2 be a vector of p fixed (e.g. baseline) covariates for the ith subject. Then also assume that for the ith subject, additional time-dependent covariates are updated at a sequence of update times Formula 2, where Formula 2 is the time at which a subject enters follow-up (usually zero as in a clinical trial). The set of update times Formula 2 may differ among subjects. At the jth update time Formula 2 of the ith subject, let Formula 2 denote a vector of q time-dependent covariate values that are updated at that time. The covariate vector at the jth update time can also be denoted as Formula 2. Then let Formula 2 denote the complete sequence of time-dependent covariate values over time for the ith subject, and let Formula 2 denote the sequence up to time t. Note that under this model, the covariates are updated at discrete points in time. Different expressions are required if covariate values are updated continuously, such as a function of time itself (see Section 5).

The likelihood then becomes

Formula 2(2.3)

where Formula 2 is the event density for a specified event-time distribution conditional on the fixed covariate values Formula 2 and the sequence of time-dependent covariate values Formula 2 up to time Formula 2. The function Formula 2 is the corresponding cumulative distribution function. Specification of the hazard function in terms of the covariates leads to a specification of the cumulative hazard and survival function probabilities, conditional on the fixed covariates and the history of the time-dependent covariate processes for a given subject.

Let Formula 2 and Formula 2 be the coefficient vectors for the fixed and time-dependent covariates Formula 2 and Formula 2, respectively, so that

Formula 2(2.4)

Let Formula 2 designate a rate parameter conditional on the covariate values at update time Formula 2. Note that with no time-dependent covariates, Formula 2.

A left-censored observation is assumed to have an initial value of the time-dependent covariate at time Formula 2, assuming that the subject is known to be event free at that time. Otherwise, left-censored observations must be excluded from the analysis. However, left-censored observations could be employed in a model with only fixed covariate values, assuming that those values were determined prior to the event (e.g. gender).

We now introduce a general form for the hazard function that with an additional parameter can span a family of distributions such as the Weibull and log-logistic distributions, among others, as special cases. For subject i, conditional on covariates measured at baseline and at time Formula 2, this hazard can be expressed as

Formula 2(2.5)

where Formula 2 andFormula 2 are general hazard function parameters. By construction, the rate parameter Formula 2 is assumed to be constant in the interval Formula 2, Formula 2, for the ith subject.

Specific values for Formula 2 and Formula 2 yield a specific distribution. In particular, Formula 2 and Formula 2 yield a negative binomial distribution for event times. More generally, Formula 2 yields a Weibull hazard, and the parameter vectors Formula 2 and Formula 2 are the change in the log relative risk per unit increase in Formula 2 and Formula 2, respectively. The hazard function will be decreasing for Formula 2, constant for Formula 2, and increasing for Formula 2. Selecting Formula 2 yields a log-logistic hazard and Formula 2 and Formula 2 are the change in the log odds ratio of cumulative incidences. This hazard is decreasing for Formula 2 For Formula 2 and fixed Formula 2, the hazard increases to a maximum at time Formula 2, then decreases to zero as time approaches infinity. For Formula 2 and Formula 2, the hazard increases rapidly, plateaus, and then begins to slowly decline, similar to the hazard for the log normal distribution. Thus, this family of hazards indexed by the additional parameter Formula 2 encompasses a wide range of survival distributions.

For any time Formula 2, let

Formula 2(2.6)

for which the hazard function can be conveniently expressed as

Formula 2(2.7)

This simplifies the expressions for the score equations and Hessian.

To describe the expression for the cumulative hazard, we impose the condition that the set of update times Formula 2 for the ith subject includes the event or censoring time Formula 2 for that subject. If in fact, as will most often be the case, the time-dependent covariate values are not updated exactly at an event or censoring time, then the interval between two update times can be split into two intervals with an added update time equal to the event or censoring time.

Let Formula 2 denote the indicator function where Formula 2 if S is true, 0 otherwise. Then, the cumulative hazard at time Formula 2 for the ith subject is

Formula 2(2.8)

where at time u

Formula 2(2.9)

For Formula 2 the latter term is undefined. However, in this case, the antiderivative of the hazard is

Formula 2(2.10)

and using l'Hospital's rule with implicit differentiation it follows that Formula 2. The resulting expressions for the hazard and survivor function equal those for the log-logistic model. Thus, the expression in (2.10) should be employed to compute the gradient and Hessian in cases where Formula 2 is specified to be 1, or the interim estimate in an iterative computation yields a value close to 1, i.e. Formula 2 for some small Formula 2.

For a subject with an observed event time or a right- or left-censored event time, the cumulative hazard is evaluated at Formula 2; and for an interval-censored observation Formula 2 at both Formula 2 and at Formula 2. In the simple case with no time-dependent covariates, the term Formula 2 is constant over time for the ith individual. The cumulative hazard function for the i th subject evaluated at time t is then expressed as

Formula 2(2.11)

From the general expression for the likelihood in (2.3) and that for the hazard function with fixed and time-dependent covariates in (2.5), the score equations and Hessian matrix can be derived. The expressions are presented in the Appendix. These can then be used to provide the maximum likelihood estimates of the model parameters and the estimated variance of the estimates using an iterative procedure such as the Newton–Raphson algorithm or variations thereof. Alternatively, a derivative-free iterative procedure may be employed to fit the model and estimate the Hessian. The program available from the authors obtains the model estimates using the Newton–Raphson ridge optimization method (cf. Press and others, 1992Go) to maximize the log-likelihood function through the SAS IML function NLPNRR (SAS, 1999). Initial values are obtained by fitting a Weibull accelerated failure time model using SAS PROC LIFEREG and then transforming the time acceleration parameter estimates to Weibull risk model estimates (cf. Lachin, 2000Go). At each iteration, the Hessian is estimated by the the SAS IML function NLPFDD that uses the algorithm of Gill and others (1983)Go based on finite difference equations with central difference approximations. The final estimate of the Hessian when the model has converged is then used to estimate the observed information matrix and the covariance matrix of the coefficient estimates.

Using the theorem from Lehmann (1983Go, pp. 429–30), Sparling (2002)Go provides a proof that the resulting estimates are asymptotically normally distributed about the true values with a covariance matrix that can be consistently estimated from the estimated observed information matrix. The resulting estimates then provide a basis for confidence interval estimates and Wald or likelihood ratio tests of significance.


    3. MODEL DIAGNOSTICS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 
The shape of the assumed hazard function is determined by the values of Formula 2 and Formula 2. Specific hypotheses such as Formula 2 (i.e. a Weibull model) or Formula 2(i.e. a log-logistic model) can be tested using a Wald or likelihood ratio test. If, for example, it is desired to employ a Weibull model, then that model could be fit if the corresponding test is not significant by setting Formula 2 in all the equations.

Alternatively, the model can be fit for other values of Formula 2, and the adequacy of those values can be assessed by examining the values of the log likelihood function, or the optimal value Formula 2 could be estimated from the model. In this case, however, the coefficient estimates Formula 2 or Formula 2 no longer have a convenient interpretation as the log relative hazards (Formula 2) or as the log cumulative incidence odds ratios (Formula 2).

The functional form of the covariate effects can be explored using spline functions (Smith, 1979Go; Ramsay, 1988Go).

Following Therneau and others (1990)Go, the deviance for the family of models herein can be described as

Formula 3(3.1)

where h is the set of subject-specific implied parameters. Let Formula 3, Formula 3, and Formula 3 denote the hazard, survival, and cumulative hazard functions, respectively, from the saturated model; and let Formula 3, Formula 3, and Formula 3 denote the estimates from the fitted model. Let Formula 3 be the individual per-subject estimates of the parameter vector Formula 3.

The Appendix shows that Formula 3 for left-censored, right-censored and interval-censored observations. For non-censored observations where Formula 3 and Formula 3 is the event time for individual i, then

Formula 3(3.2)

The first two terms are not readily obtained because the subject-specific estimate of the cumulative hazard function at time Formula 3 is a function of the subject-specific estimate of the hazard function at time Formula 3. However, an approach similar to that in Therneau and others (1990)Go can be used. For the non-censored case in the current model, the derivatives of the log-likelihood function with respect to each element of the parameter vector Formula 3 can be solved under the constraint that the second derivatives given are negative. This approach involves solving a set of simultaneous equations per subject by numerical methods such as Newton–Raphson ridge optimization.

The deviance statistic may indicate that the model does not fit the data well because important covariates have been omitted, or components of the model are mis-specified such as the variance of the responses as a function of the conditional expectation. In the latter case, the "information sandwich" can be used to provide an estimate of the covariance matrix that is robust to mis-specification (Royall, 1986Go).


    4. GLUCOSE EXPOSURE AND THE RISK OF RETINOPATHY IN DIABETES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 
In longitudinal studies or clinical trials, subjects may undergo an examination or procedure at regularly scheduled follow-up visits to determine whether disease progression has occurred. In this case, the time to specific outcomes is interval censored by the schedule of follow-up assessments. Furthermore, covariates related to the outcome may also be assessed periodically over time.

For example, the study of the Epidemiology of Diabetes Interventions and Complications (EDIC) is a follow-up observational study of the subjects who had previously participated in the Diabetes Control and Complications Trial (DCCT). Men and women aged 13–39 years with type 1 diabetes mellitus (T1DM) were enrolled in the DCCT between 1983–1989. Patients were recruited into two cohorts—a primary prevention cohort with no pre-existing complications and a secondary intervention cohort with minimal complications present.

Patients were randomized to either intensive or conventional treatment and were followed for an average of 6.5 years. Intensive therapy was aimed at maintaining near normal levels of blood glucose while conventional therapy had no such glucose target. The DCCT Research Group (1993)Go showed that intensive therapy markedly reduced the risk of progression of diabetes complications, principally retinopathy (diabetic retinal abnormalities, potentially leading to blindness) that was assessed from a retinal evaluation every 6 months.

The level of glucose exposure (glycemia) over the preceding 6–8 weeks is provided by the hemoglobin AFormula 3 (HbAFormula 3), expressed as the percentage of all hemoglobin (red cells) that have been glycosylated through exposure to glucose molecules in blood, the half-life of hemoglobin being 6–8 weeks. The history of glycemia before study entry was represented by the level of HbAFormula 3 on initial screening and the pre-existing duration of diabetes, and the history of glycemia during the study by the mean level of HbAFormula 3 during the study and the duration of follow-up. The DCCT Research Group (1995)Go showed that the lifetime history of glycemia represented by all four factors was the dominant determinant of the risk of complications, and that the group differences in the updated current mean HbAFormula 3 (a time-dependent covariate) during the study explained virtually all the effect of DCCT treatment group on complications.

At the close of the DCCT in 1993, all subjects were referred to their personal physicians for care and followed annually during EDIC at which time HbAFormula 3 was assessed. During EDIC, the levels of HbAFormula 3 were approximately equal in the two former DCCT treatment groups. Retinopathy was assessed in about one-quarter of the patients at years 1–3 timed in relation to the original date of entry (i.e. 4, 8, or 12 years since entry), and in all subjects at year 4. Thus, the times of progression of retinopathy are interval censored with staggered intervals. One objective of EDIC is to assess the long-term effects of levels of glycemia during the DCCT and EDIC on risk of further progression of retinopathy from the levels present at the end of the DCCT. The DCCT/EDIC Research Group (2000)Go showed that former intensive therapy reduced the risks of further progression of retinopathy during EDIC. The remaining question is the extent to which glycemic levels during DCCT and EDIC are associated with risk of further retinopathy progression during EDIC.

Fixed (baseline) covariates include primary versus secondary cohort at EDIC baseline (1 if in the primary cohort, 0 if secondary), the duration of T1DM in months and HbAFormula 3 level (%) on initial screening that represent the pre-DCCT level of glycemia, and the mean HbAFormula 3 (%) during the DCCT and months duration of follow-up in the DCCT that represent the level of glycemia during the DCCT. The time-dependent covariate is the updated current mean HbAFormula 3 during the EDIC, i.e. the value at year 1, the mean of years 1 and 2, then years 1–3, updated at the time of each successive measurement. In those cases where the EDIC annual HbAFormula 3 is missing (not measured), the mean value from the previous visit is carried forward. Nine patients are deleted because they are missing all HbAFormula 3 values. Of the remaining 1316 subjects, 1085 are right censored and 231 have interval-censored times of progression.

Approximately half the subjects within each treatment group were enrolled (by design) into the primary and secondary cohorts. On entry, the mean duration of diabetes was 69 ± 49 (SD) months and the mean level of HbAFormula 3 was 9.0 ± 1.6%. The mean duration of follow-up during the DCCT was 73 ± 20 months and the mean level of HbAFormula 3 was 8.1 ± 1.4% during the DCCT. The mean level of HbAFormula 3 during EDIC was 8.2 ± 1.3%.

Prior knowledge about the distribution of progression of retinopathy suggests that the event times might follow a Weibull distribution. Figure 1 presents the cumulative incidence of retinopathy progression estimated using the Turnbull (1976)Go empirical estimate and using the Weibull model-based estimate separately within each treatment group and shows that the two estimates are superimposable. Under a Weibull distribution assumption, the regression model coefficients have a log hazard ratio or log relative risk interpretation. Analyses using regression splines to represent the effect of the EDIC HbAFormula 3 showed that a simple linear effect was satisfactory (see Sparling, 2002Go, for details).


Figure 1
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Cumulative incidence of retinopathy progression over 4 years of EDIC within the intensive and conventional groups estimated using the Turnbull (1976)Go empirical estimate and using a Weibull model.

 
Table 1 presents the maximum likelihood estimates of the parameters, their variances, Wald tests, and likelihood ratio tests for the Weibull model. The deviance for the fitted Weibull model is Formula 3 with Formula 3 and Formula 3.624 that indicates adequate fit of the model. Further, the deviance/df Formula 3 0.99 does not indicate any over dispersion. Thus, a robust estimate of the variance–covariance matrix is not employed.


View this table:
[in this window]
[in a new window]

 
Table 1 Weibull model estimates and tests

 
Primary versus secondary cohort and duration of follow-up in the DCCT do not have any meaningful effects on progression of retinopathy adjusted for the other covariates. T1DM duration and the HbAFormula 3 on entry have significant effects whereas the effect of the duration of follow-up in the DCCT is not significant. By far the greatest effect is contributed by the mean level of HbAFormula 3 during the DCCT with a 2.04-fold increase in the risk of progression of retinopathy per unit increase in DCCT mean HbAFormula 3 percent (95% CI: 1.64, 2.54, Formula 3). Interestingly, the time-dependent EDIC mean HbAFormula 3 over 4 years following the DCCT has a significant effect on risk, but with a much smaller 1.19-fold increase in risk per unit increase in HbAFormula 3 percent (95% CI: 1.02, 1.38, Formula 3). The finding that the glycemic exposure during the DCCT persists and outweighs the level of HbAFormula 3 during the first 4 years of EDIC has lead to the hypothesis of metabolic memory that effects of hyperglycemia are long lasting.

An unrestricted model provides a shape parameter estimate Formula 3.3247 with Formula 3 and 95% confidence limits Formula 3. The Wald test of the hypothesis Formula 3 yields Formula 3, the likelihood ratio test yields Formula 3. These results indicate that the distribution of event times conditional on covariates does not significantly deviate from a Weibull distribution. In this model, Formula 3 with Formula 3, close to the values in the Weibull model. Further, Formula 3. Because the observed cumulative incidence at the close of follow-up in the cohort is still low (Formula 3), there is little information to reliably estimate the shape parameter Formula 3.

Similar covariate effects were obtained in a logistic regression analysis of the subset of subjects who were assessed at 4 years (The DCCT/EDIC Research group, 2000). That analysis described the cross-sectional association of the mean HbAFormula 3 over 4 years with the prevalence of progression at 4 years, whereas the analysis herein shows the more desirable prospective association between the time-dependent HbAFormula 3 levels and the incidence (risk) of progression using all visits in all patients.


    5. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 
We describe a family of parametric regression models for survival data that allows for fixed (e.g. baseline) and/or time-dependent covariates with mixtures of left, right, and interval censoring. The model is fit using standard maximum likelihood estimation from the full likelihood for which all the conditions for convergence have been rigorously proven in Sparling (2002)Go. Thus, the inferences based on the model are based on large sample approximations. The model should be used with caution in instances with a small sample size or a small total number of non-censored observations (events).

The family of models is characterized by an additional parameter Formula 3 that allows fitting a Weibull model Formula 3 or a log-logistic model Formula 3 as special cases, or allows for the optimal value of this parameter to also be estimated from the observed data. In the latter case, however, the ability to differentiate a Weibull from a log-logistic model, or to accurately estimate the optimal value, will be roughly proportional to the observed cumulative incidence. If the observed cumulative incidence is low, as the example herein (Figure 1), there is inadequate information about the true shape of the hazard function to allow precice estimation of Formula 3.

The family of models presented herein allows for incorporation of time-dependent covariates for which the values are updated or change at discrete points in time. For a time-dependent covariate that changes continuously over time, such as a function of time itself, the integrals in the cumulative hazard function may not be expressable in closed form. For example, rather than having a vector of discrete time-dependent covariates Formula 3 with distinct values at each of the update times Formula 3 a covariate that is a function of time itself would have continuously changing time-dependent values specified as Formula 3 , Formula 3. Then, the hazard function is no longer constant over intervals of time. Consequently, cumulative hazard at time t for the ith subject is

Formula 5(5.1)

While such an expression may not have a closed form, the expression could be evaluated numerically.

Many covariates will in fact be a function of time, in theory, including biochemical or biological measures that change from day to day, such as the Formula 5 measured in the EDIC example above. However, in practice, as in the EDIC, all that is known are the time-dependent covariate's values for a finite set of update times. In this situation, the time-dependent covariate effect must be interpreted in the context of the study design and the follow-up schedule of assessments at which changes in the covariate are observed. For the EDIC example, the estimated relative risk per unit increase in HbAFormula 5 is that associated with differences between values updated approximately annually, the specified schedule during EDIC. The relative risks (or hazard ratios) then have a prospective interpretation when applied to other settings with the same (or a similar) schedule of update times.

As with any model that employs fixed and time-dependent covariates, caution should be taken in the interpretation of a fixed (e.g. baseline) effect when a time-dependent covariate is influenced by that fixed effect, such as treatment group (see, for example, Kalbfleisch and Prentice, 2002Go). In this situation, the model describes the risks of the event given the time-dependent covariate values and treatment group. If the effect of the treatment group is predominantly reflected in the time-dependent covariate process, this type of analysis would show a minimal treatment group effect. However, an analysis not including the time-dependent covariate might show a substantial treatment group effect. In this case, the proper interpretation is that treatment group has an effect on risk, and also an effect on the time-dependent covariate Y, but that after adjusting for Formula 5, group has little further effect. This suggests that factors related to Y reflect the underlying mechanism by which treatment has an effect on risk. In fact, such analyses and findings are useful to illustrate the mechanism by which fixed baseline covariates such as treatment group have an effect on risk.

A SAS IML macro written by Oliver Bautista and Yvonne Sparling is available from www.bsc.gwu.edu under the link to downloadable software.


    APPENDIX
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 

A.1 Fitting the model

From (2.3), the log likelihood is

Formula 1(A.1)

In order to simplify the expressions, the conditioning on Formula 1 is omitted for the expressions for Formula 1 and Formula 1. For any parameter Formula 1, where Formula 1, the score equation is

Formula 2(A.2)

The derivatives of Formula 2, of the log hazard and the increments in cumulative hazard with respect to each parameter Formula 2 at covariate update time u, are

Formula 3(A.3)

where Formula 3 is the value for the Formula 3th time-dependent covariate value for the ith subject at update time u. The derivatives of terms involving the cumulative hazard at event or censoring time t are then provided by

Formula 4(A.4)

The score equation for parameter Formula 4 is then obtained by evaluating the respective derivatives for each subject. The maximum likelihood estimates of the model parameters are those values Formula 4 such that the joint set of score equations are equal to zero when all parameters are fixed at these values.

The Hessian for any parameters Formula 4 and Formula 4 has elements

Formula 5(A.5)

The partial second derivatives with respect to Formula 5, Formula 5, and Formula 5 are presented in Tables 24, respectively.


View this table:
[in this window]
[in a new window]

 
Table 2 The second derivatives of the term {pi}ij(u) with respect to the elements of the parameter vector, {partial}2{pi}ij(u)/({partial}{phi}{partial}{psi})

 

View this table:
[in this window]
[in a new window]

 
Table 4 The second derivatives of the terms µij(u) required for the second derivative of the cumulative hazard function with respect to the elements of the parameter vector, {partial}2µij(u)/({partial}{phi}{partial}{psi})

 
The estimated Hessian matrix then has elements Formula 5 based on the vector of parameter estimates Formula 5. The observed information is Formula 5. If Formula 5 is the Formula 5th term in the parameter vector Formula 5 then the estimated variance of Formula 5 is obtained as

Formula 6(A.6)

where Formula 6 is the (Formula 6) term of Formula 6.

A.2 The deviance

To simplify the derivation, we use Formula 6 to refer to Formula 6. In the left-censored case (Formula 6 and Formula 6 is the left-censoring time for individual i),

Formula 7(A.7)

For the right-censored case (Formula 7 and Formula 7 is the right-censoring time for individual i),

Formula 8(A.8)

In the interval-censored case (Formula 8 and Formula 8 is the left endpoint of the censoring interval and Formula 8 is the right endpoint of the censoring interval for individual i),

Formula 9(A.9)

Since Formula 9, then Formula 9 and

Formula 10(A.10)


View this table:
[in this window]
[in a new window]

 
Table 3 The second derivatives of the log hazard function with respect to the elements of the parameter vector, {partial}2ln {lambda}(u|zi, yij)/({partial}{phi}{partial}{psi})

 


    ACKNOWLEDGMENTS
 
This work was supported by a grant from the National Cancer Institute and by a contract from the National Institute of Diabetes, Digestive, and Kidney Diseases for the study of the EDIC. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODEL SPECIFICATION
 3. MODEL DIAGNOSTICS
 4. GLUCOSE EXPOSURE AND...
 5. DISCUSSION
 APPENDIX
 REFERENCES
 

    Betensky RA, Lindsey JC, Ryan LM, Wand MP. (2002) A local likelihood proportional hazards model for interval censored data. Statistics in Medicine 21:263–75.[CrossRef][Web of Science][Medline]

    Finkelstein DM. (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42:845–54.[CrossRef][Web of Science][Medline]

    Finkelstein DM and Wolfe RA. (1985) A semiparametric model for regression analysis of interval-censored failure time data. Biometrics 41:933–45.[CrossRef][Web of Science][Medline]

    Gill EP, Murray W, Saunders MA, Wright MH. (1983) Computing forward-difference intervals for numerical optimization. SIAM Journal on Scientific Computing 4:310–21.[CrossRef]

    Goetghebeur E and Ryan L. (2000) Semiparametric regression analysis of interval-censored data. Biometrics 56:1139–44.[CrossRef][Web of Science][Medline]

    Kalbfleisch JD and Prentice RL. (2002) The Statistical Analysis of Failure Time Data 2nd edition (John Wiley and Sons, Inc., New York).

    Kooperberg C and Clarkson DB. (1997) Hazard regression with interval-censored data. Biometrics 53:1485–94.[CrossRef][Web of Science][Medline]

    Kooperberg C, Stone CJ, Truong YK. (1995) Hazard regression. Journal of the American Statistical Association 90:78–94.[CrossRef]

    Lachin JM. (2000) Biostatistical Methods. The Assessment of Relative Risk(John Wiley and Sons, Inc., New York).

    Lehmann EL. (1983) Theory of Point Estimation(John Wiley and Sons, Inc., New York).

    Lindsey JK. (1998) A study of interval censoring in parametric regression models. Lifetime Data Analysis 4:329–54.[CrossRef][Web of Science][Medline]

    Odell PM, Anderson KM, D'Agostino RB. (1992) Maximum likelihood estimation for interval-censored data using a Weibull-based accelerated failure time model. Biometrics 48:951–9.[CrossRef][Web of Science][Medline]

    Press WH, Vetterling WT, Teukolsky SA, Flannery BP. (1992) Numerical Recipies in C(Cambridge University Press, London).

    Rabinowitz D, Betensky R, Tsiatis AA. (2000) Using conditional logistic regression to fit proportional odds models to interval censored data. Biometrics 56:511–8.[CrossRef][Web of Science][Medline]

    Rabinowitz D, Tsiatis A, Aragon J. (1995) Regression with interval-censored data. Biometrika 82:501–13.[Abstract/Free Full Text]

    Ramsay JO. (1988) Monotone regression splines in action (with discussion). Statistical Science 3:425–61.

    Royall RM. (1986) Model robust inference using maximum likelihood estimators. International Statistical Review 54:221–6.

    SAS Institute Inc. (1999) SAS/IML Users' Guide, Version 8(SAS Institute Inc., Cary, NC).

    Seaman SR and Bird SM. (2001) Proportional hazards model for interval-censored failure times and time-dependent covariates: application to hazard of HIV infection of injecting drug users in prison. Statistics in Medicine 20:1855–70.[CrossRef][Web of Science][Medline]

    Smith PL. (1979) Splines as a useful and convenient statistical tool. The American Statistician 33:57–62.

    Sparling YH. (2002) Parametric survival models for interval-censored data with time-dependent covariates, [Doctoral Dissertation](The George Washington University, Washington, DC).

    The DCCT/EDIC Research Group YH. (2000) Retinopathy and nephropathy in patients with type 1 diabetes four years after a trial of intensive therapy. The New England Journal of Medicine 342:381–9.[Abstract/Free Full Text]

    The DCCT Research Group YH. (1993) The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. The New England Journal of Medicine 329:977–86.[Abstract/Free Full Text]

    The DCCT Research Group YH. (1995) The relationship of glycemic exposure (HbA1c) to the risk of development and progression of retinopathy in the diabetes control and complications trial. Diabetes 44:968–83.[Abstract]

    Therneau TM, Grambsch PM, Fleming TR. (1990) Martingale-based residuals for survival models. Biometrika 77:147–60.[Abstract/Free Full Text]

    Turnbull BW. (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B 38:290–5.

    Younes N and Lachin J. (1997) Link-based models for survival data with interval and continuous time censoring. Biometrics 53:1199–211.[CrossRef]

    Received March 31, 2005; revised January 27, 2006; accepted for publication March 8, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/4/599    most recent
    kxj028v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Sparling, Y. H.
    Right arrow Articles by Bautista, O. M.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Sparling, Y. H.
    Right arrow Articles by Bautista, O. M.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?