Biostatistics Advance Access originally published online on July 13, 2006
Biostatistics 2007 8(2):345-356; doi:10.1093/biostatistics/kxl014
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Identifiability assumptions for missing covariate data in failure time regression models
Department of Health Studies, University of Chicago, 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA prathouz{at}uchicago.edu
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem.
Keywords: Identifiability; Missing at random; Parametric model; Survival analysis
| 1. INTRODUCTION |
|---|
|
|
|---|
Failure time data (Cox and Oakes, 1984
A concern in other settings, however, is that missingness of covariate values is related to future failure times. For example, a certain covariate may be measured only via an expensive or invasive test. Clinical staff may choose to order the test for a patient who, in their judgment, is reasonably likely to test positive. But, that choice may also reflect the patient's severity of illness. Alternatively, the test may not be safe for patients with poor prognosis. In either case, the presence or absence of the test predicts the future failure times.
In other settings, the covariate missingness may be related to the censoring time. Suppose a study enrolls patients over several years and censoring for all subjects occurs on the same calendar date. Suppose further that covariate missingness is related to enrollment time, due to secular shifts in clinical practice or to modifications in study protocol over time. In this case, the presence or absence of a covariate will predict the censoring time.
When missingness of a covariate X is thought to depend on the response Y in a regression model, including failure time models, a common identifying assumption is "missing at random" (MAR; Little and Rubin, 2002
). Under MAR, missingness of X is independent of the value of X given observed covariates Z and response Y. Several papers, reviewed in Section 2, have proposed MAR-based methodologies for missing covariate data in survival analysis problems. By analogy to other regression methods, the natural MAR assumption for failure time data with no censoring would be that missingness of X is independent of the value of X given Z and the failure time T, i.e. R
X|(T,Z), where R = I(Xis observed). With censoring, however, this does not correspond to conditioning on the observed covariates and response. Rather, the MAR-like assumption in survival analysis problems replaces failure time T with "response" (Y,D), where Y = min(T,C), D = I(T
C), and C is the censoring time. This assumption,
|
| (1.1) |
is what is generally termed MAR in survival analysis methods (e.g. Schluchter and Jackson, 1989
; Lipsitz and Ibrahim, 1996
; Pugh and others, 1993
).
The first purpose of this paper is to critically examine MAR assumption (1.1) in the context of failure time regression models. In Section 2.1, we briefly review the recent literature exploiting this assumption. In Section 2.2, we consider ways in which MAR can arise. Our goal is to understand how joint dependence of the missingness process on survival time T and censoring time C, as well as possible dependence on covariate X, might yield MAR (1.1). One obvious way in which it will do so is when missingness is by design and explicitly dependent on (Y,D); examples include casecohort designs and nested casecontrol studies. When missingness is by chance, the missingness mechanisms require further consideration. We posit three different scenarios, characterized by dependence of missingness on the failure time, on the censoring time, or on both, and show that in each one, MAR requires additional strong assumptions that may be undesirable. We conclude that, unless one assumes very special relationships between the missingness, the failure time, the censoring process, and the missing X, the MAR assumption (1.1) is either questionable or difficult to interpret.
A second purpose of this paper, elaborated in Section 3, is to propose alternative assumptions under which failure time regression models with missing covariates are identifiable. The first of these assumptions, R
(C,X)|(T,Z), which we call "censoring-ignorable missingness at random" (CIMAR), allows the missingness process to depend explicitly on the failure time, but assumes that it is independent of the censoring process. CIMAR has the conceptual advantage over MAR in that, given the covariates (X,Z), it maintains the separation between the failure and the censoring processes, mirroring the independent censoring assumption commonly made in survival analysis. Moreover, CIMAR reduces to the MAR assumption (R
X|T,Z) in the special case of no censoring.
In Section 3.1, we develop a bias-corrected likelihood for inference under CIMAR. Based on a complete-case analysis, this likelihood is motivated by considering the survival function for T conditional on (R = 1,X,Z). It depends on a model for R, but does not require modeling the distributions of X or C. We also develop a likelihood for the missingness model by conditioning on the failure event being observed. This likelihood permits inferences on the missingness model, estimates of which are plugged into the complete-case likelihood for the failure time model.
An alternative assumption R
(T,X)|(C,Z), presented in Section 3.2, which we call "failure-ignorable missingness at random" (FIMAR), reverses the roles of censoring and failure in the missingness process. This assumption reflects, for example, the situation where missingness is related to enrollment time, but unrelated to the severity of disease. We show that under FIMAR, naive complete-case analysis is valid.
The ideas presented in Sections 2 and 3 are expanded upon in Section 4 where we propose analysis tools to evaluate the plausibility of either CIMAR or FIMAR in a particular data set. Of course, as with MAR, both CIMAR and FIMAR are fundamentally uncheckable in the sense that one cannot confirm the independence of the missingness and the value of X. However, certain contradictions of CIMAR and FIMAR can be detected with the data. Section 5 illustrates the ideas on an example data set which has appeared in the earlier literature on this problem; a second example is presented in supplementary material available at Biostatistics online.
| 2. EXISTING LITERATURE AND ASSUMPTIONS |
|---|
|
|
|---|
Suppose subjects i = 1,...,n are independently sampled from a reference population, and data collected on each subject include (Zi,Ri,Yi,Di). Here, Zi is a vector of covariates observed for all subjects in the sample; Yi = min(Ti,Ci), where Ti and Ci are the failure and censoring times for the ith subject and Di = I(Ti
Ci) is a failure event indicator. In addition, we observe the indicator Ri = I(Xiis observed) and, when Ri = 1, the covariate vector Xi. In what follows, we omit the subscript i when possible. Interest lies in making inferences about the density f(t|X,Z) of (T|X,Z). Throughout, we make the independent censoring assumption
|
| (2.1) |
To facilitate the development that follows, let
|
|
and similarly
|
|
We will use the notational convention of replacing arguments with · to indicate conditional independence. For example, R
X|(T,C,Z) would also be denoted as
(T,C,·,Z).
Most existing methods for handling missing X in failure time models rely on a model for the distribution of (X|Z). Inferences are based on a fully specified likelihood for the observed data. The survival model can be parametric (e.g. Schluchter and Jackson, 1989
; Lipsitz and Ibrahim, 1996
), or semi-parametric, leaving either the distribution of (X|Z) and/or the baseline hazard to be nonparametric (e.g. Paik and Tsai, 1997
; Lipsitz and Ibrahim, 1998
, 2000
; Chen and Little, 1999
; Herring and Ibrahim, 2001
; Chen, 2002
; Herring and others, 2002
). In these approaches, MAR (1.1) is assumed to hold and ensures that consistent inferences will be obtained by maximizing the likelihood for the observed data without specifying a model for the missingness process. In order to avoid specifying a model for the censoring process, these authors, excepting Paik and Tsai (1997)
, additionally assume that the censoring distribution does not depend on the missing values X, i.e.
|
| (2.2) |
Note that (2.1) and (2.2) together imply C
T|Z.
A different approach avoids modeling (X|Z), instead relying on a model for the missingness process, expressed via
. Estimation is carried out via inverse probability weighted estimating equations that are variations on the partial likelihood estimating equations in the original Cox proportional hazards model (Pugh and others, 1993
). Wang and Chen (2001)
extend this idea to construct an augmented estimator that is doubly robust (Robins and others, 2000
). These approaches also require MAR (1.1) in order to identify
, but they do not require Assumption (2.2). These methods and that of Paik and Tsai (1997)
demonstrate that, while (2.2) is avoidable, (1.1) is essential for model identifiability in methods in the current literature.
According to the MAR assumption (1.1), R may depend on C and/or T through (Y,D). But in many settings, one would not expect the censoring distribution and the missingness mechanism to be related, while in others, one would expect the missingness mechanism and the failure distribution to be independent. As such, it is interesting to consider ways that R might depend on T, C, and possibly X, in addition to Z, which would give rise to Assumption (1.1). We consider three potential scenarios, and show that none of these give rise to MAR without additional strong assumptions. The first scenario captures the situation where missingness is related to failure, but not to censoring; the second reverses this. The third scenario considers joint dependence of missingness on failure and censoring.
Scenario 1.
One way to consider how (1.1) might arise is via the CIMAR assumption,
|
| (2.3) |
To examine whether (2.3) will yield (1.1), note that, if D = 1 and (2.3) is true,
|
| (2.4) |
where g(·|X,Z) is the density of C given (X,Z). Expression (2.4) is simply equal to
(T = Y,·,·,Z) and is free of X. On the other hand, if D = 0,
![]() | (2.5) |
This last expression will, in general, depend on X. One obvious way in which it will not do so is if
|
| (2.6) |
but this is an overly restrictive assumption in light of the analytic goal of quantifying the dependence of T on (X,Z).
Alternatively, we might ask if special cases of
would render (2.5) independent of X. Theorem A.1 in the online appendix (http://www.biostatistics.oxfordjournals.org) provides an answer to that question. Suppose that T is an absolutely continuous random variable, X takes at least two distinct values, x1 and x2 say, densities f(t|X = x1,Z) and f(t|X = x2,Z) satisfy the properties in Theorem A.1, and
(t,·,·,Z) is everywhere differentiable in t. While these assumptions may not hold in all settings, it is almost always desirable to allow for the possibility that they do. Then, the theorem establishes that in order for (2.5) to be free of X, it must be that
|
| (2.7) |
But, this last assumption is undesirable, since presumably one motivation for addressing missingness and positing (2.3) is the concern that subjects failure times T are associated with whether or not X is missing. Together with (2.3), (2.7) is equivalent to assuming R
(T,C,X)|Z, i.e. that X is MCAR. Thus, Assumption (2.3), by itself or even in conjunction with (2.2), will not yield (1.1).
A modification of (2.7) that may be plausible in some settings is if there exists some value
> 0 such that no one is censored before
and the missingness model
(t,·,·,Z) does not depend on t for t >
. These assumptions are formalized as
|
| (2.8) |
This may be reasonable in some settings, especially where censoring time is constant. Examination of (2.5) reveals that (2.8) together with (2.3) yields (1.1).
Scenario 2.
A second proposal for attempting to satisfy (1.1) is to consider the FIMAR assumption,
|
| (2.9) |
It can be shown via symmetric arguments to those in Scenario 1 (swapping the roles of C and T) that (2.2) and (2.9) together imply (1.1). This may be acceptable in some settings, while in others it may be difficult to justify how C could depend on whether X is missing or not (2.9), but not on the value of X itself (2.2). As in Scenario 1, the only way in which (2.9) will satisfy (1.1) in the absence of (2.2) is if the data are MCAR, which is overly restrictive.
Scenario 3.
A third possibility is that R depends on (T,C,X) as well as Z, and that(T,C,X,Z) is constructed as some function of f and g such that (1.1) is satisfied. Then we have the unappealing fact that the allowable missingness models would vary depending on f and g. To see one way in which this dependence operates, again consider two distinct values x1 and x2 of X. Suppose that T and C are absolutely continuous random variables satisfying the conditions in Theorem A.3 of the online appendix and that
(T,c,X,Z) is continuous in c. Then, the theorem shows that in order to satisfy MAR, we require
|
| (2.10) |
This yields the odd result that, if g is such that (2.2) holds, then the fraction in (2.10) is one, and
(T,c,X = x1,Z) =
(T,c,X = x2,Z), i.e.
|
| (2.11) |
On the other hand, if g does depend on X, then (2.11) cannot hold. It is difficult to imagine applications where such a modeling framework would be desirable or yield easily interpretable results.
In summary, MAR (1.1) is critical for identifiability in many methods in the literature, while C
X|Z (2.2) serves more to simplify matters in approaches that rely on models for the distribution of (X|Z) by avoiding the need to model the censoring distribution. When missingness is thought to be related to censoring but not to failure, MAR is for practical purposes equivalent to assuming FIMAR (2.9) along with (2.2). While this is probably the most convincing pair of assumptions one could make to obtain (1.1), they may be overly restrictive in some settings. Additionally, (2.2) is unappealing because it alters the traditional survival analysis modeling framework, which would allow censoring to depend on both X and Z. Furthermore, as we show in Sections 3 and 4, under (2.9), the complete record likelihood analysis remains consistent, albeit inefficient, even without (2.2), and in addition (2.2) is checkable with the data. Existing methods do not exploit these facts.
An alternative pair of assumptions yielding MAR is CIMAR (2.3) along with either (2.6), (2.7), or (2.8). While (2.3) may be acceptable, (2.6) and (2.7) are overly restrictive. MAR, therefore, appears difficult to justify when missingness is thought to be related to failure but not to censoring time. In particular, suppose that the analyst believes a priori that missingness is independent of C and X, and subsequently finds in the data that missingness depends on (Y,D) as well as Z. Then, he/she would have to conclude that neither MCAR nor MAR can hold. An exception arises when censoring is delayed for all subjects, in which case (2.8) may apply and MAR would be justified.
| 3. ALTERNATIVES TO MAR |
|---|
|
|
|---|
In contrast to MAR, two appealing and interpretable assumptions are CIMAR (2.3) and FIMAR (2.9). CIMAR is appealing, especially given that we may have already assumed (2.2). Indeed, we will show that (2.2) and (2.3) together imply (R
C|Z), capturing the case that censoring is independent of both X and the fact that X is missing, given Z. CIMAR neither implies nor is implied by MAR, as can be seen by the various scenarios presented in the foregoing section. FIMAR, on the other hand, allows for missingness to be related to the censoring distribution. Because FIMAR and (2.2) together imply MAR, FIMAR is in a practical sense a weaker assumption than MAR. In what follows, we exploit these new assumptions to propose approaches to modeling failure time data when X may be missing. We refer to independence results enumerated and proved in online Appendix B (http://www.biostatistics.oxfordjournals.org).
Instead of MAR, consider starting with CIMAR assumption (2.3). There are two important implications of (2.3). First,
is identifiable from the data (R|T,Z) among subjects with T
C. To see this, note that the likelihood for
generated by data from these subjects is conditional on T
C. One subject's contribution to this likelihood can therefore be written as
|
|
But from (2.3), this is just equal to
|
| (3.1) |
Therefore, a model for
(t,·,·,Z) can be estimated by fitting it to the data from subjects with uncensored failure events.
The second implication is that given (2.3) and independent censoring (2.1),
|
| (3.2) |
(Result B.2). That is, independent censoring holds, conditionally on R = 1.
Now suppose we undertake analysis using only records with observed X. In such a complete-case analysis, the likelihood is conditional on R = 1, i.e.
|
|
Under (3.2) and the usual parameter separation for the failure and censoring models, Lcomplete can be factored into a piece involving f(t|R = 1,X,Z) and a piece involving g(c|R = 1,X,Z). As R
C|(X,Z) (Result B.1), this second piece contains no parameters of interest and can be ignored. So, for inferences about the distribution of (T|X,Z),
|
|
where S(t) = 
f(u)du is the survival function of T.
Now note that
|
|
and
|
|
where
|
|
and
|
|
Therefore, for purposes of inferences about (T|X,Z), Lcomplete depends only on a model for f(t|X,Z) and a model for
(t,·,·,Z). The fact that
(t,·,·,Z) is estimable means that likelihood Lcomplete can be used to estimate f(t|X,Z) via data (Y,D) given (X,Z) among the complete-case subjects. Therefore, f(t|X,Z) is identifiable under independent censoring (2.1) and CIMAR (2.3). Also, it is not necessary to specify models for C or X. In Section 5, we illustrate these ideas via application to parametric model estimation of survival time distribution among lymphoma patients. Remarks on extension of these ideas to semi-parametric survival models are given in Section 6.
Now turn to the FIMAR assumption (2.9) as an alternative to either MAR or CIMAR. Analogous to the development above, (2.9) and (2.1) again imply (3.2), i.e. independent censoring holds conditionally on R (Result B.4). Additionally, R
T|(X,Z) (Result B.3), so the complete data likelihood for inferences about the distribution of (T|X,Z) is
![]() |
The implication of Lcomplete is that, under FIMAR, the naive complete record analysis yields valid likelihood-based inferences for the distribution of (T|X,Z), ignoring the missingness process.
| 4. CONSISTENCY CHECKS FOR CIMAR AND FIMAR |
|---|
|
|
|---|
In the context of a specific data analysis, suppose investigators decide that either CIMAR or FIMAR is a reasonable identifiability assumption. It would then be useful to check whether the data are consistent with this assumption. As with MAR, CIMAR or FIMAR cannot be outright confirmed using available data in that one cannot ever check whether R depends on X, even if the sample size is very large. However, we may perform tests to detect inconsistencies with either assumption by positing CIMAR or FIMAR as a working assumption and then looking for evidence to the contrary in the data. In this section, we propose such consistency checks for the special situation wherein Assumption (2.2) that C
X|Z holds; we also provide checks for that assumption. In so doing, we refer to independence results given in online Appendix B. When (2.2) is not justifiable, checking assumptions is more challenging and is beyond the scope of this paper.
Assuming either CIMAR (2.3) or FIMAR (2.9) as a working assumption, we first consider the role of X in predicting C. Results B.1 and B.2 show that CIMAR and C
X|Z (2.2) together imply C
T|(R = 1,X,Z) and C
X|R = 1,Z. Results B.3, B.4, and B.6 show the same thing given FIMAR and C
X|Z, although the development is not symmetric. The implication is that, under either CIMAR or FIMAR, one can test whether C
X|Z by modeling C as a function of (X,Z) in the subsample for which R = 1, treating T as the "censoring" time. Note that this is valid because we have independent censoring given R = 1.
Now, we consider directly the dependence of C or T on R. Result B.5 shows that CIMAR and C
X|Z together imply C
T|R,Z and C
R|Z. The implication is that, under C
X|Z, one can test CIMAR by modeling C as a function of (R,Z), treating T as the censoring time. Analogously, Results B.7 and B.8 show that FIMAR and C
X|Z together imply C
T|R,Z and T
R|Z. So, under C
X|Z, one can test FIMAR by modeling T as a function of (R,Z), treating C as the censoring time.
The foregoing results suggest the following analysis plan for checking the data for consistency with the CIMAR or FIMAR assumptions:
- Based on scientific considerations, posit either CIMAR or FIMAR as a working assumption.
- Treating C as the failure time and T as the censoring time in the subset for which X is observed, evaluate whether C is independent of X given Z.
- If so, evaluate whether C is independent of R given Z, which should hold under CIMAR, or whether T is independent of R given Z, which should hold under FIMAR.
This procedure is illustrated in Section 5.
| 5. EXAMPLE: SURVIVAL OF LYMPHOMA PATIENTS |
|---|
|
|
|---|
We illustrate the foregoing results for exploring missing covariate assumptions and for estimation of failure time models under the CIMAR assumption using previously analyzed data from a study of non-Hodgkins lymphoma. A second example illustrating FIMAR using the well-known primary biliary cirrhosis data is provided in the online Appendix C (http://www.biostatistics.oxfordjournals.org). These data were analyzed by Chen (2002)
Survival on a sample of 79 male patients with stage 4 non-Hodgkins lymphoma was analyzed by Schluchter and Jackson (1989)
and Dinse (1982)
. Interest is on whether survival is associated with the presence of symptoms (X) at the start of treatment; there are no additional covariates. Thirty-eight subjects of whom 26 (68%) fail have known symptom status; 41 of whom 26 (63%) fail are missing this information. All censoring occurs beyond the median survival time. Early in the follow-up period, the survival experience in the group with missing symptom status was worse than that in the other two groups; later on, it falls between the two groups (Figure 1). Schluchter and Jackson (1989)
, noting these differences, stated that "with uncensored data, a mechanism where missingness depended on the underlying true survival time ...would be ignorable ...." This suggests that CIMAR is a more appropriate missingness assumption than FIMAR or MAR; we proceed with CIMAR as a working assumption.
|
There are 12 censoring events, 11 in asymptomatic patients, in the group with R = 1. Under independence of symptom status and censoring, the expected number of censoring events among asymptomatic subjects was 11.05, so there is no evidence that censoring is related to X. Modeling failure as a function of R, including a time-varying interaction of R with log(t), yielded a test statistic of X2 = 3.48 on 2 df, with a higher hazard for failure at early times for those missing symptom status. While this is not statistically significant, the sample size is small, and the missingness is substantial. Modeling censoring as a function of R revealed no relationship. We conclude that a conservative approach is to assume CIMAR in modeling these data.
Following the results of Section 3.1, we fit a parametric failure time model to the complete record data, allowing for CIMAR symptom status data. Due to the small sample size, we choose to model the survival distribution as exponential with a constant hazard ratio comparing symptomatic to asymptomatic groups. Schluchter and Jackson (1989)
fitted a piecewise exponential model with constant hazard ratio, but found that the baseline hazards did not differ significantly over time. The first step is to model
, the non-missingness probability, as a function of time among failed subjects using likelihood (3.1). We used logistic regression with log-transformed time, yielding estimated intercept and slope (SD) of 3.89(2.01) and 0.800(0.406) with Z = 1.97 for the effect of log(T) on R. Because a partial residual plot for log(T) (Figure 2; O'Hara Hines and Carter, 1993
) suggested that the effect is nonlinear on the logit scale, we refit the model adding a term for {log(T)}2, yielding coefficients (SD) of 14.3(13.4), 5.34(5.75), and 0.483(0.606). Whereas the squared term is not significant, the point here is to appropriately capture the relationship of R to T, so we retain it in the model.
|
We now employ Lcomplete to fit the exponential failure time model f(t|Xi) =
iexp(
it) to the complete-case data, where log(
i) = ß0 + ßxXi. The naive analysis, equivalent to assuming MCAR in Lcomplete, yielded estimated death rates in the asymptomatic and symptomatic groups of 2.26 and 5.60 deaths/1000 person-weeks, with a risk ratio of 2.48. The bias-corrected analysis using the fitted model for
as a function of log(T) and its square resulted in 2.41 and 7.95 deaths/1000 person-weeks, with a risk ratio of 3.30. The larger rate in the symptomatic group is due to the fact that the subjects with missing symptom status who failed early were most likely symptomatic. Schluchter and Jackson (1989)
Interestingly, dropping the {log(T)}2 term from the model for
yielded estimated rates of 3.27 and 8.09 deaths/1000 person-weeks in the two groups. These rates are irreconcilable with the overall hazard rate in the entire sample of only 2.90 deaths/1000 person-weeks. When {log(T)}2 is included in the non-missingness model, the group-wise rates come into line with the overall rate.
| 6. DISCUSSION |
|---|
|
|
|---|
In this paper, we have explored various ways in which missingness of a covariate X can depend on the failure time T and the censoring time C in order to yield MAR. We conclude that MAR is difficult to justify in most settings unless one assumes that missingness is independent of failure time, and censoring time is independent of the missing value. However, the second of these assumptions is not necessary in order for the model to be identifiable. We proposed two new identifiability assumptions, formalizing the notions that missingness can depend on failure, but not on censoring, or vice-versa, and we showed how to fit failure time models using the complete-case data under these new assumptions.
We make two brief remarks. First, with regard to FIMAR and CIMAR, our goal here is only to show identifiability. Complete-case estimators in Section 3 are given so as to demonstrate the principle of model fitting rather than as any claim to optimality. Work remains to develop efficient estimators by modeling X as a function of (T,Z) under CIMAR or as a function of (C,Z) under FIMAR, and to properly account for uncertainty in these models and in the missingness models.
Second, an interesting set of problems arises when one applies these ideas to semi-parametric failure time models, or to the case where the missingness is modeled nonparametrically in t. First, extension of the identifiability results in Section 3.1 to such settings requires introduction of a maximum time
above which
is constant in order to facilitate the integrals that appear in f(t|R = 1,X,Z) and S(t|R = 1,X,Z). With that as a starting point, semi-parametric efficient estimators and standard errors under CIMAR or FIMAR for the proportional hazards and related models is an open problem.
| ACKNOWLEDGMENTS |
|---|
This material is based upon work partially supported by the National Science Foundation under Grant No. 0096412. The author thanks Ronald Thisted and Glen Satten for constructive comments on an earlier draft. Conflict of Interest: None declared.
| REFERENCES |
|---|
|
|
|---|
-
Chen CY and Little RJA. (1999) Proportional hazards regression with missing covariates. Journal of the American Statistical Association 94:896908.[CrossRef][Web of Science]
Chen HY. (2002) Double-semiparametric method for missing covariates in Cox regression models. Journal of the American Statistical Association 97:56576.[CrossRef][Web of Science]
Cox DR and Oakes D. (1984) Analysis of Survival Data(Chapman and Hall, London).
Dinse GE. (1982) Nonparametric estimation for partially-complete time and type of failure data. Biometrics 38:41731.[CrossRef][Web of Science][Medline]
Herring AH and Ibrahim JG. (2001) Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association 96:292302.[CrossRef][Web of Science]
Herring AH, Ibrahim JG, Lipsitz SR. (2002) Frailty models with missing covariates. Biometrics 58:98109.[CrossRef][Web of Science][Medline]
Lin DY and Ying Z. (1993) Cox regression with incomplete covariate measurements. Journal of the American Statistical Association 88:13419.[CrossRef][Web of Science]
Lipsitz SR and Ibrahim JG. (1996) Using the EM-algorithm for survival data with incomplete categorical covariates. Lifetime Data Analysis 2:514.[CrossRef][Medline]
Lipsitz SR and Ibrahim JG. (1998) Estimating equations with incomplete categorical covariates in the Cox model. Biometrics 54:100213.[CrossRef][Web of Science][Medline]
Lipsitz SR and Ibrahim JG. (2000) Estimation with correlated censored survival data with missing covariates. Biostatistics 1:31527.[Abstract]
Little RJA and Rubin DB. (2002) Statistical Analysis with Missing Data 2nd edition (John Wiley and Sons, Hoboken, NJ).
O'Hara Hines RJ and Carter EM. (1993) Improved added variable and partial residual plots for the detection of influential observations in generalized linear models. Applied Statististics 42:320.[CrossRef]
Paik MC and Tsai W-Y. (1997) On using the Cox proportional hazards model with missing covariates. Biometrika 84:57993.
Pugh M, Robins JM, Lipsitz S, Harrington D. (1993) Inference in the Cox proportional hazards model with missing covariate data. Technical report, Department of Biostatistics, Harvard School of Public Health http://www.biostat.harvard.edu/
robins/research.html.
Robins JM, Rotnitzky A, Van Der Laan M. (2000) Comment on `On profile likelihood', by SA Murphy and AW van der Vaart. Journal of the American Statistical Association 95:47782.[CrossRef][Web of Science]
Schluchter MD and Jackson KL. (1989) Log-linear analysis of censored survival data with partially observed covariates. Journal of the American Statistical Association 84:4252.[CrossRef][Web of Science]
Wang CY and Chen HY. (2001) Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57:4149.[CrossRef][Web of Science][Medline]
Zhou H and Pepe MS. (1995) Auxiliary covariate data in failure time regression. Biometrika 82:13949.
Received February 16, 2006; revised July 5, 2006; accepted for publication July 10, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



