Skip Navigation


Biostatistics Advance Access originally published online on August 3, 2005
Biostatistics 2006 7(2):182-197; doi:10.1093/biostatistics/kxi047
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/2/182    most recent
kxi047v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Cai, T.
Right arrow Articles by Jenny, N. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cai, T.
Right arrow Articles by Jenny, N. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oupjournals.org.

The sensitivity and specificity of markers for event times

Tianxi Cai*

Department of Biostatistics, Harvard University, Boston, MA 02115, USA tcai{at}hsph.harvard.edu

Margaret Sullivan Pepe and Yingye Zheng

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA

Thomas Lumley

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA

Nancy Swords Jenny

Department of Pathology, College of Medicine, University of Vermont, Burlington, VT 05405, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELING AND ESTIMATION...
 3. GENERAL SEMIPARAMETRIC...
 4. SIMULATION STUDIES AND...
 5. REMARKS
 REFERENCES
 
The statistical literature on assessing the accuracy of risk factors or disease markers as diagnostic tests deals almost exclusively with settings where the test, Y, is measured concurrently with disease status D. In practice, however, disease status may vary over time and there is often a time lag between when the marker is measured and the occurrence of disease. One example concerns the Framingham risk score (FR-score) as a marker for the future risk of cardiovascular events, events that occur after the score is ascertained. To evaluate such a marker, one needs to take the time lag into account since the predictive accuracy may be higher when the marker is measured closer to the time of disease occurrence. We therefore consider inference for sensitivity and specificity functions that are defined as functions of time. Semiparametric regression models are proposed. Data from a cohort study are used to estimate model parameters. One issue that arises in practice is that event times may be censored. In this research, we extend in several respects the work by Leisenring et al. (1997) that dealt only with parametric models for binary tests and uncensored data. We propose semiparametric models that accommodate continuous tests and censoring. Asymptotic distribution theory for parameter estimates is developed and procedures for making statistical inference are evaluated with simulation studies. We illustrate our methods with data from the Cardiovascular Health Study, relating the FR-score measured at enrollment to subsequent risk of cardiovascular events.

Keywords: Biomarker; Classification accuracy; Time-dependent discriminatory measure; Transformation models


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELING AND ESTIMATION...
 3. GENERAL SEMIPARAMETRIC...
 4. SIMULATION STUDIES AND...
 5. REMARKS
 REFERENCES
 
The use of clinical and laboratory data to predict future patient events is a very popular idea in medicine at present. Biomarkers are under development to detect cancer before the onset of clinical disease (Pepe et al., 2001Go). Gene expression profiles of tumor tissue promise to be predictive of survival in cancer patients (van't Veer et al., 2002Go). Clinical scores such as the Framingham risk score (FR-score) (Wilson et al., 1998Go) are considered predictive of myocardial infarction and stroke. It is critical to evaluate the sensitivity and specificity of such predictors or markers before adopting them for use in clinical practice.

The literature on evaluating the accuracy of a marker, predictor, or diagnostic test result, Y, deals primarily with settings where Y is measured concurrently with the gold standard disease variable D. The true positive rate (TPR) and false positive rate (FPR) functions are

Formula

where D = 1 indicates the presence of disease, D = 0 denotes its absence, and the threshold y is used to define a positive test result as Y ≥ y. If Y is continuous, a Receiver Operating Characteristic (ROC) curve that plots TPR(y) versus FPR(y) for all possible values of y is often used to describe the discriminatory capacity of Y.

The notions of TPR and FPR must be extended when the outcome is an event time random variable T and the time at which Y is measured relative to T can vary. Indeed, the timing of the measurement, denoted by s, is likely to impact on the capacity of Y(s) to predict T. Measurements made closer to the event time are likely to be more predictive. We define the time-dependent TPR function:

Formula 1(1.1)

where the time lag is t and {tau} is some prespecified suitably large time. For a subject that has an event at T, TPRt, s(y) is the probability of test positive with the marker at t time units prior to the event. Typically, the TPR function will be a monotone decreasing function of t. The FPR, or 1 – specificity, relates to subjects without events, or at least event free by {tau}, after the marker is measured. Therefore, we define

Formula 2(1.2)

These definitions are consistent with those used by Balasubramanian and Lagakos (2001)Go, Leisenring et al. (1997)Go, and Etzioni et al. (1999)Go and are commonly used in practice. In breast cancer screening research for example {tau} = 2 years is typically used. The rationale is that it can be assumed that if clinical disease does not emerge by 2 years after screening, the subject was free of subclinical disease at the time of screening. Heagerty et al. (2000)Go define cumulative incidence-based TPR and FPR functions:

Formula 3(1.3)


Formula 4(1.4)

However, the definitions in (1.1) and (1.2) lead to more straightforward regression modeling procedures and are easier to interpret as functions of time than their cumulative incidence-based counterparts. For extensive discussion, see Pepe (2003Go, pp. 259–265). Moreover, we can calculate (1.3) and (1.4) from (1.1) and (1.2) with knowledge of the event time distribution. Thus, here we concentrate on (1.1) and (1.2) initially, and return to (1.3) and (1.4) in the example.

In this paper we consider models for the time-dependent (TPR, FPR) functions in (1.1) and (1.2) and procedures to make inference about them from prospective cohort studies. The marker Y may be measured at multiple times for a subject, covariates Z that affect TPR and FPR may be available, and the event time T can be right censored. We extend the marginal regression modeling approach of Leisenring et al. (1997)Go which deals only with binary markers Y and uncensored failure times. We develop our method in a simplified setting in Section 2. The more general setting is discussed in Section 3. Results from simulation studies described in Section 4 suggest that the procedures work well in finite samples when the assumptions hold. In the second part of Section 4 we apply the methods to data from the Cardiovascular Health Study (CHS), a prospective cohort study of older adults (Fried et al., 1991Go). We investigate the sensitivity (TPR) and specificity (1 – FPR) of the FR-score as a marker for future cardiovascular events in this population. As expected, we show that the score is better at discriminating short-term than long-term risk and that it works better in females than males. However, the score is not a very accurate predictor in any subgroup studied. We close in Section 5 with a discussion of alternative approaches to the evaluation of markers for event time data.


    2. MODELING AND ESTIMATION IN A SIMPLE SETTING
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELING AND ESTIMATION...
 3. GENERAL SEMIPARAMETRIC...
 4. SIMULATION STUDIES AND...
 5. REMARKS
 REFERENCES
 
We only consider marker measurements made prior to the event time, Y(s) for s ≤ T. The event time can be right censored by a censoring variable C that is independent of T conditional on the covariates. Define the observation time X = min( T, C) and {delta} = I(T ≤ C). First, we consider the simple scenario with Y binary and measured at baseline, s = 0, for each person. The data for analysis are

Formula 4

where Zi is the p x 1 covariate vector. We assume the following models for TPR and FPR:

Formula 5(2.1)


Formula 6(2.2)

where Formula 6 and Formula 6 are specified inverse link functions. The dependence of the TPR on time is through the parametric function {eta}{alpha}0(t) = {alpha}'0{eta}(t), where {eta} is a vector of polynomials or spline basis functions. The FPR is not a function of time. The TPR and FPR, (2.1) and (2.2), are mathematically distinct functions. Equation (2.1) conditions on T = t and (2.2) accumulates over all subjects for whom T > {tau}. This is analogous to the approach taken by Hogan and Laird (1997)Go in the mixture model framework where they reserved a multinomial category for subjects who have not yet experienced an event by the end of the study. Note that we make no assumption about P(Y = 1|T = t, Z) for t > {tau}. Separate modeling of TPR and FPR is often undertaken with non-time-dependent disease because the behavior of the marker may be very different in cases from in controls. See Carney et al. (2003)Go for an example. In our setting, cases are subjects who have an event in [0, {tau}], while controls are subjects who have an event in ({tau}, {infty}) or never have an event in their lifetimes (denoted by T = {infty}). For events such as cancer incidence or cardiovascular events, a sizable fraction of the population would fall in the latter category. We may write the FPR as

Formula 6

Thus, we observe that even if the model (2.1) were extrapolated to hold over t > {tau}, this information could not usefully enter into the specification of (2.2) as all the remaining components of FPR{tau}, Z are not separately identifiable. We can never know which subjects will remain event free forever, their biomarker distributions (that are likely to be different from those who have an event in their lifetime), or the event time distribution beyond the end of the study. The composite probability FPR{tau}, Z, however, is identifiable and we therefore model it separately from (2.1). We also note that even if P(T = {infty}) = 0, it is not realistic to extrapolate (2.1) over ({tau}, {infty}) because typically censoring does not allow estimation of TPR or the survivor distribution over (0, {infty}).

Denote all the parameters in (2.1) and (2.2) by {psi}0. Recall that (2.1) models the distribution of Yi conditional on {Ti, Zi} when Ti ≤ {tau}, and (2.2) models the distribution of Yi conditional on {Ti > {tau}, Zi}. Therefore, the likelihood of the data is

Formula 7(2.3)

where the conditional probability pi({psi}0) = P(Yi = 1|Xi {wedge} {tau}, I(Xi ≤ {tau}), {Delta}i, Zi) is

Formula 8(2.4)

and SZ(t) = P(T > t|Z). We can think of the first and third groups as cases and controls, respectively, while the case/control status of the second group, i.e. those censored in [0, {tau}], is unknown.

A consistent estimator of {psi}0 can be obtained by using only data from subjects in the first and third groups. This is the approach taken by Leisenringet al. (1997)Go. To incorporate observations from subjects who are censored before {tau}, one needs to estimate SZ(·) since the likelihood contributions from these subjects involve SZ(·). To this end, we assume a proportional hazards model for Ti|Zi:

Formula 9(2.5)

where {lambda}(t|Zi) is the hazard function for the ith subject and the baseline hazard function {lambda}0(t) is unspecified. Then, SZi(t) = P(T ≥ t|Zi) = exp { – {Lambda}0(t) exp( {gamma}0'Zi)}, where {Lambda}0(·) is the baseline cumulative hazard function. Under the Cox model (2.5), the survival function SZ(t) can be consistently estimated as

Formula 9

where Formula 9 is the maximum partial likelihood estimator based on {Xi, {delta}i, Zi; i = 1, ..., n}, and Formula 9 is the Breslow estimate of the cumulative baseline hazard function {Lambda}0(t) (Fleming and Harrington, 1991Go). Plugging in the estimated survival function to (2.4), we obtain approximate conditional probabilities, Formula 9. Then, {psi}0 can be estimated by maximizing the approximated likelihood function, or equivalently, as the solution to the approximated score equation,

Formula 10(2.6)

We note that other regression models for T can be used. Estimation of {psi}0 requires only a consistent estimate of Sz(t) and does not rely on the proportional hazards model assumption.


    3. GENERAL SEMIPARAMETRIC FRAMEWORK
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELING AND ESTIMATION...
 3. GENERAL SEMIPARAMETRIC...
 4. SIMULATION STUDIES AND...
 5. REMARKS
 REFERENCES
 

3.1 Model assumptions

We now generalize to a continuous marker, Y, and allow measurements at various times sik, k = 1, ..., Ki, for subject i. We model the marginal probability associated with (Yik, Tik, Zi), where Yik is the marker measured at sik and Tik = Tisik is the time lag between the measurement time and the occurrence of the event. Consider the following marginal probability models:

Formula 11(3.1)


Formula 12(3.2)

where h0 and c0 are baseline functions of the threshold y that are completely unspecified. The TPR function is allowed to depend on both the time lag Tik and the measurement time sik. The FPR function depends on only the measurement time and includes all such measurements for which an event did not occur in at least the subsequent {tau} time units. The dependence on time is through the parametric functions {eta}{alpha}0(t, s) = {alpha}0'{eta}(t, s), {xi}{alpha}0(s) = {alpha}0'{xi}(s), where {eta} and {xi} are vectors of polynomial or spline basis functions. In many applications, the sensitivity TPRt, s, z(·) of the marker depends only on the time lag t. However, in some applications the absolute time of the marker measurement may also affect the TPR functions. Examples include settings where s = 0 denotes entry into an intervention study or if s denotes the subject's age which is associated with the marker distribution. In the previous section, where sik is a constant for all observations (e.g. sik = 0), we set {xi}{alpha}0(s) = 0. Similar to the binary setting, (3.1) models the distribution of Y(s) conditioning on T = s + t where the time lag t ≤ {tau} and (3.2) models the distribution of Y(s) averaged over subjects with time lag greater than {tau}, T > s + {tau}, and we make no assumptions about Y(s) conditioning on Ts = t for t > {tau}. Different measurements taken for each subject may contribute to the TPR or FPR functions depending on how close the measurement is taken to the disease onset time.

The models written in (3.1) and (3.2) assume that covariate effects are additive and do not depend on the thresholding value y. However, one can allow covariate effects to vary with the thresholding value by including interactions between covariates and parametric functions of y. This is similar to relaxing the proportional hazards assumption of the Cox model for failure time data by including interactions between covariate and parametric functions of time. Our estimating procedures apply to the more general model, but to keep notations simple we work with the simpler model here.

The nonparametric baseline functions of y, h0(·), and c0(·) essentially define the shape and location of the sensitivity and specificity functions, while the parameters ß0 and b0 quantify covariate effects on them and {eta}{alpha}0(·, ·) and {xi}{alpha}0(·) quantify time effects. Note that this type of model corresponds to the marginal semiparametric transformation model (Dabrowska and Doksum, 1988aGo,bGo; Cheng et al., 1995Go, 1997Go; Scharfstein et al., 1998Go; Cai et al., 2000Go). That is, models (3.1) and (3.2) can be represented as

Formula 12

where P({varepsilon}Dik ≥ y|Tik, Zi, sik) = gD(y) and Formula 12

3.2 Estimating the model components

Let {psi}0 = {H0(·) = [h0(·), c0(·)]', {theta}0 = [{alpha}0', ß0', {alpha}0', b0']'} denote all the unknown parameters. We base inference for {psi}0 on indicator variables I(Yik ≥ y) since models (3.1) and (3.2) essentially relate its conditional expectation pik(y;{psi}0) to the parameters of interest {psi}0. Similar to the binary case, the probability pik(y;{psi}) depends on the case/control/censored status of the observation and can be derived from the models (3.1) and (3.2). To estimate the nonparametric baseline functions, H0(y), we consider the marginal binomial likelihood based on the binary variable I(Yik ≥ y), Formula 12, and solve the corresponding score equation

Formula 13(3.3)

where Formula 13, {rho} is a prespecified nonnegative weight, {delta}ik = {delta}iI(Xisik ≤ {tau}) + I(Xisik > {tau}) and Formula 13, Formula 13 are predetermined constants such that P(Yik < Formula 13) and P(Yik > Formula 13) are both positive. To estimate {theta}0, we propose to solve

Formula 14(3.4)

where Formula 14 Formula 14 is some increasing function that can depend on the data but converges asymptotically to a deterministic function v(y) uniformly in y isin [Formula 14, Formula 14]. The basic idea then is to solve (3.3) and (3.4) simultaneously to estimate the parameters in the models (3.1) and (3.2).

Observe that we now include in the estimating equations a weighting factor {rho} that dictates the extent to which the censored observations in [0, {tau}] enters into the analysis (we have Formula 14 for known cases and controls because 1 – {delta}ik = 0 for them). If {rho} is set to 0, then censored observations are excluded entirely from the analysis which corresponds to the Leisenring et al. (1997)Go approach. The score equation given in (2.6) for the binary case essentially sets {rho} to 1. Increasing {rho} allows censored observations to have more influence on estimation. A variety of values for {rho} are investigated later in simulation studies. Again, when {rho} > 0, an estimate of the survivor function SZi(·) is required in order to approximate the probabilities pik(y;{psi}) for censored observations. In summary, we propose a two-step approach to estimate {psi}0:

(1) Estimation of {psi}0 when {rho} = 0.
Let Formula 14 denote the solution to the estimating (3.3) and (3.4) when {rho} = 0, i.e. when the censored observations with Xisik ≤ {tau} and {delta}i = 0 are ignored. {{alpha}0, ß0, h0(·)} are estimated using only the cases, observations with Xisik ≤ {tau} and {delta}i = 1, and {{alpha}0, b0, c0(·)} are estimated using only the controls, i.e. observations with Xisik > {tau}. We show in Appendix A of Cai et al. (2003)Go that Formula 14 is a consistent estimator of {psi}0 for y isin [ Formula 14, Formula 14].
(2) Estimation of {psi}0 when {rho} > 0.
To include observations with {delta}ik = 0 in estimating {psi}0, we set {rho} > 0 and approximate pik(y;{psi}) for those censored subjects by using the estimated survivor function Sz(·) as in the binary case, assuming a proportional hazards model (2.5). Then we solve (3.3) and (3.4) using the weight functions WikH(y;{psi}) and Wik{theta}(y;{psi}) evaluated at Formula 14. Let Formula 14 denotes the final estimator for {psi}0.

3.3 Inference in large samples

We show in Appendix A of Cai et al. (2003)Go that Formula 14 is unique for large n and is consistent. Furthermore, Formula 14 is asymptotically equivalent to Formula 14, where A and Uij are defined in Appendices A and B of Cai et al. (2003)Go, respectively. It follows from properties of U-statistics (Serfling, 1980Go) that Formula 14 is approximately normal with mean 0 and variance Formula 14. Now, let Formula 14 and Formula 14 be the matrices obtained by replacing all the theoretical quantities in Formula 14 and {Sigma} with their empirical counterparts. Then, the covariance matrix of Formula 14 can be approximated by Formula 14.

We now turn to inference about the TPR and FPR functions, which depend on time and on covariate z. Substituting Formula 14 into (3.1) and (3.2), we propose estimating TPRt, s, z(y) and FPR{tau}, s, z(y) as

Formula 14

To obtain point-wise and simultaneous confidence intervals for the TPR and FPR functions, we show in Appendix B of Cai et al. (2003)Go that the process

Formula 14

is asymptotically equivalent to

Formula 14

where Hij(y) is defined in Appendix B of Cai et al. (2003)Go. This allows us to approximate the distribution of the process Q(y;t, s, z) using resampling techniques (Parzen et al., 1994Go) in practice. This technique avoids the need to derive explicit analytic expressions for variance–covariance processes, which seem intractable in our setting. Moreover, relative to other resampling methods such as the bootstrap, the computational burden is minimal. A detailed description of the procedure for constructing confidence bands based on the resampling method can be found in Cai and Pepe (2002)Go.


    4. SIMULATION STUDIES AND EXAMPLE
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELING AND ESTIMATION...
 3. GENERAL SEMIPARAMETRIC...
 4. SIMULATION STUDIES AND...
 5. REMARKS
 REFERENCES
 

4.1 Simulation studies

Simulation studies were performed to examine the finite sample properties of the estimation procedures proposed in the previous sections and to investigate the impact of {rho} on their efficiencies. The results suggest that our methods provide reasonably unbiased estimates of both the model parameters and (TPR, FPR) functions. The choice of {rho} affects the efficiency of all parameters of interest, but seems to have the most impact on Formula 14, less on Formula 14, and very little on Formula 14 in the settings we considered. Recall that when {rho} = 0, censored observations are ignored for estimation. By including censored observations, i.e. using {rho} > 0, we find that the estimates are almost always more precise than those calculated with {rho} = 0. See Cai et al. (2003)Go for details on simulation results.

To choose an optimal weighting of censored data in practice, one could use an ‘optimization procedure’ to minimize the total mean-squared error Formula 14. That is, one selects the value of {rho} that corresponds to the smallest value for the estimated sum of squared errors. Using this criterion, in our simulations the weight of choice is {rho} = 1 in both configurations. We note that for the binary case, weight {rho} = 1 corresponds to the likelihood and thus is the optimal weight. It is less clear what the optimal weight is in the continuous case based on semiparametric models. If the baseline functions were known, {rho} = 1 would still correspond to the likelihood and thus is optimal. However, when the baseline functions are unknown and are estimated based on estimating equations, it is not clear what the ‘optimal’ weight {rho} is and thus we suggest choosing {rho} to minimize the mean-squared error. Alternatively, one could minimize other quantities such as the total coefficient of variation Formula 14, where P is the dimension of {theta} and Formula 14 is the jth component of Formula 14. We also note that the estimators are consistent regardless of the choice of {rho}.

4.2 Example: the CHS

The CHS is a population-based observational prospective study of elderly adults (age ≥ 65 at enrollment) in the United States. A full description of the design of CHS is reported in Fried et al. (1991)Go. The analysis here includes 3967 subjects (1531 males) between 65 and 95 years old who were free of clinical cardiovascular events at enrollment. There were 585 (14.7%) who experienced a cardiovascular event during the study. Follow-up on subjects without cardiovascular events averaged 6.75 years (standard deviation = 1.58 years). Here we consider the FR-score as a predictor for cardiovascular events. The FR-score is a widely used clinical prediction score used to quantify risk for cardiovascular events (Grundy et al., 1998Go, 2001Go). It includes information on age, cigarette smoking, blood pressure, diabetes mellitus, blood cholesterol, and high-density lipoprotein cholesterol. Separate score sheets are used for men and women.

The FR-score was evaluated for all subjects at enrollment, and we only include this baseline assessment in this analysis. Thus, s = 0 for all observations. We considered gender and medication for hypertension as covariates that might have a substantial influence on the predictive accuracy of the FR-score. We fit the following models for the time-dependent TPR and FPR functions:

Formula 14


Formula 14

where Z1 = 1 for subjects on medication for hypertension at enrollment and 0 otherwise, Z2 = 1 for males and 0 otherwise, and we drop the subscript ‘s’ relative to the general model forms (3.1) and (3.2) because Y, the FR-score, is measured only at baseline, s = 0. We choose {tau} = 7 years in the analysis. That is, we investigated the predictive accuracy of the FR-score for events during the 7 years subsequent to enrollment. Subjects without an event by the 7th year of observation after the FR-score is measured are considered to be controls for the purposes of calculating the FPR.

There is little loss to follow-up in CHS. However, 487 subjects in the sample died from other causes without a cardiovascular failure. Rather than censoring these survival times at death which would imply that they were subsequently at risk for cardiovascular events, we censor them at the end of the trial. Since 1 – SZ(t) represents the marginal probability of a cardiovascular event by time t, it is appropriate to include them as definite nonevents in the estimation of SZ(t) (Pepe and Mori, 1993Go).

The estimated regression coefficients and their estimated standard errors are shown in Table 1. The estimated FPR and TPR functions for different groups at t = 1 and t = 5 years are shown in Figure 1 and their corresponding ROC curves are presented in Figure 2. The lack of a gender effect on the FPR functions (b2 = 0) indicate that in order to control the FPR at a particular value, the same threshold value y can be used for males and females. The positive value of ß2 indicates, however, that for a given threshold value, the FR-score appears to have higher sensitivity in females than in males for detecting subsequent cardiovascular events but similar specificity. The estimated FPR and TPR functions shown in Figure 1 illustrate this. Medication use appears to be associated with higher FPR and TPR since both the coefficient estimates b1 and ß1 are negative. The medication effect on the TPR is less than that on the FPR. This translates into the ROC curve for those on medication being less than for those who are not on medication, as illustrated in Figure 2.


View this table:
[in this window]
[in a new window]
 
Table 1. Estimated regression coefficients and their estimated standard errors for the TPR and FPR models of the FR-score as a predictor for cardiovascular events within 7 years

 

Figure 1
View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1. Estimated TPR/FPR functions of the FR-score for all four groups: female subjects (solid curves)/male subjects (dashed curves) who are on medication (thicker curves)/not on medication (thinner curves).

 

Figure 2
View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2. Plots of the ROC curve: the TPR function versus the FPR function at t = 1 and t = 5 years. Shown are plots for females (solid curves) and for males (dashed curves). Thicker curves are for those on medication and thinner curves are for those not on medication.

 
In other words, the FR-score is better at distinguishing between subjects who will and will not have a cardiovascular event when they are not on medication. We have noted that given the same FPR, the corresponding TPR is substantially higher in females than in males. As shown in Table 2, the area under the ROC curve (AUC) in females is about 0.07 (sd = 0.02) higher than that in males in both medication groups at year 1. The gender effect is statistically significant. However, the medication effect is not. The AUC in subjects on medication is only about 0.03(sd = 0.02) lower than that in subjects not on medication (for both male and female) at year 1.


View this table:
[in this window]
[in a new window]
 
Table 2. Estimated AUCs and their standard errors. Shown also are the differences in AUCs between groups and between years. {Delta}Year, {Delta}Gender, and {Delta}Med denote the difference in AUC between year 1 and year 5, between female and male, and between no medication and with medication, respectively

 
Perhaps the most important and disappointing result of our analysis, however, is that the FR-score is a very inaccurate marker for cardiovascular events. The ROC curves in Figure 2 demonstrate that the benefit of a high TPR can only be achieved at the expense of an accompanying high FPR and vice versa. It does not have adequate sensitivity and specificity (at any threshold) for accurate individual-level prediction of cardiovascular events. Figure 3 displays the estimated TPR functions for events at t = 1 and 5 years after the FR-score is measured in male subjects who are not on hypertension medication. These curves indicate that for any positivity threshold y, the sensitivity of the FR-score is higher for events that occur at 1 year after enrollment than at 5 years after enrollment. In particular for these men, the threshold criterion FR-score > 10 which identifies 45% of subjects with events at 1 year after enrollment, identifies only 36% of subjects with events at 5 years. The FPR functions for those still without events at 7 years after enrollment is also shown in Figure 3. Of those subjects, 30% also meet criterion that FPR-score > 10.


Figure 3
View larger version (13K):
[in this window]
[in a new window]
 
Fig. 3. Estimated TPR/FPR functions of the FR-score for male subjects who are not on medication. The sensitivities (TPR) for events at t = 1 and t = 5 years after the FR-score is measured are displayed. Shown also are their 95% simultaneous confidence bands (shaded regions).

 
We next calculate the cumulative incidence-based TPR and FPR functions defined in Section 1 and derive the area under the corresponding ROC curves. Estimators for Formula 14 and Formula 14 can be derived from our proposed models in (3.1) and (3.2) noting the following identities:

Formula 14


Formula 14

The definitions of the cumulative incidence-based TPR and FPR functions specify cases and controls at time t as subjects with T ≤ t and subjects with T > t, respectively. The results are presented in Figure 4 and Table 3. At any given positivity threshold, both the TPR and FPR of the FR-score are higher for t = 1 year since enrollment than for t = 5 years since enrollment. The FR-score is better at distinguishing subjects who fail within 1 year from those who do not than distinguishing subjects who fail within 5 years from those who do not. In particular, for female subjects who are on hypertension medication, the AUC for the cumulative incidence ROC curve is 0.68 (sd = 0.03) when t = 1 year and is 0.61 (sd = 0.02) when t = 5 years. Comparing the AUC between 1 and 5 years for this subgroup, we have a difference of 0.07 with standard error 0.02. Shown also in Figure 4 are the nonparametric estimates (Heagerty et al., 2000Go) of the ROC curves as well as the FPR functions using data from each subgroup. The nonparametric estimates are reasonably close to the model-based estimates suggesting that the assumed semiparametric models are reasonable.


Figure 4
View larger version (20K):
[in this window]
[in a new window]
 
Fig. 4. Semiparametric (solid curves) and nonparametric estimates (dashed curves) of the cumulative incidence-based ROC curve for t = 1 and t = 5 years. Shown also are the semiparametric (solid curves) and nonparametric estimates (dashed curves) of the FPR functions.

 

View this table:
[in this window]
[in a new window]
 
Table 3. Estimated AUCs for the cumulative incidence-based ROC curves and their standard errors. Shown also are the differences in AUCs between groups and between years. {Delta}Year, {Delta}Gender, and {Delta}Med denote the difference in AUC between year 1 and year 5, between female and male, and between no medication and with medication, respectively

 

    5. REMARKS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELING AND ESTIMATION...
 3. GENERAL SEMIPARAMETRIC...
 4. SIMULATION STUDIES AND...
 5. REMARKS
 REFERENCES
 
Statistical models for the joint analysis of longitudinal biomarkers and time to disease onset have been studied extensively in the past few decades (e.g. Pawitan and Self, 1993Go; De Gruttola and Tu, 1994Go; Tsiatis et al., 1995Go; Faucett and Thomas, 1996Go; Wulfsohn and Tsiatis, 1997Go; Hogan and Laird, 1997Go; Henderson et al., 2000Go; Skates et al., 2001Go; Wang and Taylor, 2001Go; Henderson et al., 2002Go). See Hogan and Laird (1997)Go for discussion of two broad classes of models, namely selection models and pattern mixture models. Most of the existing methods in this area require parametric modeling of the marker process over time and a joint parametric model for the distribution of the event time. To induce models for the association between the marker and event time process, both mixture models and selection models rely on specification of distributional assumptions for random effects or latent stochastic processes. In contrast, we use marginal semiparametric models for the marker distribution given the event time and for the event time distribution. The approach does not model marker processes and hence is more flexible. We estimate the regression parameters and the nonparametric baseline functions simultaneously based on estimating equations and incorporate censoring by integrating over time.

There are several directions in which the methodology proposed here should be enhanced. First, procedures to assess model fit and to assist in model selection need development. Although the models are semiparametric, there is an assumption that covariate effects are additive on the scale of the link function and we did not test the adequacy of this assumption in our data analysis. Second, we have restricted to time-independent covariates. It would be of interest to generalize the models to include time-varying covariates. This is straightforward if covariates are external but more complicated when covariates are internal in the sense defined by Kalbfleisch and Prentice (2002)Go. Third, we have restricted to biomarkers measured prior to the occurrence of the event. However, in some applications, such as in infectious disease research, there would be interest in using the biomarker to detect the event as soon as possible after T. Fourth, we do not allow censoring of T to depend on the marker Y conditional on the covariates. To do so would induce bias into the estimated TPR and FPR, commonly referred to as ‘verification bias’ in the diagnostic testing literature (Begg and Greenes, 1983Go). Studies that seek to evaluate the sensitivity and specificity of a marker should be designed so that follow-up does not depend on the marker. However, extensions that relax this requirement somewhat could be possibly developed using inverse probability weighting, for example (Robins et al., 1995Go). Finally, our methods can be used to assess the relative performance of two markers. One would fit separate models for each of the markers and plot the induced ROC curves. Comparisons could then be made using AUC statistics, for example. It would be interesting to explore this further with real data.


    ACKNOWLEDGMENTS
 
The authors are grateful to reviewers for constructive comments that lead to substantial improvement to the paper. Support for this research was provided by National Institutes of Health grants GM-54438 and AI29168. Participating institutions and principal investigators for CHS are: Wake Forest University School of Medicine: Gregory L. Burke, MD; Wake Forest University–ECG Reading Center: Pentti M. Rautaharju, MD, PhD; University of California, Davis: John Robbins, MD, MHS; The Johns Hopkins University: Linda P. Fried, MD, MPH; The Johns Hopkins University–MRI Reading Center: Nick Bryan, MD, PhD, Norman J. Beauchamp, MD; University of Pittsburgh: Lewis H. Kuller, MD, DrPH; University of California, Irvine–Echocardiography Reading Center (baseline): Julius M. Gardin, MD; Georgetown Medical Center–Echocardiography Reading Center (follow-up): John S. Gottdiener, MD; New England Medical Center, Boston–Ultrasound Reading Center: Daniel H. O'Leary, MD; University of Vermont–Central Blood Analysis Laboratory: Russell P. Tracy, PhD; University of Arizona, Tucson–Pulmonary Reading Center: Paul Enright, MD; Retinal Reading Center–University of Wisconsin: Ronald Klein, MD; University of Washington–Coordinating Center: Richard A. Kronmal, PhD; NHLBI Project Office: Jean Olson, MD, MPH. The CHS research was supported by contracts N01-HC-85079 through N01-HC-85086, N01-HC-35129, and N01-HC-15103 from the National Heart, Lung, and Blood Institute.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELING AND ESTIMATION...
 3. GENERAL SEMIPARAMETRIC...
 4. SIMULATION STUDIES AND...
 5. REMARKS
 REFERENCES
 

    BALASUBRAMANIAN, R. AND LAGAKOS, S. (2001). Estimation of the timing of perinatal transmission of HIV. Biometrics 57, 1048–1058.[Medline]

    BEGG, C. AND GREENES, R. (1983). Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 39, 207–215.[CrossRef][Web of Science][Medline]

    CAI, T. AND PEPE, M. S. (2002). Semiparametric receiver operating characteristic analysis to evaluate biomarkers for disease. Journal of the American Statistical Association 97, 1099–1107.[CrossRef]

    CAI, T., PEPE, M. S., LUMLEY, T., ZHENG, Y. AND JENNEY, N. S. (2003). The sensitivity and specificity of markers for event times. UW Biostatistics Working Paper Series.

    CAI, T., WEI, L. J. AND WILCOX, M. (2000). Semi-parametric regression analysis for clustered failure time data. Biometrika 87, 867–878.[Abstract/Free Full Text]

    CARNEY, P., MIGLIORETTI, D., YANKASKAS, B., KERLIKOWSKE, K., ROSENBERG, R. AND RUTTER, C. (2003). Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening. Mammography 138, 168–175.

    CHENG, S. C., WEI, L. J. AND YING, Z. (1995). Analysis of transformation models with censored data. Biometrika 82, 835–845.[Abstract/Free Full Text]

    CHENG, S. C., WEI, L. J. AND YING, Z. (1997). Prediction of survival probabilities with semi-parametric transformation models. Journal of the American Statistical Association 92, 227–235.

    DABROWSKA, D. AND DOKSUM, K. (1988a). Estimation and testing in a two-sample generalized odds-rate model. Journal of the American Statistical Association 83, 744–749.

    DABROWSKA, D. AND DOKSUM, K. (1988b). Partial likelihood in transformation models with censored data. Scandinavian Journal of Statistics 15, 1–23.

    DE GRUTTOLA, V. AND TU, X. M. (1994). Modelling progression of Cd4-lymphocyte count and its relationship to survival time. Biometrics 50, 1003–1014.[CrossRef][Web of Science][Medline]

    ETZIONI, R., PEPE, M. S., LONGTON, G., HU, C. AND GOODMAN, G. (1999). Incorporating the time dimension in receiver operating characteristic curves: a case study of prostate cancer. Medical Decision Making 19, 242–251.[Abstract/Free Full Text]

    FAUCETT, C. L. AND THOMAS, D. C. (1996). Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine 15, 1663–1685.[CrossRef][Web of Science][Medline]

    FLEMING, T. R. AND HARRINGTON, D. P. (1991). Counting Processes and Survival Analysis. New York: Wiley.

    FRIED, L. P., BORHANI, N., ENRIGHT, P., FURBERG, C., GARDIN, J., KRONMAL, R., KULLER, L., MANOLIO, T., MITTELMARK, M., NEWMAN, A. et al. (1991). The cardiovascular health study: design and rationale. Annals of Epidemiology 1, 263–276.[Medline]

    GRUNDY, S., BALADY, G., CRIQUI, M., FLETCHER, G., GREENLAND, P., HIRATZKA, L., HOUSTON-MILLER, N., KRIS-ETHERTON, P., KRUMHOLZ, H., LAROSA, J. et al. (1998). Primary prevention of coronary heart disease: guidance from Framingham—a statement for healthcare professionals from the AHA Task Force on risk reduction. Circulation 97, 1876–1887.[Free Full Text]

    GRUNDY, S., D'AGOSTINO, R., MOSCA, L., BURKE, G., WILSON, P., RADER, D., CLEEMAN, J., ROCCELLA, E., CUTLER, J. AND FRIEDMAN, L. (2001). Cardiovascular risk assessment based on US cohort studies—findings from a National Heart, Lung, and Blood Institute Workshop. Circulation 104, 491–496.[Free Full Text]

    HEAGERTY, P. J., LUMLEY, T. AND PEPE, M. S. (2000). Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344.[CrossRef][Web of Science][Medline]

    HENDERSON, R., DIGGLE, P. AND DOBSON, A. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics 1, 465–480.[Abstract]

    HENDERSON, R., DIGGLE, P. AND DOBSON, A. (2002). Identification and efficacy of longitudinal markers for survival. Biostatistics 3, 33–50.[Abstract]

    HOGAN, J. W. AND LAIRD, N. M. (1997). Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine 16, 259–272.[CrossRef][Web of Science][Medline]

    KALBFLEISCH, J. AND PRENTICE, R. (2002). The Statistical Analysis of Failure Time Data. New York: John Wiley.

    LEISENRING, W., PEPE, M. S. AND LONGTON, G. L. (1997). A marginal regression modelling framework for evaluating medical diagnostic tests. Statistics in Medicine 16, 1263–1281.[CrossRef][Web of Science][Medline]

    PARZEN, M. I., WEI, L. J. AND YING, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika 81, 341–350.[Abstract/Free Full Text]

    PAWITAN, Y. AND SELF, S. (1993). Modeling disease marker processes in AIDS. Journal of the American Statistical Association 88, 719–726.[CrossRef]

    PEPE, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press.

    PEPE, M. S., ETZIONI, R., FENG, Z. D., POTTER, J., THOMPSON, M., THORNQUIST, M., WINGET, M. AND YASUI, Y. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute 93, 1054–1061.[Free Full Text]

    PEPE, M. S. AND MORI, M. (1993). Kaplan–Meier, marginal or conditional probability curves in summarizing competing risks failure time data? Statistics in Medicine 12, 737–751.[Web of Science][Medline]

    ROBINS, J., ROTNITZKY, A. AND ZHAO, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90, 106–121.[CrossRef]

    SCHARFSTEIN, D. O., TSIATIS, A. A. AND GILBERT, P. (1998). Semi-parametric efficient estimation in the generalized odds-rate class of regression models for right-censored time to event data. Lifetime Data Analysis 4, 355–391.[CrossRef][Web of Science][Medline]

    SERFLING, R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley.

    SKATES, S. J., PAULER, D. K. AND JACOBS, I. J. (2001). Screening based on the risk of cancer calculation from bayesian hierarchical change-point models of longitudinal markers. Journal of the American Statistical Association 96, 429–439.[CrossRef]

    TSIATIS, A. A., DEGRUTTOLA, V. AND WULFSOHN, M. S. (1995). Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and Cd4 counts in patients with AIDS. Journal of the American Statistical Association 90, 27–37.[CrossRef]

    VAN'T VEER, L., DAI, H., VAN DE VIJVER, M., HE, Y., HART, A., MAO, M., PETERSE, H., VAN DER KOOY, K., MARTON, M. AND WITTEVEEN, A. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–535.[CrossRef][Medline]

    WANG, Y. AND TAYLOR, J. M. G. (2001). Joint modelling longitudinal and event time data with application to acquired immunodeficiency syndrome. Journal of the American Statistical Association 96, 895–905.[CrossRef]

    WILSON, P., D'AGOSTINO, R., LEVY, D., BELANGER, A., SILBERSHATZ, H. AND KANNEL, W. (1998). Prediction of coronary heart disease using risk factor categories. Circulation 97, 1837–1847.[Medline]

    WULFSOHN, M. S. AND TSIATIS, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics 53, 330–339.[CrossRef][Web of Science][Medline]

    Received August 25, 2004; revised February 24, 2005; revised July 26, 2005; accepted for publication July 27, 2005.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    BiostatisticsHome page
    D. Zeng and D. Y. Lin
    Efficient resampling methods for nonsmooth estimating functions
    Biostat., April 1, 2008; 9(2): 355 - 363.
    [Abstract] [Full Text] [PDF]


    Home page
    BiostatisticsHome page
    T. Cai and S. Cheng
    Robust combination of multiple diagnostic tests for classifying censored event times
    Biostat., April 1, 2008; 9(2): 216 - 233.
    [Abstract] [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/2/182    most recent
    kxi047v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Cai, T.
    Right arrow Articles by Jenny, N. S.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Cai, T.
    Right arrow Articles by Jenny, N. S.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?