Skip Navigation


Biostatistics Advance Access originally published online on June 22, 2005
Biostatistics 2006 7(1):58-70; doi:10.1093/biostatistics/kxi040
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/1/58    most recent
kxi040v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Baker, S. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Baker, S. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2005.

A simple meta-analytic approach for using a binary surrogate endpoint to predict the effect of intervention on true endpoint

Stuart G. Baker

Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD 20892-7354, USA sb16i{at}nih.gov


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
A surrogate endpoint is an endpoint that is obtained sooner, at lower cost, or less invasively than the true endpoint for a health outcome and is used to make conclusions about the effect of intervention on the true endpoint. In this approach, each previous trial with surrogate and true endpoints contributes an estimated predicted effect of intervention on true endpoint in the trial of interest based on the surrogate endpoint in the trial of interest. These predicted quantities are combined in a simple random-effects meta-analysis to estimate the predicted effect of intervention on true endpoint in the trial of interest. Validation involves comparing the average prediction error of the aforementioned approach with (i) the average prediction error of a standard meta-analysis using only true endpoints in the other trials and (ii) the average clinically meaningful difference in true endpoints implicit in the trials. Validation is illustrated using data from multiple randomized trials of patients with advanced colorectal cancer in which the surrogate endpoint was tumor response and the true endpoint was median survival time.

Keywords: Colorectal cancer random effects; Randomized trials


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
A surrogate endpoint is an endpoint that is obtained sooner, at lower cost, or less invasively than the true endpoint and is used to make conclusions about the effect of intervention on a true endpoint for a health outcome. An application trial is a trial in which the surrogate, but not the true, endpoint is observed and used to evaluate the effect of intervention on true endpoint. Before a surrogate endpoint can be confidently used in an application trial, it must be validated using a validation trial in which both the surrogate and true endpoints are observed. In a general sense, a surrogate endpoint is validated if the surrogate endpoint approach and an approach using the true endpoint yield similar conclusions about the effect of intervention on the true endpoint in a validation trial. More precise definitions of surrogate endpoint validation depend on the methodology.

One general framework for the validation and application of surrogate endpoints involves testing the null hypothesis of no effect of intervention on the surrogate endpoint. In this context, a surrogate endpoint is validated if rejection of the null hypothesis for the surrogate endpoints implies rejection of the null hypothesis for the true endpoint in a validation trial. Equivalently, a surrogate endpoint is validated if the contrapositive statement holds, namely the null hypothesis for the true endpoint implies the null hypothesis for the surrogate endpoint in the validation trial. Prentice (1989)Go formulated two criteria that guarantee the validity of the surrogate endpoint in the context of hypothesis testing: (i) the distribution of the true endpoint conditional on the surrogate endpoint is the same for both arms of randomized trial and (ii) an extra condition ensuring that the null hypothesis for the true endpoint implies the null hypothesis for the surrogate endpoint, rather than vice versa as would be obtained if only (i) were satisfied. Buyse and Molenberghs (1998)Go showed that, with binary surrogate and true endpoints, criterion (ii) reduces to the criterion that the surrogate endpoint is associated with the true endpoint. Generally, the focus of attention is on criterion (i) which is also known as the Prentice Criterion (Begg and Leung, 2000Go). If the Prentice Criterion is rejected for an appropriate sample size, one can conclude that the surrogate endpoint is not valid for hypothesis testing. The main challenge with validating surrogate endpoints in a hypothesis testing framework is evaluating the surrogate endpoint when the Prentice Criterion cannot be rejected. To attempt to address this challenge, various measures have been proposed that are related to how much the surrogate endpoint contributes to the prediction of the effect of intervention on true endpoint (e.g. Buyse et al., 2000aGo; Freedman et al., 1992Go; Wang and Taylor, 2002)Go. The downside is that it may be difficult to decide what levels of these measures indicate an appropriately validated surrogate endpoint.

Another general framework for the validation and application of surrogate endpoints involves estimation of the effect of intervention on the true endpoint via a model relating surrogate and true endpoints in one or more previous trials. Although the Prentice Criterion is central to hypothesis testing, it is not required for estimation (Baker and Kramer, 2003Go; Baker et al., 2005Go) although it is sometimes postulated.

One type of model in this estimation framework involves individual-level associations between surrogate and true endpoints in a single previous trial (Morrison, 1991Go; Day and Duffy, 1996Go). A controversy arose from paradoxical results (Day and Duffy, 1996Go; Begg and Leung, 2000Go). However, Baker et al. (2005)Go recently resolved these paradoxes, giving the method a firm inferential foundation. Nevertheless, this approach has a salient disadvantage, it does not capture the variability of parameters over multiple trials.

A second type of model involves the association between principal strata of the surrogate endpoint (the pair of potential outcomes for the surrogate endpoint if each subject were assigned to each of the two arms) and true endpoint (Frangakis and Rubin, 2002Go). Because the potential outcomes are not observed, some assumptions are required to obtain unique maximum likelihood estimates of parameters. These assumptions are analogous to those needed with potential outcome models for noncompliance (e.g. Baker and Kramer, 2005Go). One standard assumption is that the probability of the true endpoint given a pair of identical potential surrogate endpoints does not depend on randomization group. However, this assumption would not hold if there were multiple pathways to the true endpoint. Another disadvantage is that the current formulation only involves a single previous trial, so it does not capture the variability of parameters over previous trials.

A third type of model estimates the associations between summary (trial-level) statistics for surrogate and true endpoints among multiple previous trials. Unlike a simple regression on summary statistics, it incorporates the within-trial correlations and variability between surrogate and true endpoints. When there are separate summary statistics for each randomization group (Gail et al., 2001Go; Buyse et al., 2000aGo), the estimation is computationally difficult, particularly with binary outcomes (Renard et al., 2002Go). Although estimation is computationally simpler when the summary statistic is a difference in intervention effects (Daniels and Hughes, 1997Go; Korn et al., 2005Go), less information is used than with separate models for each arm of the trial, and there may be logical difficulties when an intervention is in the control arm of one trial and in the experimental arm of another trial (Freedman, 2005Go).

This paper proposes a fourth type of model that extends the first type to the meta-analysis of multiple previous trials. The method is a meta-analysis over previous trials of the predicted effect of intervention on true endpoint based on the data in each previous trial. The approach has the desirable features of being simple to implement and easy to understand. This paper also proposes a new criterion for validation, the average prediction error, which is the average absolute difference between the overall predicted effect of intervention on true endpoint and the observed effect of intervention on true endpoint. A desirable feature of this criterion is that it can be compared with an important quantity, the clinically meaningful difference.


    2. ESTIMATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
The same estimation procedure is used for application and validation scenarios. For the application scenario, trial j denotes the application trial with only a surrogate endpoint, and i indexes the previous trials with both surrogate and true endpoints. The goal of the application scenario is to estimate the predicted effect of intervention on true endpoint in application trial j. For the validation scenario, from a set of trials with surrogate and true endpoints, one trial j is selected as the validation trial and i != j indexes the other trials which take the role of previous trials. The first step in validation is to estimate the predicted effect of intervention on true endpoint in each validation trial j. (Later steps in validation are described in the next section.) The basic idea of the estimation procedure for both application and validation trials is to (a) separately estimate the predicted effect of intervention on true endpoint in a trial j using data from each previous trial with surrogate and true endpoints and data on surrogate endpoint in the trial of interest, and then (b) take a weighted average of these estimated predicted intervention effects with weights derived from a random-effects model. The paper focuses on a binary surrogate endpoint, although the method could be extended to a categorical surrogate endpoint, and a true endpoint that is either binary or the median survival time, although the methodology could be applied to other true endpoints such as the mean.

Let Z denote randomization group, S denote a binary surrogate endpoint with realizations of 0 or 1, and T denote a true endpoint. Let {theta}jzs denote a parameter relating surrogate and true endpoints in group z of trial j. If the true endpoint is binary, {theta}jzs = pr(T = 1|S = s, Z = z, trial j). If the true endpoint is the median survival time, {theta}jzs is the median survival time among subjects in arm z of trial j with surrogate endpoint s. Also, let {pi}jz = pr(S = 1|Z = z, trial j), namely the probability the surrogate endpoint equals 1 in arm z of trial j.

Let {Delta}(obs)j denote the effect of intervention on the true endpoint in trial j, where the subscript (obs) indicates that the true endpoint is observed. If the true endpoint is binary, {Delta}(obs)j = pr(T = 1|Z = 1, trial j) – pr(T = 1|Z = 0, trial j). If the true endpoint is the median survival time, {Delta}(obs)j is the difference in median survival times in the two arms of trial j. To motivate further derivations, it is helpful to rewrite {Delta}(obs)j as the following mathematical identity that sums over the two possible states for the binary surrogate endpoint,

(2.1)

The goal is to predict (2.1) using only data from the surrogate endpoint in the validation trial and data on surrogate and true endpoints in previous trials. The surrogate endpoint in trial j provides data to estimate {pi}jz. For previous trial i(i != j), {theta}izs is used to predict {theta}jzs. Substituting {theta}izs for {theta}jzs in (2.1) gives

(2.2)

which is called the predicted intervention effect (for the true endpoint in trial j) from previous trial i.

The predicted intervention effect from trial i is estimated as follows. Let njzs denote the counts in trial j for group z and surrogate endpoint at level s. The parameter {pi}jz is estimated by jz = njz1/njz+, where ‘+’ indicates summation over the indicated subscript. For a binary true endpoint, izst = nizs1/nizs+ with var(izs) = izs(1 –izs)/nizs+, where nizst denotes the number of subjects in trial i for group z, surrogate endpoint at level s, and true endpoint at level t. When the median survival time is the true endpoint, izs is the median of the survival times for trial i, group z and surrogate at level s. An approximate variance is derived in Appendix A. Substituting these estimates into (2) gives the estimated predicted intervention effect from previous trial i,

(2.3)

The estimate of the predicted intervention effect in trial j is a weighted average of the estimated predicted intervention effects from each previous trial,

where

(2.4)

The weights, wij, for i != j, are analogous to those for a standard fixed-effects meta-analysis because they equal the reciprocal of the variance. These weights would minimize the variance of j if the ij were independent (Kendall and Stuart, 1961Go). Although the ij are correlated, this simple formula is attractive because it is difficult to compute nonnegative weights that minimize the variance of j when the ij are correlated.

In all meta-analytic approaches for surrogate endpoints, an important issue is the variability of parameters over trials due to variations in population characteristics or pathways from the surrogate to the true endpoint among various interventions. For the simple meta-analytic model, one could view the {theta}izs parameters as having a distribution over previous trials. However, explicitly modeling the variability among the {theta}izs parameters is difficult. A simpler approach is to model the variability of the {Delta}ij parameters, which indirectly incorporates the variability of the {theta}izs parameters. The underlying assumption is that {Delta}ij follows a normal distribution with mean {Delta}j and variance Let denote a vector of {Delta}ij for all i != j. The variance of conditional on is the (k 1) x (k – 1) matrix Vj, where the diagonal elements are given by (4) and the off-diagonal elements (i != i') are

(2.5)

Under the random-effects model, the variance of is where I is a (k – 1) x (k – 1) identity matrix. By analogy with a standard random-effects meta-analysis of true endpoints, the estimated predicted intervention effect adjusted for random effects is the following weighted average of the estimated predicted intervention effects for each previous trial i,

(2.6)

where the subscript (pred) indicates the predicted intervention effect. The new weights for i != j, are the reciprocals of the overall variance for trial i, which equals the variance from the random effects plus the sampling variance for trial i. Estimation of is based on a method-of-moments procedure similar to that for a standard meta-analysis for true endpoints (DerSimonian and Laird, 1986Go) but accounting for the correlation among the ij. Let denote a vector of wij for all i != j. As derived in Appendix B, is estimated by

where

(2.7)

Let (meta)j denote the estimated effect of intervention on the true endpoint in trial j from a standard random-effects meta-analysis (DerSimonian and Laird, 1986Go) based on data from the true endpoint in trials other than trial j. If the surrogate and true endpoints are unrelated, {theta}iz1 = {theta}iz0 {equiv} {theta}iz = pr(T = 1|z), {Delta}ij = {theta}i1{theta}i0, Vii'j = 0, for i != i', and thus (pred)j = (meta)j. If the surrogate and true endpoints are related, (pred)j will differ from (meta)j because it contains more information for predicting the effect of intervention on true endpoint.

Figure 1 illustrates some of the calculations for a binary true endpoint and a comparison of (meta)j and (pred)j. Thick and thin lines indicate different randomization groups. In the left panel, each point denotes trial-level data, namely the fraction of subjects with surrogate and true endpoints for each arm of each previous trial, with dashed lines indicating 95% confidence intervals. The boxes indicate the fractions of subjects with surrogate and true endpoints in each arm of trial j. The small horizontal line segments to the left of the plot indicate the fraction of subjects with the true endpoint for each arm of each previous trial. The weighted average, over all trials excluding trial j, of the differences between pairs of thick and thin small horizontal line segments corresponding to a particular trial equals (meta)j. Thus, trial j contributes no information for computing (meta)j.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1. Graphical illustration of computations for a standard meta-analysis based on other true endpoints (left panel) and the simple meta-analytic approach for surrogate endpoints (right panel).

 
The right panel shows the prediction lines, namely the lines that predict the effect of intervention on true endpoint from each previous trial, which are used in the simple meta-analytic approach for surrogate endpoints. For each previous trial i, the thin prediction line connects the points (0, i00) and (1, i01) and the thick prediction line connects the points (0, i10) and (1, i11). Each thin horizontal line segment on the left of the plot equals i01j0 + i00(1 – j0) and is graphically computed by finding the vertical point on the thin prediction line for trial i evaluated at the horizontal point j0 corresponding to the box with the thin outline. Similarly, each thick horizontal line segment on the left of the plot equals i11j1 + i10(1 –j1) and is graphically computed by finding the vertical point on the thick prediction line for trial i evaluated at the horizontal point j1 corresponding to the box with the thick outline. The difference between the thick and thin horizontal line segments for trial i equals ij. The weighted average of the ij gives the estimated predicted intervention effect (pred)j. Because the surrogate endpoint is predictive of true endpoint, there is a greater difference between the thick and thin horizontal line segments on the right plot corresponding to (pred)j than on the left plot corresponding to (meta)j.


    3. VALIDATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Consider the validation scenario in which all trials have surrogate and true endpoints. Let (obs)j denote the observed effect of intervention on true endpoint in trial j. When the true endpoint is binary, (obs)j is the difference in the probability of true endpoint in the two arms of trial j. When the true endpoint is the median survival time, (obs)j is the difference in median survival times in the two arms of trial j. The average prediction error for the predicted effect of intervention (APEP) is

(3.1)

which is the weighted sum of the absolute values of the differences between the predicted and observed intervention effects, where the weights are proportional to sample size. Absolute values are needed because otherwise large differences with different signs would cancel and so incorrectly indicate a small average prediction error.

APEP is a useful measure because it can be directly compared with the average clinically meaningful difference (ACMD) over all trials,

(3.2)

The contribution to ACMD from each trial is the implicit alternative hypothesis for the true endpoint and is computed assuming (i) the standard error of the observed effect of intervention on true endpoint equals the anticipated standard error used in designing the trial, (ii) the power is 0.8, and (iii) the two-sided type I error is 0.05.

If APEP is large compared with ACMD, the error in using the surrogate endpoint to predict the effect of intervention on true endpoint swamps the clinically meaningful difference. However, if APEP is small compared with ACMD, any error from using the surrogate-based predicted effect of intervention is likely to have little consequence on the conclusions. The recommended comparison is between the upper bound of the 95% confidence interval for APEP and ACMD. The 95% confidence interval for APEP can be computed using a bootstrap approach (Efron and Gong, 1983Go) in which the data in each randomization group in each trial are randomly sampled. Because of the absolute value function, the bootstrap mean of the APEP can differ considerably from the point estimate.

There is one important caveat to comparing APEP and ACMD. If all trials have similar effects of intervention on true endpoint, APEP could be much smaller than ACMD, yet the surrogate could be unrelated to true endpoint, which would greatly lower one's confidence in its application to a new trial. To investigate this situation, it is helpful to compute the average prediction error of other true endpoints (APEO),

(3.3)

which is the average prediction error for a standard meta-analysis involving true endpoints in other trials. If APEP and APEO are similar, it would imply similar effects of intervention on true endpoint in all trials, and so there would be little information about the predictive value of the surrogate endpoint in an application trial with a different effect of intervention on true endpoint. The meta-analytic method is most useful relative to a standard meta-analysis of true endpoints when the association between surrogate and true endpoints is the same (with some random variability) over all trials, but the effect of intervention on true endpoint differs over trials.

In summary, there are two requirements for validating a surrogate endpoint in this framework. First, the upper bound of APEP should be smaller than ACMD to indicate that errors arising from using the surrogate endpoint to predict intervention effect are small relative to the intervention effect the investigators are hoping to detect. Second, APEP should be smaller than APEO to indicate that the small value of APEP is not simply a consequence of similar effects of intervention on true endpoints.

Once the surrogate endpoint is validated, one can more confidentially apply the method in an application trial (j = 0) in which the surrogate, but not the true endpoint, is observed. One would estimate the mean and 95% confidence interval for 0 in (6) based on bootstrapping. However, see Section 6 for caveats.


    4. SIMULATIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Simulations were conducted to further investigate the new meta-analytic approach and compare it with a previous meta-analytic approach based on summary statistics in each trial (Gail et al., 2001Go; Buyse et al., 2000aGo). Let (summary)j denote the estimated effect of intervention on true endpoint in trial j based on a model involving summary statistics. Because previous computational methods for (summary)j with binary endpoints (Renard et al., 2002Go) are difficult to implement, a simpler method-of-moments approach was used. See technical report at http://www.cancer.gov/prevention/bb/baker.html for details. The average prediction error of the summary statistic (APES) approach for surrogate endpoints is

The simulations are based on 1000 iterations for 10 trials with 100 subjects in each randomization group. There is an underlying true prediction line for each randomization group z, specified by {theta}(true)zs which is graphically displayed for six scenarios in Figure 2. For each trial i, randomization group z, and iteration, prediction lines are generated assuming {theta}izs ~ N({theta}(true)zs, 0.04) with a constraint that 0.01 ≤{theta}izs ≤ 0.99. In addition, on each iteration the probability of surrogate outcome given group z and trial i is generated by {pi}iz ~ N({pi}(true)iz, 0.05), where {pi}(true)i0 = i/10 – 0.05 and {pi}(true)i1 = 1.05 – i/10 and the constraint that 0.01 ≤ {pi}iz ≤ 0.99. On each iteration, sampling error is incorporated by multinomial sampling from the 100 subjects per randomization group.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 2. ‘True’ prediction lines for simulation results in Table 1. The two lines represent different randomization groups. The lines in Scenario D coincide.

 

View this table:
[in this window]
[in a new window]
 
Table 1. Results from a simulation using scenarios in Figure 2

 
Let {Delta}true(j) denote the ‘true’ effect of intervention on true endpoint computed from {theta}(true)zs and {pi}(true)jz. Let which is the APEP relative to the ‘true’ value. Similarly, define APES* and APEO*. Also, let which is the average prediction error for the observed true endpoint relative to the true value. The averages of APEO*, APES*, and APEP* over the iterations provide a measure of error related to the mean-squared error but more easily compared with the averages of APEO, APES, and APEP over the iterations.

As shown in Table 1, the average values (over simulations) of APEO*, APES*, and APEP* are slightly smaller than the average values of APEO, APEP, and APES, respectively, because the estimated intervention effect on true endpoint in each trial is more variable than the underlying ‘true’ intervention effect for the set of trials. The average of APEP* is smaller than the average of AOBS* because APEP*, unlike AOBS*, averages over the random-effect realizations of prediction lines. In this simulation, the simple meta-analysis for surrogate endpoints performs slightly better than the previous analytic methods using summary statistics. Under Scenario F, the surrogate endpoint is not related to true endpoint, so APEO is similar to APES and APEP. Under the other scenarios, the surrogate endpoint is related to true endpoint, and APES and APEP are much smaller than APEO.


    5. EXAMPLE
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
The method was used to investigate the validity of a surrogate endpoint in data from multiple randomized trials for patients with advanced colorectal cancer (Burzykowski et al., 2004Go; Buyse et al., 2000bGo). The binary surrogate endpoint is an indicator of tumor response where S = 0 if there was complete or partial response and S = 1 if the disease was stable or progressive. The true endpoint is median survival time in weeks. Following Burzykowski et al. (2004)Go, every two-arm comparison is regarded as a separate trial, and the trial with no surrogate endpoint at one level in one arm was excluded, leaving 26 trials with available data. Also, following Burzykowski et al. (2004)Go, a landmark analysis was used to reduce length bias that arises when subjects die before tumor response is observed. In particular, the analysis only uses data from subjects alive at 3 months and assumes that tumor response was observed prior to 3 months.

To visually distinguish between estimates from different randomization groups, separate plots are presented for each randomization group although the analysis is based on differences between randomization groups within each trial. The upper plots in Figure 3 show the mean and 95% confidence intervals of the surrogate and true endpoints for each arm of each trial. Although there was substantial variation in the surrogate endpoint, the true endpoints fell within a small range, except for the smallest trial involving only 15 subjects. The middle plots show the lines used to predict the effect of intervention on true endpoint using the simple meta-analytic approach. The lower left plot shows estimated prediction errors and 95% confidence intervals for the predicted effect of intervention on true endpoint for each trial. Size refers to the number of subjects alive at 3 months. The lower right plot shows APEO and APEP with 95% confidence intervals, compared with ACMD. Although APEP is small relative to ACMD, it is similar to APEO. Therefore, tumor response is not a good surrogate endpoint for predicting the effect of treatment on the difference in median survival times. This conclusion agrees with that of Burzykowski et al. (2004)Go.



View larger version (45K):
[in this window]
[in a new window]
 
Fig. 3. Analysis of data from 26 trials of advanced colorectal cancer. The surrogate endpoint is the fraction of subjects without tumor response and the true endpoint is the median survival in weeks. APEP is the weighted average of the absolute value of the prediction errors in the bottom left plot.

 

    6. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
The simple meta-analytic approach requires the same relationship of the surrogate to true endpoints (but allowing extra variability from random effects) within a given arm of all the randomized trials. This is a less-stringent requirement than the Prentice Criterion which requires the same relationship of surrogate to true endpoints among both arms in all trials (Baker and Kramer, 2003Go; Baker et al., 2005Go). If it were possible to stratify the data by a baseline covariate, an even less-stringent requirement would be needed, namely the same relationship of the surrogate and true endpoints within each arm and stratum of all randomized trials (again allowing extra variability from random effects). Sometimes an intervention is used in the control arm of one trial and the experimental arm of another trial. In that case, one should require the same relationship of surrogate to true endpoints (allowing extra variability from random effects) among both arms of all trials, as with the Prentice Criterion. It is not necessary to explicitly test the Prentice Criterion because the better the relationship between surrogate and true endpoints approximates the Prentice Criterion, the more likely the surrogate will be validated.

The ACMD provides a benchmark for the validation statistic APEP. The advantage of APEP over previous summary measures for validation is that, unlike the other measures, one can compare its magnitude relative to a well-defined and important quantity, namely the ACMD. There may be concern that ACMD explicitly depends on sample size while APEP does not. However, APEP implicitly depends on sample size because, for a given trial, the largest value of the observed intervention effect and hence the largest value of the predicted intervention effect would typically not be much larger than the clinically meaningful difference. Because one would not typically validate a surrogate endpoint using highly different types of intervention, such as for both prevention and treatment, the prediction lines among a set of trials should be within the same order of magnitude. The ideal situation for validating a surrogate endpoint is when the prediction lines are similar but the surrogate endpoints differ across trials.

Once a surrogate endpoint is validated, it can be more confidently used in an application trial. However, there are two caveats. First, a validated surrogate endpoint may not give the correct conclusion in a new setting simply because the relationship of surrogate to true endpoint may differ in the new trial and in the validation trial. In other words, there is always some extrapolation when using a surrogate endpoint for inference in an application trial. Second, a validated surrogate endpoint for a true endpoint associated with benefit generally provides no information about harms that would occur after its observation.

Given these caveats, whether or not one should rely on the validated surrogate endpoint for conclusions about the effect of intervention on true endpoint in an application depends on the particular situation. There are two situations when reliance on the validated surrogate endpoint for an application trial would be most appropriate. One situation is when the surrogate endpoint evaluates an intervention in a preliminary phase of development, and the purpose is to try to identify the most promising interventions for more definitive evaluation with a true endpoint. A second situation arises when the surrogate endpoint evaluates a particular timing or dose of an intervention that was previously shown to be effective at another dose or timing. For example, Day and Duffy (1996)Go proposed using surrogate endpoints to evaluate breast cancer screening every 3 years because randomized trials with mortality endpoint showed benefits of breast cancer screening every year.

To implement this approach, it is necessary to obtain data from previous trials that report surrogate and true endpoints. Hopefully, the simplicity of this approach will spur clinical trialists to routinely collect high-quality data on surrogate endpoints that could be used in this type of meta-analysis.


    APPENDIX A
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 

Variance of estimated median survival time

The estimated large sample variance of the median survival time is 0.25 var(S)/(S2f), where S is the estimated probability of surviving to the median survival time and f is the estimated density function at the median survival time (Klein and Moeschberger, 1997Go). For simplicity, survival time is assumed to follow an exponential distribution. Because the variance is only used for the weights, misspecification is not a major concern. For the surrogate at level s in group z of trial i, the median survival time is denoted izs. Let dizs and risz denote the number of deaths and total of all survival times, respectively, among subjects with surrogate at level s in arm z of trial i. Therefore, Sizs = exp(–hizsizs) and fizs = hizsSizs, where hizs = dizs/risz is the estimated hazard. Using the delta method, where


    APPENDIX B
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 

Random-effects computation

The component of variance from the random effects, is estimated by the method of moments using the following formulas. Recall that the variance of conditional on is and Therefore,

Also, Extending the method of DerSimonian and Laird (1986)Go, the method-of-moments estimator is derived from setting Qj = E(Qj) and solving for where

and


    ACKNOWLEDGMENTS
 
The author thanks Laurence Freedman, Mitchell Gail, Barnett Kramer, Ruth Pfeiffer, Philip Prorok, and anonymous referees for helpful comments. The author also thanks Tomasz Burzykowski, Marc Buyse, Pascal Piedbois, and Geert Molenberghs for helpful comments and for help in formatting the data which were kindly made available by the Meta-Analysis Group in Cancer.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. ESTIMATION
 3. VALIDATION
 4. SIMULATIONS
 5. EXAMPLE
 6. DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 

    BAKER, S. G., IZMIRLIAN, G. AND KIPNIS, V. (2005). Resolving paradoxes involving surrogate endpoints. Journal of the Royal Statistical Society, Series A. doi:10.1111/j.1467-985X.2005.00373.x.

    BAKER, S. G. AND KRAMER, B. S. (2003). A perfect correlate does not a surrogate make. BMC Medical Research Methodology 3, 16.

    BAKER, S. G. AND KRAMER, B. S. (2005). Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Statistical Methods in Medical Research 14, 349–367.[Abstract/Free Full Text]

    BEGG, C. B. AND LEUNG, D. H. Y. (2000). On the use of surrogate endpoints in randomized trials (with discussion). Journal of the Royal Statistical Society, Series A 163, 15–28.[CrossRef]

    BURZYKOWSKI, T., MOLENBERGHS, G. AND BUYSE, M. (2004). The validation of surrogate endpoints using data from randomized clinical trials: a case-study in advanced colorectal cancer. Journal of the Royal Statistical Society, Series A 167, 103–124.[CrossRef]

    BUYSE, M. AND MOLENBERGHS, G. (1998). Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 54, 1014–1029.[CrossRef][Web of Science][Medline]

    BUYSE, M., MOLENBERGHS, G., BURZYKOWSKI, T., RENARD, D. AND GEYS, H. (2000a). The validation of surrogate endpoints in meta-analyses of randomized trials. Biostatistics 1, 49–67.[Abstract]

    BUYSE, M., THIRION, P., CARLSON, R. W., BURZYKOWSKI, T., MOLENBERGHS, G., PIEDBOIS, P., FOR THE META-ANALYSIS GROUP IN CANCER (2000b). Relation between tumour response to first-line chemotherapy and survival in advanced colorectal cancer: a meta-analysis. Lancet 356, 373–378.[CrossRef][Web of Science][Medline]

    DANIELS, M. J. AND HUGHES, M. D. (1997). Meta-analysis for the evaluation of potential surrogate markers. Statistics in Medicine 16, 1965–1982.[CrossRef][Web of Science][Medline]

    DAY, N. E. AND DUFFY, S. W. (1996). Trial design based on surrogate end points—application to comparison of different breast screening frequencies. Journal of the Royal Statistical Society A 159, Part 1, 49–60.[CrossRef]

    DERSIMONIAN, R. AND LAIRD, N. M. (1986). Meta analysis of clinical trials. Controlled Clinical Trials 7, 177–188.[CrossRef][Web of Science][Medline]

    EFRON, B. AND GONG, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. American Statistician 37, 36–48.

    FRANGAKIS, C. E. AND RUBIN, D. B. (2002). Principal stratification in casual inference. Biometrics 58, 21–29.[CrossRef][Web of Science][Medline]

    FREEDMAN, L. S. (2005). Commentary on assessing surrogates as trial endpoints using mixed models by E. L. Korn, P. S. Albert and L. M. McShane. Statistics in Medicine 24, 183–185.[CrossRef][Web of Science][Medline]

    FREEDMAN, L. S., GRAUBARD, B. I. AND SCHATZKIN, A. (1992). Statistical validation of intermediate endpoints for chronic disease. Statistics in Medicine 11, 167–178.[Web of Science][Medline]

    GAIL, M. H., PFEIFFER, R., HOUWELINGEN, H. C. AND CARROLL, R. J. (2001). On meta-analytic assessment of surrogate outcomes. Biostatistics 3, 231–246.

    KENDALL, M. G. AND STUART, A. (1961). The Advanced Theory of Statistics. Inference and Relationship, Volume 2, 3rd edition. London: Charles Friffin, p. 33, Ex. 17.21.

    KLEIN, J. P. AND MOESCHBERGER, M. L. (1997). Survival Analysis: Techniques for Censored and Truncated Data. New York: Springer.

    KORN, E. L., ALBERT, P. S. AND MCSHANE, L. M. (2005). Assessing surrogates as trial endpoints using mixed models (with discussion). Statistics in Medicine 24, 163–187.[CrossRef][Web of Science][Medline]

    MORRISON, A. S. (1991). Intermediate determinants of mortality in the evaluation of screening. International Journal of Epidemiology 20, 642–650.[Abstract/Free Full Text]

    PRENTICE, R. L. (1989). Surrogate endpoints in clinical trials: definitions and operational criteria. Statistics in Medicine 8, 431–430.[Web of Science][Medline]

    RENARD, D., GEYS, H., MOLENBERGHS, G., BURZYKOWSKI, T. AND BUYSE, M. (2002). Validation of surrogate endpoints in multiple randomized clinical trials with discrete outcomes. Biometrical Journal 44, 921–935.[CrossRef][Web of Science]

    WANG, Y. AND TAYLOR, J. M. G. (2002). A measure of the proportion of intervention effect explained by a surrogate marker. Biometrics 58, 803–812.[CrossRef][Web of Science][Medline]

    Received October 13, 2004; revised March 22, 2005; revised June 6, 2005; accepted for publication June 13, 2005.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    Stat Methods Med ResHome page
    M. N Lassere
    The Biomarker-Surrogacy Evaluation Schema: a review of the biomarker-surrogate literature and a proposal for a criterion-based, quantitative, multidimensional hierarchical levels of evidence schema for evaluating the status of biomarkers as surrogate endpoints
    Statistical Methods in Medical Research, June 1, 2008; 17(3): 303 - 340.
    [Abstract] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/1/58    most recent
    kxi040v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Baker, S. G.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Baker, S. G.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?