Skip Navigation


Biostatistics Advance Access originally published online on December 6, 2006
Biostatistics 2007 8(4):689-694; doi:10.1093/biostatistics/kxl040
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/4/689    most recent
kxl040v2
kxl040v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Williamson, P. R.
Right arrow Articles by Tudur Smith, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Williamson, P. R.
Right arrow Articles by Tudur Smith, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

The influence of competing-risks setting on the choice of hypothesis test for treatment effect

P. R. Williamson*, R. Kolamunnage-Dona and C. Tudur Smith

Centre for Medical Statistics and Health Evaluation, Shelley's Cottage, Brownlow Street, University of Liverpool, Liverpool L69 3GS, UK p.r.williamson{at}liverpool.ac.uk

* To whom correspondence should be addressed


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHODS
 3. RESULTS
 4. DISCUSSION
 REFERENCES
 
There is considerable debate regarding the choice of test for treatment difference in a randomized clinical trial in the presence of competing risks. This question arose in the study of standard and new antiepileptic drugs (SANAD) trial comparing new and standard antiepileptic drugs. This paper provides simulation results for the log-rank test comparing cause-specific hazard rates and Gray's test comparing cause-specific cumulative incidence curves. To inform the analysis of the SANAD trial, competing-risks settings were considered where both events are of interest, events may be negatively correlated, and the degree of correlation may differ in the 2 treatment groups. In settings where there are effects in opposite directions for the 2 event types, a likely situation for the SANAD trial, Gray's test has greater power to detect treatment differences than log-rank analysis. For the epilepsy application, conclusions were qualitatively similar for both log-rank and Gray's tests.

Keywords: Anti-epileptic drugs; Competing risks; Cumulative incidence; Hypothesis test; Logrank test


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHODS
 3. RESULTS
 4. DISCUSSION
 REFERENCES
 
In the statistical literature, the situation where there are several reasons why an event can occur is known as "competing risks." For example, the International League Against Epilepsy has recommended retention time, defined as the time to withdrawal of the randomized drug or addition of another, be one of the primary endpoints for clinical trials of antiepileptic drugs (AEDs) (Commission on antiepileptic drugs, 1998). Patients may decide to withdraw from such treatment because of unacceptable adverse effects (UAE) or switch to an alternative AED because of inadequate seizure control (ISC). The reduction in side effects may be at the expense of a reduction in seizure control, but it would be hoped that the latter would be within the defined margin of equivalence. Overall analysis of retention time may miss such differential effects of AEDs on the reasons for withdrawal, which may differ in terms of their relative importance for patients. An analysis of one event type which censors those patients who suffer a different event type, at the time they experience that other event, may be misleading because such an analysis assumes that the competing risks are independent or equivalently that such censoring is noninformative (Kalbfleisch and Prentice, 1980Go).

If only administrative censoring, that is, censoring due to the end of the period of observation, occurs within a study, the correlation between censoring and events would be zero, and standard methods for handling censoring are appropriate. However, censoring may occur for reasons related to the treatment received, for example, which may induce correlation between censoring and the event of interest. In this situation, there are similar concerns about the assumption of noninformative censoring in standard methods, and censoring should also be considered a competing risk.

The motivation for this research is the study of standard and new antiepileptic drugs (SANAD) trial, which included time to overall withdrawal as one of its primary outcomes (Marson and others, 2006Go). The analysis of this trial involved a second competing-risks situation related to a further outcome, time to first seizure. Since newer AEDs have been developed with the aim of providing similar seizure control, SANAD was designed as an equivalence trial; thus, the analysis includes both an intention-to-treat and a per-protocol approach. In the latter, for patients withdrawn from their randomized AED, first seizure was censored at the point of AED change. Censoring will be due to withdrawal for UAE only. The likely correlation between the event types in both of these competing-risks settings is not clear, and it may be that the correlation differs between treatment groups. However, the data collected do not provide any information regarding these correlations since only data relating to the first event are observed.

Two common methods for testing treatment effects in competing-risks situations are the log-rank test, corresponding to a test of equality of cause-specific hazards, and Gray's test based on the cumulative incidence approach (Fine and Gray, 1999Go). The log-rank test censors other event types at the time they occur and assumes that the competing risks are independent. The approach based on cumulative incidence makes no such assumption.

This paper has 2 objectives: first, to provide simulation results for the log-rank and Gray's tests for a variety of competing-risks settings where both events are of interest, events may be negatively correlated, and the degree of correlation may differ in the 2 groups being compared; second, to demonstrate the approach to testing of treatment effects in the SANAD trial, as a result of this simulation study.


    2. METHODS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHODS
 3. RESULTS
 4. DISCUSSION
 REFERENCES
 

2.1 Hypothesis tests for treatment effect

Consider a trial comparing a new treatment (A say) to a standard (B say) in the presence of 2 types of competing causes of events, so that each patient has an associated random vector (T1,T2) corresponding to time to type 1 and type 2 events. In practice, only the first event, occurring at T = min(T1,T2), and the event type indicator, C = 1 if T1 < T2 and C = 2 if T1 > T2, will be observed. The overall survival function is associated with the time to the first event S(t) = P[T > t]. The cause-specific hazard is given by

Formula

and the crude cumulative incidence is

Formula

where i denotes the event type.

Possible null hypotheses for comparing treatments, A and B, are the global null hypotheses H0:SA(t) = SB(t), corresponding to no relative benefit on overall events, and cause-specific null hypotheses HFormula:SFormula(t) = SFormula(t) and HFormula:SFormula(t) = SFormula(t) for each event type. The notation follows Freidlin and Korn (2005). The null hypothesis H0:SA(t) = SB(t) is usually tested using the log-rank test on the time to the first event, denoted here by W0. The cause-specific null hypotheses can be tested by (1) comparing cause-specific hazard rates hFormula(t) and hFormula(t) using the log-rank test on times to event type i censoring the other event types at the times they occur, denoted by WFormula, or (2) comparing cumulative incidence curves IFormula(t) and IFormula(t) using Gray's test, denoted by WFormula.

2.2 Simulation study

A simulation study was undertaken to evaluate both tests, WFormula and WFormula, for detecting treatment differences. As both event types were important in the SANAD trial, results are also presented in terms of the proportion of simulations where correct decisions were made for both event types, that is, rejection of null hypothesis if real effect simulated and no rejection if null effect. Data were simulated from a bivariate exponential distribution as described by Friedlin and Korn (2005)Go. Event times were generated by applying a monotone increasing marginal transformation (chosen to achieve the required exponential distribution) to bivariate normal random variables.

Hazard ratios of 1.2 and 1.6 were chosen to reflect small and moderate adverse effects, with values of 0.833 and 0.625 reflecting small and moderate beneficial effects of the new treatment relative to the standard on cause-specific time to drug withdrawal. No previous trials have analyzed the competing risks of AED withdrawal; thus, the choice of values was based on clinical guidance suggesting that hazard ratios of 1.1–1.2 may be considered to reflect equivalence, and effects larger than 1.6 were thought to be unlikely. The degree of correlation between drug withdrawal reasons may vary by AED, and hence data were simulated allowing the degree of correlation to differ in the 2 groups. Correlations were selected from the following: – 0.8, – 0.5, – 0.3, – 0.2, – 0.1,0,0.1,0.2,0.3,0.5,0.8. The extreme values are not anticipated in the epilepsy setting but are provided for completeness. Various settings have been simulated. These include no effect on either failure type, no effect on one failure and an effect on the other, adverse effect on both failure types, and adverse effect on one failure and beneficial effect on the other.


    3. RESULTS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHODS
 3. RESULTS
 4. DISCUSSION
 REFERENCES
 
Table 1 shows the results most relevant to the SANAD trial: settings 7 and 8, defining withdrawal due to ISC as failure type 1 and withdrawal due to UAE as failure type 2, in which drug effects in opposite directions are anticipated for the 2 event types. Negative correlation may be anticipated here since side effects are more likely at higher doses, whereas seizure control is poorer at low doses. The overall analysis ignoring event type has low power to detect differences between treatments when correlations are small and similar in the 2 groups. For both settings, Gray's test is to be preferred in terms of power unless an extreme difference between correlations is anticipated.


View this table:
[in this window]
[in a new window]

 
Table 1. Simulation results for settings 7 (small adverse effect on type 1 failure, small beneficial effect on type 2 failure) and 8 (small adverse effect on type 1 failure, moderate beneficial effect on type 2 failure). Values given are the proportion of tests where the null hypothesis of no difference between treatments is rejected. A denotes new treatment, B standard. Whi and WIi denote the log-rank and Gray's test for event type i, respectively, W0 denotes the log-rank test on the time to the first event irrespective of type. In the final 2 columns, values given are the proportion of simulated data sets where cause-specific tests for the 2 event types gave the correct result for both

 
Full results are available at http://www.biostatistics.oxfordjournals.org. Results confirm that both tests achieve the correct size when the correlation between event types is equal for both groups. For no effect on the first event type, a beneficial effect on the other, and independent events, the log-rank test achieves its nominal size, whereas Gray's test inflates the type 1 error for the first event type. A beneficial effect of one treatment on the second event type results in more events of type 1 being observed in that group, subsequently creating an apparent difference between the cumulative incidence for the 2 groups for the first event type. When event types are dependent on either treatment group, the size of both tests for the first event type is affected. The performance of Gray's test improves with increasing negative correlation, which has the effect of separating the time distributions for the 2 events. For similarly sized adverse effects on both event types, overall analysis ignoring the event type is the most powerful.

Table 2 shows results for both log-rank and Gray's tests for comparisons of AEDs in the SANAD trial. Conclusions are qualitatively similar. There was no evidence of a difference in drug withdrawal due to poor seizure control; however, lamotrigine was superior to carbamazepine in terms of withdrawal due to side effects. Topiramate was superior to gabapentin for drug withdrawal due to poor seizure control but inferior for withdrawal due to side effects. The P-values were considered to be sufficiently small to declare differences as significant despite the potential issues regarding multiple testing.


View this table:
[in this window]
[in a new window]

 
Table 2. Summary of results from SANAD trial. PP = per protocol

 

    4. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHODS
 3. RESULTS
 4. DISCUSSION
 REFERENCES
 
Previous authors have suggested that the comparison of treatment effects should involve application of Gray's test (Marubini and Valsecchi, 2004Go, p. 357). However, others disagree (Friedlin and Korn, 2005Go). These latter authors concluded from a simulation study that the log-rank test was more appropriate due to the inflated type 1 error of Gray's test for a single event of interest when treatment affects only the other event type. Their simulation study was motivated by examples from the cancer field and examined the situation where only one event type was of interest and the events may be positively correlated. Although such a situation will apply in many clinical areas, settings such as those described in this paper in the epilepsy field, in which both event types are of interest and events may be negatively correlated, were not investigated. More recently, one group has commented "it is usually wise to use both approaches in order to avoid missing any important feature" (Gichangi and Vach, 2006).

We have undertaken a simulation study extending the nature of competing-risks settings investigated beyond those considered by previous authors. In particular, given the epilepsy application, we were interested in understanding the performance of both tests in settings where both event types are of interest, events may be negatively correlated, and the degree of correlation may differ in the 2 treatment groups.

Competing risks of drug withdrawal due to lack of efficacy and lack of tolerability have widespread applicability across other conditions such as Parkinson's disease, migraine, and neuropathic pain. Interest in more than one event type also extends to other areas such as bipolar disorder, where the effect of interventions on the type of first event, manic or depressive, is important. For the past 100 years, a unidimensional model has dominated perspectives on bipolar disorder, with depression and mania being assumed to exist at opposite ends of a single continuum (Gottschalk and others, 1995), implying that the 2 types of first event, manic or depressive, may be negatively correlated.

Differing correlation in the 2 groups is important to consider, with Gichangi and Vach suggesting that in a cancer trial "one therapy allows a moderate progression, such that, the risk of masking a relapse by a death is small, whereas the other therapy implies a rapid progression such that a relapse is likely to be masked by death." In the epilepsy example, the degree of negative association between efficacy and tolerability due to dose may differ between AEDs. The data collected do not provide any information regarding these correlations, and it is important to explore the effect of differing correlation on the performance of the hypothesis tests considered.

Gray's test has greater power than log-rank analysis to detect treatment differences in settings where there are opposite effects for the 2 event types, a likely situation for the SANAD trial. For this epilepsy application, conclusions were qualitatively similar for both log-rank and Gray's tests in all AED comparisons.

In addition to the overall analysis ignoring event type, it is recommended that both log-rank and Gray's tests be performed for the different event types, unless there is a strong rationale a priori in favor of one approach or the other, based either on previous knowledge or on a more precise definition of the question of interest regarding treatment effect. Graphical displays of the cause-specific hazard and cumulative incidence functions should be provided and should assist the explanation in circumstances where conclusions from the 2 tests differ. If a particular competing-risks setting is not covered by this or previous work, it is recommended that researchers undertake a simulation study to investigate how both tests perform in their setting of interest.


    ACKNOWLEDGMENTS
 
The authors would like to thank Boris Friedlin for providing his original simulation program, Anthony Marson and David Chadwick on behalf of the SANAD collaborators for providing the data, and a referee for their helpful comments. Ruwanthi Kolamunnage-Dona is funded on Medical Research Council Grant G0400615. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHODS
 3. RESULTS
 4. DISCUSSION
 REFERENCES
 

    COMMISSION ON ANTIEPILEPTIC DRUGS. Considerations on designing clinical trials to evaluate the place of new antiepileptic drugs in the treatment of newly diagnosed and chronic patients with epilepsy. Epilepsia (1998) 39:799–803.[CrossRef][Web of Science][Medline]

    Friedlin B, Korn EL. Testing treatment effects in the presence of competing risks. Statistics in Medicine (2005) 24:1703–1712.[CrossRef][Web of Science][Medline]

    Gichangi A, Vach W. The analysis of competing risks data: a guided tour. Statistics in Medicine (Forthcoming 2006).

    Gottschalk A, Bauer MS, Whybrow PC. Evidence of chaotic mood variation in bipolar disorder. Archives of General Psychiatry (1995) 52:947–959.[Abstract/Free Full Text]

    Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association (1999) 94:496–509.[CrossRef][Web of Science]

    Kalbfleisch JD, Prentice R. The Analysis of Failure Time Data (1980) New York: John Wiley and Sons.

    Marson AG, Al-Kharusi AM, Alwaidh M, Appleton R, Baker GA, Chadwick DW, Cramp C, Cockerell OC, Cooper PN, Doughty J. and others. Carbamazepine, gabapentin, lamotrigine, oxcarbazepine or topiramate for epilepsy: results from arm A of the SANAD trial. Lancet (Forthcoming 2006).

    Marubini E, Valsecchi MG. Analysing Survival Data from Clinical Trials and Observational Studies (2004) New York: John Wiley and Sons.

    Received July 1, 2006; revised October 13, 2006; revised November 22, 2006; accepted for publication December 1, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    Nephrol Dial TransplantHome page
    D. W. Evans, J.-P. Ryckelynck, E. Fabre, and C. Verger
    Peritonitis-free survival in peritoneal dialysis: an update taking competing risks into account
    Nephrol. Dial. Transplant., January 25, 2010; (2010): gfq003v1 - gfq003.
    [Abstract] [Full Text] [PDF]


    Home page
    JCOHome page
    J. J. Dignam and M. N. Kocherginsky
    Choice and Interpretation of Statistical Tests Used When Competing Risks Are Present
    J. Clin. Oncol., August 20, 2008; 26(24): 4027 - 4034.
    [Abstract] [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    8/4/689    most recent
    kxl040v2
    kxl040v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Williamson, P. R.
    Right arrow Articles by Tudur Smith, C.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Williamson, P. R.
    Right arrow Articles by Tudur Smith, C.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?