Biostatistics Advance Access originally published online on December 6, 2006
Biostatistics 2007 8(4):689-694; doi:10.1093/biostatistics/kxl040
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The influence of competing-risks setting on the choice of hypothesis test for treatment effect
Centre for Medical Statistics and Health Evaluation, Shelley's Cottage, Brownlow Street, University of Liverpool, Liverpool L69 3GS, UK p.r.williamson{at}liverpool.ac.uk
* To whom correspondence should be addressed
| SUMMARY |
|---|
|
|
|---|
There is considerable debate regarding the choice of test for treatment difference in a randomized clinical trial in the presence of competing risks. This question arose in the study of standard and new antiepileptic drugs (SANAD) trial comparing new and standard antiepileptic drugs. This paper provides simulation results for the log-rank test comparing cause-specific hazard rates and Gray's test comparing cause-specific cumulative incidence curves. To inform the analysis of the SANAD trial, competing-risks settings were considered where both events are of interest, events may be negatively correlated, and the degree of correlation may differ in the 2 treatment groups. In settings where there are effects in opposite directions for the 2 event types, a likely situation for the SANAD trial, Gray's test has greater power to detect treatment differences than log-rank analysis. For the epilepsy application, conclusions were qualitatively similar for both log-rank and Gray's tests.
Keywords: Anti-epileptic drugs; Competing risks; Cumulative incidence; Hypothesis test; Logrank test
| 1. INTRODUCTION |
|---|
|
|
|---|
In the statistical literature, the situation where there are several reasons why an event can occur is known as "competing risks." For example, the International League Against Epilepsy has recommended retention time, defined as the time to withdrawal of the randomized drug or addition of another, be one of the primary endpoints for clinical trials of antiepileptic drugs (AEDs) (Commission on antiepileptic drugs, 1998). Patients may decide to withdraw from such treatment because of unacceptable adverse effects (UAE) or switch to an alternative AED because of inadequate seizure control (ISC). The reduction in side effects may be at the expense of a reduction in seizure control, but it would be hoped that the latter would be within the defined margin of equivalence. Overall analysis of retention time may miss such differential effects of AEDs on the reasons for withdrawal, which may differ in terms of their relative importance for patients. An analysis of one event type which censors those patients who suffer a different event type, at the time they experience that other event, may be misleading because such an analysis assumes that the competing risks are independent or equivalently that such censoring is noninformative (Kalbfleisch and Prentice, 1980
If only administrative censoring, that is, censoring due to the end of the period of observation, occurs within a study, the correlation between censoring and events would be zero, and standard methods for handling censoring are appropriate. However, censoring may occur for reasons related to the treatment received, for example, which may induce correlation between censoring and the event of interest. In this situation, there are similar concerns about the assumption of noninformative censoring in standard methods, and censoring should also be considered a competing risk.
The motivation for this research is the study of standard and new antiepileptic drugs (SANAD) trial, which included time to overall withdrawal as one of its primary outcomes (Marson and others, 2006
). The analysis of this trial involved a second competing-risks situation related to a further outcome, time to first seizure. Since newer AEDs have been developed with the aim of providing similar seizure control, SANAD was designed as an equivalence trial; thus, the analysis includes both an intention-to-treat and a per-protocol approach. In the latter, for patients withdrawn from their randomized AED, first seizure was censored at the point of AED change. Censoring will be due to withdrawal for UAE only. The likely correlation between the event types in both of these competing-risks settings is not clear, and it may be that the correlation differs between treatment groups. However, the data collected do not provide any information regarding these correlations since only data relating to the first event are observed.
Two common methods for testing treatment effects in competing-risks situations are the log-rank test, corresponding to a test of equality of cause-specific hazards, and Gray's test based on the cumulative incidence approach (Fine and Gray, 1999
). The log-rank test censors other event types at the time they occur and assumes that the competing risks are independent. The approach based on cumulative incidence makes no such assumption.
This paper has 2 objectives: first, to provide simulation results for the log-rank and Gray's tests for a variety of competing-risks settings where both events are of interest, events may be negatively correlated, and the degree of correlation may differ in the 2 groups being compared; second, to demonstrate the approach to testing of treatment effects in the SANAD trial, as a result of this simulation study.
| 2. METHODS |
|---|
|
|
|---|
Consider a trial comparing a new treatment (A say) to a standard (B say) in the presence of 2 types of competing causes of events, so that each patient has an associated random vector (T1,T2) corresponding to time to type 1 and type 2 events. In practice, only the first event, occurring at T = min(T1,T2), and the event type indicator, C = 1 if T1 < T2 and C = 2 if T1 > T2, will be observed. The overall survival function is associated with the time to the first event S(t) = P[T > t]. The cause-specific hazard is given by
|
|
and the crude cumulative incidence is
|
|
where i denotes the event type.
Possible null hypotheses for comparing treatments, A and B, are the global null hypotheses H0:SA(t) = SB(t), corresponding to no relative benefit on overall events, and cause-specific null hypotheses H
:S
(t) = S
(t) and H
:S
(t) = S
(t) for each event type. The notation follows Freidlin and Korn (2005). The null hypothesis H0:SA(t) = SB(t) is usually tested using the log-rank test on the time to the first event, denoted here by W0. The cause-specific null hypotheses can be tested by (1) comparing cause-specific hazard rates h
(t) and h
(t) using the log-rank test on times to event type i censoring the other event types at the times they occur, denoted by W
, or (2) comparing cumulative incidence curves I
(t) and I
(t) using Gray's test, denoted by W
.
A simulation study was undertaken to evaluate both tests, W
and W
, for detecting treatment differences. As both event types were important in the SANAD trial, results are also presented in terms of the proportion of simulations where correct decisions were made for both event types, that is, rejection of null hypothesis if real effect simulated and no rejection if null effect. Data were simulated from a bivariate exponential distribution as described by Friedlin and Korn (2005)
. Event times were generated by applying a monotone increasing marginal transformation (chosen to achieve the required exponential distribution) to bivariate normal random variables.
Hazard ratios of 1.2 and 1.6 were chosen to reflect small and moderate adverse effects, with values of 0.833 and 0.625 reflecting small and moderate beneficial effects of the new treatment relative to the standard on cause-specific time to drug withdrawal. No previous trials have analyzed the competing risks of AED withdrawal; thus, the choice of values was based on clinical guidance suggesting that hazard ratios of 1.1–1.2 may be considered to reflect equivalence, and effects larger than 1.6 were thought to be unlikely. The degree of correlation between drug withdrawal reasons may vary by AED, and hence data were simulated allowing the degree of correlation to differ in the 2 groups. Correlations were selected from the following: – 0.8, – 0.5, – 0.3, – 0.2, – 0.1,0,0.1,0.2,0.3,0.5,0.8. The extreme values are not anticipated in the epilepsy setting but are provided for completeness. Various settings have been simulated. These include no effect on either failure type, no effect on one failure and an effect on the other, adverse effect on both failure types, and adverse effect on one failure and beneficial effect on the other.
| 3. RESULTS |
|---|
|
|
|---|
Table 1 shows the results most relevant to the SANAD trial: settings 7 and 8, defining withdrawal due to ISC as failure type 1 and withdrawal due to UAE as failure type 2, in which drug effects in opposite directions are anticipated for the 2 event types. Negative correlation may be anticipated here since side effects are more likely at higher doses, whereas seizure control is poorer at low doses. The overall analysis ignoring event type has low power to detect differences between treatments when correlations are small and similar in the 2 groups. For both settings, Gray's test is to be preferred in terms of power unless an extreme difference between correlations is anticipated.
|
Full results are available at http://www.biostatistics.oxfordjournals.org. Results confirm that both tests achieve the correct size when the correlation between event types is equal for both groups. For no effect on the first event type, a beneficial effect on the other, and independent events, the log-rank test achieves its nominal size, whereas Gray's test inflates the type 1 error for the first event type. A beneficial effect of one treatment on the second event type results in more events of type 1 being observed in that group, subsequently creating an apparent difference between the cumulative incidence for the 2 groups for the first event type. When event types are dependent on either treatment group, the size of both tests for the first event type is affected. The performance of Gray's test improves with increasing negative correlation, which has the effect of separating the time distributions for the 2 events. For similarly sized adverse effects on both event types, overall analysis ignoring the event type is the most powerful.
Table 2 shows results for both log-rank and Gray's tests for comparisons of AEDs in the SANAD trial. Conclusions are qualitatively similar. There was no evidence of a difference in drug withdrawal due to poor seizure control; however, lamotrigine was superior to carbamazepine in terms of withdrawal due to side effects. Topiramate was superior to gabapentin for drug withdrawal due to poor seizure control but inferior for withdrawal due to side effects. The P-values were considered to be sufficiently small to declare differences as significant despite the potential issues regarding multiple testing.
|
| 4. DISCUSSION |
|---|
|
|
|---|
Previous authors have suggested that the comparison of treatment effects should involve application of Gray's test (Marubini and Valsecchi, 2004
We have undertaken a simulation study extending the nature of competing-risks settings investigated beyond those considered by previous authors. In particular, given the epilepsy application, we were interested in understanding the performance of both tests in settings where both event types are of interest, events may be negatively correlated, and the degree of correlation may differ in the 2 treatment groups.
Competing risks of drug withdrawal due to lack of efficacy and lack of tolerability have widespread applicability across other conditions such as Parkinson's disease, migraine, and neuropathic pain. Interest in more than one event type also extends to other areas such as bipolar disorder, where the effect of interventions on the type of first event, manic or depressive, is important. For the past 100 years, a unidimensional model has dominated perspectives on bipolar disorder, with depression and mania being assumed to exist at opposite ends of a single continuum (Gottschalk and others, 1995), implying that the 2 types of first event, manic or depressive, may be negatively correlated.
Differing correlation in the 2 groups is important to consider, with Gichangi and Vach suggesting that in a cancer trial "one therapy allows a moderate progression, such that, the risk of masking a relapse by a death is small, whereas the other therapy implies a rapid progression such that a relapse is likely to be masked by death." In the epilepsy example, the degree of negative association between efficacy and tolerability due to dose may differ between AEDs. The data collected do not provide any information regarding these correlations, and it is important to explore the effect of differing correlation on the performance of the hypothesis tests considered.
Gray's test has greater power than log-rank analysis to detect treatment differences in settings where there are opposite effects for the 2 event types, a likely situation for the SANAD trial. For this epilepsy application, conclusions were qualitatively similar for both log-rank and Gray's tests in all AED comparisons.
In addition to the overall analysis ignoring event type, it is recommended that both log-rank and Gray's tests be performed for the different event types, unless there is a strong rationale a priori in favor of one approach or the other, based either on previous knowledge or on a more precise definition of the question of interest regarding treatment effect. Graphical displays of the cause-specific hazard and cumulative incidence functions should be provided and should assist the explanation in circumstances where conclusions from the 2 tests differ. If a particular competing-risks setting is not covered by this or previous work, it is recommended that researchers undertake a simulation study to investigate how both tests perform in their setting of interest.
| ACKNOWLEDGMENTS |
|---|
The authors would like to thank Boris Friedlin for providing his original simulation program, Anthony Marson and David Chadwick on behalf of the SANAD collaborators for providing the data, and a referee for their helpful comments. Ruwanthi Kolamunnage-Dona is funded on Medical Research Council Grant G0400615. Conflict of Interest: None declared.
| REFERENCES |
|---|
|
|
|---|
-
COMMISSION ON ANTIEPILEPTIC DRUGS. Considerations on designing clinical trials to evaluate the place of new antiepileptic drugs in the treatment of newly diagnosed and chronic patients with epilepsy. Epilepsia (1998) 39:799–803.[CrossRef][Web of Science][Medline]
Friedlin B, Korn EL. Testing treatment effects in the presence of competing risks. Statistics in Medicine (2005) 24:1703–1712.[CrossRef][Web of Science][Medline]
Gichangi A, Vach W. The analysis of competing risks data: a guided tour. Statistics in Medicine (Forthcoming 2006).
Gottschalk A, Bauer MS, Whybrow PC. Evidence of chaotic mood variation in bipolar disorder. Archives of General Psychiatry (1995) 52:947–959.
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association (1999) 94:496–509.[CrossRef][Web of Science]
Kalbfleisch JD, Prentice R. The Analysis of Failure Time Data (1980) New York: John Wiley and Sons.
Marson AG, Al-Kharusi AM, Alwaidh M, Appleton R, Baker GA, Chadwick DW, Cramp C, Cockerell OC, Cooper PN, Doughty J. and others. Carbamazepine, gabapentin, lamotrigine, oxcarbazepine or topiramate for epilepsy: results from arm A of the SANAD trial. Lancet (Forthcoming 2006).
Marubini E, Valsecchi MG. Analysing Survival Data from Clinical Trials and Observational Studies (2004) New York: John Wiley and Sons.
Received July 1, 2006; revised October 13, 2006; revised November 22, 2006; accepted for publication December 1, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. W. Evans, J.-P. Ryckelynck, E. Fabre, and C. Verger Peritonitis-free survival in peritoneal dialysis: an update taking competing risks into account Nephrol. Dial. Transplant., January 25, 2010; (2010): gfq003v1 - gfq003. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Dignam and M. N. Kocherginsky Choice and Interpretation of Statistical Tests Used When Competing Risks Are Present J. Clin. Oncol., August 20, 2008; 26(24): 4027 - 4034. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

