Skip Navigation


Biostatistics Advance Access originally published online on April 14, 2005
Biostatistics 2005 6(3):374-394; doi:10.1093/biostatistics/kxi014
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
6/3/374    most recent
kxi014v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Gilbert, P. B.
Right arrow Articles by Sun, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gilbert, P. B.
Right arrow Articles by Sun, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oupjournals.org.

Failure time analysis of HIV vaccine effects on viral load and antiretroviral therapy initiation

Peter B. Gilbert*

Department of Biostatistics, University of Washington and Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA pgilbert{at}scharp.org

Yanqing Sun

Department of Mathematics and Statistics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 
The world's first efficacy trial of a preventive HIV vaccine was completed in 2003. Study participants who became HIV infected were followed for 2 years and monitored for HIV viral load and initiation of antiretroviral therapy (ART). In order to determine if vaccination may have altered HIV progression in persons who acquired HIV, a pre-specified objective was to compare the time until a composite endpoint between the vaccine and placebo arms, where the composite endpoint is the first event of ART initiation or viral failure (HIV viral load exceeds a threshold xvl copies/ml). Specifically, with vaccine efficacy, VE({tau}, xvl), defined as one minus the ratio (vaccine/placebo) of the cumulative probability of the composite endpoint (with failure threshold xvl) occurring by {tau} months, the aim was to estimate the four parameters {VE({tau}, xvl): xvl {1500, 10 000, 20 000, 55 000} copies/ml with simultaneous 95% confidence bands. A Gaussian multipliers simulation method is devised for constructing confidence bands for VE({tau}, xvl) with xvl spanning multiple discrete values or a continuous range. The new method is evaluated in simulations and is applied to the vaccine trial data set.

Keywords: Gaussian multipliers technique; HIV vaccine efficacy trial; Kaplan–Meier estimator; Simultaneous confidence bands


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 
Development of a preventive HIV vaccine (administered to HIV uninfected persons) is a global public health priority. A preventive vaccine may reduce morbidity and mortality due to HIV infection in at least three ways: (1) lower susceptibility to acquiring HIV infection, (2) decrease secondary transmission of HIV from vaccine recipients who become infected and (3) ameliorate HIV disease progression in vaccine recipients who become infected. Classically designed Phase III vaccine efficacy trials allow evaluation of vaccine effect (1), but do not allow direct evaluation of (2) and (3). This is the case because secondary transmission events are not observed, and the numbers of AIDS and death endpoints are low due to the several-year disease progression period of HIV and the ethical mandate to provide antiretroviral therapy (ART) to trial participants who acquire HIV (UNAIDS, 2001Go). Nonetheless, it is important to attempt to evaluate vaccine effects (2) and (3) within classically designed trials, because most licensed vaccines protect through these mechanisms (Murphy and Chanock, 1996Go; Clemens et al., 1997Go; Halloran et al., 1997Go; Clements-Mann, 1998Go), and a series of candidate vaccines are under development that are designed specifically to ameliorate transmission and disease post-acquisition of HIV (Nabel, 2001Go; Shiver et al., 2002Go; HVTN, 2004Go; IAVI, 2004Go).

In this article, we consider an indirect approach to assessing (2) and (3) within a classically designed efficacy trial. This objective is important because (i) classical designs are simpler and cheaper than augmented partners (Longini et al., 1996Go) and cluster-randomized (Halloran et al., 1997Go; Hayes, 1998Go) designs that would permit direct assessments of (2), (ii) no design is available for assessing (3) directly within a 2–4 year time frame and (iii) the first two completed HIV vaccine efficacy trials (rgp120 HIV Vaccine Study Group, 2004Go) and the ongoing efficacy trial in Thailand use a classical design.

In the indirect approach, we consider methods for evaluating vaccine effects on a biomarker variable measured post-HIV infection that is putatively a surrogate endpoint for secondary transmission and/or progression to clinical disease. The level of plasma HIV RNA (viral load) is an important putative surrogate endpoint, since it has been found to be highly prognostic for both of these endpoints in observational studies (cf. Mellors et al., 1997Go; HIV Surrogate Marker Collaborative Group, 2000; Quinn et al., 2000Go; Gray et al., 2001Go), and has been used as a primary endpoint in many ART trials (Gilbert et al., 2001Go). The completed and ongoing efficacy trials use viral load measurements as the basis for assessing vaccine effects on transmission and disease.

In addition to the complication that a vaccine effect to reduce biomarker levels may not predict a vaccine effect to reduce the rate of clinical endpoints (cf. Fleming, 1992Go; Fleming and DeMets, 1996Go; Albert et al., 1998Go), the assessment of viral load is complicated by the fact that some trial participants will likely receive ART following the diagnosis of HIV infection (DHHS Guidelines, 2002Go). The therapy will suppress viral replication to undetectable levels in many treated persons (DHHS Guidelines, 2002Go). Consequently, the comparison of viral load between vaccine and placebo recipients is confounded by the effect of ART. This complication can be avoided by basing the analysis only on viral load measurements made on blood samples drawn soon after the diagnosis of HIV infection and before the initiation of ART. Though useful, this analysis provides no direct information about the durability of the vaccine effect. Initial suppression of virus by vaccine may wane over time due to emergent HIV vaccine resistance mutations; this phenomenon has been observed in monkey challenge studies that evaluated leading HIV vaccine approaches (Barouch et al., 2002Go, 2003Go), and is a major potential problem for HIV vaccines in humans (Lukashov et al., 2002Go). Therefore, it is important to analyze study endpoints that capture longer-term vaccine effects on viral load.

In the first HIV vaccine efficacy trial, VAX004, the Statistical Analysis Plan (SAP) specified the main post-infection study endpoint as a composite endpoint, defined as either virologic failure (a rise in HIV viral load above a pre-specified failure threshold xvl copies/ml) or initiation of ART, whichever occurs first. This endpoint has recently been proposed for use as a co-primary endpoint (together with HIV infection) for efficacy trials of HIV vaccines designed to ameliorate viremia (Gilbert et al., 2003Go). The composite endpoint is directly tied to clinical events, because virologic failure places a subject at increased risk for AIDS and HIV transmission to others, and initiating ART exposes a patient to drug toxicities, drug resistance and the loss of future ART options (Hirsch et al., 2000Go; DHHS Guidelines, 2002Go). The composite endpoint measures the magnitude of viremic control through the choice of failure threshold xvl, with a vaccine effect on the endpoint with a lower threshold indicating greater suppression. In addition, the endpoint measures the durability of the vaccine effect by counting events during a sufficiently long period following the diagnosis of HIV. Virologic failure has been used as a primary endpoint in many clinical trials of ARTs for HIV infected persons (Gilbert et al., 2000Go, 2001Go).

An analytic advantage of the composite endpoint is that it can be assessed validly using standard survival analysis techniques such as Kaplan–Meier curves and log-rank tests. Such methods would yield biased inferences if applied to assess the time to virologic failure with censoring of subjects who initiate ART, because ART initiation is almost certainly associated with the risk of virologic failure, since physicians use information on viral load in decisions to prescribe ART (DHHS Guidelines, 2002Go).

The SAP for VAX004 specified analyzing a vaccine efficacy parameter, VE({tau}, xvl), defined as one minus the ratio (vaccine/placebo) of cumulative probabilities of the composite endpoint occuring by {tau} = 12 months post-infection diagnosis. VE({tau}, xvl) is interpreted as the percent reduction (vaccine versus placebo) in the cumulative risk of the composite endpoint by {tau} months. A parameter based on cumulative rather than instantaneous incidence rates was used in order to capture durability of the vaccine effect to 12 months. VE({tau}, xvl) can be estimated using Kaplan–Meier estimates of the composite endpoint survival curves for the vaccine and placebo groups. The SAP specified making inferences on VE({tau}, xvl) at the four thresholds xvl = 1500, 10 000, 20 000, 55 000. These thresholds were selected based on an HIV-discordant heterosexual partners study in Uganda, which showed that persons with viral load <1500 copies/ml rarely transmit (Quinn et al., 2000Go; Gray et al., 2001Go), and on the Multicenter AIDS Cohort Study (MACS), which demonstrated that the viral thresholds 1500, 10 000, 20 000 and 55 000 discriminated the risk of progressing to AIDS within 3 years after infection (DHHS Guidelines, 2002Go). Furthermore, the MACS population of men who have sex with men (MSM) is similar to the VAX004 study population, which was 94.3% MSM, and the MACS data provided the basis for the recent U.S. recommendations for when to initiate ART (DHHS Guidelines, 2002Go).

To control the Type I error rate, the SAP specified calculation of simultaneous confidence intervals for VE({tau}, xvl), xvl = 1500, 10 000, 20 000 and 55 000, with 95% joint coverage probability. To our knowledge no solution to this problem exists in the literature, and we develop a solution here. Given that these four particular thresholds are not validated as important thresholds for measuring HIV vaccine effects, and that typically scant information is available a priori to predict how low the tested vaccine may be capable of suppressing viral load, it is also important to compute simultaneous confidence bands for VE({tau}, xvl) with xvl varying over a continuous range. Such bands convey a full picture of the magnitude of vaccine efficacy, for example allowing identification of the threshold (if any) at which the lower simultaneous confidence limit crosses zero. Making inferences over a pre-specified interval of thresholds also avoids the need to guess at the discrete set of most important thresholds, and prevents post hoc cheating, i.e. selective reporting of VE({tau}, xvl) estimates at the thresholds that yield the largest estimates. We develop a general procedure for constructing confidence bands that applies to both cases of xvl spanning discrete levels and a continuous range. Work related to the problem addressed here includes methodology for constructing confidence bands for a functional of two survival curves (Parzen et al., 1997Go) or of two cause-specific cumulative incidence functions (McKeague et al., 2001Go). These procedures approximated the distribution of interest using the Gaussian multipliers technique introduced by Lin et al. (1993)Go; we also apply this technique.

How to interpret the estimated VE({tau}, xvl) curve over a range of xvl values? First, note that the lower the threshold xvl at which there is efficacy, the more potent (and efficacious) the vaccine, as greater viral suppression predicts greater reductions in both disease progression and HIV transmissibility to others. Therefore, the lowest threshold at which the lower simultaneous confidence limit for VE({tau}, xvl) exceeds 0 indicates the greatest potency of viral suppression that the vaccine provides with high confidence. Second, inference on VE({tau}, xvl) at the threshold xvl at which starting ART is recommended (and offered/provided to trial participants) has important policy implications, because the efficacy parameter at this threshold has interpretation as the percent vaccine reduction in the fraction of persons who need ART by time {tau}. Third, albeit with interpretation complicated by ART initiation, the shape of the estimated curve VE({tau}, xvl) reflects the mechanism by which vaccination impacts viral load. In the clearest case that trial participants adhere to the ART guidelines used in the trial, if the vaccine operates by lowering viral loads at all levels by a constant amount (i.e. a location-shift effect), then VE(t, xvl) is positive for all xvl. Under other mechanisms of vaccine effects, the efficacy can vanish to zero above a certain threshold xvl; for example this may occur if vaccination only impacts viral loads below a certain level.

This article is organized as follows. The procedure for generating simultaneous confidence bands is developed in Section 2, and is studied in simulations in Section 3. Section 4 applies the methods to the VAX004 data. Section 5 discusses alternative and complementary approaches to studying the composite endpoint. Section 6 provides discussion on how to apply the new method in future vaccine trials, and an Appendix contains theoretical details of the method.


    2. METHOD FOR CONSTRUCTING SIMULTANEOUS CONFIDENCE BANDS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 

2.1. Preliminaries and the estimand

With {tau} a fixed time point and xvl a fixed virologic failure threshold, define

where F1({tau}, xvl)(F2({tau}, xvl)) is the cumulative probability that a vaccinated (placebo) subject fails virologically or starts treatment by {tau} months post-infection diagnosis. Let be the times between infection diagnosis and treatment initiation and be the viral loads at time t for the nk infected subjects in group k (k = 1, vaccine; k = 2, placebo). Assume that {Yki(t), Tki}, i = 1, ..., nk, are independent, identically distributed (iid) within each group, and the two samples are independent of one another. We also assume that Fk(t, xvl) is continuous on with for k = 1, 2. The total number of infected subjects is n = n1 + n2. Let and 0 < {rho}k < 1. The goal is to construct simultaneous confidence bands for VE({tau}, xvl) for xvl spanning a pre-specified range where and The widest possible range of thresholds is specified by and equal to the lower- and upper-quantification limits of the viral load assay, respectively.

The time for subject i in group k to fail virologically given the virologic failure threshold xvl or starting treatment, whichever comes first, is ki(xvl) = min{inf{t:sup0≤s≤t Yki(s)≥xvl}, Tki}. Let Cki be the censoring time for subject i in group k, ki(xvl)=min{ki(xvl), Cki}, and {delta}ki(xvl)=I(ki(xvl)≤Cki). We assume ki(xvl) and Cki are independent for each k.

Throughout this article we define the time of virologic failure as the time of the first study visit at which the viral load is observed to equal or exceed xvl. Alternatively, this event time could be taken to be the true time at which viral load first exceeds xvl. This event time is interval censored, and the estimation of VE(t, xvl) could be biased if interval censoring is ignored. We restrict attention to the observable viral failure detection time because (i) it is clinically relevant to define failure at the clinic visit of failure detection, because this is the event observed by physicians that influences treatment decisions; (ii) the time of ART initiation is defined by the clinic visit at which ART is prescribed, so that using the clinic visit time for viral failure creates a cohesive definition of the composite endpoint event time and (iii) there is greatest interest in assessing VE(t, xvl) at the latest time point t = {tau}, and inferences on VE({tau}, xvl) are minimally susceptible to bias from interval censoring, since interval censoring up to the last visit time prior to {tau} does not impact estimates of the proportion failing by {tau}.

For fixed xvl, the cumulative probability that an infected subject in group k fails virologically or starts treatment by time {tau} is equal to

2.2. Estimation

Let Sk({tau}, xvl) = 1 – Fk({tau}, xvl) be the survival function of ki(xvl) at time {tau} and let k({tau}, xvl) be the Kaplan–Meier estimator of Sk({tau}, xvl) based on }ki(xvl), {delta}ki(xvl)} for i=1,...,nk. Then k({tau}, xvl) = 1 –k({tau}, xvl). Let k({tau}, xvl) be the Nelson–Aalen estimator for the cumulative hazard function {Lambda}k({tau}, xvl) = –logSk({tau}, xvl). For explicit forms of these estimators, we introduce the following notations. Let Nki(t, xvl) = I(ki(xvl) ≤ t, {delta}ki(xvl) = 1), Rki(t, xvl) = I(ki(xvl) ≥ t), Mki(t,xvl)=Nki(t, xvl) – Rki(s, xvl)d{Lambda}k(s, xvl) and Let rk(t, xvl) = P{ki(xvl) ≥ t}. The Nelson–Aalen estimator for the given xvl is then

It is well known that for the given value of xvl, we have the following martingale representation for the Kaplan–Meier estimator (Fleming and Harrington, 1991Go):

(2.1)

It is shown in the Appendix that (2.1) holds uniformly for and that (1) converges in distribution to a mean-zero normal random variable with variance equal to In the absence of censoring, reduces to Fk({tau}, xvl) Sk({tau}, xvl). The asymptotic variance can be consistently estimated by

For ease of notation, in what follows, we drop the first component {tau} in the functions. Then

(2.2)

uniformly in

2.3. Pointwise confidence bands for VE(xvl)

It follows from the central limit theorem that for each fixed xvl, U(xvl) converges in distribution to a mean-zero normal random variable with variance

which can be estimated by 2(xvl) obtained by replacing {rho}k with nk/n, Fk(xvl) with k(xvl) and with Let (xvl) = 1–1(xvl)/2(xvl). Large sample 100(1 – {alpha})% pointwise confidence bands for VE(xvl) at xvl are given by

(2.3)

where z{alpha}/2 is the upper {alpha}/2 quantile of a standard normal distribution.

2.4. Simultaneous confidence bands for VE(xvl)

From (1) and (2), we have

(2.4)

Let Z1i, Z2j, i = 1, ..., n1, j = 1, ..., n2, be iid standard normal random variables. Let

(2.5)

where It is shown in the Appendix that U(xvl) converges weakly to a mean-zero Gaussian process for and that conditional on the observed data, the process U*(xvl) converges weakly to the same limiting Gaussian process as U(xvl). Also, by the uniform almost sure convergence of (xvl) to {sigma}(xvl) over it follows that

(2.6)

where P*{A} is the conditional probability of A given the observed data sequence. Let c{alpha}/2 be the asymptotic 1–{alpha} quantile of Let b = 1, ..., B, be B independent copies of U*(xvl), obtained by repeatedly generating independent sets of iid standard normal random variables {Z1i, Z2j, i = 1, ..., n1, j = 1, ..., n2} while holding the observed data fixed. The quantile c{alpha}/2 can be estimated consistently by the 1–{alpha} quantile of the set Large sample 100(1–{alpha})% uniform confidence bands for VE(xvl) over are then given by

(2.7)


    3. SIMULATIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 
A complicated question is how to simulate viral loads and the times to treatment initiation in the most realistic way. The time to treatment initiation depends heavily on the current science on when to start ART and on the policy that is used to provide treatment for infected trial participants; these factors vary over time and with the geographic region of the trial. Current science suggests that individuals with high viral load and/or low CD4 cell counts should start treatment. In particular, U.S. guidelines recommend starting treatment when viral load > 55 000 copies/ml or when CD4 < 350 copies/ml (DHHS Guidelines, 2002Go). For trials in developed countries, considerable heterogeneity in treatment initiation among infected individuals is expected; some will follow the guidelines and others will start treatment apart from the guidelines. In contrast, trials in developing countries are expected to operate under strict standardized guidelines that are adhered to by most or all infected participants.

3.1. Simulation model setup

We develop a simulation model based on the viral load and treatment initiation data from the VAX004 trial:

  1. n = 347 infected subjects, n1 = 225 in group 1 (vaccine) and n2 = 122 in group 2 (placebo).
  2. Subjects are followed for 24 months after the diagnosis of HIV infection.
  3. 20% random dropout prior to the composite endpoint by 24 months for each group.
  4. Viral loads are measured from samples drawn at times near nine scheduled visits at Months 0.5, 1, 2, 4, 8, 12, 16, 20 and 24 post-infection diagnosis, denoted by tj, 1 ≤ j ≤ 9. The actual visit times in months for each individual are normally distributed with means at the scheduled times. Specifically, for the ith individual in group k, the jth visit time tkij is where {sigma}1 = 0.05, {sigma}2 = 0.06, {sigma}3 = 0.10 and {sigma}j = 0.12 for j = 4, ..., 9.
  5. The viral loads (log10 transformed) from a subject in the placebo group satisfy a standard linear mixed effects (lme) model,

(3.1)

where (ß0, ß1, ß2, ß3, ß4)T = (4.3884, –0.2808, 0.0363, –0.0019, 0.000035)T are fixed effects parameters. The random effects (r0i, r1i)T have a bivariate normal distribution with mean 0 and covariance matrix given by Var(r0i) = 0.4745, Var(r1i) = 0.00233 and Cov(r0i, r1i) = –0.0138. The measurement errors {epsilon}i(tkij) are iid with mean 0 and variance 0.4977.

The viral load processes for the vaccine group are simulated in three ways:

(a) null model (denoted by NULL) where the viral load processes follow (3.1);
(b) constant mean shift model (denoted by CONS) with a mean shift of svl at all 9 time points, lower in vaccine than placebo. We take svl = 0.33 and 0.5 on the log10 scale;
(c) non-constant mean shift model (denoted by NCONS) with a mean shift of svl lower at Months 0.5, 1 and 2, mean 0.5svl lower at Month 4 and 0 lower at Months 8, 12, 16, 20, 24. For this scenario the vaccine initially lowers viral load, but then vaccine resistance develops, which ruins the suppression.

Once the simulation process for viral load is set, the time to treatment initiation is generated in one of two ways: (i) (INDEP) independent of viral load and CD4 cell count and (ii) (DEP) dependent on viral load and CD4 cell count.

(i) INDEP of biomarkers. The times to treatment initiation are simulated from exponential distributions in each group with approximate probability of starting treatment by 24 months, 0.5 in the placebo group and 0.5 (null case) or 0.25 (alternative cases) in the vaccine group.
(ii) DEP on biomarkers. Based on the U.S. treatment guidelines (DHHS Guidelines, 2002Go), subjects whose CD4 counts decline to low levels (<350 cells/mm3) have a high chance of starting ART, subjects whose CD4 counts decline to moderate levels (<500 cells/mm3) have a moderate chance of starting ART, subjects whose viral load becomes high (>55 000 copies/ml) have a moderate chance of starting treatment and subjects whose CD4 stays above 500 cells/mm3 and viral load stays below 55 000 copies/ml have a low chance of starting treatment. These ideas can be formalized by first simulating a CD4 process for each subject. Fitting a simple lme model to the real CD4 count data from VAX004 yields the following setup. There are two fixed effects parameters, the intercept ß0 = 627.9 and slope ß1 = –0.203. There are two random effects that represent subject-specific intercepts (b0) and slopes (b1), which have a bivariate normal distribution with mean 0 and Var(b0) = 41375.0, Var(b1) = 102.9 and Cov(b0, b1) = –635.6. The Gaussian error {epsilon} has mean 0 and variance 15724.9.

For simulations with a vaccine effect to lower viral load by mean 0.33 (0.5), we assume a vaccine effect to increase the mean CD4 count by 100 (150) cells/mm3. Simulation configurations with no vaccine effect on viral load also have no vaccine effect on CD4 cell count.

At each visit time, the probability of starting ART within the next month (for visits at Months 0.5, 1, 2) and within the next 2 months (for visits at Months 4, 8, 12, 16, 20) is set as a function of the current CD4 count and viral load. Specifically, the probabilities of ART initiation at a visit during the next 1 or 2 month interval are fixed as follows:

CD4 count


Viral load


Probability of ART initiation

CD4 ≤ 350 VL > 55 000 0.7
CD4 ≤ 350 VL ≤ 55 000 0.3
350 < CD4 ≤ 500 VL > 55 000 0.1
350 < CD4 ≤ 500 VL ≤ 55 000 0.05
CD4 > 500 VL > 55 000 0.02
CD4 > 500

VL ≤ 55 000

0.01

Under this scenario, about 40% in the placebo group and 25–40% in the vaccine group start treatment by 24 months.

For a single data set randomly generated under each scenario defined by the INDEP and DEP models of ART initiation crossed with the NULL, CONS(2) and NCONS(2) models of viral load ((2) denotes a mean shift of svl = 0.5), Figure 1 illustrates 95% confidence bands for VE(14, xvl) for xvl [1500, 55 000].



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 1. For a single data set randomly generated under the scenarios (a) INDEP, NULL; (b) INDEP, CONS(2); (c) INDEP, NCONS(2); (d) DEP, NULL; (e) DEP, CONS(2) and (f) DEP, NCONS(2) described in Section 3.1, the plots show the estimate of VE(14, xvl) (solid lines) with 95% pointwise (dotted lines) and simultaneous (dashed lines) confidence bands, for xvl [1500, 55 000] on the log10 scale.

 
3.2. Coverage probability and empirical power

To evaluate the coverage probability of the confidence bands and the ability of the bands to identify non-zero vaccine efficacy, we consider testing the following hypotheses:

where Ri, i = 1, ..., 8, represent the following ranges of xvl: R1, xvl [1500, 55 000]; R2, xvl [10 000, 55 000]; R3, xvl {1500, 10 000, 20 000, 55 000}; R4, xvl {10 000, 55 000}; R5, xvl = 1500; R6, xvl = 10 000; R7, xvl = 20 000; R8, xvl = 55 000. Since the null hypothesis H0i is rejected if and only if the confidence bands for xvl Ri exclude zero at one or more thresholds xvl, assessing these eight scenarios informs on the coverage probability of the confidence bands. In addition, evaluating these scenarios informs the power/precision trade-offs for various ways of conducting the analysis. When designing the real trial we struggled with the question of what was the best range of thresholds to study.

We propose two types of test statistics for testing H0i versus Hai. Specifically,

and

depending on whether Ri is a finite set or a continuous interval. The null hypothesis is rejected for large values of the test statistics. The supremum tests Si are known to be omnibus but may have lower power because of lack of specificity to specific alternatives. The sum/integrated square tests Qi combine information across thresholds and are more powerful against monotone alternatives where the vaccine always improves over the placebo.

For the pointwise tests corresponding to i = 5, 6, 7, 8, the tests Si and Qi at significance level {alpha} are equivalent to a normal test with the test statistic Z = U(xvl)/(xvl) and rejection region |Z| > z{alpha}/2. For simultaneous tests corresponding to i = 1, 2, 3, 4, the critical values ci{alpha}/2 for Si are estimated by the 1–{alpha} quantile of the data set The critical values for Qi are estimated by the 1–{alpha} quantile of the data set or depending on whether Ri is a finite set or continuous interval. When Ri is discrete with m thresholds, a computationally simple alternative to the above Gaussian multiplier approach is to use m normal statistics Z (one for each xvl in Ri) and to apply the Bonferroni correction to determine significance. Such a procedure is likely to be conservative, resulting in wider intervals and reduced power, especially when Ri contains a large number of threshold values. Table 1 describes the empirical sizes and powers of the supremum tests and the sum/integrated square tests using Gaussian multiplier critical values, and Table 2 shows the results for the normal tests, with Bonferroni correction when m > 1. Each entry in Tables 1 and 2 are calculated based on 1000 repetitions and B = 1000.


View this table:
[in this window]
[in a new window]
 
Table 1. Empirical sizes and powers x 100 of the supremum tests Si and sum/integrated square tests Qi: the models under which the data were simulated [NULL, CONS(1), CONS(2), NCONS(1) and NCONS(2)] are defined in Section 3.1. svl = 0.33 in models CONS(1) and NCONS(1) and svl = 0.5 in models CONS(2) and NCONS(2)

 

View this table:
[in this window]
[in a new window]
 
Table 2. Empirical sizes and powers x 100 of the normal tests, with Bonferroni correction for R3 and R4: the models under which the data were simulated [NULL, CONS(1), CONS(2), NCONS(1) and NCONS(2)] are defined in Section 3.1. svl = 0.33 in models CONS(1) and NCONS(1) and svl = 0.5 in models CONS(2) and NCONS(2)

 
Based on the NULL simulations, the confidence band procedures consistently have sizes near the nominal 0.05 level. An exception is for the null hypothesis R5 in Table 2, for which the empirical size is 0.008 and 0.015 for the INDEP and DEP cases, respectively. The low size occurs because when xvl = 1500, almost every subject fails by {tau} = 14 months, so that the risk sets R1(t, xvl) and R2(t, xvl) in formulas (2.4) and (2.5) are very small near {tau}. The tiny risk sets cause the asymptotic approximation to be unreliable.

Based on the non-null simulations, the following observations were made regarding the comparative power for evaluating VE(14, xvl) in the ranges R1, ..., R8. First, the sum/integrated square test has slightly higher power than the supremum test for thresholds in R1, R2, R3 or R4. For R3 and R4, both tests show greater power than the normal tests with Bonferroni correction. Second, power is comparable for R1 through R4 under each test; therefore, in practice fixed thresholds can be added without appreciably compromising power. Third, for hypothesis tests at single threshold values R5 through R8, the power increases with the magnitude of the threshold. Along the lines described above, this result occurs because almost all subjects fail by time {tau} when the threshold xvl is relatively small. Power was consistently lower for the DEP versus INDEP simulations, which occurs because the alternative hypothesis is closer to the null hypothesis for the DEP simulations. Finally, as expected, power was consistently higher for the simulations with true viral load mean shift of svl = 0.5 compared to svl = 0.33.


    4. EXAMPLE
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 
The world's first HIV vaccine efficacy trial (VAX004) was conducted in North America and the Netherlands from 1998 to 2003 (rgp120 HIV Vaccine Study Group, 2004Go). Participants were randomized in a 2:1 ratio to receive the subunit protein vaccine AIDSVAX (3598 subjects) or a blinded placebo (1805 subjects). Participants were immunized at Months 0, 1, 6, 12, 18, 24 and 30 post-randomization, and were tested for HIV infection at Months 6, 12, 18, 24, 30 and 36. Subjects diagnosed with HIV infection were re-consented and followed on a Month 0.5, 1, 2, 4, 8, 16, 20 and 24 post-infection diagnosis visit schedule. At each of these visits, HIV viral load and status of ART initiation were recorded. The comprehensive results of the analyses of the data in VAX004 will be presented in clinical journals (including rgp120 HIV Vaccine Study Group, 2004Go); here we present a subset of the results needed to demonstrate and apply the statistical methodology developed here.

The primary objective of the trial was to assess whether vaccination reduced the rate of HIV infection. Unfortunately it did not, as 7% of participants were infected in each study arm (vaccine: 241/3598 infected; placebo: 127/1805 infected). The secondary objective, of interest for this article, was to assess whether vaccination altered the course of HIV progression. Of the 368 infected subjects, 347 enrolled into the post-infection cohort and are analyzable for post-infection endpoints, 225 and 122 in the vaccine and placebo groups, respectively. The composite endpoint was analyzed for the entire randomized cohort as well as for the cohort of HIV infected subjects. Analyses of the infected subcohort are important because vaccine effects on HIV pathogenesis are most clearly measured in infected subjects, and it is feasible to monitor this subcohort intensively for several years. However, this analysis is not intent-to-treat (ITT) and is susceptible to post-randomization selection bias (Hudgens et al., 2003Go; Gilbert et al., 2003Go), and therefore, it is important to also conduct unbiased ITT analyses of the composite endpoint in all randomized subjects. The ITT analyses evaluate the time between randomization and the composite endpoint, and approximate a classical assessment of vaccine efficacy to prevent clinically significant disease (Clements-Mann, 1998Go). A drawback of the ITT approach is that the follow-up period for capturing endpoints is restricted to the interval during which the entire cohort is followed.

Viral load tends to be highly variable in the first few weeks following HIV infection (Schacker et al., 1998Go). A small fraction of infected trial participants may have a Month 0.5 viral load value that was measured in this acute phase. For such subjects vaccination may be efficacious to control viral load, but suppression is not yet achieved. To eliminate the influence of possibly unstable Month 0.5 values, measurements at this visit were not used for determining composite endpoints. Therefore, composite endpoints were registered at the earliest date of ART initiation or virologic failure based on a viral load measurement at the Month 1 visit or later. For analyses of the infected subcohort, subjects who did not experience the composite endpoint by 14 months post-infection diagnosis were censored at 14 months, and for randomized cohort analyses, subjects who did not experience the composite endpoint within 36 months of randomization were censored at 36 months. In both analyses subjects lost to follow-up were censored at the date of last contact.

For each cohort and by study arm, Figure 2 shows Kaplan–Meier curves of the time to ART initiation, and Figure 3 shows the pre-ART measurements of viral load. A Cox model analysis verified strongly dependent censoring of pre-ART viral profiles by ART initiation, with estimated hazard ratio 1.88 (95% CI 1.51–2.34, p < 0.0001) for each log10 higher value of most recent pre-ART viral load. This result implies that a Kaplan–Meier analysis of the time-to-viral failure with censoring by ART would be severely biased, and motivates analysis of the composite endpoint. For the four pre-specified virologic failure thresholds xvl = 1500, 10 000, 20 000, 55 000 copies/ml, Figure 4 shows Kaplan–Meier curves of the time to the composite endpoint. In the ITT analysis, 290 randomized subjects reached the composite endpoint with xvl = 1500, 227 (78.3%) of whom failed virologically, and 211 subjects reached the composite endpoint with xvl = 55 000, 117 (55.5%) of whom failed virologically. For the infected cohort, 320 subjects reached the composite endpoint with xvl = 1500, 261 (81.6%) of whom failed virologically, and 237 subjects reached the composite endpoint with xvl = 55 000, 144 (60.8%) of whom failed virologically. Figure 4 suggests comparable distributions of time-to-composite endpoints in the vaccine and placebo arms.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2. For the VAX004 trial data, the figure shows Kaplan–Meier curves of (a) the time between randomization and ART initiation and (b) the time between HIV infection diagnosis and ART initiation.

 


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 3. For the VAX004 trial data, the figure shows the pre-ART measurements of viral load for the (a) vaccine group and (b) placebo group, as a function of the time of sampling post-infection diagnosis. For pre-ART viral loads sampled at the Month 0.5, 1, 2, 4, 8, 12, 16, 20 and 24 visits, the solid lines are mean estimates and the dotted lines are pointwise 95% confidence intervals.

 


View larger version (33K):
[in this window]
[in a new window]
 
Fig. 4. For the four pre-specified virologic failure thresholds xvl = log10 1500, 10 000, 20 000 and 55 000 copies/ml (i.e. levels 3.18, 4.00, 4.30 and 4.74) in the VAX004 trial, the figure shows Kaplan–Meier curves of (left panel) the time between randomization and the composite endpoint, and of (right panel) the time between infection diagnosis and the composite endpoint. The solid (dotted) line denotes the vaccine (placebo) group.

 
For formal inferences, the parameter VE(14, xvl) was assessed for xvl ranging between 1500 and 55 000 copies/ml, where a 14-month time frame post-infection diagnosis was chosen so as to capture all events occurring by the Month 12 visit. Since most subjects failed by the Month 12 visit, an analysis that would use a longer follow-up duration would provide little additional information over the 12-month analysis. For ITT inferences, the parameter VEITT(36, xvl) was assessed for xvl spanning the same values as for the infected subcohort analysis, where VEITT(36, xvl) is one minus the ratio (vaccine/placebo) of the cumulative incidence of the composite endpoint occurring between randomization and 36 months. Inference on VEITT(36, xvl) evaluates the combined effects of vaccination to reduce the infection rate and composite endpoint rate. Figure 5 shows estimates of VEITT(36, xvl) and VE(14, xvl), with pointwise and simultaneous 95% confidence interval estimates. Bold vertical segments indicate the simultaneous 95% confidence intervals for the four fixed values of xvl. The confidence coefficient c{alpha}/2 for each of the bands was obtained by generating B = 1000 copies of U*(xvl).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 5. For the VAX004 trial data, (a) shows 95% pointwise (dashed lines) and simultaneous (dotted lines) confidence intervals about VEITT(36, xvl) for xvl ranging between 1500 and 55 000 copies/ml on the log10 scale. Solid lines denote estimates of VEITT(36, xvl). Bold vertical segments are 95% simultaneous confidence intervals for VEITT(36, xvl) for xvl set at log10 1500, 10 000, 20 000 and 55 000 and (b) shows the comparable analysis of VE(14, xvl) for the failure time measured from infection diagnosis.

 
The point estimates of VEITT(36, xvl) varied between 0.03 (at xvl = 7286;3.88 log10) and 0.27 (at xvl = 43 652;4.64 log10) over the range of thresholds xvl. The 95% simultaneous bands included zero at all thresholds xvl, indicating no significant differences in the risk of composite endpoints among the groups. The fact that the point estimates of VEITT(36, xvl) were consistently above zero is explained by the trend toward a longer time until ART initiation in the vaccine group (p = 0.07, Figure 2(a)).

The point estimates of VE(14, xvl) varied between –0.05 and 0.05 and steadily increased with xvl. The simultaneous confidence bands included zero at all threshold values, and were most narrow for xvl = 1500, –0.12 to 0.05, and steadily widened with xvl, with span –0.24 to 0.30 at xvl = 55 000. This pattern occurred because the number of events decreased with xvl, from 320 events for xvl = 1500 to 237 events for xvl = 55 000. In comparing the analyses of the randomized and infected cohorts, the simultaneous confidence bands were substantially narrower for the latter analysis, with average half-width 0.39 and 0.16, respectively. This result occurred in part because there were fewer endpoints for the ITT analysis (since composite endpoints occurring beyond 36 months post-randomization were excluded in the ITT inferences; Figure 4).

Notice that for both the ITT and infected cohort analyses, the simultaneous confidence intervals at the four fixed thresholds are substantially narrower than the simultaneous bands computed over the continuous range of thresholds. This result suggests that one reasonable strategy for future vaccine trials is to apply the procedure using a fixed set of several discrete thresholds that have clinical relevance, if available.


    5. COMPLEMENTARY ASSESSMENTS OF POST-INFECTION VACCINE EFFECTS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 
Alternative approaches to studying vaccine effects on viral load and ART initiation include assessments based on marginal distributions, cause-specific hazard functions or cumulative incidence functions. We consider the value of these approaches. First, the assessment of the vaccine effect on the marginal distribution of the time to ART inititation provides important interpretable information, since ART initiation itself, regardless of reason, is a clinically significant endpoint. This marginal analysis should be done in addition to the composite endpoint analysis. Second, the assessment of the vaccine effect on the marginal distribution of the time-to-viral failure is of little value unless the post-infection follow-up period is very long, because very few viral failure events will occur after ART initiation within a 1–2 year time period. Third, given the arguments made above for focusing inferences on vaccine efficacy parameters that are cumulative rather than instantaneous in time, assessment of cumulative incidence functions is more pertinent than assessment of cause-specific hazard functions. It is informative to study the cumulative incidence functions for viral failure, where if failure is due to viral load > xvl and 0 if failure is due to ART initiation. The methods developed here can be adapted to provide simultaneous confidence intervals for in xvl. Plotting estimates of both VE({tau}, xvl) and VEvl({tau}, xvl) provides information on the degree to which vaccine efficacy to prevent the composite endpoint is due to prevention of viral failure. In addition, the parameter can be shown to equal the relative probability (vaccine versus placebo) that a composite endpoint failure event by time t was due to viral failure: This ratio can be interpreted as the proportion of the efficacy to prevent the composite endpoint attributable to prevention of viral failure, and estimates of it can also be plotted alongside ({tau}, xvl) and v1({tau}, xvl) to provide complementary information.


    6. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 
Future HIV vaccine efficacy trials are planned to operate under standardized ART initiation guidelines based on viral load and/or CD4 cell count criteria. The guidelines used, and adherence to these guidelines, influence the interpretation of the composite endpoint analysis and the choice of virologic failure thresholds xvl. Based on the current U.S. guidelines (DHHS Guidelines, 2002Go), it is sensible to assess the composite endpoint for thresholds ranging up to xvl = 55 000 copies/ml. Because pre-ART virologic failure above xvl for xvl ≤ 55 000 usually precedes pre-ART CD4 decline < 350 cells/mm3 (in fact the contrary event never occurred in VAX004), if this guideline is followed, then estimates of VE({tau}, xvl) for xvl ≤ 55 000 have clear interpretations as vaccine effects on the virologic failure rate with little or no confounding by treatment. Although an upper threshold xvl = 55 000 copies/ml was selected for VAX004 based on the current U.S. guidelines, it should be noted that this choice is somewhat arbitrary, because standardized guidelines were not used for this trial, and prevailing opinions about when to start treatment evolved during the 5 year period of the trial.

For future planned trials that will use standardized ART initiation guidelines, achieving high rates of adherence to the guidelines will make the composite endpoint analysis easier to interpret. In the world's second HIV vaccine efficacy trial, conducted by VaxGen in intravenous drug users in Thailand from 1998 to 2003, the Thai government freely provided ART to infected participants whose CD4 declined below a threshold. Adherence to this national guideline was perfect in that no participant initiated ART prior to meeting the threshold. If there is substantial non-adherence to treatment initiation criteria in an efficacy trial, then the value of the composite endpoint analysis erodes with the degree of non-adherence. In the case that a large fraction of infected subjects start ART prior to meeting treatment criteria, the composite endpoint analysis would contribute little independent information beyond the marginal analysis of ART initiation.

The method developed here applies for analyzing a general composite endpoint defined as the first event of ART initiation or any biomarker-defined endpoint. The method has been applied to assess the first event of CD4 count failure (CD4 count < xCD4 [200, 500] cells/mm3) or ART initiation in the VaxGen Thai trial (unpublished data). Like viral load, CD4 cell count is highly prognostic for progression to AIDS and death (cf. Mellors et al., 1997Go; HIV Surrogate Marker Collaborative Group, 2000), and based on some studies may be a better predictor of AIDS than viral load near the time of AIDS (Lyles et al., 2000Go; HIV Surrogate Marker Collaborative Group, 2000). In many developing countries including Botswana, South Africa, Thailand and Uganda, criteria for providing ART through national programs are based on CD4 cell count thresholds but do not consider viral load information. In trials where such treatment policies are operative, analysis of the CD4 cell count/ART initiation composite endpoint may be easier to interpret and have a more direct link to progression to AIDS/death than the analysis of the viral load/ART initiation composite endpoint. A drawback of the CD4-based composite endpoint is that in some trial populations the rate of CD4 cell count decline is quite low (this result was observed in VAX004, with 26% of infected subjects reaching CD4 < 350 cells/mm3 by 24 months), which restricts the power of the composite endpoint analysis. However, in some populations (e.g. in developing countries) CD4 cells may decline quickly enough to give the analysis reasonably high power; in the VaxGen Thai trial 55% of infected subjects reached CD4 < 350 cells/mm3 by 24 months. In any case, any efficacy trial is expected to collect data on both viral load and CD4 cell counts, and analyses of composite endpoints based on both biomarkers will likely be useful for inferring HIV vaccine effects on HIV progression and transmission. The ongoing HIV vaccine efficacy trial in Thailand is using a composite endpoint that includes all three events, ART initiation, viral failure and CD4 failure.

This article has focused on studying VE(t, xvl) at the latest time point of follow-up after infection diagnosis t = {tau}. This analysis has greatest importance, because efficacy at later time points predicts greater clinical benefit, and implies greater robustness of the vaccine's efficacy to the possible development of vaccine resistance. It is also of interest to study VE(t, xvl) over time t, to assess if and how efficacy wanes over time. For a fixed threshold xvl, the procedure of Parzen et al. (1997)Go can be applied to obtain simultaneous confidence intervals for VE(t, xvl) for t in an interval [t1, t2]. Since Parzen et al.'s (1997) method is based on the same technique used in this article (a martingale approximation and Gaussian multipliers), and our convergence result (A.4) in the Appendix is uniform in t and xvl, it should be possible to combine the two methods into a procedure for constructing a confidence region for VE(t, xvl) simultaneously in both t [t1, t2] and This is left as an open problem.

Finally, note that the proposed procedure can be used to construct simultaneous confidence bands for Fk({tau}, xvl) or for any continuous functional of F1({tau}, xvl) and F2({tau}, xvl); for example in some applications it may be of interest to study F1({tau}, xvl) – F2({tau}, xvl).


    APPENDIX
 TOP
 SUMMARY
 1. INTRODUCTION
 2. METHOD FOR CONSTRUCTING...
 3. SIMULATIONS
 4. EXAMPLE
 5. COMPLEMENTARY ASSESSMENTS OF...
 6. DISCUSSION
 APPENDIX
 REFERENCES
 
We prove (2.1) and the weak convergence of U(xvl) and U*(xvl). Note that ki(xvl) increases as xvl increases. Thus, Vk(xvl) = max1≤i≤nkki(xvl) increases as xvl increases. We have as nk->{infty}. By Corollary 3.2.1 (Fleming and Harrington 1991Go, p. 98), it follows that

(A.1)

where

Note that both Sk(t, xvl) and k(t, xvl) increase as xvl increases and that Sk(t, xvl) is continuous on Further, it is known (Fleming and Harrington, 1991Go) that pointwise for By some elementary analysis, we have as nk->{infty}. Similar arguments lead to the convergence of Rk(t, xvl)/nk to rk(t, xvl) in probability, uniformly in

Next, applying the modern empirical process theory (van der Vaart, 1998Go), we show that converges weakly to a mean-zero Gaussian process with continuous paths. Let

We have Let be the class of coordinate projections such that for Let 0=t0 < t1 < t2 < ··· < tR={tau} and By the monotone properties of Nki(t, xvl), Rki(t, xvl) and {Lambda}k(t, xvl) on each coordinate, we have, for (t,x) [tr–1, tr]x[xm–1, xm], and For any {epsilon} > 0, we can take the number of grids, R and M, in t and in x to be at the order of 1/{varepsilon} such that under the continuity assumptions on the distributions. Hence the bracketing number is of the polynomial order (1/{epsilon})5, following the arguments in the proof of Theorem 19.5 and Example 19.6 (van der Vaart, 1998Go). Therefore, the bracketing integral J[](1,, L2(