Skip Navigation



Biostatistics Advance Access published online on July 11, 2007

Biostatistics, doi:10.1093/biostatistics/kxm023
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
9/1/172    most recent
kxm023v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Riley, R. D.
Right arrow Articles by Abrams, K. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Riley, R. D.
Right arrow Articles by Abrams, K. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown

Richard D. Riley*

Centre for Medical Statistics and Health Evaluation, Faculty of Medicine, University of Liverpool, Shelley's Cottage, Brownlow Street, Liverpool, England L69 3GS richard.riley{at}liv.ac.uk

John R. Thompson and Keith R. Abrams

Centre for Biostatistics and Genetic Epidemiology, Department of Health Sciences, University of Leicester, Second Floor, Adrian Building, University Road, Leicester, England LE1 7RH

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 
Multivariate meta-analysis models can be used to synthesize multiple, correlated endpoints such as overall and disease-free survival. A hierarchical framework for multivariate random-effects meta-analysis includes both within-study and between-study correlation. The within-study correlations are assumed known, but they are usually unavailable, which limits the multivariate approach in practice. In this paper, we consider synthesis of 2 correlated endpoints and propose an alternative model for bivariate random-effects meta-analysis (BRMA). This model maintains the individual weighting of each study in the analysis but includes only one overall correlation parameter, {rho}, which removes the need to know the within-study correlations. Further, the only data needed to fit the model are those required for a separate univariate random-effects meta-analysis (URMA) of each endpoint, currently the common approach in practice. This makes the alternative model immediately applicable to a wide variety of evidence synthesis situations, including studies of prognosis and surrogate outcomes. We examine the performance of the alternative model through analytic assessment, a realistic simulation study, and application to data sets from the literature. Our results show that, unless Formula is very close to 1 or –1, the alternative model produces appropriate pooled estimates with little bias that (i) are very similar to those from a fully hierarchical BRMA model where the within-study correlations are known and (ii) have better statistical properties than those from separate URMAs, especially given missing data. The alternative model is also less prone to estimation at parameter space boundaries than the fully hierarchical model and thus may be preferred even when the within-study correlations are known. It also suitably estimates a function of the pooled estimates and their correlation; however, it only provides an approximate indication of the between-study variation. The alternative model greatly facilitates the utilization of correlation in meta-analysis and should allow an increased application of BRMA in practice.

Keywords: Correlation; Evidence synthesis; Multiple outcomes; Multivariate random-effects meta-analysis; Systematic review


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 
Meta-analysis methods combine quantitative data from several studies to produce pooled results that aid evidence-based decision making. Multiple pooled results are required whenever there are multiple outcomes (Berkey and others, 1998) or multiple treatment groups (Hasselblad, 1998). For example, in diagnostic studies both sensitivity and specificity are of interest (Reitsma and others, 2005; Harbord and others, 2006), while in prognostic studies often both overall and disease-free survival are important (Riley and others, 2007a). In such situations, the common practice is to perform a univariate meta-analysis for each endpoint independently. This approach is simple but ignores the potential correlation between endpoints. On the other hand, a multivariate meta-analysis jointly synthesizes the endpoints and can incorporate their correlation (Raudenbush and others, 1988; Becker and others, 2000; Van Houwelingen and others, 2002). This can improve efficiency over separate univariate syntheses (Riley and others, 2007a) and allows the association between endpoints to be modeled. This facilitates the identification of surrogate outcomes (Daniels and Hughes, 1997) and the production of joint confidence or prediction regions (Reitsma and others, 2005).

In multivariate random-effects meta-analysis, both within-study and between-study correlation can be incorporated. The "within-study correlation" indicates the association between endpoint estimates within a study. In some situations, this might be assumed zero, for example, where the 2 endpoints are sensitivity and specificity which are calculated using separate sets of patients (Reitsma and others, 2005); however, for structurally dependent endpoints, like overall and disease-free survival, the within-study correlation is likely to be nonzero (Riley and others, 2007a). The "between-study correlation" indicates how the underlying true endpoint values are related across studies, perhaps because of differences across studies in patient-level characteristics, such as age, or changes in study-level characteristics, such as the threshold level in diagnostic studies. Both within-study and between-study correlation can influence the meta-analysis results (Riley and others, 2007a). The within-study correlation is most influential in the analysis when the within-study variation (i.e. the sampling error of study estimates) is large relative to the between-study variation in the underlying true study values, and the converse is true for the between-study correlation.

The within-study correlations are usually assumed known but in practice they may be difficult to obtain, especially from published information (Becker and others, 2000). Calculation of the within-study correlation may also be nontrivial, perhaps requiring bootstrap methods (Daniels and Hughes, 1997), and study authors may be unable to provide the correlation even on request. A number of articles consider the problem of unavailable within-study correlations. Berkey and others (1996) assess how their results change for a range of different within-study correlation values, while Nam and others (2003) perform sensitivity analyses using a range of different prior distributions for the unknown correlations. Where the multiple endpoints are survival proportions, Dear (1994) suggests a method for retrospectively estimating the within-study correlations. Raudenbush and others (1988) suggest using the known correlation from external data as an approximation, while Berrington and Cox (2003) limit the range of possible values for the unknown correlation between multiple relative risks.

Another issue in multivariate random-effects meta-analysis is the estimation of between-study correlation. Riley and others (2007b) consider maximum likelihood estimation and show that the between-study correlation is often estimated as 1 or –1, even when the within-study correlations are known. This occurs because the maximum likelihood estimator truncates the between-study covariance matrix on the boundary of its parameter space, and this often occurs when the within-study variation is relatively large or the number of studies is small. However, it is associated with an upward bias in the between-study variance estimates, which can inflate the mean-square error and standard error of pooled estimates. Thompson and others (2005) suggest a reparameterized model that avoids estimating the between-study correlation, at the loss of making strong within-study and between-study correlation assumptions.

In this paper, we consider synthesis of 2 correlated endpoints and propose an alternative model for bivariate random-effects meta-analysis (BRMA) to help alleviate the aforementioned problems associated with the within-study and between-study correlation. Our new model includes only one overall correlation parameter, which removes the need to know the within-study correlations or estimate the between-study correlation, and we show that the alternative model produces appropriate pooled estimates that are superior to those from separate univariate syntheses. In Section 2, we introduce our alternative model in relation to the general BRMA model proposed by Van Houwelingen and others (2002). In Section 3, we compare analytically the models and perform a realistic simulation study to examine the statistical properties of their estimates. In Section 4, we then illustrate the benefits and limitations of the alternative model through application to data sets from the literature. Section 5 contains a critical discussion and makes recommendations for practice and for future research priorities.


    2. MODELS FOR BRMA
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 

2.1 The general model for BRMA

Suppose that 2 endpoints, j = 1 or 2, are available from each of i = 1to n studies. Each study supplies summary measures, Yij, and associated standard errors, sij, for each endpoint. For instance, for prognostic studies Riley and others (2007a) set Yi1 and Yi2 to be the log-hazard ratio for disease-free survival and overall survival, respectively. Each summary statistic (Yij) is assumed to be an estimate of a true value ({theta}ij) in each study, and in a hierarchical structure each {theta}ij is assumed to be drawn from a distribution with mean, or "pooled," value ßj and between-study variance {tau}Formula. If Yij and {theta}ij are both assumed normally distributed, then the general BRMA model can be written as (Van Houwelingen and others, 2002)

Formula (2.1)

In this general model, {delta}i and {Omega} are the within-study and between-study covariance matrices, respectively, and the model is equivalent to 2 independent univariate random-effect meta-analyses (URMAs) when the within-study correlations {rho}Wi and the between-study correlation {rho}B are all zero. In (2.1), it is common to assume that the sFormula and the {rho}Wi are known (Berkey and others, 1998; Van Houwelingen and others, 2002). The usually objective from the BRMA is to estimate ß1 and ß2 or some function of these pooled values, for example, ß1ß2. However, the estimate of correlation between ß1 and ß2 may also be of importance, for example, when assessing whether one endpoint could be a surrogate for the other (Daniels and Hughes, 1997) or calculating joint confidence regions. The {theta}ij in (2.1) are often considered nuisance parameters and are rarely of interest. Inference and estimation are thus usually based on the marginal model, which can be written as

Formula (2.2)

In this paper, we estimate the between-study parameters ({tau}Formula, {tau}Formula, and {rho}B) and the 2 pooled values ß1 and ß2 iteratively using restricted maximum likelihood (REML) in SAS Proc Mixed, as described by Van Houwelingen and others (2002). We also use Cholesky decomposition (Gentle 1998) of {Omega} to ensure that this matrix is estimated to be positive semi-definite and therefore that the between-study correlation estimate, Formula, is in the range [ – 1,1].

2.2 An alternative model for BRMA

The general BRMA model of (2.1) partitions the observed variation of the Yi1 and Yi2 into within-study and between-study variation using a fully hierarchical structure. Similarly, (2.1) partitions the observed correlation between the Yi1 and the Yi2 into within-study and between-study correlation. Given known within-study correlations, one can then, at least in principle, estimate the between-study correlation. However, the within-study correlations are unlikely to be available in practice and so application of the general model may be difficult. Consider now an alternative model for BRMA, where we continue to partition the overall variation but now do not partition the overall correlation. That is, rather than partitioning the overall correlation into within-study and between-study parameters, we propose a "single" parameter, {rho}, to model directly the overall correlation. This situation is specified in (2.3), which is essentially a modification of the marginal model in (2.2) to take into account the overall correlation parameter:

Formula (2.3)

By modeling the overall correlation directly, we obtain the desirable property that the within-study correlations are not required, unlike in the general model. The only information required to fit (2.3) are the Yij and the sFormula, that is, the same information needed to fit a URMA for each endpoint independently. GoEquation (2.3) thus provides a way to synthesize 2 endpoints and utilize their correlation, even when the within-study correlations are unknown. It can also include studies that provide only one of the 2 endpoints under a missing at random assumption.

Note that the within-study variances (i.e. the sFormula) are still specified and assumed known in the alternative model in order to preserve the individual weighting of each study. The additional variation beyond sampling error is indicated by {psi}Formula. However, {psi}Formula is not directly equivalent to {tau}Formula, the between-study variance in the general model, although in some circumstances they may be similar (see Sections 3.1 and 3.2). The reason they are different is that by not partitioning the overall correlation, the alternative model does not have a fully hierarchical structure. In Sections 3 and 4, we will examine what effect this has on the pooled estimates from the alternative model and investigate if and how they differ from the pooled estimates from the general model. In this paper, we estimate ß1, ß2, {psi}Formula, {psi}Formula, and {rho} in (2.3) using a self-written program in Stata that implements the "maximize" procedure (see supplementary material available at Biostatistics online, http://www.biostatistics.oxfordjournals.org). It is not possible to use SAS Proc Mixed due to the nonhierarchical structure of the alternative model. Our Stata program uses the Newton–Raphson procedure to maximize iteratively the restricted log-likelihood, which can be specified as

Formula (2.4)

where n is the number of studies, k = 2 as there are 2 endpoints, Y is a vector of the Yij, X is the design matrix, and {Phi} is a square matrix with diagonal components {Phi}i. Our Stata program ensured that {Phi} is estimated positive definite by modeling {psi}Formula and {psi}Formula on the log-scale (see supplementary material available at Biostatistics online, http://www.biostatistics.oxfordjournals.org).

We verified our Stata program by also specifying and estimating the general model in this way and found that the REML estimates were practically identical to those obtained from fitting the general model in SAS Proc Mixed.


    3. COMPARISON OF THE ALTERNATIVE MODEL WITH THE GENERAL MODEL
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 

3.1 Analytic comparison of the covariance

It is important to compare the alternative model directly to the general model as the latter is likely to be more realistic because of its hierarchical structure. We first compare the covariance between Yi1 and Yi2 from each model, that is, the off-diagonal components of Vi and {Phi}i in (2.2) and (2.3):

Formula (3.1)


Formula (3.2)

As the additional variation beyond sampling error increases so that the sFormula become relatively small, cov(Yi1,Yi2) will tend to {rho}B{tau}1{tau}2 in the general model and to {rho}{psi}1{psi}2 in the alternative model. These are likely to be similar because the within-study parameters have little influence and so the overall correlation, {rho}, will be based mainly on {rho}B, the between-study correlation; similarly, {psi}Formula is also likely to be very similar to {tau}Formula in this situation. This suggests that the alternative model will closely approximate the general model when the within-study variability is relatively small.

Conversely, as the within-study variability becomes increasingly large, the 2 models are unlikely to be similar. In this situation, cov(Yi1,Yi2) will tend to {rho}Wisi1si2 in the general model and to {rho}si1si2 in the alternative model. These are likely to differ as the {rho}Wi in the general model can vary across the i = 1 to n studies, whereas the alternative model specifies a common {rho}. Further, {rho} is an amalgam of the between-study correlation and the within-study correlations as measured by the observed correlation of the Yi1 and Yi2; however, each {rho}Wi relates to the correlation within a particular study, which is unobservable without the raw study data. Thus, when there is relatively large within-study variability it would seem particularly important to know the within-study correlations.

3.2 Comparison of the models through simulation

To compare the statistical properties of estimates from the general and alternative models, we performed a simulation study using a number of realistic scenarios. Each scenario related to a different specification of the general model of (2.1), from which we generated 1000 meta-analysis data sets for subsequent analysis. We chose to generate data from the general model due to its realistic hierarchical structure. The scenarios differed in 4 factors considered important in practice, namely, (a) the number of studies in the meta-analysis, (b) whether complete data were available for both endpoints or whether some data were missing for endpoint j = 2 (Rubin, 1976), (c) the size of the within-study variation relative to the between-study variation, and (d) the size of the within-study correlations relative to the between-study correlation. Nine such scenarios are described in Tables 1 and 2 as scenarios (i)–(ix). As with any simulation study, these cannot cover all eventualities, but we would claim that they are sufficiently wide ranging to give useful insights into the comparative performance of the different models.


View this table:
[in this window]
[in a new window]

 
Table 1 Simulation results for a selection of complete data scenarios

 

View this table:
[in this window]
[in a new window]

 
Table 2 Simulation results for a selection of missing data scenarios

 
Each of the 1000 meta-analysis data sets generated in each scenario were analyzed separately by fitting (i) 2 separate URMAs (i.e. as (2.1) but assuming zero within-study and between-study correlation), (ii) the general BRMA model of (2.1), with the within-study correlations known, and (iii) the alternative BRMA model of (2.3), which does not require the within-study correlations. REML estimation was used to fit each model, and the mean bias, mean standard error, mean-square error, and coverage of pooled estimates Formula, Formula, and Formula were then compared across models. Details of our simulation exercise are shown elsewhere (Riley and others, 2007b); we now summarize the key findings.

Estimation issues.

When the within-study variation was relatively large, or when the number of studies in the meta-analysis was small (e.g. n = 5), the general model often did not converge or estimated the between-study correlation, {rho}B, as 1 or – 1, a phenomenon detailed elsewhere (Riley and others, 2007b). Also in these situations, the alternative model often did not converge or gave estimates of the overall correlation, {rho}, very close to 1 or – 1, which were associated with unstable pooled estimates and standard errors. As Formula becomes close to 1 or – 1, the determinant of {Phi}i in (2.3) becomes close to zero, causing unstable maximum likelihood solutions; for example, for one of the simulated data sets from scenario (iv), where {rho} was estimated as 0.999, Formula changes rapidly from about – 0.2 to – 0.45 as {rho} moves from 0.95 to 1. Interestingly though, a {rho} very close to 1 or – 1 in the alternative model occurred less frequently than a {rho}B equal to 1 or – 1 in the general model. For example, in scenario (i), 20 of the 1000 simulations estimated Formula as 1 in the general model, whereas {rho} was always estimated to be less than 0.95 in the alternative model with no evidence of unstable solutions. Thus, even when the within-study correlations are known, those practitioners who would rather estimate a correlation away from 1 or – 1 may prefer to fit the alternative model and estimate the overall correlation rather than the between-study correlation (see Section 4.1 for an example).

The problem of non-convergence and unstable estimates when {rho} is close to 1 or – 1 is a clear limitation of the alternative model. Given this, and to aid comparison across models, the results in Tables 1 and 2 only consider those simulated data sets which (a) gave converged estimates for both the general model and the alternative model and (b) gave a value of Formula in the alternative model. The ±0.95 limit was chosen because an inspection of results where Formula was within this range did not raise concerns about unstable estimates. Hence, Tables 1 and 2 essentially indicate the statistical properties of the alternative model when it does fit adequately; what to do when Formula is discussed in Section.

Within-study variance small relative to between-study variance.

When the within-study variation is relatively small (scenarios (i) and (iii)), the alternative model and the general model perform well and produce similar pooled estimates, as reasoned in Section 3.1, with small bias relative to the Monte Carlo standard error of the simulations (Table 1). The estimates of {psi}Formula and {psi}Formula, the additional variation in the alternative model, are also close to the estimates of {tau}Formula and {tau}Formula, the between-study variation in the general model. In comparison to 2 independent URMAs, for complete data there are benefits for estimating ß1 ß2; for example, in scenario (i) the mean standard error (coverage) of Formula is 0.13 (93%) in the general model, 0.14 (94%) in the alternative model, and 0.28 (100%) in the URMA. For the individual pooled estimates themselves, the complete data scenarios indicate that the alternative model offers no benefit over URMA in terms of their standard error and mean-square error; however, it does provide a good estimate of their correlation which may itself be of interest. For example, in scenario (iii), the correlation between Formula and Formula is 0.71 in both the alternative model and the general model, but it is of course zero in the URMA.

For missing data, there are large benefits for estimating ß1 ß2 and noticeable benefits for estimating ß2 due to the missing data for endpoint j = 2. For example, in scenario (vii), where there is large correlation and data missing completely at random in 5 out of 10 studies for endpoint 2, the alternative model reduces the standard error and mean-square error of Formula from the URMA by about 11% and 24%, respectively.

Within-study variance similar to between-study variance.

When the within-study variances were similar in size to the between-study variances (scenarios (ii), (iv)–(ix)), the general model and the alternative model again gave similar pooled estimates with little bias (Tables 1 and 2). On the whole, both BRMA models also produced pooled estimates with better statistical properties than those from URMA (Tables 1 and 2), in the same manner as described above. However, unlike the scenarios where the between-study variance was large, the estimates of {psi}Formula and {psi}Formula are less similar to the estimates of {tau}Formula and {tau}Formula here, due to the within-study variation being more influential as discussed in Section 3.1.

Discrepant within-study correlations across studies.

Scenarios (v) and (viii) involve within-study correlations that vary substantially across studies, being 0.8 in some and – 0.8 in others. The alternative model again gives appropriate estimates with little bias in this situation. For example, in scenario (v), the coverage of Formula is 96% in the alternative model, which is closer to 95%, than the general model (93%) and the URMA (99%). Interestingly, the overall correlation in scenarios (v) and (viii) is much lower here than in other scenarios due to the negative within-study correlations. This causes the correlation between Formula and Formula to be quite small in the alternative model, less than 0.30, and thus there are only very small benefits over URMA. For example, in scenario (viii), where there were data missing completely at random for endpoint 2, the standard error of Formula is 0.15 in the alternative model and 0.18 in the URMA, but the mean standard error of Formula is actually the same in both models. This indicates that, as one would expect, the use of the alternative model rather a URMA is more important when the overall correlation is large.

Extreme missing data scenario.

Scenario (ix) considers a special case of missing data where, after generating complete data from the general model, we deleted the data for endpoint j = 2 if Yi2 < 0, i.e. non-ignorable, rather than completely random missingness (Rubin, 1976). This is akin to authors or journals not reporting negative results for endpoint j = 2. In this situation, the URMA gives estimates of ß2 which are upwardly biased by 0.52 on average (Table 2). However, the alternative model and the general model both "borrow strength" from the data for endpoint j = 1 (Riley and others, 2007a) and considerably reduce this bias by about 38% to 0.32. Similarly, Formula is less biased in the BRMA models. In scenarios like this in practice, it is conceivable that, due to the reduction in bias, the alternative model may even lead to different clinical or scientific conclusions than URMA.

Assuming the within-study correlations are all zero.

For all the scenarios shown in Tables 1 and 2, we also assessed how well the general model performed when, regardless of their true value, we assumed the within-study correlations to be zero (results available upon request). This approach frequently estimates the between-study correlation to be 1 or –1, especially when the true within-study correlations were far from zero, and gives a large upward bias in the estimates of between-study variance; this in turn greatly increases the standard error and mean-square error of pooled estimates and makes them larger than those from the alternative model on average. Thus, as more suitable estimates are available from the alternative model, simply setting the within-study correlations to zero is not generally recommended unless they truly are zero or close to zero as highlighted in some previous applications (Reitsma and others, 2005; Daniels and Hughes, 1997; Korn and others, 2005; Thompson and others, 2005; Van Houwelingen and others, 2002). For example, in scenario (ix), the alternative model reduces the bias of Formula to 0.32, with coverage 12%, but fitting the general model assuming zero within-study correlation reduces the bias of Formula to only 0.42 with coverage 1%.


    4. APPLICATIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 
We now apply the alternative model to 3 data sets from the medical literature, all of which have previously been considered for BRMA. The first example is a rare case where the within-study correlations are known and so the general and alternative models can be directly compared; the other 2 examples involve nonzero but unavailable within-study correlations, the more common situation in practice.

4.1 Surrogate outcomes—Daniels and Hughes data

Daniels and Hughes (1997) assess whether the change in CD4 cell count is a surrogate for time to either development of AIDS or death in drug trials of patients with HIV. They consider between–treatment arm log-hazard ratios of time to onset of AIDS or death (Yi1) and between–treatment arm differences in mean changes in CD4 count (Yi2) from pretreatment baseline to about 6 months. Fifteen relevant trials were identified. Some of the trials involved 3 or 4 treatment arms, but to enable application to BRMA we only consider outcome differences between the control arm and the first treatment arm in the reported data set (Daniels and Hughes, 1997). All 15 studies provided complete data, including the within-study correlations. These were quite small, varying between – 0.22 and 0.17 with a mean of – 0.08. The within-study variances for endpoint 2 had a mean value of 97 and in some studies were so large that endpoint 2 was akin to being missing; this makes a BRMA particularly appealing to borrow strength in the estimation of ß2 (Riley and others, 2007a). The general model estimates the between-study correlation as –1, at the boundary of its parameter space (Table 3), and thus the between-study variances will be inflated (Riley and others, 2007b), which likely explains why the standard errors of the pooled estimates are observed larger than for 2 URMAs. The alternative model, however, estimates a more well-defined overall correlation of – 0.76 and produces pooled estimates with greater precision than those from URMA; for example, the standard error of Formula was 4.87 in the alternative model compared to 5.56 in the URMA, a reduction of about 12%. Furthermore, the correlation between Formula and Formula was estimated as – 0.70 in the alternative model, which is comparable to the estimate of – 0.76 in the general model and more realistic than the estimate of zero from URMA. This estimate of association may itself help facilitate decisions regarding whether CD4 should be used as a surrogate of disease-free survival. This example shows that the alternative model may be considered worthwhile even when the within-study correlations are known.


View this table:
[in this window]
[in a new window]

 
Table 3 Results from the applied examples

 
4.2 Prognostic studies—Riley data

A systematic review in neuroblastoma sought to establish the prognostic importance of MYCN, a proto-oncogene (Riley and others, 2004a). In 17 studies, a log-hazard ratio estimate for "amplified" versus "non-amplified" MYCN was available for both disease-free survival (Yi1) and overall survival (Yi2); however, no studies reported the within-study correlations, which are likely to be strongly positive due to the structural relationship between these endpoints. Further, there were 64 studies which provided data for only one of the 2 endpoints. If one assumes the missing endpoints are missing completely at random, then a BRMA is desirable to increase precision and reduce mean-square error. However, the general model cannot be applied unless one makes some assumptions about the within-study correlation values (Riley and others, 2007a). The alternative model was thus applied, and the overall correlation was estimated as 0.80 (Table 3), which causes the standard error of pooled estimates to be smaller in the alternative model than in the URMA (Table 3); for example, the standard error of Formula is 0.113 in the alternative model compared to 0.127 in the URMA (a reduction of 11%). The pooled estimates themselves are very similar; the alternative model has therefore increased our confidence in the meta-analysis results provided that we can assume endpoints were missing completely at random. Whether this assumption holds is open to debate, especially as publication bias may be affecting this data set (Riley, Sutton and others, 2004), and so the BRMA results should perhaps be treated with caution here. However, it is important to note that the URMA also makes the missing at random assumption, and simulation scenario (ix) suggests that the alternative model is preferable to 2 URMAs even if the data are not missing at random.

4.3 Passive smoking studies—Nam data

Nam and others (2003) consider BRMA where the 2 endpoints are the log-odds ratio for developing asthma (Yi1) and the log-odds ratio for developing lower respiratory disease (Yi2), comparing children exposed and unexposed to passive smoking. Fifty-nine relevant studies were identified of which 8 reported both endpoints but without the within-study correlations. As 51 studies only reported data for 1 endpoint, a BRMA is appealing here in order to borrow strength (Riley and others, 2007a). However, the general model is not applicable without making some assumptions about the missing within-study correlations or by performing sensitivity analyses (Nam and others, 2003). We applied the alternative model, but the overall correlation was poorly estimated as 0.997 and spurious standard errors were evident for the pooled estimates (Table 3). Further, small changes in the overall correlation caused large changes in the pooled results, and thus the alternative model results should not be used here. This failure is likely due to the within-study variances being relatively large. For instance, the between-study variance for endpoint 2 is estimated as 0.019 in the URMA, whereas the mean within-study variance for endpoint 2 is 0.23, about 12 times larger. In this situation, the alternative model is less appropriate (see Section 3.1) and another approach for dealing with the missing within-study correlations is required; for example, one could perform sensitivity analyses by fitting the general model using either imputed within-study correlation values (Berkey and others, 1996) or a range of different prior distributions for the unknown correlations (Nam and others, 2003).


    5. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 
The Campbell Collaboration states that meta-analysts "should not ignore the dependence among study outcomes" and "should use some procedure to deal with dependence" (Becker and others, 2004). Multivariate meta-analysis models can facilitate this requirement as they enable the synthesis of multiple, correlated endpoints of interest. However, application to settings involving nonzero within-study correlations is difficult as studies do not usually report the correlation between endpoint estimates; this is also the case for other measures of correlation like the intra-class correlation coefficient in cluster trials (Campbell and others, 2004). Unfortunately, without the within-study correlations it is difficult to fit the general multivariate meta-analysis model. For this reason, application of multivariate meta-analysis "may require data imputation, which can be both complex and problematic" (Becker and others, 2004). Various articles have considered how to deal with unavailable within-study correlations, such as Berkey and others (1996) and Nam and others (2003). In this paper, we have introduced an alternative model for BRMA, which provides a further option for synthesizing 2 endpoints when their within-study correlations are unknown. Our new model maintains the individual weighting of each study in the analysis but includes only one overall correlation parameter, which can be considered a hybrid measure of the within-study and between-study correlations. Importantly, this removes the need to know the within-study correlations, and the data required to fit the model are the same as are needed for a separate URMA of each endpoint. Section 4 highlights some areas of potential application; others may include education (Becker and others, 2000), psychology (Raudenbush and others, 1988), and genetics (Thompson and others, 2005).

An important assumption of both the alternative and the general models is that the endpoints are normally distributed. This is commonly used in meta-analysis, for example, where the Yij are log-odds ratios (Nam and others, 2003), mean differences (Berkey and others, 1998), and log-event rates (Arends and others, 2003). Yet, it is not always appropriate. For example, for synthesis of logit-sensitivity and logit-specificity from diagnostic studies, a bivariate generalized model is preferred as the normality assumption breaks down when the proportions are close to 0 or 1 (Chu and Cole, 2006; Harbord and others, 2006). Where the Yij can be assumed normally distributed, our simulation study shows that when Formula the alternative model produces pooled estimates with little bias that are similar to those obtained from the general BRMA model when the within-study correlations are known. The alternative model also offers benefits over 2 URMAs in a similar manner identified elsewhere for the general model (Riley and others, 2007b). That is, the mean-square error is smaller and standard error more appropriate for a contrast of the pooled estimates, for example, Formula. Also, when some data are missing completely at random the alternative model produces, on average, a smaller standard error and mean-square error for the individual pooled estimates themselves. Such a missing data assumption may be hard to justify in practice (Rubin, 1976), but scenario (ix) shows that the alternative model can outperform URMA even for non-ignorable missing data. The benefits of the alternative model increase as the overall correlation increases. Also, an estimate of the correlation between Formula and Formula from the alternative model may be useful for making predictions (Daniels and Hughes, 1997) or calculating joint confidence regions (Reitsma and others, 2005).

If practitioners are fortunate to have the within-study correlations available, or if they can be assumed zero (Thompson and others, 2005; Arends and others, 2003), then we recommend that they still perform a BRMA using the general model as this has a more realistic, hierarchical structure and allows estimation of the between-study variances, {tau}Formula and {tau}Formula, whereas the alternative model only provides an approximate indication of these (see Section 3.1). However, even when the within-study correlations are known, the general model may still estimate the between-study correlation as 1 or – 1, which leads to upwardly biased between-study variance estimates and thus an increase in the standard error and mean-square error of pooled estimates (Riley and others, 2007b). In this situation, one might still apply the alternative model as the overall correlation is often estimated more easily and away from the edge of its parameter space (see Section 4.1). Occasionally, estimation issues also arise for the alternative model (see Section 4.3); in particular, when the overall correlation is very close to 1 or – 1 the pooled estimates may be unstable. This occurs most often when the within-study variation is relatively large, in which case a bivariate fixed-effects meta-analysis may be more appropriate (Berkey and others, 1995). Indeed, in those scenarios where the alternative model has difficulties, it seems more important to specify the within-study correlations and in which case imputing a range of values within the general model may be the best approach to take (Berkey and others, 1996). We restricted our use of the alternative model to when Formula was between – 0.95 and 0.95, within which we did not observe problems of unstable estimates; this may have been conservative as most problems arose when Formula. A cautious suggestion is that practitioners who obtain a high overall correlation, say > 0.9 in absolute value, should assess the robustness of pooled results to small changes in Formula as a sensitivity analysis.

It is always preferable to explain the between-study variability where possible (Thompson, 1994). The alternative model can be extended to a bivariate meta-regression to include additional study-level covariates (Berkey and others, 1998). However, this approach will reduce the between-study variation and make the within-study variation relatively large; this will in turn make estimating the overall correlation difficult. A multivariate meta-regression approach may thus be difficult without the within-study correlations. The ideal way to examine heterogeneity is to obtain and directly model the individual patient data for each study (Lambert and others, 2002), and this itself would negate the problem of unavailable within-study correlations. Yet, individual patient data may not always be available (Riley, Look and others, 2007) and in practice meta-analysts tend to reduce their available individual patient data to aggregate data for synthesis, suggesting that they are more comfortable with traditional aggregate data techniques (Simmonds and others, 2005). For future research, it is important to assess if and how the alternative model may help identify surrogate outcomes (Daniels and Hughes, 1997), make predictions, or calculate predictive regions. Extension to 3 or more correlated endpoints would also be interesting (Berkey and others, 1996). The underlying assumptions regarding the multivariate models perhaps also require more critical evaluation; for example, are Yi1 and Yi2 truly sampled from the same distribution, and can one really assume that the sFormula and sFormula are known? A Bayesian approach may overcome the latter issue as it accounts for all parameter uncertainty (Nam and others, 2003). The impact of dissemination bias within multivariate meta-analysis also warrants attention (Riley, Sutton and others, 2004).


    FUNDING
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 
Department of Health's National Coordinating Centre for Research Capacity Development (RSES C2/ PDA/015) to R.R. as a Research Scientist in Evidence Synthesis.


    ACKNOWLEDGMENTS
 
We would like to thank Alex Sutton and Paul Lambert for helpful discussions regarding bivariate meta-analysis models. We also thank Peter Diggle and the 2 referees, whose comments and suggestions have greatly improved the content of this paper. Conflicts of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. MODELS FOR BRMA
 3. COMPARISON OF THE...
 4. APPLICATIONS
 5. DISCUSSION
 FUNDING
 REFERENCES
 

    Arends LR, Voko Z, Stijnen T. Combining multiple outcome measures in a meta-analysis: an application. Statistics in Medicine (2003) 22:1335–1353.[CrossRef][Web of Science][Medline]

    Becker BJ, Hedges LV, Pigott TD. Campbell Collaboration Statistical Analysis Policy Brief. A Campbell Collaboration Resource Document. (2004) Available at http://www.campbellcollaboration.org/ECG/policy_stat.asp.

    Becker BJ, Tinsley HEA, Brown S. Multivariate Meta-Analysis (2000) San Diego, CA: Academic Press.

    Berkey CS, Anderson JJ, Hoaglin DC. Multiple-outcome meta-analysis of clinical trials. Statistics in Medicine (1996) 15:537–557.[CrossRef][Web of Science][Medline]

    Berkey CS, Antczak-Bouckoms A, Hoaglin DC, Mosteller F, Pihlstrom BL. Multiple-outcomes meta-analysis of treatments for periodontal disease. Journal of Dental Research (1995) 74:1030–1039.[Abstract/Free Full Text]

    Berkey CS, Hoaglin DC, Antczak-Bouckoms A, Mosteller F, Colditz GA. Meta-analysis of multiple outcomes by regression with random effects. Statistics in Medicine (1998) 17:2537–2550.[CrossRef][Web of Science][Medline]

    Berrington A, Cox DR. Generalized least squares for the synthesis of correlated information. Biostatistics (2003) 4:423–431.[Abstract]

    Campbell MK, Elbourne DR, Altman DG. CONSORT statement: extension to cluster randomised trials. British Medical Journal (2004) 328:702–708.[Free Full Text]

    Chu H, Cole SR. Bivariate meta-analysis for sensitivity and specificity with sparse data: a generalized linear mixed model approach (letter to the. Journal of Clinical Epidemiology (2006) 59:1331–1332.[CrossRef][Web of Science][Medline]

    Daniels MJ, Hughes MD. Meta-analysis for the evaluation of potential surrogate markers. Statistics in Medicine (1997) 16:1965–1982.[CrossRef][Web of Science][Medline]

    Dear KB. Iterative generalized least squares for meta-analysis of survival data at multiple times. Biometrics (1994) 50:989–1002.[CrossRef][Web of Science][Medline]

    Gentle JE. Cholesky factorization. Numerical Linear Algebra for Applications in Statistics (1998) Berlin, Germany: Springer.

    Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA. A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics (2007) 8:239–251.[Abstract/Free Full Text]

    Hasselblad V. Meta-analysis of multitreatment studies. Medical Decision Making (1998) 18:37–43.[Abstract/Free Full Text]

    Korn EL, Albert PS, Mcshane LM. Assessing surrogates as trial endpoints using mixed models. Statistics in Medicine (2005) 24:163–182.[CrossRef][Web of Science][Medline]

    Lambert PC, Sutton AJ, Abrams KR, Jones DR. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. Journal of Clinical Epidemiology (2002) 55:86–94.[CrossRef][Web of Science][Medline]

    Nam IS, Mengersen K, Garthwaite P. Multivariate meta-analysis. Statistics in Medicine (2003) 22:2309–2333.[CrossRef][Web of Science][Medline]

    Raudenbush SW, Becker BJ, Kalaian H. Modeling multivariate effect sizes. Psychological Bulletin (1988) 103:111–120.[CrossRef][Web of Science]

    Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology (2005) 58:982–990.[CrossRef][Web of Science][Medline]

    Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR. An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Statistics in Medicine (2007a) 26:78–97.[CrossRef][Web of Science][Medline]

    Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Medical Research Methodology (2007b) 7:3.[CrossRef][Medline]

    Riley RD, Heney D, Jones DR, Sutton AJ, Lambert PC, Abrams KR, Young B, Wailoo AJ, Burchill SA. A systematic review of molecular and biological tumor markers in neuroblastoma. Clinical Cancer Research (2004) 10:4–12.[CrossRef][Web of Science][Medline]

    Riley RD, Look MP, Simmonds MC. Combining individual patient data and aggregate data in evidence synthesis: a systematic review identified current practice and possible methods. Journal of Clinical Epidemiology (2007) 60:431–439.[Web of Science][Medline]

    Riley RD, Sutton AJ, Abrams KR, Lambert PC. Sensitivity analyses allowed more appropriate and reliable meta-analysis conclusions for multiple outcomes when missing data was present. Journal of Clinical Epidemiology (2004) 57:911–924.[CrossRef][Web of Science][Medline]

    Rubin DB. Inference and missing data. Biometrika (1976) 63:581–592.[Abstract/Free Full Text]

    Simmonds MC, Higgins JPT, Stewart LA, Tierney JF, Clarke MJ, Thompson SG. Meta-analysis of individual patient data from randomized trials: a review of methods used in practice. Clinical Trials (2005) 2:209–217.[CrossRef][Web of Science][Medline]

    Thompson JR, Minelli C, Abrams KR, Tobin MD, Riley RD. Meta-analysis of genetic studies using Mendelian randomization—a multivariate approach. Statistics in Medicine (2005) 24:2241–2254.[CrossRef][Web of Science][Medline]

    Thompson SG. Why sources of heterogeneity in meta-analysis should be investigated. British Medical Journal (1994) 309:1351–1355.[Free Full Text]

    Van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Statistics in Medicine (2002) 21:589–624.[CrossRef][Web of Science][Medline]

    Received October 4, 2006; revised January 31, 2007; revised March 5, 2007; accepted for publication April 25, 2007.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow Supplementary Material
    Right arrow All Versions of this Article:
    9/1/172    most recent
    kxm023v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Riley, R. D.
    Right arrow Articles by Abrams, K. R.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Riley, R. D.
    Right arrow Articles by Abrams, K. R.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?