Biostatistics Advance Access originally published online on October 27, 2006
Biostatistics 2007 8(3):625-631; doi:10.1093/biostatistics/kxl034
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample size determination for matched-pair equivalence trials using rate ratio
Department of Statistics, Yunnan University, Kunming 650091, China
Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong mltang{at}math.hkbu.edu.hk
Department of Statistics, Yunnan University, Kunming 650091, China
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
In this article, we compare Wald-type, logarithmic transformation, and Fieller-type statistics for the classical 2-sided equivalence testing of the rate ratio under matched-pair designs with a binary end point. These statistics can be implemented through sample-based, constrained least squares estimation and constrained maximum likelihood (CML) estimation methods. Sample size formulae based on the CML estimation method are developed. We consider formulae that control a prespecified power or confidence width. Our simulation studies show that statistics based on the CML estimation method generally outperform other statistics and methods with respect to actual type I error rate and average width of confidence intervals. Also, the corresponding sample size formulae are valid asymptotically in the sense that the exact power and actual coverage probability for the estimated sample size are generally close to their prespecified values. The methods are illustrated with a real example from a clinical laboratory study.
Keywords: Constrained maximum likelihood estimation method; Equivalence study; Sample size formula; Score test statistic
| 1. INTRODUCTION |
|---|
|
|
|---|
Many new treatments have been developed because they offer advantages such as better safety profiles, easier administration or lower cost, while maintaining efficacies similar to those of standard treatments. This changes the nature of the clinical investigation from a superiority to a noninferiority or an equivalence trial design (Hauck and Anderson, 1999).
An example is a clinical laboratory study of several radio allegro sorbent test (RAST) methods (Garcia and others, 1997
). Briefly, the rate of production of in vitro immunoglobulin E (IgE) antibodies to the benzylpenicilloyl (PBO) determinant is a useful tool for evaluating suspected penicillin-allergic subjects. PBO conjugated to human serum albumin (HSA) is usually considered to be the standard test, and PBO conjugated to an aminospacer (SP) has only recently been suggested. The objective of this trial is to demonstrate that the ratio of the true success rates lies between a pair of clinically acceptable equivalence margins
0 and
1, with
0 <
1. However, only 60 samples of sera were obtained. It was reported that this could be an undersized study, and a proper sample size determination was needed once
0 and
1 were fixed (Tang, 2003
).
Sample size determination for assessing the equivalence/noninferiority of 2 treatments via a rate ratio under a matched-pair design has only recently been studied (see Tang, 2003
and references therein). Adopting the logarithmic transformation and Fieller-type statistics based on sample-based and constrained least squares estimations of nuisance parameters, Lui and Cumberland (2001)
developed sample size formulae for noninferiority testing of the rate ratio in matched-pair designs. Nam and Blackwelder (2002)
derived sample size formulae based on a Wald-type statistic and the constrained maximum likelihood (CML) Fieller-type statistic. Tang and others (2002)
developed sample size formulae that control the desired power and confidence width based on a score-type statistic. Tang (2003)
found that the score-type statistic of Tang and others is identical to Nam and Blackwelder's CML statistic and that the sample size formulae based on the CML estimation method are valid asymptotically. However, all these studies have mainly been concerned with noninferiority testing. Systematic evaluations of equivalence trials via rate ratios have not been carried out.
In this article, we consider the problem of testing equivalence via a rate ratio. We compare the performance of Wald-type, logarithmic transformation, and Fieller-type statistics. These statistics are implemented through sample-based, constrained least squares estimation, and CML estimation methods. We discuss both significance testing and confidence interval approaches. We then consider sample size formulae for different tests and approaches based on the CML estimation method. Simulations are conducted to demonstrate the asymptotic validity of the proposed formulae. We illustrate our methodologies with the aforementioned clinical laboratory study. Finally, we give a brief discussion.
| 2. PROCEDURES FOR EQUIVALENCE HYPOTHESIS TESTS |
|---|
|
|
|---|
We assume that the disease status (i.e. diseased or nondiseased) of a given subject can be determined by a gold standard. A random sample of size ng subjects is drawn from the diseased (g = d) and nondiseased (
) populations. A reference diagnostic test and a new diagnostic test are then administered to each of these ng sampled subjects in random order. We define "concordant" as a positive test result on a diseased subject or as a negative test result on a nondiseased subject. Let i = 1 (j = 1), if a subject shows a concordant result in the new (reference) test; otherwise i = 0 (j = 0). Let Xijg be the number of subjects that show result (i,j) (i = 0,1,j = 0,1) in the gth population (
) and xijg the corresponding observed value of Xijg. The 4 outcomes and probabilities in population g are summarized in Table 1, where 0
pijg
1 denotes the response probability of cell (i,j), pi + g = pi1g + pi0g and p + jg = p1jg + p0jg, i,j = 0,1; xi + g = xi1g + xi0g and x + jg = x1jg + x0jg,i,j = 0,1. Note that ng = x1 + g + x0 + g = x + 1g + x + 0g,
. Hence, the sensitivities for the new and reference diagnosis tests are given by p1 + d and p + 1d, respectively. Similarly, the specificities for the new and reference diagnosis tests are given by
and
, respectively. Let
,
, and
be the sample-based estimates of pijg,pi + g, and p + jg, for i,j = 0,1 and
, respectively. The ratio p1 + d/p + 1d (or
) provides a measure for assessing equivalence between 2 test procedures in terms of sensitivity (or specificity), and the vector (x11g,x10g,x01g) follows a multinomial distribution with response probabilities (p11g,p10g,p01g). The equivalence between the new and reference diagnosis procedures can be described by the following interval hypotheses:
|
| (2.1) |
where
0g <
1g are predetermined clinically meaningful lower and upper equivalence limits.
|
To test the interval hypothesis H0g in (2.1), we adopt the widely used two one-sided tests (TOST) approach. This consists of testing the following 1-sided hypotheses (see, Dunnett and Gent, 1977
|
| (2.2) |
|
| (2.3) |
and taking the p value of the equivalence test for (2.1) to be the maximum of the p values of the TOST for (2.2) and (2.3) (Berger and Hsu, 1996
).
Let
and
be any estimates of p + 1g and p11g under the null hypothesis H0kg, for k = l,u, respectively, and let z
be the upper 100
percentile of the standard normal distribution. For sufficiently large ng, we consider the following equivalence tests for hypotheses (2.2) and (2.3) at level
. All derivations are presented in the supplementary material available at Biostatistics online. Reject H0g at the level
if
(T1) Wald-type tests based on measurement
g = p1 + g/p + 1g:
|
| (2.4) |
(T2) Logarithmic transformation tests based on measurement log(
g) = log(p1 + g/p + 1g):
|
| (2.5) |
(T3) Fieller-type tests based on measurement p1 + g
gp + 1g:
|
| (2.6) |
As reported by Farrington and Manning (1990)
and Tang (2003)
, the choices of
,
,
, and
have a substantial impact on the performance of the test statistics in (2.4), (2.5), and (2.6). Here, we consider 3 methods to estimate
, and
. They are the sample-based method (M1), the constrained least squares method (M2), and the CML method (M3) (see the supplementary material available at Biostatistics online).
Note that the statistic T1lg based on the CML method is identical to the CML Fieller-type statistic and the score-type statistic (see, Nam and Blackwelder, 2002
; Tang and others 2002
), also that T1lg and T2lg could be undefined when
= x + 1g/ng = 0. To overcome this, we add 0.5 to x + 1g.
In some applications, one may want simultaneously to assess the equivalence of the sensitivity and the specificity of a new test and a reference test. In this case, we can establish equivalence between the new and the reference tests only when we are able to reject both hypotheses H0d:p1 + d/p + 1d
0dorp1 + d/p + 1d
1d and
. Hence, the intersectionunion test discussed by Berger and Hsu (1996)
can be used. That is, the equivalence of a new test to a reference test can be established at level
if Tkld
z
, Tkud
z
,
, and
, for k = 1,2,3.
Often, the estimation of treatment difference is of more interest than the testing of specific hypotheses. It is well known that an equivalence hypothesis can be tested via the confidence interval approach (Tang and others 2002
; Liu and others 2002
). Briefly, the equivalence between the test and reference procedures can be established at the
level of significance, if and only if the corresponding 100x(1
) percent confidence interval lies entirely in the interval (
0g,
1g). It then follows from (2.4)(2.6) that the 100x(1
) percent asymptotic test-based confidence intervals are given by
, where
|
|
with k' = 1,2,3.
| 3. SAMPLE SIZE FORMULAE |
|---|
|
|
|---|
Determining the appropriate sample size for an equivalence trial is an essential step in any statistical design. In general, sample size planning can be approached from 2 different perspectives, namely the significance testing and confidence interval approaches. In particular cases, it is pertinent to include sample size planning in order to likely accomplish the goals of the study from the significance testing approach, the confidence interval approach, or a combination of the two. Derivations of the approximate sample size formulae for the 3 proposed statistics (T1, T2, and T3) and the 2 approaches (significance testing and confidence interval approaches) based on the CML method (M3) are presented in the supplementary material available at Biostatistics online.
The supplementary material available at Biostatistics online also compares the performances of the 3 proposed statistics (T1, T2, and T3) using the 3 different estimation methods (M1, M2, and M3). The results can be summarized as follows:
- In general, T2 and T3, using M3, give consistently good performance.
- In particular, T3 demonstrates robust behavior for almost all settings, while T2 gives conservative performance when sample size is small and moderate. We therefore recommend T3 using M3.
- All sample size formulae for controlling a prespecified power are asymptotically valid in the sense that the exact power for the estimated sample size is close to the prespecified power level.
- Similarly, all sample size formulae for controlling the confidence interval width are asymptotically valid in the sense that both the prespecified coverage and half-width are well controlled.
- Required sample sizes are generally smaller for T3, than for T1 or T2.
| 4. ANALYSIS OF THE LABORATORY STUDY |
|---|
|
|
|---|
In this section, we illustrate our proposed methodology using as an example the clinical laboratory study described in Section 1. Briefly, 30 positive control sera (serum samples from penicillin-allergic subjects with a positive clinical history and a positive penicillin skin test) and 30 negative control sera (sera from subjects with no history of penicillin allergy and a negative skin test) were tested for BPO determinantspecific IgE antibodies by RAST using different conjugates coupled to the solid phase. The standard procedure is benzylpenicillin conjugated to HSA and the new procedure is benzylpenicillin conjugated to SP. The results are summarized in Table 2.
|
Suppose that we want to show that BPO-SP is equivalent to BPO-HSA on the basis of the specificity and/or sensitivity. The null hypothesis of interest is
|
|
where
0g = 0.9 and
1g = 1/
0g for g = d or
. We applied to this data set the 9 equivalence tests that have been presented in this paper. If we take the significance level of the 2 1-sided equivalence tests to be 0.05, all tests except the Wald test based on the CML estimation method yield p values which are greater than the prespecified 0.05 nominal level, indicating that BPO-SP is not equivalent to BPO-HSA based on sensitivity and/or specificity.
Suppose that we focus now on the CML method. We want to plan a trial with 
, and p10g = p1 + g p11g =
gp + 1g p11g for g = d or
. The sample sizes that are required for obtaining 80% power with a 0.05 significance level are 122, 121, and 117 for the Wald-type statistic, the logarithmic transformation statistic, and the Fieller-type statistic, respectively, on the basis of the sensitivity, while their sample sizes are 56, 55, and 52, respectively, based on the specificity. Clearly, the sample sizes that are obtained from different statistics do not differ substantially in this example.
Suppose now that another investigator wants to rerun the experiment using similar settings, but with the aim of estimating the rate ratios of the sensitivities and specificities (i.e.
d and
) and to construct the corresponding 90% confidence intervals with both half-widths being controlled at prescribed values. According to the data, the estimates of p1 + d, p + 1d,
, and
are given by
,
,
, and
, respectively. The estimates of
d and
are given by
and
, respectively. Based on these, we set
d = 1.4,
, p + 1d = 0.6,
, p10d = 0.3, and
. The corresponding sample sizes for the 90% confidence intervals with half-width controlled at each of w = 0.05 and 0.1 are reported in Table 3. In all the cases, we observe that the sample sizes based on the logarithmic transformation statistic are slightly smaller than those based on Wald-type (or Fieller-type) statistic.
| 5. CONCLUSION |
|---|
|
|
|---|
Our findings in this article are consistent with those of comparative binomial trials (see Farrington and Manning, 1990
We consider sensitivity and specificity separately, although there are summary indices that combine both sensitivity and specificity. Two common choices for this purpose include Youden's index and the likelihood ratio of a positive (or negative) test (see, Biggerstaff, 2000
). Extension of the present work to these indices is under consideration.
|
| ACKNOWLEDGMENTS |
|---|
The first author's work was sponsored by the National Natural Science Foundation of China (Project no. 10561008) and Natural Science Fund of Yunnan Province (Project no. 2004A0002M). The second author's work was fully supported by a grant from the Research Grant Council of the Hong Kong Special Administration (Project no. CUHK4371/04M). The authors are grateful to the editor and referees for their valuable suggestions that greatly enhanced the manuscript and to Professor N. Balakrishnan for reading the article for us. The second author would like to thank Ms Chow Hoi-Sze Daisy for her kind encouragement during the preparation of the manuscript. Conflict of Interest: None declared.
| REFERENCES |
|---|
|
|
|---|
-
Berger JR, Hsu J. Bioequivalence trials, intersection union tests and equivalence confidence sets (with discussion). Statistical Science (1996) 11:283319.[CrossRef][Web of Science]
Biggerstaff BJ. Comparing diagnostic tests: a simple graphic using likelihood ratios. Statistics in Medicine (2000) 19:649663.[CrossRef][Web of Science][Medline]
Dunnett CW, Gent M. Significance testing to establish equivalence between treatments, with special reference to data in the form of 2x2 tables. Biometrics (1977) 33:593602.[CrossRef][Web of Science][Medline]
Farrington CP, Manning G. Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Statistics in Medicine (1990) 9:14471454.[Web of Science][Medline]
Garcia JJ, Blanca M, Moreno F, Vega JM, Mayorga C, Fernandez J, Juarez C, Romano A, De Ramon E. Determination of IgE antibodies to the benzylpenicilloyl determinant: a comparison of the sensitivity and specificity of three radio allegro sorbent test methods. Journal of Clinical Laboratory and Analysis (1997) 11:251257.[CrossRef]
Hauck WW, Anderson S. Some issues in the design and analysis of equivalence trials. Drug Information Journal (1999) 33:109118.[Web of Science]
Liu JP, Hsueh HM, Hsieh E, Chen JJ. Tests for equivalence or non-inferiority for paired binary data. Statistics in Medicine (2002) 21:231245.[CrossRef][Web of Science][Medline]
Lui KJ, Cumberland WG. Sample size determination for equivalence test using rate ratio of sensitivity and specificity in paired sample data. Controlled Clinical Trials (2001) 22:373389.[CrossRef][Web of Science][Medline]
Nam J, Blackwelder WC. Analysis of the ratio of marginal probabilities in a matched-pair setting. Statistics in Medicine (2002) 21:689699.[CrossRef][Web of Science][Medline]
Schuirmann DJ. A comparison of the two one-sided procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics (1987) 15:657680.[CrossRef][Web of Science][Medline]
Tang ML. Matched-pair non-inferiority trials using rate ratio: a comparison of current methods and sample size refinement. Controlled Clinical Trials (2003) 24:364377.[CrossRef][Web of Science][Medline]
Tang ML, Tang NS, Chan ISF, Chan BPS. Sample size determination for establishing equivalence/noninferiority via ratio of two proportions in matched-pair design. Biometrics (2002) 58:957963.[CrossRef][Web of Science][Medline]
Received June 21, 2005; revised December 8, 2005; revised March 25, 2006; revised April 28, 2006; revised July 10, 2006; revised September 17, 2006; accepted for publication October 20, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||