Skip Navigation


Biostatistics Advance Access originally published online on March 10, 2006
Biostatistics 2006 7(4):585-598; doi:10.1093/biostatistics/kxj027
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/4/585    most recent
kxj027v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Mumford, S. L.
Right arrow Articles by Liu, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mumford, S. L.
Right arrow Articles by Liu, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2006.

Pooling biospecimens and limits of detection: effects on ROC curve analysis

Sunni L. Mumford

Division of Epidemiology, Statistics & Prevention, NICHD, NIH, DHHS, Bethesda, MD 20892, USA and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA

Enrique F. Schisterman*, Albert Vexler and Aiyi Liu

Division of Epidemiology, Statistics & Prevention, NICHD, NIH, DHHS, Bethesda, MD 20892, USA schistee{at}mail.nih.gov

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 
Frequently, epidemiological studies deal with two restrictions in the evaluation of biomarkers: cost and instrument sensitivity. Costs can hamper the evaluation of the effectiveness of new biomarkers. In addition, many assays are affected by a limit of detection (LOD), depending on the instrument sensitivity. Two common strategies used to cut costs include taking a random sample of the available samples and pooling biospecimens. We compare the two sampling strategies when an LOD effect exists. These strategies are compared by examining the efficiency of receiver operating characteristic (ROC) curve analysis, specifically the estimation of the area under the ROC curve (AUC) for normally distributed markers. We propose and examine a method to estimate AUC when dealing with data from pooled and unpooled samples where an LOD is in effect. In conclusion, pooling is the most efficient cost-cutting strategy when the LOD affects less than 50% of the data. However, when much more than 50% of the data are affected, utilization of the pooling design is not recommended.

Keywords: Limit of detection; Maximum likelihood; Pooling design; Receiver operating characteristics; Sampling


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 
New biomarkers are continually being researched and developed to detect and prevent various chronic and acute diseases. Biomarkers are distinctive biochemical indicators of biological processes or events that help measure the progress of disease or the effects of treatment. At times, the high cost associated with evaluating these biomarkers can prohibit further investigation. For example, the cost of a single assay measuring polychlorinated biphenyl (PCB) is between $500 and $1000, so only small studies have been able to examine whether PCBs are associated with cancer and with endometriosis (Laden and others, 2001; Laden and Hunter, 1998Go; Louis and others, 2005). In addition to cost constraints, the development of biomarkers for PCBs is also constrained by instrument sensitivity (e.g. Finkelstein and Verma, 2001Go; Hornung and Reed, 1990Go; Lubin and others, 2004). The most common sensitivity limitation occurs when a proportion of study participants have levels at or below the value of the limit of detection (LOD). Under these circumstances, biomarker values above the LOD level are measured and reported; however, values below the LOD cannot be measured and are unobservable.

A critical step in biomarker development is the evaluation of its discriminating ability in terms of receiver operating characteristics (ROC) curves (e.g. Faraggi and Reiser, 2002Go; Shapiro, 1999Go; Wieand and others, 1989). The most commonly used global index of diagnostic accuracy is the area under the ROC curve (AUC). Bamber (1975)Go showed that Formula(Formula). This can be interpreted as the probability that in a randomly selected pair of healthy and diseased individuals, the diagnostic marker value is higher for the diseased subject. Values of AUC close to 1.0 indicate that the marker has high diagnostic accuracy while a value of 0.5 indicates a noninformative marker which does no better than a random (fair) coin toss.

In evaluating the discriminating ability of PCBs, for example, we know that analysis is restricted by the high cost of assays and instrument sensitivity is limited by an LOD. Clearly, estimation of the AUC will be biased if we ignore the LOD issue. We propose and evaluate methodology for AUC estimation under different sampling strategies when faced with LOD and cost limitations. Two common sampling strategies used to ease cost restrictions include pooling biospecimens and taking a simple random sample. Pooling involves physically combining individual blood samples and has been found to be a useful way to cut costs and evaluate biomarkers (e.g. Faraggi and others, 2003; Liu and Schisterman, 2003Go; Schisterman and others, 2005; Weinberg and Umbach, 1999Go). The pooling strategy reasonably assumes that the measurement of the samples being pooled adequately represents the average of the individual unpooled samples, giving the sample mean the properties associated with a mean of n individual measurements. One advantage of pooling is that the amount of information per assay is increased, while the number of assays and the associated cost needed to evaluate this information remains fixed; whereas, taking a random sample of the data reduces the number of assays that need to be performed, but only uses a fraction of the available information. Generally, however, the density of measured pooled biospecimens involves the complex convolution of density functions of individual biological samples and likelihood methods based on pooled data may not be feasible. On the other hand, the density function of the random sample is the same as the original data, avoiding the complexities that come from pooling.

The effect of pooling on ROC curve analysis without an LOD effect has been investigated by Faraggi and others (2003). The authors obtained that, for normally distributed markers, the estimator of the mean based on pooled data is equivalent to that based on the full sample. However, the variance estimator based on pooled data is less efficient than that based on the full sample. For example, Faraggi and others (2003) showed (in terms of efficiency of AUC estimation) that for a true value of Formula, 200 assays of unpooled samples are equivalent to 110 assays of pooled samples of group size 2. The authors demonstrated that the loss of efficiency due to pooling is not of practical importance for Formula and for Formula when the pooling group size is 2. However, evaluation of the AUC estimator based on pooled data subject to an LOD has not been addressed in biostatistical literature.

Schisterman and others (2005b) discussed the benefits of pooling when data are affected by an LOD in the context of estimating the mean of one population. They showed that based on normally distributed data, there is always an interval where the pooling strategy is more efficient than a random sample and sometimes even the full sample, given that inference based on the pooling design provides more numerical information. In our study, we apply maximum likelihood methodology to investigate the joint effect of pooling and LOD on AUC (i.e. the context of testing for separation of two populations). We examine the efficiency of estimation of AUC as a function of the LOD for various sampling strategies. Efficiency is analyzed by comparing the variance between the estimators based on the pooled data and the random sample. We measure loss of information by the change in root mean-squared error (RMSE) of the AUC estimate. We examine the extent of this loss via a simulation study, in which we also investigate the sensitivity of our methodology to departures from normality.

This paper is organized as follows. In Section 2, we formally present notations related to the stated problem. Section 3 presents the maximum likelihood estimator of AUC and the proposed asymptotic distribution of this estimator. Section 4 analyzes efficiency of AUC estimation, dependent on sampling strategy, for various levels of LOD. Section 5 presents a real data example. We give some concluding remarks in Section 6.


    2. FORMALIZATION OF STATED PROBLEM
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 
Let X and Y denote the diagnostic marker measurements for diseased and healthy individuals, respectively. We assume that these measurements follow a normal distribution, i.e.

Formula

When Formula and Formula are completely observed, standard estimates of the unknown parameters Formula, and Formula, are easily obtained. Hence, AUC can then be calculated by replacing the unknown parameters in the following formula with their estimated values:

Formula (2.1)

where Formula However, when an LOD is in effect, only biomarker values above some threshold Formula is the value of LOD) are observed such that

Formula

where N/A (not applicable) represents values less than the threshold value d. Thus, estimation of Formula, Formula, Formula, and Formula, ignoring the LOD will lead to biased results. To this end, we utilize maximum likelihood estimation (MLE) as proposed by Gupta (1952)Go. Details of this method are discussed in (3.1). Using the MLEs for the estimation of Formula and Formula, the AUC can then be obtained by applying (2.1).

Pooled samples are obtained by randomly grouping individuals of similar disease status into groups of size g. The grouped specimens are combined as pooled samples and are tested as single observations. The pooled sample measurements are considered to be the average of the individual samples. Consider the instance where there are n and m pooled observations available for cases and controls, with groups of size Formula and Formula, respectively. Let Formula denote cases and Formula denote controls, such that,

Formula

(with both n and m being integers). By using the additive property of the normal distribution, we obtain that

Formula

For simplicity, let Formula and Formula be equal, Formula. In a manner similar to the unpooled data, the detection limit leads to the definition of the observed sample in the form of

Formula

Since the pooled data (Formula or Formula) follow normal distributions, the technique proposed by Gupta (1952)Go, which corresponds to estimation of the unknown parameters, is still appropriate. Thus, AUC can be estimated by substituting the unknown parameters Formula, Formula, Formula, and Formula, with the maximum likelihood estimators based on Formula We will use the subscript j to denote whether the estimators are being computed from the full sample (Formula), the pooling sample (Formula), or the random sample (Formula). Thus, we specify Formula, Formula, Formula, and Formula as maximum likelihood estimators based upon

Formula


    3. MLE UNDER POOLING AND LOD
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 
Let Formula denote the number of elements of sets Formula, where Formula, respectively.

Similarly, we define Formula as the number of unobserved measurements in these samples. Depending on j and k, the log likelihood functions based on full data, the pooled data, and the random sample are

Formula (3.1)

where Formula and Formula is an individual data point (not N/A) of the considered data sets where Formula and Formula Therefore, the likelihood equations are

Formula

where

Formula

Solving this system of equations yields the MLEs for Formula and Formula, adjusted for pooling and LOD. Certainly, the statistical properties of the estimators depend on the number of observations above the LOD. Since pooling reduces the variability, if Formula the probability that Formula is smaller than the probability that Formula. Therefore, when Formula, a pooled sample is more likely to be observed than an individual sample. The situation reverses when Formula. Thus, the pooled data is less affected by an LOD when the mean is larger than the LOD (Schisterman and others, 2005b). This can be demonstrated by considering an unpooled sample, Formula. The pooled sample then has the following distribution, Formula, based on a pooling group size of Formula. When Formula, 16% of the unpooled observations are censored, whereas only 2% of the pooled observations are censored. When d = 1, 84% of the unpooled observations are censored, whereas 98% of the pooled observations are censored. We will further show that this gain in information for Formula based on the pooling strategy leads to improvements in efficiency.

3.1 Asymptotic distribution of the AUC estimator

In this section we examine the asymptotic distributions of the AUC estimators, which are based on the application of the maximum likelihood technique. Denote the total sample size Formula and assume that

Formula (3.2)

We define the estimators of Formula from (2.1) in the form Formula, where Formula corresponds to estimation based upon the full data, the pooling sample, and the random sample, respectively. Subsequently, Formula and the asymptotic distribution of Formula is derived by using the following proposition.

PROPOSITION 3.1 Let (3.2) hold and Formula be finite. Then, Formula has the asymptotic (as Formula normal distribution with mean zero and covariance matrix Formula (where Formula is defined in Appendix). Proof is given in the Appendix.

Thus, the confidence interval (CI) for AUC is constructed using the following formula:

Formula

where Formula is an estimator of Formula

For different values of d, we graphically present Formula. Figures 1 (a) and (b) are based on 500 cases and 500 controls (full data), where Formula and Formula, corresponding to an Formula. The pool size is set to be Formula and is compared to the results of a simple random sample of Formula. As expected and shown in Figure 1 (a), estimates of AUC based on pooled data have asymptotically lower variance than the random sample until about Formula. For the pooled sample, this corresponds to 75% of the Xs and 92% of the Ys falling below the LOD. For the random sample, this corresponds to 63% of the Xs and 76% of the Ys falling below the LOD. In fact, in some cases (Figure 1 (b)), the variance of the Formula from the pooled data is smaller than the original sample. This result corresponds to the results of Schisterman and others (2005b), where an interval exists where the pooling strategy is more efficient than the full sample, given that the pooling design provides more numerical information.


Figure 1
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Asymptotic variance of the AUC estimator based on full data (curve Formula), pooled data (curve —), and a random sample (curve - - - - -). Based on Formula, Formula. (b) Interval where variance based on pooled sample is smaller than variance based on full data.

 
Consider another application of Proposition 3.1 when we have two populations with fixed sizes Formula. For fixed Formula we then consider the asymptotic variance of Formula as a function of pooling size Formula This variance is defined by Formula and the sample sizes Formula. If Formula, we have that Formula. Hence, depending on d, the value of g that minimizes the variance of Formula can then be recommended. Let, for example Formula, and Formula, corresponding to an Formula and Formula.

Figure 2 plots the asymptotic variance of Formula, for Formula and Formula. In agreement with these graphs, the classical individual measuring of biological samples (i.e. Formula minimizes the variance of the maximum likelihood AUC estimator only if Formula or Formula . This result makes sense intuitively because when no LOD exists (Formula), or when the LOD is above the mean, the full sample contains more information than the pooled sample. For this reason, the panels in Figure 2 corresponding to Formula and Formula have a similar pattern.


Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Asymptotic variance of Formula as a function of pooling group size and LOD for the case of Formula and Formula.

 

    4. SIMULATION STUDY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 
A simulation study was carried out in order to examine the combined effects of pooling and LOD on the AUC estimator. Normally distributed data were generated for both cases and controls at varying levels of separation (Formula), with fixed Formula, Formula, and Formula, and mean Formula obtained by Formula.

The data were then pooled into groups of sizes (Formula) and an LOD was applied. LODs were defined so that a specified percentage of the control population was censored (Formula). Random samples of the data were also taken and the LOD was applied in the same manner. The findings of the simulation study are presented in Table 1. Following Faraggi and others (2003), we considered two general conditions regarding the availability of samples in an experimental setting. The first involves fixing the number of study subjects (Formula), and the second fixes the number of assays (Formula). Results for Formula and Formula were not included due to space limitations. We generated 5000 individual samples from each set of parameters. The relative RMSE was calculated relative to estimates based on the total population as follows:


View this table:
[in this window]
[in a new window]

 
Table 1 Efficiency of sampling schemes when the number of subjects or assays is fixed

 

Formula

where Formula is the RMSE for the total population and Formula is the RMSE for pooled data with pooling size 2 or 4 when Formula, or the RMSE for a random sample of size Formula when Formula. Coverage was calculated by finding the percentage out of 5000 CIs for each set of conditions that contained the true AUC.

First, let the number of subjects available be fixed. As the pooling size increases (Formula), the number of tested samples Formula decreases, and so does the quality of the Formula estimator. As expected, the relative RMSE increased as pool size increased. For LODs less than 60%, no considerable distinction could be made between the RMSE from unpooled data (Formula) and pooled data (Formula). However, when 80% of the control samples were censored, the relative loss of efficiency was about 25%. In addition, for Formula, the relative loss of efficiency was three times that of pairs for all LODs. That result is to be expected when reducing the sample by 75%. The loss of efficiency between the random sample and the unpooled data was about 40%. For LODs less than 60%, pooling was consistently more efficient than random sampling. Coverage tended to decrease as AUC increased and was more conservative when more than 50% of the control samples were censored. Bias for all levels of discrimination and pooling were found to be negligible and were not included in the table due to space limitations. In terms of cost, when the number of subjects is fixed, pooling or taking a random sample will reduce cost by 50% (when Formula). Using Table 1, we can compare the efficiency of the sampling schemes for various values of AUC and LOD. For a fixed number of subjects Formula, if we assume that Formula and an LOD that affects 40% of the control subjects, the Rel. RMSE for the pooling strategy is 1.01 as compared to the full data, but 1.42 for the random sample as compared to the full data. Therefore, there is essentially no loss in efficiency when we employ the pooling strategy over the full data. However, there is a substantial gain in efficiency when we pool the samples rather than take a random sample. In this case, when we employ the pooling strategy, we cut cost by 50% and suffer essentially no loss in efficiency. If the LOD affects 80% of the controls, however, the pooling strategy is not as efficient as the full sample, and pooling is not recommended.

When the number of assays is fixed, the benefits of pooling are readily noticed. For example, using 40 pooled samples (Formula) as opposed to 40 unpooled samples leads to a 30% gain in efficiency. This gain in efficiency increases as the pooling group size increases and is consistent for LODs less than 60%. These results can be particularly useful in cases where the cost of assaying significantly exceeds the cost of obtaining samples because for the same overall cost, there is a significant gain in efficiency.

Robustness.

The simulations thus far assumed that the samples followed normal distributions. In order to illustrate the robustness of our methodology, we performed the following Monte Carlo simulations. Let us assume that one believes the observations are normally distributed and chooses the method of AUC estimation as proposed in Section 4. However, the true diagnostic markers satisfy

Formula

where Formula, Formula are t-distributed random variables with df degrees of freedom. Thus, the true AUC is

Formula

For example, if Formula, then Formula, respectively, when the putative Formula. Here we ran 5000 repetitions of the sample Formula, with parameters Formula at each Formula, Formula (d is the value of LOD), and Formula (g is the value of the pool size). We examined the proposed estimation of AUC given the uncorrected distributional assumption. Figure 3 corresponds to the case when Formula.


Figure 3
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Evaluation of AUC estimators based on full data (curve Formula), pooled data (curve —), and a random sample (curve - - - - - -), plotted against d for Formula and different df. (a) Monte Carlo averages of the AUC estimators. Lines (Formula) correspond to the true values of AUC. (b) Monte Carlo estimators of Formula.

 
From these results we conclude that the proposed methodology is reasonable even when the distributional assumptions do not exactly satisfy normality. However, the accuracy of the considered estimators is poor when Formula (see Figure 3(b)). Note that, although the AUC estimator based on the pooled data utilizes only Formula observations, the efficiency of this estimator is close to the efficiency of the AUC estimator based on the full data (Figure 3(b)). Moreover, there are values of the LOD in which the estimator based on the pooling sample is the most robust and accurate. However, the bias Formula when Formula is based on pooled samples seems to be the largest for some values of d (note that the differences between the biases Formula are respectively small, see Figure 3(a)). This is, perhaps, partly because the assumed normal distribution of the pooled data Formula is less likely than the assumed normal distribution of the individual markers. Note that similar results were observed for Formula and 5.


    5. EXAMPLE
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 
Development of a marker for cholesterol is crucial because evidence shows that cholesterol may play a contributing role in the development of coronary heart disease. The pooling strategy explained above was applied to individual cholesterol measurements on 80 volunteers. Forty of those individuals who recently survived a myocardial infarction were defined as cases; the remaining 40 subjects served as controls. In addition, the blood specimens were randomly pooled in groups of 2, for the cases and controls separately, and remeasured. Faraggi and others (2003) have shown, using the same data, that the assumption that the pooled sample measurements are the equivalent of the average of the individual case is justified. Due to the costs involved, such confirmatory evidence for the averaging assumption will generally not be available.

Distributional assumptions were also tested and found to fit well with normal assumptions. The mean (Formula SD) in the control and case unpooled samples, respectively, were 205.5 (Formula 42.3) and 226.8 (Formula 41.7). An artificial Formula was applied to the cholesterol data so that 20% of the control samples were censored. AUC was then estimated using the method previously described. Table 2 presents the estimated AUC with corresponding 90% CIs. The pooled sample and a random sample were also used to estimate the AUC. The estimator of the variance of the AUC based on the random sample was two times the estimator of the variance of the AUC estimator based on the original and pooled samples. This is consistent with findings from the simulation study. The pooled point estimate of the AUC Formula was closer to the AUC based on full data with no LOD effect Formula than the random sample Formula. Upon further investigation, it was found that the pooled data had two outliers. These outliers were a result of variability introduced by the pooling process itself. The methods presented in this paper rely on the assumption that the value of the pooled sample is the average of the individual unpooled samples. It is reasonable to assume, however, that sometimes the practicality of pooling biological specimens can lead to additive pooling errors (Schisterman and others, 2005b). Care must be taken during the physical pooling process so as not to introduce additional variability. In order to complete our analysis, the outliers were removed and the analysis was repeated. The point estimate, after removing the two points, was closer to the true AUC (changed from 0.584 to 0.605) Formula. More importantly, this pooled analysis shows that cholesterol has discrimination properties, as shown by the original data. This is not the case in the random sample analysis. However, the largest improvement in the point estimation was found when we used the original data to calculate the theoretical pooled data values (mathematically pooling and not physically pooling samples). This resulted in an AUC point estimate of 0.634. The process of pooling the samples may introduce variability and careful consideration must be taken when pooling biospecimens so that no additional error is introduced because we may lose all the benefits of pooling.


View this table:
[in this window]
[in a new window]

 
Table 2 Estimated AUC and variance for cholesterol based on different sample assumptions

 

    6. CONCLUSIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 
In this paper, we have presented a method to estimate the AUC based on pooled or unpooled data affected by an LOD. We have shown that there is a significant gain in efficiency when using pooled specimens as opposed to taking a random sample. This gain in efficiency occurs when the LOD affects less than 50% of our control samples. In this case, there are more pooled observations above the LOD, and the quality of our estimator is improved. Pooling is therefore a statistically viable cost-saving approach. However, estimating AUC based on a pooled sample requires that certain distributional assumptions be met. The process of mixing biospecimens may be a potential source of additional variability. Therefore, careful attention to instrument sensitivity must be taken during the pooling process. The paper proposes the methodology for normally distributed biomarkers. However, in a similar manner to the proposed method, one could consider another distribution, such as Gamma etc.


    Appendix
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 

A.1 Definition of the covariance matrix Vj

The covariance matrix has the following form:

Formula

and Formula.

Proof of Proposition 3.1 It is clear that the maximum likelihood estimator Formula has the asymptotic normal distribution with covariance matrix Formula, for Formula. The covariance matrix can be found by inverting the asymptotic Fisher information matrix divided by N (if Formula) or M (if Formula), as Formula. Thus, by applying the results proposed by Gupta (1952)Go, we obtain

Formula

where Formula. The estimator Formula of the Formula can be considered as a function of Formula. Therefore, the usual Taylor expansion around points Formula can be utilized for analyzing the asymptotic distribution of Formula. This technique is presented by Kotz and others (2003). Based on the results proposed by Kotz and others (2003), we complete the proof of Proposition 3.1.


    ACKNOWLEDGMENTS
 
We are grateful to the editor, associate editor, and referee for their helpful comments that clearly improved this paper. This work was supported by the Intramural Research Program of the National Institutes of Health, National Institute of Child Health and Human Development.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. FORMALIZATION OF STATED...
 3. MLE UNDER POOLING...
 4. SIMULATION STUDY
 5. EXAMPLE
 6. CONCLUSIONS
 Appendix
 REFERENCES
 

    Bamber DC. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12:387–415.[CrossRef][Web of Science]

    Faraggi D and Reiser B. (2002) Estimation of the area under the ROC curve. Statistics in Medicine 21:3093–106.[CrossRef][Web of Science][Medline]

    Faraggi D, Reiser B, Schisterman E. (2003) ROC curve analysis for biomarkers based on pooled assessments. Statistics in Medicine 22:2515–27.[CrossRef][Web of Science][Medline]

    Finkelstein M and Verma D. (2001) Exposure estimation in the presence of nondetectable values: another look. American Industrial Hygiene Association Journal 62:195–8.

    Gupta AK. (1952) Estimation of the mean and standard deviation of a normal population from a censored sample. Biometrika 39:260–73.[Free Full Text]

    Hornung R and Reed L. (1990) Estimation of average concentration in the presence of nondetectable values. Applied Occupational Environmental Hygiene 5:46–51.

    Kotz S, Lumelskii Y, Pensky M. (2003) The Stress-Strength Model and Its Generalizations(World Scientific., London).

    Laden F, Hankinson SE, Wolff MS, Colditz GA, Willett WC, Speizer FE, Hunter DJ. (2001) Plasma organochlorine levels and the risk of breast cancer: an extended follow-up in the Nurses' Health Study. International Journal of Cancer 91:568–74.[CrossRef][Web of Science][Medline]

    Laden F and Hunter DJ. (1998) Environmental risk factors and female breast cancer. Annual Review of Public Health 19:101–23.[CrossRef][Web of Science][Medline]

    Liu A and Schisterman E. (2003) Comparison of diagnostic accuracy of biomarkers with pooled assessments. Biometrical Journal 45:631–644.[CrossRef][Web of Science]

    Louis GM, Weiner JM, Whitcomb BW, Sperrazza R, Schisterman EF, Lobdell DT, Crickard K, Greizerstein H, Kostyniak PJ. (2005) Environmental PCB exposure and risk of endometriosis. Human Reproduction 20:279–85.[Abstract/Free Full Text]

    Lubin JH, Colt JS, Camann D, Davis S, Cerhan JR, Severson RK, Bernstein L, Hartge P. (2004) Epidemiologic evaluation of measurement data in the presence of detection limits. Environmental Health Perspectives 112:1691–96.[Web of Science][Medline]

    Schisterman EF, Perkins NJ, Liu A, Bondell H. (2005a) Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology 16:73–81.[CrossRef][Web of Science][Medline]

    Schisterman EF, Vexler A, Liu A. (2005b) To pool or not to pool: from whether to when: applications of pooling to biospecimens with incomplete measurements. Statistics in Medicine (submitted).

    Shapiro DE. (1999) The interpretation of diagnostic tests. Statistical Methods in Medical Research 8:113–34.[Abstract/Free Full Text]

    Weinberg CR and Umbach DM. (1999) Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics 55:718–26.[CrossRef][Web of Science][Medline]

    Wieand S, Gail MH, James BR, James KL. (1989) A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76:585–92.[Abstract/Free Full Text]

    Received September 13, 2005; revised February 9, 2006; revised February 28, 2006; accepted for publication March 6, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/4/585    most recent
    kxj027v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Mumford, S. L.
    Right arrow Articles by Liu, A.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Mumford, S. L.
    Right arrow Articles by Liu, A.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?