Biostatistics Advance Access published online on April 13, 2006
Biostatistics, doi:10.1093/biostatistics/kxj036
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Biometric Research Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
* To whom correspondence should be addressed. Many gene expression studies attempt to develop a predictor of pre-defined diagnostic or prognostic classes. If the classes are similar biologically, then the number of genes that are differentially expressed between the classes is likely to be small compared to the total number of genes measured. This motivates a two-step process for predictor development, a subset of differentially expressed genes is selected for use in the predictor, and then the predictor constructed from these. Both of these steps will introduce variability into the resulting classifier, so both must be incorporated in sample size estimation. We introduce a methodology for sample size determination for prediction in the context of high-dimensional data that captures variability in both steps of predictor development. The methodology is based on a parametric probability model, but permits sample size computations to be carried out in a practical manner without extensive requirements for preliminary data. We find that many prediction problems do not require a large training set of arrays for classifier development.
Received October 21, 2005
Revised March 31, 2006
Accepted April 7, 2006
Article
Sample size planning for developing classifiers using high dimensional DNA microarray data
Kevin K. Dobbin 1 *
and
Richard M. Simon 1
Kevin K. Dobbin, E-mail: dobbinke{at}mail.nih.gov
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. de Valpine, H.-M. Bitter, M. P. S. Brown, and J. Heller A simulation-approximation approach to sample size planning for high-dimensional classification studies Biostat., July 1, 2009; 10(3): 424 - 435. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. van de Wiel, J. Berkhof, and W. N. van Wieringen Testing the prediction error difference between 2 predictors Biostat., July 1, 2009; 10(3): 550 - 560. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. K. Dobbin A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions Biostat., April 1, 2009; 10(2): 282 - 296. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. V. Ballman Genetics and Genomics: Gene Expression Microarrays Circulation, October 7, 2008; 118(15): 1593 - 1597. [Full Text] [PDF] |
||||
![]() |
S. L. George Statistical Issues in Translational Cancer Research Clin. Cancer Res., October 1, 2008; 14(19): 5954 - 5958. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Simon The Use of Genomics in Clinical Trial Design Clin. Cancer Res., October 1, 2008; 14(19): 5984 - 5993. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Sparano and S. Paik Development of the 21-Gene Assay and Its Application in Clinical Practice and Clinical Trials J. Clin. Oncol., February 10, 2008; 26(5): 721 - 728. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. K. Dobbin, Y. Zhao, and R. M. Simon How Large a Training Set is Needed to Develop a Classifier for Microarray Data? Clin. Cancer Res., January 1, 2008; 14(1): 108 - 114. [Abstract] [Full Text] [PDF] |
||||



