Biostatistics (2004), 5, 3, pp. 427-443
Biostatistics Vol. 5 No. 3 © Oxford University Press 2004; all rights reserved.
Classification of gene microarrays by penalized logistic regression

Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
jizhu{at}umich.edu
Department of Statistics, Stanford University, Stanford, CA 94305, USA
To whom correspondence should be addressed.
Classification of patient samples is an important aspect of cancer diagnosis and treatment. The support vector machine (SVM) has been successfully applied to microarray cancer diagnosis problems. However, one weakness of the SVM is that given a tumor sample, it only predicts a cancer class label but does not provide any estimate of the underlying probability. We propose penalized logistic regression (PLR) as an alternative to the SVM for the microarray cancer diagnosis problem. We show that when using the same set of genes, PLR and the SVM perform similarly in cancer classification, but PLR has the advantage of additionally providing an estimate of the underlying probability. Often a primary goal in microarray cancer diagnosis is to identify the genes responsible for the classification, rather than class prediction. We consider two gene selection methods in this paper, univariate ranking (UR) and recursive feature elimination (RFE). Empirical results indicate that PLR combined with RFE tends to select fewer genes than other methods and also performs well in both cross-validation and test samples. A fast algorithm for solving PLR is also described.
Keywords: Cancer diagnosis; Feature selection; Logistic regression; Microarray; Support vector machines
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Ma and J. Huang Penalized feature selection and classification in bioinformatics Brief Bioinform, September 1, 2008; 9(5): 392 - 403. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Kuner, A. S. Barth, M. Ruschhaupt, A. Buness, L. Zwermann, E. Kreuzer, G. Steinbeck, A. Poustka, H. Sultmann, and M. Nabauer Genomic analysis reveals poor separation of human cardiomyopathies of ischemic and nonischemic etiologies Physiol Genomics, June 1, 2008; 34(1): 88 - 94. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.G. Liao and K.-V. Chin Logistic regression for disease classification using microarray data: model selection in a large p and small n case Bioinformatics, August 1, 2007; 23(15): 1945 - 1951. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-L. Boulesteix WilcoxCV: an R package for fast variable selection in cross-validation Bioinformatics, July 1, 2007; 23(13): 1702 - 1704. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Guo, T. Hastie, and R. Tibshirani Regularized linear discriminant analysis and its application in microarrays Biostat., January 1, 2007; 8(1): 86 - 100. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Segal Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited Biostat., April 1, 2006; 7(2): 268 - 285. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Dabney Classification of microarrays to nearest centroids Bioinformatics, November 15, 2005; 21(22): 4148 - 4154. [Abstract] [Full Text] [PDF] |
||||



