Biostatistics Advance Access originally published online on May 11, 2006
Biostatistics 2007 8(2):212-227; doi:10.1093/biostatistics/kxl002
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Averaged gene expressions for regression
Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA meeyoung{at}google.com
Department of Statistics and Department of Health Research & Policy, Stanford University, CA 94305, USA
Department of Health Research & Policy and Department of Statistics, Stanford University, CA 94305, USA
* To whom correspondence should be addressed.
Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the challenge of having far more features than samples. In this paper, we introduce a two-step procedure that combines (1) hierarchical clustering and (2) Lasso. By averaging the genes within the clusters obtained from hierarchical clustering, we define supergenes and use them to fit regression models, thereby attaining concise interpretation and accuracy. Our methods are supported with theoretical justifications and demonstrated on simulated and real data sets.
Keywords: Averaging; Hierarchical clustering; Lasso; Variance reduction
Received January 3, 2006; revised April 27, 2006; accepted for publication May 8, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Sameith, P. Antczak, E. Marston, N. Turan, D. Maier, T. Stankovic, and F. Falciani Functional modules integrating essential cellular functions are predictive of the response of leukaemia cells to DNA damage Bioinformatics, November 15, 2008; 24(22): 2602 - 2607. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Luan and H. Li Group additive regression models for genomic data analysis Biostat., January 1, 2008; 9(1): 100 - 113. [Abstract] [Full Text] [PDF] |
||||

