Biostatistics Advance Access published online on June 6, 2008
Biostatistics, doi:10.1093/biostatistics/kxn015
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation
Department of Statistics and Department of Public Health Sciences, Pennsylvania State University, University Park, PA 16802, USA
Department of Pathology and Department of Urology, Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI 48109, USA
ghoshd{at}psu.edu
In most analyses of large-scale genomic data sets, differential expression analysis is typically assessed by testing for differences in the mean of the distributions between 2 groups. A recent finding by Tomlins and others (2005) is of a different type of pattern of differential expression in which a fraction of samples in one group have overexpression relative to samples in the other group. In this work, we describe a general mixture model framework for the assessment of this type of expression, called outlier profile analysis. We start by considering the single-gene situation and establishing results on identifiability. We propose 2 nonparametric estimation procedures that have natural links to familiar multiple testing procedures. We then develop multivariate extensions of this methodology to handle genome-wide measurements. The proposed methodologies are compared using simulation studies as well as data from a prostate cancer gene expression study.
Keywords: Bonferroni correction; DNA microarray; False discovery rate; Goodness of fit; Multiple comparisons; Uniform distribution
* To whom correspondence should be addressed.
Received December 13, 2007; revised April 3, 2008; accepted for publication April 29, 2008.