Biostatistics Advance Access originally published online on June 6, 2008
Biostatistics 2009 10(1):60-69; doi:10.1093/biostatistics/kxn015
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation
Department of Statistics and Department of Public Health Sciences, Pennsylvania State University, University Park, PA 16802, USA, ghoshd{at}psu.edu
Department of Pathology and Department of Urology, Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI 48109, USA
* To whom correspondence should be addressed.
In most analyses of large-scale genomic data sets, differential expression analysis is typically assessed by testing for differences in the mean of the distributions between 2 groups. A recent finding by Tomlins and others (2005) is of a different type of pattern of differential expression in which a fraction of samples in one group have overexpression relative to samples in the other group. In this work, we describe a general mixture model framework for the assessment of this type of expression, called outlier profile analysis. We start by considering the single-gene situation and establishing results on identifiability. We propose 2 nonparametric estimation procedures that have natural links to familiar multiple testing procedures. We then develop multivariate extensions of this methodology to handle genome-wide measurements. The proposed methodologies are compared using simulation studies as well as data from a prostate cancer gene expression study.
Keywords: Bonferroni correction; DNA microarray; False discovery rate; Goodness of fit; Multiple comparisons; Uniform distribution
Received December 13, 2007; revised April 3, 2008; accepted for publication April 29, 2008.