Biostatistics Advance Access originally published online on December 8, 2008
Biostatistics 2009 10(2):352-363; doi:10.1093/biostatistics/kxn042
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Microarray background correction: maximum likelihood estimation for the normal–exponential convolution
Bioinformatics Division, Walter and Eliza Hall Institute, Parkville 3050, Victoria, Australia and Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, Entrance B, PO Box 2099, DK-1014 Copenhagen K, Denmark j.silver{at}biostat.ku.dk
Department of Oncology, University of Cambridge, Cambridge CB2 0RE, UK
Bioinformatics Division, Walter and Eliza Hall Institute, Parkville 3050, Victoria, Australia smyth{at}wehi.edu.au
* To whom correspondence should be addressed.
Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixel intensities as the sum of 2 random variables, one normally distributed and the other exponentially distributed, representing background noise and signal, respectively. Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data. This article develops the normexp method further by improving the estimation of the parameters. A complete mathematical development is given of the normexp model and the associated saddle-point approximation. Some subtle numerical programming issues are solved which caused the original normexp method to fail occasionally when applied to unusual data sets. A practical and reliable algorithm is developed for exact maximum likelihood estimation (MLE) using high-quality optimization software and using the saddle-point estimates as starting values. "MLE" is shown to outperform heuristic estimators proposed by other authors, both in terms of estimation accuracy and in terms of performance on real data. The saddle-point approximation is an adequate replacement in most practical situations. The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.
Keywords: 2-color microarray; Background correction; Maximum likelihood; Nelder-Mead algorithm; Newton- Raphson algorithm; Normal-exponential convolution
Received December 21, 2007; revised July 28, 2008; revised October 1, 2008; accepted for publication October 17, 2008.