Skip Navigation


Biostatistics Advance Access first published online on February 27, 2008
This version published online on March 18, 2008

Biostatistics, doi:10.1093/biostatistics/kxm053
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
kxm053v3    most recent
kxm053v2
kxm053v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Kustra, R.
Right arrow Articles by Rangrej, J.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kustra, R.
Right arrow Articles by Rangrej, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Authors
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Efficient p-value estimation in massively parallel testing problems

Rafal Kustra*

Department of Public Health Sciences, University of Toronto, Toronto, ON, Canada M5T 3M7 r.kustra{at}utoronto.ca

Xiaofei Shi

Department of Public Health Sciences, University of Toronto, Toronto, ON, Canada M5T 3M7 and Genetics and Genome Biology, Hospital for Sick Children 15-706, Toronto, ON, Canada M5G 1L7

Duncan J. Murdoch

Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON, Canada N6A 5B7

Celia M. T. Greenwood

Department of Public Health Sciences, University of Toronto, Toronto, ON, Canada M5T 3M7 and Genetics and Genome Biology, Hospital for Sick Children 15-706, Toronto, ON, Canada M5G 1L7

Jagadish Rangrej

Genetics and Genome Biology, Hospital for Sick Children 15-706,Toronto, ON, Canada M5G 1L7

* To whom correspondence should be addressed.

We present a new method to efficiently estimate very large numbers of p-values using empirically constructed null distributions of a test statistic. The need to evaluate a very large number of p-values is increasingly common with modern genomic data, and when interaction effects are of interest, the number of tests can easily run into billions. When the asymptotic distribution is not easily available, permutations are typically used to obtain p-values but these can be computationally infeasible in large problems. Our method constructs a prediction model to obtain a first approximation to the p-values and uses Bayesian methods to choose a fraction of these to be refined by permutations. We apply and evaluate our method on the study of association between 2-way interactions of genetic markers and colorectal cancer using the data from the first phase of a large, genome-wide case–control study. The results show enormous computational savings as compared to evaluating a full set of permutations, with little decrease in accuracy.

Keywords: Bayesian testing; Genome-wide association studies; Interaction effects; Permutation distribution; p-value distribution; Random Forest

Received November 27, 2006; revised September 14, 2007; accepted for publication November 5, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.