Biostatistics Advance Access originally published online on July 15, 2009
Biostatistics 2009 10(4):680-693; doi:10.1093/biostatistics/kxp023
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, M2-C200, Seattle, WA 98109, USA jdai{at}fhcrc.org
Department of Epidemiology, Cardiovascular Health Research Unit, University of Washington, Seattle, WA 98195, USA
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, M2-C200, Seattle, WA 98109, USA
* To whom correspondence should be addressed.
Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation. Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses. Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure. Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.
Keywords: Adaptive regression; Haplotype; Multilocus analysis; SNP
Received July 18, 2008; revised February 25, 2009; accepted for publication June 22, 2009.