Biostatistics Advance Access originally published online on October 23, 2006
Biostatistics 2007 8(3):632-653; doi:10.1093/biostatistics/kxl035
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A hierarchical clustering method for estimating copy number variation
Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada and Samuel Lunenfeld Research Institute of Mount Sinai Hospital, Toronto, Ontario, Canada
Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada and Department of Public Health Sciences, University of Toronto, Toronto, Ontario, Canada celia.greenwood{at}utoronto.ca
Samuel Lunenfeld Research Institute of Mount Sinai Hospital, Toronto, Ontario, Canada and Department of Public Health Sciences, University of Toronto, Toronto, Ontario, Canada
* To whom correspondence should be addressed.
Microarray technologies allow for simultaneous measurement of DNA copy number at thousands of positions in a genome. Gains and losses of DNA sequences reveal themselves through characteristic patterns of hybridization intensity. To identify change points along the chromosomes, we develop a marker clustering method which consists of 2 parts. First, a "circular clustering tree test statistic" attaches a statistic to each marker that measures the likelihood that it is a change point. Then construction of the marker statistics is followed by outlier detection approaches. The method provides a new way to build up a binary tree that can accurately capture change-point signals and is easy to perform. A simulation study shows good performance in change-point detection, and cancer cell line data are used to illustrate performance when regions of true copy number changes are known.
Keywords: Array CGH; Change-point; Genomic copy number; Outlier detection; Permutation
Received March 20, 2006; revised September 1, 2006; accepted for publication October 16, 2006.