Skip Navigation

Biostatistics 2004 5(4):515-529; doi:10.1093/biostatistics/kxh005
This Article
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Albert, P. S.
Right arrow Articles by Taylor, P. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Albert, P. S.
Right arrow Articles by Taylor, P. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Biostatistics Vol. 5 No. 4 © Oxford University Press 2004; all rights reserved.

Identifying multiple changepoints in heterogeneous binary data with an application to molecular genetics

Paul S. Albert* and Sally A. Hunsberger

Biometric Research Branch, National Cancer Institute, 6130 Executive Blvd, Room 8136, Bethesda, MD 20892, USA

Nan Hu and Philip R. Taylor

Cancer Prevention Studies Branch, National Cancer Institute, 6116 Executive Blvd, Room 705, Bethesda, MD 20892, USA

* To whom correspondence should be addressed.

Identifying changepoints is an important problem in molecular genetics. Our motivating example is from cancer genetics where interest focuses on identifying areas of a chromosome with an increased likelihood of a tumor suppressor gene. Loss of heterozygosity (LOH) is a binary measure of allelic loss in which abrupt changes in LOH frequency along the chromosome may identify boundaries indicative of a region containing a tumor suppressor gene. Our interest was on testing for the presence of multiple changepoints in order to identify regions of increased LOH frequency. A complicating factor is the substantial heterogeneity in LOH frequency across patients, where some patients have a very high LOH frequency while others have a low frequency. We develop a procedure for identifying multiple changepoints in heterogeneous binary data. We propose both approximate and full maximum-likelihood approaches and compare these two approaches with a naive approach in which we ignore the heterogeneity in the binary data. The methodology is used to estimate the pattern in LOH frequency on chromosome 13 in esophageal cancer patients and to isolate an area of inflated LOH frequency on chromosome 13 which may contain a tumor suppressor gene. Using simulations, we show that our approach works well and that it is robust to departures from some key modeling assumptions.

Keywords: Change point detection; Correlated binay data; Heterogeneity; Loss of heterozygosity; Repeated binary data; Spectral correlation


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.