Skip Navigation



Biostatistics Advance Access published online on March 18, 2008

Biostatistics, doi:10.1093/biostatistics/kxm058
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Supplementary Material
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Li, S. M.
Right arrow Articles by Self, S.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, S. M.
Right arrow Articles by Self, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

A transdimensional Bayesian model for pattern recognition in DNA sequences

Sierra M. Li

Division of Oncology Biostatistics, Sidney Kimmel Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD 21205-2013, USA

Jon Wakefield*

Department of Biostatistics, University of Washington, Seattle, WA 98195-7232, USA jonno{at}u.washington.edu

Steve Self

Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA

* To whom correspondence should be addressed.

Identification of transcription factor binding sites (TFBSs) is essential to elucidate gene regulatory networks. This article is focused on the recognition of overpresented short patterns, called "motifs", that may correspond to regulatory binding sites in the DNA sequences upstream of genes. An integrated Bayesian model is proposed to incorporate all unknown characteristics in motif discovery, including the number of motifs, motif widths, motif compositions, the number of motif sites, and locations of motif sites. Reversible jump Markov chain Monte Carlo is used to obtain posterior inference in the transdimensional parameter space. We present a number of suggestions for graphical summarization of the posterior distribution over the complex parameter space. The basic model is extended using a third-order Markov structure for nonmotif bases and allowing positions within a motif to be switched between 2 types: "conserved" and "degenerate." We evaluate the prediction accuracy for the simulated data with 3 motifs and apply the model to upstream sequences in high signal-to-noise regions in a human ChIP-chip study. The performance of the Bayesian model is assessed using yeast data sets of various numbers of sequences and background structures, with and without true TFBSs. The performance is also compared to other computational methods, including 2 statistical approaches, AlignACE and multiple expectation maximization for motif elicitation, and 1 word numeration–based approach, yeast motif finder (YMF).

Keywords: Bayesian model; Dimensional change; Gene regulation; Motif discovery; Reversible jump Markov chain Monte Carlo

Received December 18, 2006; revised July 26, 2007; revised October 4, 2007; accepted for publication November 20, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.