Biostatistics Advance Access published online on May 22, 2008
Biostatistics, doi:10.1093/biostatistics/kxn011
Time-synchronized clustering of gene expression trajectories
Division of Biostatistics, Center for Devices and Radiological Health, Food and Drug Administration, Rockville, MD 20850, USA rong.tang{at}fda.hhs.gov
Department of Statistics, University of California–Davis, One Shields Avenue, Davis, CA 95616, USA
Current clustering methods are routinely applied to gene expression time course data to find genes with similar activation patterns and ultimately to understand the dynamics of biological processes. As the dynamic unfolding of a biological process often involves the activation of genes at different rates, successful clustering in this context requires dealing with varying time and shape patterns simultaneously. This motivates the combination of a novel pairwise warping with a suitable clustering method to discover expression shape clusters. We develop a novel clustering method that combines an initial pairwise curve alignment to adjust for time variation within likely clusters. The cluster-specific time synchronization method shows excellent performance over standard clustering methods in terms of cluster quality measures in simulations and for yeast and human fibroblast data sets. In the yeast example, the discovered clusters have high concordance with the known biological processes.
Keywords: Clustering; Gene expression analysis; Microarray; Time warping
* To whom correspondence should be addressed.
Received November 29, 2007; revised March 24, 2008; accepted for publication April 8, 2008.