Skip Navigation


Biostatistics Advance Access originally published online on February 22, 2006
Biostatistics 2006 7(4):569-584; doi:10.1093/biostatistics/kxj026
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary material
Right arrow All Versions of this Article:
7/4/569    most recent
kxj026v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Leng, X.
Right arrow Articles by Müller, H.-G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leng, X.
Right arrow Articles by Müller, H.-G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Time ordering of gene coexpression

Xiaoyan Leng*

Division of Public Health Sciences, Department of Biostatistical Sciences, Wake Forest University School of Medicine, Medical Center Boulevard, MRI-3, Winston-Salem, NC 27157, USA ileng{at}wfubmc.edu

Hans-Georg Müller

Department of Statistics, University of California, One Shields Avenue, Davis, CA 95616, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
Temporal microarray gene expression profiles allow characterization of gene function through time dynamics of gene coexpression within the same genetic pathway. In this paper, we define and estimate a global time shift characteristic for each gene via least squares, inferred from pairwise curve alignments. These time shift characteristics of individual genes reflect a time ordering that is derived from ob- served temporal gene expression profiles. Once these time shift characteristics are obtained for each gene, they can be entered into further analyses, such as clustering. We illustrate the proposed methodology using Drosophila embryonic development and yeast cell-cycle gene expression profiles, as well as simulations. Feasibility is demonstrated through the successful recovery of time ordering. Estimated time shifts for Drosophila maternal and zygotic genes provide excellent discrimination between these two categories and confirm known genetic pathways through the time order of gene expression. The application to yeast cell-cycle data establishes a natural time order of genes that is in line with cell-cycle phases. The method does not require periodicity of gene expression profiles. Asymptotic justifications are also provided.

Keywords: Curve alignment; Functional data analysis; Gene expression profiles; Microarray; Time dynamics; Warping


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
The thousands of genes of an organism or a cell must be expressed in a regulated manner to enable the organism or the cell to utilize the biological information contained within the genome. One of the most important dimensions of gene regulation is temporal control of gene expression within genetic pathways or biological processes, through which groups of genes are thought to be coexpressed coherently across different time periods. These processes are related to the function of the proteins encoded by these genes. In general, the timing of mRNA expression for a given gene correlates well with the function of the resultant protein (Bozdech et al., 2003Go; Bähler, 2005Go).

The temporal control of gene expression extends over many time scales, such as development and maturation, over the entire life span on the organismal level or cell cycles on the cellular level. For example, embryogenesis is an example of a time-sensitive biological process, whereby the embryo forms and develops. A fertilized egg of the fruit fly (Drosophila melanogaster) undergoes cleavage, blastoderm, gastrulation, germ band elongation and retraction, head involution and dorsal closure, and differentiation over about 24 h. The coordination in time and space of the expressions of maternal and zygotic genes guarantees normal embryogenesis (Lodish et al., 2000Go; Weigmann et al., 2003Go; Pollard and Earnshaw, 2004Go). Another example is provided by the timing characteristics of the yeast cell cycle, which can be modeled as ‘a line of dominoes’ or dependent pathway, whereby a gene will not start to express until certain other genes have already expressed (Pollard and Earnshaw, 2004Go).

Uncovering the time ordering of gene expression could therefore be expected to aid the elucidation of gene function, and to provide a context for interpreting organismal/cellular responses to drugs, growth conditions, or environmental perturbations. This lends motivation to employ increasingly collected temporal microarray gene expression data, when aiming at detection of the time ordering of gene expression. The assumption that genes with similar temporal expression patterns are part of a common genetic pathway provides further motivation for global time-ordering analysis of gene expression profiles.

Previous time-ordering analyses have aimed at pairwise gene expression profile alignment. Butte et al. (2001)Go utilized digital signal-processing tools, including power spectral densities, coherence, transfer gain, and phase shift, to find pairwise gene associations based on periodically expressed time-invariant gene profiles. Qian et al. (2001)Go adopted a local clustering approach to determine optimal pairwise alignment based on dynamic programming, while Aach and Church (2001)Go implemented both simple and interpolative time warping based on dynamic programming to identify an optimal time alignment of two gene expression time series. Expanding this approach, Liu and Müller (2003)Go proposed a non-parametric time-warping technique to construct modes of temporal structure for a sample of gene expression profiles, adapting a time synchronization approach (Liu and Müller, 2004Go). For general warping and curve alignment procedures, we refer to Wang and Gasser (1997)Go and Gervini and Gasser (2004)Go. Bar-Joseph et al. (2003aGo, 2003b) developed yet another approach and aligned temporal data sets under varying conditions, extracting shift and stretch parameters for each data set. In a very general approach, Arkin and Ross (1995)Go and Arkin et al. (1997)Go devised models with time shifts for chemical reaction pathways, and proposed to determine pairwise time shifts by maximizing correlation. A related global time shift model for functional data was considered by Silverman (1995)Go.

In this paper, we define and infer global time shift characteristics for genes that are based on observations of optimal pairwise curve alignments, which then are symmetrized through the minimization of a functional distance via a least squares step. We show that a conditional Formula distance between two curves is minimized near the true underlying pairwise time shift. The resulting global time shifts for genes lead to the proposed time-order characteristics. Further analysis such as gene grouping/clustering within the same genetic pathway can then be based on the estimated time-order characteristics in a subsequent step. The proposed methodology may play an auxiliary role in determining a genetic pathway.

The organization of the paper is as follows: We introduce pairwise alignment in Section 2, including asymptotic consistency results for identifying underlying time shifts based on minimizing Formula distance. The connection to global time shifts is made in Section 3, followed by simulations (Section 4) and the analysis of the time ordering for Drosophila embryonic development (Section 5) and yeast cell cycle (Section 6). The paper ends with discussion and conclusions (Section 7). Theoretical results and proofs can be found in the Appendix which is posted at the journal's website.


    2. PAIRWISE CURVE ALIGNMENT
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
Assume we observe a collection of K gene expression profiles with expression levels Formula measured at times Formula, Formula, Formula, Formula, with

Formula

where Formula are the underlying expression profiles, which we view as realizations of a stochastic process, and Formula are iid errors with zero mean and finite variance. Furthermore, assume

Formula

where Formula is a non-random mean curve and Formula are iid realizations of a stochastic process Z, s.t. Formula and Formula. It is not necessary to specify the covariance function of Z. If, on the other hand, it is known, some efficiency might be gained. The Formula are random time shifts and Formula is a small constant, so that Formula might be viewed as a small random perturbation. For a pair of random curves Formula and Formula, the relative time shift is Formula, Formula. Note that this pairwise time shift is symmetric in the sense that Formula.

We are interested in identifying the time shifts Formula, which are associated with specific gene profiles Formula, and also in the inherent ordering of the Formula, Formula. The time shifts Formula are characteristics of individual genes associated with their expression trajectories. They reflect the inherent time order of the ensemble of genes and therefore have a clear biological interpretation.

Given a random trajectory Formula, we align other trajectories Formula against Formula on the interval Formula for some constants Formula by minimizing a distance Formula with regard to s. We aim at the minimizers

Formula (2.1)

Similarly, we define Formula and Formula by exchanging i and j. We assume implicitly that Formula is defined for all Formula, for a Formula, by extending the domain of the Formula suitably beyond Formula . For distance d in function space, we consider the Formula pseudodistance

Formula 2(2.2)

All functions are pre-normalized by the transformation Formula 2, aiming to de-emphasize differences in amplitude and to emphasize differences in horizontal shift. Other distances could also be used, such as correlation and rank correlation (Arkin and Ross, 1995Go; Arkin et al., 1997Go; Heckman and Zamar, 2000Go).

THEOREM 2.1 For two random functions Formula 2 and Formula 2, let

Formula

Under Conditions (A1)–(A4) (see Appendix), for sufficiently small Formula 2,

Formula 2

Theorem 2.1 shows that, under suitable assumptions, Formula 2 asymptotically tracks the true value Formula 2. The proof is in the Appendix.

After pairwise alignment, we obtain two matrices: the minimum distance matrix Formula 2 and the relative time shift matrix Formula 2, where Formula 2 and Formula 2, Formula 2. We note that the matrices Formula 2 and Formula 2 are generally asymmetric. The elements of S consist of the pairwise relative time shifts Formula 2, which serve as responses in a global time shift model that is discussed in the next section.


    3. GLOBAL TIME SHIFT MODEL AND INFERENCE FOR TIME SHIFTS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
For each pair of random curves Formula 2, Formula 2, Formula 2, the relative shift of Formula 2 with respect to Formula 2 is expected to be close to Formula 2, if Formula 2 is relatively small, since Formula 2 for Formula 2. We note that, in general, for the pairwise time shifts Formula 2 (2.1), Formula 2, so that a reasonable algorithm needs to include a symmetrization step. Without loss of generality, let Formula 2. For K gene expression trajectories, the equation

Formula 3(3.1)

corresponds to a linear system

Formula 3(3.2)

where Formula 3 is a Formula 3 vector of stacked pairwise relative time shifts, A the corresponding design matrix corresponding to (3.1), and Formula 3 the shift parameter vector, where Formula 3 denotes the transpose of a (column or row) vector x; specifically, Formula 3, where Formula 3 is a Formula 3 vector of Formula 3, Formula 3, Formula 3. Note that A is always of full rank by design.

Gene expression profiles Formula 3 are often not continuously observed but are rather observed over a discrete grid of measurement times Formula 3, giving rise to discrete observations Formula 3 per profile. Since the measurement times can be irregular, there seems no obvious way to obtain the distance between any two observed profiles without making distributional assumptions, which we prefer to avoid; compare Yao et al. (2005)Go for the implementation of Gaussian assumptions for irregularly observed functional data. Smooth trajectories for profiles Formula 3 can be obtained from the discrete measurements by applying a linear scatterplot smoother to the scatterplot Formula 3, denoted by Formula 3, when evaluated at point t. While various smoothing methods are available (Fan and Gijbels, 1996Go), we use a class of kernel smoothers (Gasser and Müller, 1984Go) that are well suited for our purpose; further details about these smoothers are provided in the Appendix.

THEOREM 3.1 Using kernel smoothers (A1, see Appendix), for any two random functions Formula 3 and Formula 3 and their smoothed estimates Formula 3 and Formula 3, let

Formula (3.3)

Under Assumptions (A1)–(A6) (see Appendix), it holds that

Formula 3(3.4)

For the proof, we refer to the Appendix. Theorem 3.1 says that the minimizers Formula 3 asymptotically track the minimizing pairwise time shifts Formula 3 and therefore according to Theorem 2.1 also the true shifts Formula 3 up to an asymptotically small error. Together with (3.1) and (3.2), this suggests a least squares approach.

With relative time shifts Formula 3, Model (3.2) becomes

Formula 3(3.5)

where Formula 3 is the vector of estimated pairwise relative time shifts obtained from (3.3) and Formula 3 is the corresponding vector of errors. The least squares estimator Formula 3 of Formula 3 is then

Formula 3(3.6)

COROLLARY 3.2 The least squares time shifts Formula 3 satisfy

Formula 3(3.7)

The proof is in the Appendix. Relative time shift estimates are practically implemented based on the aligned smoothed trajectories Formula 3, Formula 3 via

Formula (3.8)

which are entered on the right-hand side of the least squares equation (3.5). Then the least squares solution is Formula 3, where Formula 3 is defined from (3.8) in analogy to Formula 3 as defined in (3.2).

The gene-specific time shifts can alternatively be estimated through weighted least squares. Introducing the estimated squared distances Formula resulting from (3.8), we define case weights Formula 3. Then the weighted least squares solution is

Formula 3(3.9)

where Formula 3 is a diagonal matrix with diagonal elements Formula 3. Updated estimates for pairwise time shifts Formula 3 are then obtained by Formula 3. These estimates are symmetric as Formula 3.

In the analysis of temporal gene expression profiles, one may be interested in finding clusters of genes, which can be defined as genes with similar time shifts. Reaching high expression levels at about the same time might imply that such genes are related in function and may be involved in a common genetic pathway. Cluster analyses, based on various methods such as k-means and hierarchical or non-parametric density estimation-based clustering (Fukunaga and Hostetler, 1975Go; Wong and Lane, 1983Go), may be performed in a second step, based on estimated time shifts Formula 3.


    4. SIMULATION STUDIES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
To compare the performance of distance and other similarity criteria, we conducted two small simulation studies.

Simulation 1 We generated a set of curves (Formula 3) with known non-random time shifts Formula 3, Formula 3, Formula 3, on the interval Formula 3:

Formula 3

where Formula 3, Formula 3, Formula 3, and data were sampled on an equi-spaced measurement grid Formula 3 on Formula 3, Formula 3. In this setup, the mean function is Formula 3, Formula 3, where Formula 3 and Formula 3 and Formula 3 are iid errors. The number of Monte Carlo runs was Formula 3.

The normalized smoothed curves for one data set are shown in a time-ordered manner in the left panel of Figure 1. We aligned the normalized smoothed curves using distances d (2.2), and also distances based on correlation and rank correlation. For each criterion, we obtained the least squares solution for the time shifts, assuming Formula 3 = 0 in the model. The performance of each criterion was measured by the observed mean squared prediction errors (MSPE) for Formula 3, Formula 3, Formula 3.


Figure 1
View larger version (31K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Illustrating time shifts for a set of simulated data. Left panel: normalized smoothed curves plotted in a vertical sequence corresponding to increasing true time shifts Formula 3 from top to bottom, so that timewise leading profiles appear on the top and lagged profiles at the bottom; right panel: normalized smoothed curves plotted in the same way but with estimated time shifts Formula 3 (3.9).

 
Both correlation and rank correlation measures (Heckman and Zamar, 2000), as alternative distances, led to larger MSPEs as compared to the proposed distances d (2.2). We also compared the least squares estimates computed with and without case weights. The introduction of case weights reduced MSPE for all criteria, especially for the proposed L2 distance d. The first part of Table 1 provides the comparisons for various similarity measures with regard to MSPE.

The right panel of Figure 1 shows estimated time order using weighted least squares based on distance d for one set of simulated curves. It is evident that the original time order is well recovered.

Simulation 2 We also carried out a second simulation to investigate the behavior for less smooth trajectories Z than those considered in Simulation 1, with Z constructed as moving averages of Gaussian white noise Formula 3 with varying span sizes. The mean curve Formula 3 and errors were defined as in Simulation 1. We report the results for three different levels of smoothness of Z in the second part of Table 1. The results are quite similar to those of Simulation 1. The distance d (2.2) again led to smaller MSPEs, as compared to correlation and rank correlation measures, and the introduction of case weights reduced MSPE for all criteria, especially for distance d, which emerged as the overall best distance for time alignment in both simulation studies and was used in the following applications. We conclude that our procedure is robust with regard to differing levels of smoothness of Z.


View this table:
[in this window]
[in a new window]

 
Table 1 Comparisons of different measures with or without case weights

 

    5. TIME ORDERING FOR Drosophila EMBRYONIC DEVELOPMENTAL GENES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
Arbeitman et al. (2002)Go reported cDNA array gene expression patterns for nearly one-third of all 4028 Drosophila melanogaster genes during a complete time course of development, covering 66 sequential time periods beginning at fertilization and spanning the embryonic, larval, and pupal periods and the first 30 days of adulthood. In the first hours of embryonic development between fertilization and gastrulation (about 6–7 h after fertilization), gene expression is highly dynamic (Brody, 1999Go; Weigmann et al., 2003Go). Two broad categories of transcripts are present at this time: those deposited into the egg during oogenesis (produced by maternal genes) and those that are expressed only after fertilization (produced by zygotic genes). To illustrate the proposed methodology, we time ordered 27 strict maternal genes (including swallow) and 21 transiently expressed zygotic genes identified by Arbeitman et al. (2002)Go. We also included the maternal anterior group genes, namely, bicoid, swallow, and exuperantia (no data were available for staufen and exuperantia-like) (Brody, 1999Go). The gene expression patterns of these 50 genes for the first Formula 3 h of the fly embryo stage are displayed in Figure 2.


Figure 2
View larger version (31K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Drosophila gene expression profiles during the first Formula 3 h. Solid: strict maternal genes (including maternal anterior genes); dotted: transient zygotic genes.

 
Here, it is natural to set the time shift of the first strict maternal gene as zero. The obtained time orders for the Drosophila genes are given in Table 2. The gene expression patterns are depicted in Figure 3, ordered by their estimated time shifts. The genes with complete peaks in the lower part of the figure are zygotic and the genes with ‘half’ peaks in the upper half are maternal. We note that the time shift for zygotic genes starts at about 2 h after fertilization, in accordance with the time when zygotic transcription initiates the switch from maternal to zygotic control of mitotic cycles (Foe et al., 1993Go).


View this table:
[in this window]
[in a new window]

 
Table 2 Estimated relative time shifts for Drosophila maternal (m) and zygotic (z) genes

 

Figure 3
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Time-ordered maternal and zygotic gene expression profiles of Drosophila. The curves in the figure are normalized and smoothed gene expression profiles, ordered vertically from top to bottom according to increasing estimated time shifts. Profiles of early-expressed genes appear near the top and those of late-expressed genes near the bottom. Solid: maternal genes; dotted: zygotic genes.

 
We also demonstrate another application of the proposed methodology by including the anterior genes in the alignment. The anterior system is one of the four maternal systems for assuring proper polarity of the oocyte prior to fertilization. Bicoid is the principal protein organizing anterior development in Drosophila. Directional action of exuperantia, exuperantia-like, swallow, and staufen are required during the process, in which bicoid mRNA is transported along the microtubule network of the oocyte to its anterior pole (Brody, 1999Go). The recovered early peakings of swallow and exuperantia prior to bicoid confirm the known pathway in terms of time order of expression.

The time-order analysis also reveals that one gene [CG1624 or DPLD (dappled)] previously identified as maternal displays a relatively large lagged shift, which is a typical feature of zygotic genes. The pattern of this gene is very similar to that of zygotic genes and it likely has been misclassified in previous analysis as a maternal gene.


    6. TIME ORDERING FOR YEAST CELL-CYCLE GENES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
We illustrate the proposed methods for a set of time course microarray expression profiles of 90 yeast genes (Formula 3 factor synchronized), which have been identified as cell-cycle–regulating genes using traditional methods, such as northern blot analysis (Spellman et al., 1998Go). The expression level of each gene was measured during a period from 0 to 119 min with 7-min intervals. The smoothed normalized gene expression profiles for these 90 genes are shown in Figure 4. Although the cell cycle is a continuous process resulting from a sequence of biochemical events, it has traditionally been divided into four phases: G1, S, G2, and M. Of the 90 genes, 18 were previously known to be related to M/G1 phase regulation, 44 to G1 phase regulation, 8 to S phase regulation, 6 to S/G2 phase regulation, and 14 to G2/M phase regulation of the yeast cell cycle.


Figure 4
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Normalized smoothed yeast cell-cycle gene expression profiles.

 
Without employing knowledge of phase assignment, when applying our algorithm to obtain time- order characteristics for these genes, the phases were successfully recovered in the natural time order: G2/M Formula 3 M/G1 Formula 3 G1 Formula 3 S Formula 3 S/G2 Formula 3 G2/M, as illustrated by the dot plot in Figure 5. A complete list of time orders for the cell-cycle genes is in Table 3. Furthermore, we would expect that relatively small time shifts between two genes indicate that they are closely coregulated. The pairwise time-order characteristics are visualized in Figure 6. One may discern four time clusters (darker squares) corresponding to G2/M, M/G1, G1, and S–S/G2 genes.


Figure 5
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5 Dot plot of estimated time shifts Formula 3 (3.9) ordered according to cell-cycle phases. The time order is G2/M Formula 3 M/G1 Formula 3 G1 Formula 3 S Formula 3 S/G2.

 

View this table:
[in this window]
[in a new window]

 
Table 3 Estimated relative time shifts for yeast cell-cycle genes

 

Figure 6
View larger version (41K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6 Relative time shift image plot for time-ordered yeast cell-cycle gene expression profiles. From top to bottom and from left to right, genes are time ordered by increasing estimated gene-specific time shifts Formula 3 (3.9). The plot shows estimated relative time shifts between pairs of genes as indicated by the gray scale (darker indicates smaller relative time shifts, see grayscale bar on the right). The darker squares correspond to time clusters of genes, i.e. genes whose expression profiles have similar time shifts.

 
The gene expression profiles can be aligned in their time order as shown in Figure 7. We note that no sharp cutoff occurs between genes in different phases, i.e. there is a continuum of timing of expression. Assignment of a ‘phase boundary’ gene to a certain cycle phase therefore appears subjective (see also Cooper and Shedden, 2003Go). Further studies on these ‘boundary genes’ might reveal gene-regulating mechanisms during phase transitions.


Figure 7
View larger version (47K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7 Time-ordered normalized smoothed yeast cell-cycle gene expression profiles. From bottom to top, gene expression profiles are ordered according to increasing estimated time shifts Formula 3 (3.9). The gene expression level is gray scale coded (see gray scale bar on the right). Note the shifting patterns in peak locations.

 
Besides contributing to the issue of boundary genes, the time-order analysis points to a few genes as likely having been misclassified into underlying cycle phases by previous methods. These include G1 phase genes YGL225W, YDR113C, and YJL173C, for which the time shifts fall within the cluster of S–S/G2 phase genes, and YDL102W, which according to its time shift belongs to the cluster of M/G1 genes. In addition, two M/G1 genes, namely, YER111C and YNR044W, fall into the cluster of G1 genes. Moreover, the time shifts of genes YDR150W and YPR141C, previously classified as S/G2, are found to be considerably closer to those of G2/M and S genes, respectively. This demonstrates the utility of the inferred time order for clustering and classification.


    7. DISCUSSION AND CONCLUSIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 
The timing of individual gene expression offers insights into the function of the proteins they encode. A recent study on the transcriptome of the intraerythrocytic developmental cycle (IDC) of the malaria parasite Plasmodium falciparum led Bozdech et al. (2003)Go to conclude that the ‘expression profiles for developmentally regulated genes in the IDC transcriptome reveal an orderly timing of key cellular functions’ to ensure the completion of the P. falciparum IDC and ‘groups of functionally related genes share common expression profiles’. The timing of gene expression suggests potential target genes for drug discovery and new vaccine therapies as also pointed out by Bozdech et al. These findings indicate there is a broad scope for applications of the proposed time-ordering methodology.

Recognizing the importance of timing characteristics of gene expression, in this paper, we developed methods to uncover the underlying time ordering. We define and estimate a global time shift for each gene that is rooted in pairwise curve alignments and is obtained in a symmetrization step through minimization of a functional distance via least squares. The proposed approach is easy to implement. In practice, the interval of integration Formula can be determined by searching over different ranges and choosing the range that leads to a ‘best global alignment’, e.g. minimizing the sum of all pairwise distances Formula Although we sought to specifically emphasize the global temporal order of gene expression, finding time clusters for classifying sets of genes by their time dynamics might also be of interest in some studies and can be based at least in part on the proposed time ordering.

We demonstrated the methodology for Drosophila embryonic development and yeast cell-cycle gene expression profiles, as well as simulations. Feasibility of the approach was demonstrated by successful recovery of time ordering. While recovering time ordering of genes in itself is not sufficient to construct gene regulatory networks, it contains valuable information toward that goal and may serve as an informative guiding tool for biologists to further explore and discover relevant regulatory relationships within certain genetic pathways. For example, the timeline of early embyonic development of Drosophila has been well studied and the estimated time shifts of participating genes will assign them into various embyonic sub-stages, thereby informing about their respective functions.


    ACKNOWLEDGMENTS
 
We wish to thank an associate editor and two referees for helpful comments, and Ms. Karen Klein at Office of Research, Wake Forest University School of Medicine, for assistance in text editing. This research was supported in part by National Science Foundation grants DMS03-5448 and DMS05-05537. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. PAIRWISE CURVE ALIGNMENT
 3. GLOBAL TIME SHIFT...
 4. SIMULATION STUDIES
 5. TIME ORDERING FOR...
 6. TIME ORDERING FOR...
 7. DISCUSSION AND CONCLUSIONS
 REFERENCES
 

    Aach J and Church GM. (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17:495–508.[Abstract/Free Full Text]

    Arbeitman MN, Furlong EEM, Imam F, Johnson E, Null BH, Baker BS, Krasnow MA, Scott MP, Davis RW, White KP. (2002) Gene expression during the life cycle of Drosophila melanogaster. Science 297:2270–2275.[Abstract/Free Full Text]

    Arkin A and Ross J. (1995) Statistical construction of chemical-reaction mechanisms from measured time-series. Journal of Physical Chemistry 99:970–979.[CrossRef]

    Arkin A, Shen P, Ross J. (1997) A test case of correlation metric construction of a reaction pathway from measurements. Science 277:1275–1279.[Abstract/Free Full Text]

    Bähler J. (2005) Cell cycle control of gene expression in budding and fission yeast. Annual Review of Genetics 39:69–94.[CrossRef][Web of Science][Medline]

    Bar-Joseph Z, Gerber G, Gifford DK, Jaakkola TS, Simon I. (2003a) Continuous representation of time-series gene expression data. Journal of Computational Biology 10:2–4.

    Bar-Joseph Z, Gerber G, Simon I, Gifford DK, Jaakkola TS. (2003b) Comparing the continuous representation of time-series expression profiles to identify differently expressed genes. Proceedings of the National Academy of Sciences of the United States of America 100:10146–10151.[Abstract/Free Full Text]

    Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, Derisi JL. (2003) The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biology 1:85–100.

    Brody TB. (1999) The Interactive Fly: Gene networks, development and the Internet. Trends in Genetics 15:333–334.[CrossRef][Web of Science][Medline]

    Butte AJ, Bao L, Reis BY, Watkins TW, Kohane IS. (2001) Comparing the similarity of time-series gene expression using signal processing metrics. Journal of Biomedical Informatics 34:396–405.[CrossRef][Web of Science][Medline]

    Cooper S and Shedden K. (2003) Microarray analysis of gene expression during the cell cycle. Cell & Chromosome 2:1–12.[Medline]

    Fan J and Gijbels I. (1996) Local Polynomial Modelling and its Applications(Chapman and Hall., London).

    Foe VE, Odell GM, Edgar BA. (1993) Mitosis and morphogenesis in the Drosophila embryo. In Bate M and Martinez-Arias A (Eds.). The Development of Drosophila melanogaster(Cold Spring Harbor Laboratory Press., Long Island, NY).

    Fukunaga K and Hostetler LD. (1975) Estimation of the gradient of a density-function, with applications in pattern recognition. IEEE Transactions on Information Theory 21:32–40.[CrossRef]

    Gasser T and Müller HG. (1984) Estimating regression functions and their derivatives by the kernel method. Scandinavian Journal of Statistics 11:171–185.

    Gervini D and Gasser T. (2004) Self-modelling warping functions. Journal of the Royal Statistical Society, Series B 66:959–971.[CrossRef]

    Heckman NE and Zamar RH. (2000) Comparing the shapes of regression functions. Biometrika 87:135–144.[Abstract/Free Full Text]

    Liu XL and Müller HG. (2003) Modes and clustering for time-warped gene expression profile data. Bioinformatics 19:1937–1944.[Abstract/Free Full Text]

    Liu XL and Müller HG. (2004) Functional convex averaging and synchronization for time-warped random curves. Journal of the American Statistical Association 99:687–699.

    Lodish H, Berk A, Zipursky LS, Matsudaira P, Baltimore D, Darnell J. (2000) Molecular Cell Biology 4th edition (W.H. Freeman and Company, New York).

    Pollard TD and Earnshaw WC. (2004) Cell Biology 1st edition (W.B. Saunders., Philadelphia, PA).

    Qian J, Dolled-Filhart M, Lin J, Yu H, Gerstein M. (2001) Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identified new, biologically relevant interactions. Journal of Molecular Biology 314:1053–1066.[CrossRef][Web of Science][Medline]

    Silverman BW. (1995) Incorporating parametric effects into functional principal components analysis. Journal of the Royal Statistical Society, Series B 57:673–689.

    Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B. (1998) Comprehensive identification of cell cycle-regulated gene of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9:3273–3297.[Abstract/Free Full Text]

    Wang FM and Gasser T. (1997) Alignment of curves by dynamic time warping. Annals of Statistics 25:1251–1276.[CrossRef]

    Weigmann K, Klapper R, Strasser T, Rickert C, Technau GM, Jäckle H, Janning W, Klämbt C. (2003) FlyMove—a new way to look at development of Drosophila. Trends in Genetics 19:310–311.[CrossRef][Web of Science][Medline]

    Wong MA and Lane T. (1983) A k-th nearest neighbor clustering procedure. Journal of the Royal Statistical Society, Series B 45:362–368.

    Yao F, Müller HG, Wang JL. (2005) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100:577–590.[CrossRef]

    Received August 5, 2005; revised November 28, 2005; revised February 14, 2006; accepted for publication February 16, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    BiometrikaHome page
    R. Tang and H.-G. Muller
    Pairwise curve synchronization for functional data
    Biometrika, December 1, 2008; 95(4): 875 - 889.
    [Abstract] [PDF]


    Home page
    Nucleic Acids ResHome page
    D. Sahoo, D. L. Dill, R. Tibshirani, and S. K. Plevritis
    Extracting binary signals from microarray time-course data
    Nucleic Acids Res., June 28, 2007; 35(11): 3705 - 3712.
    [Abstract] [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow Supplementary material
    Right arrow All Versions of this Article:
    7/4/569    most recent
    kxj026v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Leng, X.
    Right arrow Articles by Müller, H.-G.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Leng, X.
    Right arrow Articles by Müller, H.-G.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?