Skip Navigation


Biostatistics Advance Access originally published online on March 28, 2006
Biostatistics 2007 8(1):53-71; doi:10.1093/biostatistics/kxj033
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/1/53    most recent
kxj033v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Beerenwinkel, N.
Right arrow Articles by Drton, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beerenwinkel, N.
Right arrow Articles by Drton, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

A mutagenetic tree hidden Markov model for longitudinal clonal HIV sequence data

Niko Beerenwinkel*

Department of Mathematics, University of California, 1073 Evans Hall, Berkeley, CA 94720 USA niko{at}math.berkeley.edu

Mathias Drton

Department of Statistics, The University of Chicago, 5734 S. University Avenue, Chicago, IL 60637, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 
RNA viruses provide prominent examples of measurably evolving populations. In human immunodeficiency virus (HIV) infection, the development of drug resistance is of particular interest because precise predictions of the outcome of this evolutionary process are a prerequisite for the rational design of antiretroviral treatment protocols. We present a mutagenetic tree hidden Markov model for the analysis of longitudinal clonal sequence data. Using HIV mutation data from clinical trials, we estimate the order and rate of occurrence of seven amino acid changes that are associated with resistance to the reverse transcriptase inhibitor efavirenz.

Keywords: EM algorithm; Graphical model; Hidden Markov model; HIV drug resistance; Longitudinal data; Measurably evolving populations; Mutagenetic tree


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 
In the developed world, patients infected with human immunodeficiency virus (HIV) are treated with combinations of several antiretroviral drugs in order to maximally suppress virus replication. However, the development of drug-resistant virus variants limits the success of this intervention (Clavel and Hance, 2004Go). The genetic basis for the emergence of drug resistance is the high mutation rate of HIV, which is a result of the absence of a proof-reading mechanism for reverse transcription of RNA into DNA. The frequent mutations may alter the genetic composition of drug targets and create drug-resistant mutants that will outcompete the wild-type virus and eventually dominate the intra-host virus population. The applied drug cocktail then becomes ineffective and the patient experiences an increase in virus load.

The accumulation of resistance-conferring mutations is a stochastic process that typically follows one of several co-existing evolutionary pathways (Boucher and others, 1992Go; Molla and others, 1996Go; Shafer and Schapiro, 2005Go). Understanding the pathways to drug resistance is important for the design of effective drug combinations. For the study of these pathways, two types of DNA sequence data, based on either population or clonal sequencing, are available.

In routine clinical diagnostics, viral drug targets are sequenced after therapy failure and prior to a therapy switch. Typically, population sequencing is applied, which results in one DNA sequence that represents the mixture of viruses in the population. This approach detects only those mutations that are present in at least 20% of the strains. Furthermore, the phase information, i.e. the knowledge whether polymorphisms at different sites actually co-occur on the same genome, is lost in population sequencing. Abundant in public databases (Rhee and others, 2003Go), cross-sectional data obtained from population sequencing have been used to demonstrate correlations between the occurrences of different mutations (Gonzales and others, 2003Go; Sing and others, 2005Go). Mutagenetic tree models, a subclass of Bayesian networks based on directed trees (Beerenwinkel and others, 2005bGo), take such correlation analysis a step further by providing a tool particularly suited to infer evolutionary pathways to drug resistance from cross-sectional data (Beerenwinkel and others, 2005aGo).

A more elaborate alternative to population sequencing is clonal sequencing. In this approach, multiple viral genomes are independently amplified by polymerase chain reaction (PCR) and sequenced. This strategy has two major advantages over population sequencing. First, since mutations may occur in PCR reactions, the use of multiple independent PCR reactions reduces the impact of early and erroneous introduction of mutations at this step of the analysis. Second, in clonal sequencing, the genotype of a single strain is determined. From the resulting haplotypes the linkage between mutations is readily assessed.

While haplotypes provide exact information on the correlation between mutations, we can gain additional insight about the order in which substitutions are fixed into the population from longitudinal data obtained by sequencing viruses from the same patient at multiple time points. In this paper, we propose an extension of the mutagenetic tree model to a model for longitudinal clonal HIV sequence data. The new model captures both the time structure per patient and the clonal variation per time point, and thus allows to employ time structure in the estimation of substitution rates of resistance-conferring mutations. Knowledge of these substitution rates, which can be interpreted as waiting times, provides the basis for rational therapy planning. Indeed, using viral evolutionary information encoded in mutagenetic trees has recently been shown to significantly improve predictions of in vivo virological response to antiretroviral combination therapies (Beerenwinkel and others, 2005d).

Rather than modeling viral evolution in full generality, we intend to describe a specific phase of evolution in terms of a specific set of genetic events. More precisely, we model amino acid changes that confer drug resistance under the constant selective pressure of a given drug. The data analyzed in this paper comprise a total of 3350 clones that have been collected from 163 patients at different time points during three phase II clinical studies of the reverse transcriptase (RT) inhibitor efavirenz (Bacheler and others, 2000Go, 2001Go). We focus on seven amino acid substitutions in the HIV-1 RT that are associated with resistance to efavirenz (cf. Table 1, where each observed clone is coded as a binary vector of length seven). Working on protease sequences obtained from the same efavirenz studies, Foulkes and DeGruttola (2003)Go have modeled evolutionary pathways to drug resistance by grouping the observed clones into five clusters and estimating transition rates between clusters. In that approach, however, neither the dependence structure between single mutations nor their rates of occurrence is modeled explicitly. In addition, clonal variation is not incorporated explicitly.


View this table:
[in this window]
[in a new window]

 
Table 1 Mutation data and inferred Viterbi path for patient 22. Presence and absence of mutations are encoded as "1" and "0", respectively. At each time point (week), the first row represents the inferred population state (in italics), all other rows are unordered and correspond to observed clones

 
Under treatment with efavirenz, virus strains harboring one or more of the seven considered resistance mutations have a significant fitness advantage. As a result, these mutations accumulate in the population during the course of therapy (Boucher and others, 1992Go; Molla and others, 1996Go; Crandall and others, 1999Go). When modeling this accumulative evolutionary process, we make the following three assumptions:
(A1) Substitutions do not occur independently. There are preferred evolutionary pathways in which mutations are fixed.
(A2) The fixation of mutations into the populations is definite, i.e. substitutions are non-reversible.
(A3) At each time point, the virus population is dominated by a single strain and clones are independent and (possibly erroneous) copies of this genotype.

As mentioned above, previous studies have provided overwhelming evidence for strong dependencies between resistance mutations, as stated in Assumption (A1). In fact, the fixation of mutations typically exhibits order constraints that reflect properties of the underlying fitness landscape (Beerenwinkel and others, 2006Go). For example, if two advantageous mutations show positive epistasis, then they interact synergistically and the double mutation will be fixed more often than expected from the frequencies of the individual substitutions (Michalakis and Roze, 2004Go). In other words, the mutations are more likely to occur on the same mutational pathway. On the other hand, if advantageous mutations exhibit negative epistasis, then they interact antagonistically, resulting in mutants that are less fit than expected or even not viable at all. The mutagenetic tree model accounts for lethal mutational patterns by excluding some binary vectors from its state space, which thus encodes only viable mutant types. Mutational patterns in the state space arise from mutation accumulation along one of the pathways specified in the tree and we call these patterns "compatible" with the tree.

Assumption (A2) refers to the non-reversibility of substitutions. It is motivated by the strong selective advantage that each resistance mutation may confer. Due to this strong selective advantage, we interpret the fixation of each resistance mutation as a selective sweep, after which virtually all viable viruses exhibit the mutation. Together with Assumption (A1), this implies that resistance mutations accumulate along different pathways by successive sweeps and that both back substitutions and incompatible states result in viruses whose relative fitness is so low that they can be ignored.

The third assumption (A3) ties in with our point of view of a series of selective sweeps. Indeed, immediately after such an incident the population will consist of the new advantageous mutant type and its recent descendants, which are therefore phylogenetically independent. Under this assumption, a virus population should exhibit very low genetic diversity. In light of the phylogenetic independence, the aim of our work is different from that of work employing population genetic or phylogenetic methods (Drummond and others, 2002Go, 2003Go). Those approaches explicitly model the mutation and selection process and typically aim at inferring global population parameters, such as the global rate of adaption (Williamson, 2003Go) or the effective population size (Seo and others, 2002Go). By contrast, we model the joint probability distribution resulting from this evolutionary process directly, with the goal of inferring individual substitution rates, as the presence of specific mutational patterns determines the therapeutic options of each patient.

The three assumptions discussed above lead us to a hidden Markov model (HMM) whose state space consists of the compatible states of a given mutagenetic tree. We consider unobservable population states and assume a simple error process involving only a false-positive and a false-negative rate that generates clones from this hidden state. The fixation of each mutation is modeled by an independent Poisson process. Thus, at each time point, we assume the virus population to be dominated by a single mutational pattern and all clonal variation of this unobserved state is modeled as resulting from erroneous copying of this strain.

The remaining part of this paper is organized as follows: In Section 2, we present the data on the seven considered mutations associated with resistance to efavirenz and perform simple permutation tests to assess the validity of Assumptions (A2) and (A3). In Section 3, we develop the above outlined mutagenetic tree hidden Markov model (mtree-HMM). We present the results from fitting the mtree-HMM to the efavirenz data in Section 4 and conclude in Section 5, where we discuss some of the limitations of our approach.


    2. LONGITUDINAL CLONAL HIV SEQUENCE DATA AND MODEL ASSUMPTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 

2.1 HIV data

We subsequently analyze a set of longitudinal clonal HIV sequences that have been collected during three clinical studies of efavirenz combination therapy (DMP 266-003, -004, -005). This data set is publicly available at the Stanford HIV drug resistance database (Rhee and others, 2003Go). Selection of samples, RNA amplification, cloning, and sequencing have been described in detail by Bacheler and others (2000Go, 2001)Go. All patients received the non-nucleoside RT inhibitor efavirenz.

Bacheler and others (2000)Go identified a list of amino acid changes in the HIV RT associated with resistance to efavirenz. In particular, they described two alternative pathways to efavirenz resistance, one initiated by mutation K103N (103-pathway) and the other by Y188L (188-pathway). In the present data set, for each patient, at most one of these two pathways occurs. We focus here on the 103-pathway because mutations associated with the 188-pathway were found in only 7 out of 170 patients, which we excluded from further consideration. A total of 3350 clones have been derived from the remaining 163 patients at between 1 and 11 different time points (median = 3, inter-quartile range 1–5). From the list of efavirenz resistance mutations associated with the 103-pathway, we selected those that occur in at least 2% of clones. This strategy resulted in the seven mutations 103N (frequency 48.1%), 225H (8.0%), 108I (4.9%), 101Q (2.7%), 190S (2.7%), 101E (2.5%), and 100I (2.3%).

As an example for mutational patterns found in the data set, Table 1 shows the clones observed in patient 22, which is also highlighted in Bacheler and others (2000, Table 2) (Viterbi paths such as the one given in Table 1 are discussed in Section 4). This patient is atypical in that before the onset of therapy (week 0), two clones had mutation 103N. In fact, among the 1144 baseline clones only very few already featured the mutations considered here, i.e. mutations 103N (frequency 1.9%), 190S (1.3%), 101Q (0.7%), 101E (0.35%), and 108I (0.08%). Mutations 100I and 225H were not present in any baseline clone. Only six clones (0.52%) harbored a double mutation, namely 103N+190S.

According to Table 1, the first selective sweep, introducing mutation 103N in the virus population of patient 22, has occurred by week 48. By week 70, mutation 225H shows definite signs of manifestation. This behavior is in support of the Assumptions (A2) and (A3) put forward in Section 1. For a quantitative analysis of these assumptions, we perform randomization tests.

2.2 Statistical tests

Let N = 163 be the number of patients in our data set, and let Kij be the number of clones observed in patient i isin [N] := {1,...,N} at the jth time point tij. Moreover, let yijkm isin {0, 1} be the indicator of the presence of mutation m isin [M]: ={1,...,7} in clone k isin [Kij] derived from patient i isin [N] at time point tij. We use two test statistics for evaluating the Assumptions (A2) and (A3), respectively. In both cases, we treat different patients as independent observations.

The non-reversibility of substitutions (A2) is tested by tracing the change in mutant allele frequency for each patient over time. This frequency is

Formula

For each mutation m, our test statistic Am counts how often its frequency decreases from one time point to the next. Thus,

Formula

where I denotes the indicator function. We estimate the distribution of Am under the null hypothesis of equal probability of increase and decrease of allele frequencies by randomizing the temporal order of the clone populations. Under non-reversibility of substitutions, the observed value of the test statistic Am should be small compared to the randomization-based values of Am.

In our data, mutation losses occur in only 1.5–9.4% of the cases (Figure 1). In other words, for all seven considered mutations, the frequency of the mutant increases or stays constant in over 90% of pairs of subsequent virus populations. For most mutations, this value was significantly smaller than expected (p100I = 0.011, p101E = 0.013, p103N < 0.001, p108I < 0.001, p225H < 0.001) with the exceptions of mutations 101Q (p101Q = 0.056) and 190S (p190S = 0.465). We conclude that Assumption (A2) of non-reversible substitutions is reasonable for the majority of the data.


Figure 1
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Non-reversibility of substitutions. For each mutation, the logarithm of the number of times for which its frequency in the population strictly decreases between successive time points is displayed (bold discs). The distribution of this test statistic was estimated by shuffling 1000 times the order of time points (box plots).

 
For assessing the validity of Assumption (A3), we measure the genetic diversity among mutational patterns by the Hamming distance. If c and c' are two clones, their Hamming distance is defined as

Formula

The diversity of a population of clones c1,...,cK is the expected Hamming distance between two clones drawn at random from that population,

Formula

Our test statistic is the expected population diversity

Formula

We consider the null hypothesis of observing the full diversity present in all clones from one patient at each single time point. The distribution of B under this null is estimated by randomly permuting the assignment of clones to time points. Observed values at the left tail of this distribution indicate less genetic diversity per time point than expected from the total diversity present in the clones. Large population diversities would put into question our assumption of a single population state.

For our data, grouping clones according to their sampling time reduces the genetic diversity more than expected from a random grouping. We find that on average clones from the same population differ in only 1 out of 1900 of the considered mutations (Figure 2). For random clone groupings, the expected value is 1 in 300. This difference is highly significant (p < 0.001). Thus, most genetic diversity occurs between time points rather than within time points, and most of the observed populations indeed look like having gone through a selective sweep. We therefore conclude that our assumptions described in Section 1 and formalized in Section 3, which is described next, are in reasonable agreement with the clonal HIV mutation data considered here.


Figure 2
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Genetic diversity. The logarithm of the average Hamming distance between clones from the same time point is displayed (bold disc). The distribution of this test statistic was estimated by randomizing 1000 times the assignment of clones to time points (histogram).

 

    3. MUTAGENETIC TREE HIDDEN MARKOV MODEL
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 
We begin this section by recalling the basic definition and properties of the mutagenetic tree model for cross-sectional data. We first extend this basic model to a mutagenetic tree Markov chain model for multiple observations per patient in time. Second, we introduce the mtree-HMM), which assumes the evolutionary state of the virus population to be unobservable. An expectation maximization (EM) algorithm for parameter estimation is presented, the details of which are given in the Appendix.

3.1 Mutagenetic tree model

Consider M genetic events of interest, which we also call "mutations." We treat these events as random and use binary random variables to indicate their occurrence. For each mutation m isin [M] := {1,...,M}, let Xm be a binary random variable taking values in {0, 1} such that {Xm = 1} and {Xm = 0} indicate the occurrence and non-occurrence of mutation m, respectively. In the HIV application, [M] represents the set of seven amino acid changes in the HIV RT described in Section 2 that confer resistance to efavirenz, and Formula indicates the fixation of mutation m into the virus population. For convenience, we additionally introduce a random variable Formula with Formula. In this setup, the mutational pattern representing the evolutionary state of the virus population corresponds to the binary vector Formula that takes values in the state space Formula.

A mutagenetic tree T on M events is a connected branching on the set of vertices Formula, rooted at node 0 (Beerenwinkel and others, 2005bGo); in the cancer literature, such a tree has been termed oncogenetic (Desper and others, 1999Go; von Heydebreck and others, 2004Go). In particular, this tree is an acyclic directed graph and thus induces a directed graphical model for the joint distribution of the random vector X (Lauritzen, 1996Go). The joint distributions for X comprised in this graphical model factor as

Formula

where Formula, Formula, and pa(m) = paT(m) denotes the parent of m in T. The mutagenetic tree model induced by the tree T is a submodel of the usual graphical model obtained by restricting the conditional probability matrices to the form

Formula

where Formula. The mutagenetic tree model imposes the same conditional independence structure as the graphical model but in addition imposes the constraint that an event can occur only if all of its ancestor events have already occurred (Beerenwinkel and Drton, 2005Go, Theorem 14.6). This fact follows from the first row of the transition matrix. As a consequence, mutations can only occur in specific pathways. More precisely, each mutational pattern corresponds to a subtree of the mutagenetic tree and the pathways correspond to sequences of subtrees related by inclusion and increasing by exactly one vertex (mutation). The extreme cases are the star and the path. In the star, each vertex has the root vertex as parent and all mutational pathways are possible. In the path, the vertices form a chain starting with the root vertex. In this case, there is only one pathway.

A mutational pattern Formula is called compatible with the tree T, if x is a state that may occur with non-zero probability in the mutagenetic tree model induced by T. Hence, x is compatible with T, if there exist parameters Formula such that

Formula

The set Formula of states that are compatible with the tree T forms a finite distributive lattice Formula, where Formula and Formula denote the component-wise maximum and minimum operator, respectively (Beerenwinkel and Drton, 2005Go, Lemma 14.3), cf. Figure 3(a) and (b). The so-called Hasse diagram in Figure 3(b) illustrates the above-mentioned pathways in that any such pathway corresponds to a path from the (wild-type) state Formula to the state Formula with all mutations present.


Figure 3
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 (a) Mutagenetic tree on five vertices, and (b) its induced lattice of compatible states.

 
In a timed mutagenetic tree model, mutations occur according to independent Poisson processes. If Formula denotes the rate of this process on the edge Formula, then the probability that mutation m occurs during a time period of length Formula is

Formula

Mutagenetic trees can be reconstructed from cross-sectional data by solving the maximum weight branching problem in the complete graph on the set of vertices V with an efficient combinatorial algorithm (Desper and others, 1999Go). This algorithm is implemented, for example, in the "Mtreemix" software package (Beerenwinkel and others, 2005cGo). Here, we use the algorithm for obtaining the topology (but not the parameters) of the tree. For this purpose, the longitudinal data were treated as cross-sectional.

3.2 Markov chain model

Assume now that we can observe the viral mutational pattern within one patient at more than one time point. More precisely, let Formula be a binary random variable indicating the occurrence of mutation m at time point Formula, Formula, in the patient's virus population. We assume that the evolutionary process starts at time 0 in the wild-type state, i.e. without any mutation, which is the case for the majority of our data (cf. Section 2.1). Thus, the initial population state distribution follows the timed mutagenetic tree model. In particular, Formula for all Formula with Formula.

For the temporal evolution of the mutational patterns Formula, Formula, we make the assumption that these vectors form a Markov chain observed at fixed time points Formula. Furthermore, we assume that a mutation occurs at the time it is observed. In the HIV application, observations can only be made if the virus load is above a detectable limit resulting in censored observations. However, prior work of Foulkes and DeGruttola (2003)Go suggests that the impact of the censoring is minor (cf. Section 5).

The development of mutation m at a time point Formula with Formula which is encoded in the random variable Formula depends on past mutational events as well as current ancestral mutational events only through two indicators. The first one is Formula and indicates whether the mutation was already present at the immediately preceding time point Formula. The second one is Formula and indicates whether the parent mutation is present at time point Formula. These dependencies arise from modeling the presence of mutation m at time Formula as resulting either from its introduction along the edge Formula at time point Formula or from its earlier non-reversible introduction into the population and hence its presence at time point Formula. The dependency structure among the variables Formula can be encoded in an acyclic directed graph. For example, the subgraph of the graph shown in Figure 4 induced by the unshaded vertices represents the mutagenetic tree Markov chain based on the tree in Figure 3(a).


Figure 4
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Acyclic directed graph representing an mtree-HMM based on the tree from Figure 3(a) with Formula time points, two clones at time point Formula, one clone at Formula, and three clones at Formula. Clear vertices correspond to hidden random variables, shaded vertices to observable variables.

 
The dynamics of this Markov chain model are encoded by the transition matrices

Formula (3.1)

whose rows are indexed by pairs Formula in Formula. The asterisks indicate entries that need not be specified because no event m can occur before its parent event Formula. With these matrices, we define the mutagenetic tree Markov chain model as the family of joint distributions of the form

Formula

where Formula and Formula. It follows that

Formula

3.3 Mutagenetic tree hidden Markov model

We now turn to the case of observed clonal samples, rather than population states, at different time points. We model these data by assuming that the clones are erroneous copies of a hidden mutational pattern that evolves according to a mutagenetic tree Markov chain. Hence, Formula is now a hidden binary random variable. The observed data are instances of binary random variables Formula, Formula, that indicate whether mutation m is present in the kth clone sampled from the virus population at time point Formula. Clones are conditionally independent given the hidden states Formula. The resulting graphical model is referred to as the mutagenetic tree hidden Markov model (mtree-HMM). Its graph is shown in Figure 4.

Let Formula and Formula be parameter vectors that contain the mutation-specific probabilities of observing a false positive and a false negative, respectively. False positives (negatives) are mutations (wild-type residues) observed in clones derived from a virus population that is in the wild-type (mutant) state at that time point. The false-positive and false-negative rates summarize differences from the population state that can arise from mutations in the PCR reactions, or from erroneous viral copying of the dominating strain. Thus, these parameters quantify the expected genetic diversity of the virus population. Conditionally upon the hidden state Formula, the probabilities of observing mutation m in clone k at time point Formula are

Formula

The entries of this matrix are the conditional probabilities

Formula

The different clones Formula, Formula, are thus modeled as independent and identically distributed. Let

Formula

denote all clonal sequence observations. The mtree-HMM is the family of joint distributions of Y given by

Formula

where we have summed over the hidden states. The graphical model structure of the mtree-HMM is illustrated in Figure 4.

We remark that on an abstract level, mtree-HMMs are related to phylogenetic hidden Markov models (phylo-HMMs) (McAuliffe and others, 2004Go; Siepel and Haussler, 2004Go). Phylo-HMMs are probabilistic models of sequence evolution that combine phylogenetic trees at different sites of aligned genomes with an HMM that allows for dependence across sites. Thus, in phylo-HMMs the tree models arise through dependence in time, and the HMM represents a dependence in space (along the genome). While mtree-HMMs are similarly structured in that tree-based models are combined with HMMs, the roles of space and time are reversed. The mutagenetic trees arise from dependence among genome sites, and the HMM introduces dependence across time.

3.4 Parameter estimation

Consider now clonal sequence data for a set of patients indexed by Formula. For each patient Formula, we have observations at time points Formula. We denote by Formula the indicator of occurrence of mutation m in the viral population state of patient i at time point Formula. The random variable Formula indicates the occurrence of mutation m in clone Formula of patient i at time point Formula. We denote the transition matrices (3.1) by Formula. For example,

Formula

We assume patients to be independent and each patient to follow the mtree-HMM based on a fixed tree T. Then the resulting model for the observations

Formula

made at time points Formula, has the likelihood function

Formula

where we set Formula and Formula for all Formula and Formula. The outer sums make the likelihood function hard to maximize. In order to solve this optimization problem and to obtain maximum likelihood estimates, we apply an EM algorithm.

Let Formula be values of the hidden variables Formula that are compatible with the unobserved mutagenetic tree Markov chain model. Then it must hold that Formula whenever Formula and Formula whenever Formula. Let I be the indicator function and set

Formula

for Formula. The log-likelihood function Formula of the hidden model is the sum over all Formula, Formula, and Formula of the expressions

Formula

Maximization of Formula is feasible since it does not involve the sums that are present in Formula.

The EM algorithm iteratively alternates between its E-step, in which the expectations of the indicators Formula and Formula are computed for fixed parameter estimates, and its M-step, in which these expectations are used to maximize the expectation of the log-likelihood function Formula and to obtain new parameter estimates. Details of the EM algorithm, including the choice of starting solutions and the bootstrap procedure for confidence intervals, are given in the Appendix.


    4. RESULTS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 

4.1 Tree reconstruction

Employing the cross-sectional tree reconstruction method yielded the mutagenetic tree on eight vertices displayed in Figure 5 with parameters Formula shown in Figure 6(a). In this mutagenetic tree model, predominantly mutation 103N initiates the development of efavirenz resistance and is followed by 100I, 101Q, 108I, or 225H. A parallel but strongly retarded pathway involves mutation 190S followed by 101E. Based on this cross-sectional analysis, the expected waiting time for mutation 103N to occur is estimated to be Formula weeks of efavirenz combination therapy. By contrast, mutation 190S is estimated to occur after 113 weeks.


Figure 5
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5 Mutagenetic tree model for the development of resistance to efavirenz in the HIV-1 RT. Vertices are labeled with amino acid substitutions.

 

Figure 6
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6 Bootstrap estimates of the parameters of the mtree-HMM for the development of efavirenz resistance based on the tree in Figure 5. (a) Progression rates Formula of mutations, reported on a log scale. Diamonds indicate the cross-sectional estimate. (b) False-positive rates Formula. (c) False-negative rates Formula.

 
The mutagenetic tree model has 51 compatible states. The transition matrix of the corresponding HMM has 588 non-zero entries out of Formula (cf. Appendix).

4.2 Inferred population states

Using the fixed tree topology, we applied the EM algorithm to the longitudinal clonal data. The Viterbi algorithm was used to obtain maximum a posteriori estimates of the hidden variables, i.e. of the viral population states, resulting in one Viterbi path for each patient. Given the model parameters, the Viterbi path is the most probable sequence of states to explain both the progression in time of the virus population along the lattice of compatible states and the observed clones at the respective time points. For example, the Viterbi path for patient 22 is illustrated in Table 1. We estimated mutation 103N to enter the virus population of this patient at week 48 and mutation 225H to be introduced after 70 weeks of therapy. Mutation 103N had already been present in 2 out of 16 clones before onset of therapy (week 0), but was most likely not established in the population at this time, because viral replication was suppressed for 48 more weeks (Bacheler and others, 2000, Figure 1). Likewise, mutation 225H appears first at week 59 in one out of seven clones, 11 weeks before its estimated fixation. Conversely, we estimated the introduction of this mutation at week 70 despite the fact that four out of eight clones did not yet harbor it at this time.

4.3 Rates of progression

The parameters Formula are conditional rates of fixation of mutations in the population, associated with the edges of the mutagenetic tree. Estimation of these rates from cross-sectional data depends on the assumption of independent observations and of an exponentially distributed sampling time ((A.1) in the Appendix), both of which are not met with the present data set (Formula, Kolmogorov–Smirnov test). Estimates of Formula in the mtree-HMM are shown in Figure 6(a).

Mutation 103N is by far the earliest mutation to occur with an expected waiting time of 19 weeks, as compared to 190S with 478 weeks. Subsequent to 103N, mutation 225H is most likely to occur next, followed by 108I, 101Q, and 100I. Mutation 101E appears shortly after 190S. These findings are in accordance with in vitro measurements of efavirenz resistance (Bacheler and others, 2001). Engineered single amino acid substitutions result in 36- and 97-fold reduced efavirenz susceptibility for 103N and 190S, respectively, as compared to the wild type. By contrast, all other mutations cause significantly less efavirenz resistance on their own (1.2- to 24-fold, median 5.6). Remarkably however, all the double mutants 103N+100I, 103N+101Q, 103N+108I, 103N+225H as well as the triple mutant 103N+108I+225H were measured highly resistant to efavirenz (84- to 2400-fold). Thus, phenotypic resistance increases along the mutagenetic tree, and the ordered accumulation appears to result from a specific phenotypic gradient.

Comparing the cross-sectional estimates of Formula to their longitudinal counterparts obtained from bootstrapping reveals consistent overestimation of progression rates by the cross-sectional approach (Figure 6(a)). This effect may be due to the high number of clones that have been sequenced prior to therapy start at week 0, which leads to an estimated sampling time of only 22 weeks in the cross-sectional approach.

4.4 Clonal variation

The variation of clones is modeled by mutation-specific parameters Formula and Formula that denote the probability of a false-positive and a false-negative mutation, respectively. Figures 6(b) and (c) show the bootstrap estimates of these parameters. False positives occur when a mutation is not yet fixed into the population, but is already present on single clones. We found very low false-positive rates (Formula) for all mutations.

We obtained higher estimates for Formula, the rates of false negatives. False negatives occur when a mutation is estimated to be fixed into the virus population, but does not appear in subsequent clones. This happens more frequently because the model assumes non-reversibility of mutations in the population. The false-negative rate of 103N was by far the lowest, presumably due to its strong selective advantage both as a single mutation and in combination with others. Mutations 190S and 101E showed high false-negative rates but these estimates were also subject to a considerable variance. In the case of 190S and also 101Q, the high false-negative rate is not as surprising given that the randomization tests in Section 2.2 indicated that the assumption of non-reversibility of substitution is less warranted for these two mutations.


    5. LIMITATIONS AND CONCLUSIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 
RNA viruses provide prominent examples of measurably evolving populations, and precise predictions of the outcome of this evolutionary process are a prerequisite for the rational design of antiretroviral treatment protocols. Insight in the evolutionary process can be obtained from longitudinal genomic data. For analysis of such data, we have presented a mtree-HMM. We applied this model in an analysis of longitudinal clonal sequence data on the evolution of drug resistance in HIV. Our focus was on estimating the order and rate of occurrence of seven single amino acid changes that are associated with resistance to the drug efavirenz.

Our model of clonal variation assumes that the clones are possibly erroneous copies of a fixed but unobserved population state. Thus, we regard the virus population as consisting of a single dominating strain and a cloud of closely related mutants. While this assumption is justified by the strong selective pressure exerted by antiretroviral therapy, it does not allow for explicit modeling of different lineages following different evolutionary pathways. Such parallel developments were very rare in the present data set. For example, at week 59 in patient 126, three clones each displayed the mutational patterns 101E + 225H and 103N + 225H, respectively. The estimated population state was the quadruple mutant 101E+103N+190S+225H, but none of the observed clones actually displayed this pattern. In fact, these parallel lineages contribute to the elevated false-negative rates of mutations 101E and 190S. Possible ways of extending the mtree-HMM to account for this effect include modeling the population state as more than one strain, and grouping clones into different pathways.

Evolution in our model is assumed to result from mutation and selection. Incorporating recombination would alter our setup considerably. However, the mutations we considered lie no more than 125 codons, or 375 nucleotides, apart. Given the HIV genome of length 10 000 base pairs, recombination is unlikely to occur in this small region, but cannot be ruled out entirely.

Finally, we emphasize that throughout we have ignored the interval censoring inherent in the data and assumed that mutations occur at the time they are observed. However, ignoring the interval censoring is unlikely to have a strong effect on the parameter estimates because of the small interval lengths. In a related analysis of data with the same time interval structure, Foulkes and DeGruttola (2003)Go have found no qualitative and little quantitative difference when using the midpoint of time intervals as the time at which mutations occur. Nevertheless, modeling interval censoring will become more important for larger time intervals, for example due to longer periods of suppressed virus replication below detectable limits.


    APPENDIX A
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 

EM algorithm

We provide here a detailed description of the EM algorithm from Section 3.4 that is used to estimate the parameters of the mtree-HMM.

A.1 M-step

The log-likelihood function Formula of the hidden model decomposes into a sum involving only Formula and another sum involving only Formula and Formula. Let

Formula

be the conditional expectation of Formula given the clonal observations Y. For maximum likelihood estimation of Formula, we have to maximize the function

Formula

This task is complicated by the Formula terms, but can be accomplished numerically by simple general optimization methods, such as the simplex method of Nelder and Mead (1965)Go.

For estimating Formula and Formula, let

Formula

be the conditional expectation of the corresponding sum of Formula given the clonal observations Y. Then the maximum likelihood estimates are

Formula

A.2 E-step

The mtree-HMM is a submodel of a standard HMM with hidden state space equal to the set Formula of states compatible with the mutagenetic tree T. Therefore, we can compute the conditional expectations Formula and Formula by applying, separately for each patient, the forward–backward equations of the standard HMM (Durbin and others, 1998), (Hallgrímsdóttir and others, 2005). In the standard HMM, only certain transitions between compatible states can occur with non-zero probability, namely those that respect the partial order "Formula" of the lattice induced by T. The probability Formula of transitioning from state Formula to state Formula is given by

Formula

where Formula and Formula is a random vector distributed according to the timed mutagenetic tree model based on the tree T and the time interval Formula. The number of compatible states depends on the topology of T but is bounded by Formula. Thus, the number of non-zero transition probabilities also depends on the topology of T. For example, the transition matrix arising from the tree T shown in Figure 3(a) is of the form

Formula

where asterisks indicate entries that are not restricted to zero.

The remaining structure of the standard HMM is determined by the probability of emitting clones Formula from a hidden state Formula, which is equal to

Formula

Applying the forward–backward equations to this HMM yields, for each patient i and time point Formula, the conditional expectation of the number of transitions Formula from state Formula to state Formula and the conditional expectation of the number of emissions Formula of clone Formula from state Formula. From these conditional expectations, we can obtain the quantities

Formula

that are needed in the M-step of the EM algorithm for the mtree-HMM.

If several vertices of the mutagenetic tree share the root vertex 0 as parent, additional computational saving is possible by employing that variables in different subtrees rooted at 0 are mutually independent. In this case, several standard HMMs with smaller state spaces can replace the one standard HMM.

A.3 Starting solutions

As initial parameter values in the mtree-HMM, we used Formula for all error probabilities Formula and Formula. The initial rates Formula were derived from the cross-sectional approach by assuming an exponential distribution with rate Formula for the sampling times. Under this assumption, Formula can be computed from

Formula (A.1)

where both Formula and Formula were estimated directly from the cross-sectional data (Rahnenführer and others, 2005).

A.4 Confidence intervals

We obtained confidence intervals for all model parameters Formula, Formula, and Formula from 100 bootstrap runs of the EM algorithm. Resampling of the data was performed in two steps. First, a set of 163 patients was drawn at random. Second, for each patient and each time point, the corresponding set of clones was used for resampling the respective number of clones. Thus, the number of patients, the structure of sampling times, and the number of clones equaled those of the original data set in all bootstrap runs.


    ACKNOWLEDGMENTS
 
Niko Beerenwinkel is supported by Deutsche Forschungsgemeinschaft (DFG) under grant No. BE 3217/1-1. Mathias Drton acknowledges support from National Institutes of Health grant R01-HG02362-03 and National Science Foundation grant DMS-0505612. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LONGITUDINAL CLONAL HIV...
 3. MUTAGENETIC TREE HIDDEN...
 4. RESULTS
 5. LIMITATIONS AND CONCLUSIONS
 APPENDIX A
 REFERENCES
 

    Bacheler L, Jeffrey S, Hanna G, D'Aquila R, Wallace L, Logue K, Cordova B, Hertogs K, Larder B, Buckery R. and others. (2001) Genotypic correlates of phenotypic resistance to efavirenz in virus isolates from patients failing nonnucleoside reverse transcriptase inhibitor therapy. Journal of Virology 75:4999–5008 http://dx.doi.org/10.1128/JVI.75.11.4999-5008.2001.[Abstract/Free Full Text]

    Bacheler LT, Anton ED, Kudish P, Baker D, Bunville J, Krakowski K, Bolling L, Aujay M, Wang XV, Ellis D. and others. (2000) Human immunodeficiency virus type 1 mutations selected in patients failing efavirenz combination therapy. Antimicrobial Agents and Chemotherapy 44:2475–84.[Abstract/Free Full Text]

    Beerenwinkel N, Däumer M, Sing T, Rahnenführer J, Lengauer T, Selbig J, Hoffmann D, Kaiser R. (2005a) Estimating HIV evolutionary pathways and the genetic barrier to drug resistance. Journal of Infectious Diseases 191:1953–60 http://dx.doi.org/10.1086/430005.[CrossRef][Web of Science][Medline]

    Beerenwinkel N and Drton M. (2005) Mutagenetic tree models. In Pachter L and Sturmfels B (Eds.). Algebraic Statistics for Computational Biology(Cambridge University Press, Cambridge, UK) pp. 278–90.

    Beerenwinkel N, Eriksson N, Sturmfels B. (2006) Evolution on distributive lattices. Journal of Theoretical Biology in press.

    Beerenwinkel N, Rahnenführer J, Däumer M, Hoffmann D, Kaiser R, Selbig J, Lengauer T. (2005b) Learning multiple evolutionary pathways from cross-sectional data. Journal of Computational Biology 12:584–98 http://dx.doi.org/10.1089/cmb.2005.12.584.[CrossRef][Web of Science][Medline]

    Beerenwinkel N, Rahnenführer J, Kaiser R, Hoffmann D, Selbig J, Lengauer T. (2005c) Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics 21:2106–7 http://dx.doi.org/10.1093/bioinformatics/bti274.[Abstract/Free Full Text]

    Beerenwinkel N, Sing T, Lengauer T, Rahnenführer J, Roomp K, Savenkov I, Fischer R, Hoffmann D, Selbig J, Korn K. and others. (2005d) Computational methods for the design of effective therapies against drug resistant HIV strains. Bioinformatics 21:3943–50 http://dx.doi.org/10.1093/bioinformatics/bti654.[Abstract/Free Full Text]

    Boucher CA, O'Sullivan E, Mulder JW, Ramautarsing C, Kellam P, Darby G, Lange JM, Goudsmit J, Larder BA. (1992) Ordered appearance of zidovudine resistance mutations during treatment of 18 human immunodeficiency virus-positive subjects. Journal of Infectious Diseases 165:105–10.[Web of Science][Medline]

    Clavel F and Hance AJ. (2004) HIV drug resistance. New England Journal of Medicine 350:1023–35 http://dx.doi.org/10.1056/NEJM2ra025195.[Free Full Text]

    Crandall KA, Kelsey CR, Imamichi H, Lane HC, Salzman NP. (1999) Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. Molecular Biology and Evolution 16:372–82.[Abstract]

    Desper R, Jiang F, Kallioniemi OP, Moch H, Papadimitriou CH, Schaffer AA. (1999) Inferring tree models for oncogenesis from comparative genome hybridization data. Journal of Computational Biology 6:37–51.[Web of Science][Medline]

    Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W. (2002) Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161:1307–20.[Abstract/Free Full Text]

    Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG. (2003) Measurably evolving populations. Trends in Ecology & Evolution 18:481–8.[CrossRef]

    Durbin R, Eddy S, Krogh A, Mitchison G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids(Cambridge University Press, Cambridge, UK).

    Foulkes A and DeGruttola V. (2003) Characterizing the progression of viral mutations over time. Journal of the American Statistical Association 98:859–67.[CrossRef]

    Gonzales MJ, Wu TD, Taylor J, Belitskaya I, Kantor R, Israelski D, Chou S, Zolopa AR, Fessel WJ, Shafer RW. (2003) Extended spectrum of HIV-1 reverse transcriptase mutations in patients receiving multiple nucleoside analog inhibitors. AIDS 17:791–9 http://dx.doi.org/10.1097/01.aids.0000050860.71999.23.[CrossRef][Web of Science][Medline]

    Hallgrímsdóttir I, Milowski R, Yu J. (2005) The EM algorithm for hidden Markov models. In Pachter L and Sturmfels B (Eds.). Algebraic Statistics for Computational Biology(Cambridge University Press, Cambridge, UK) pp. 248–61.

    Lauritzen S. (1996) Graphical Models(Clarendon Press, Oxford, UK).

    McAuliffe JD, Pachter L, Jordan MI. (2004) Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. Bioinformatics 20:1850–60 http://dx.doi.org/10.1093/bioinformatics/bth153.[Abstract/Free Full Text]

    Michalakis Y and Roze D. (2004) Evolution. Epistasis in RNA viruses. Science 306:1492–3 http://dx.doi.org/10.1126/science.1106677.[Abstract/Free Full Text]

    Molla A, Korneyeva M, Gao Q, Vasavanonda S, Schipper PJ, Mo HM, Markowitz M, Chernyavskiy T, Niu P, Lyons N. and others. (1996) Ordered accumulation of mutations in HIV protease confers resistance to ritonavir. Nature Medicine 2:760–6.[CrossRef][Web of Science][Medline]

    Nelder J and Mead R. (1965) A simplex method for function minimization. Computer Journal 7:308–13.

    Rahnenführer J, Beerenwinkel N, Schulz WA, Hartmann C, von Deimling A, Wullich B, Lengauer T. (2005) Estimating cancer survival and clinical outcome based on genetic tumor progression scores. Bioinformatics 21:2438–46 http://dx.doi.org/10.1093/bioinformatics/bti312.[Abstract/Free Full Text]

    Rhee S-Y, Gonzales MJ, Kantor R, Betts BJ, Ravela J, Shafer RW. (2003) Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Reserch 31:298–303.

    Seo T-K, Thorne JL, Hasegawa M, Kishino H. (2002) Estimation of effective population size of HIV-1 within a host: a pseudomaximum-likelihood approach. Genetics 160:1283–93.[Abstract/Free Full Text]

    Shafer RW and Schapiro JM. (2005) Drug resistance and antiretroviral drug development. Journal of Antimicrobial Chemotherapy 55:817–20 http://dx.doi.org/10.1093/jac/dki127.[Abstract/Free Full Text]

    Siepel A and Haussler D. (2004) Combining phylogenetic and hidden Markov models in biosequence analysis. Journal of Computational Biology 11:413–28 http://dx.doi.org/10.1089/1066527041410472.[CrossRef][Web of Science][Medline]

    Sing T, Svicher V, Beerenwinkel N, Ceccherini-Silberstein F, Kaiser MDR, Walter H, Korn K, Hoffmann D, Oette M, Rockstroh JK. and others. (2005) Characterization of novel HIV drug resistance mutations using clustering, multidimensional scaling, and SVM-based feature ranking. In Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (Eds.). Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3–7, 2005, Proceedings. Lecture Notes in Computer Science 3721, Springer 2005 pp. 285–286.

    von Heydebreck A, Gunawan B, Füzesi L. (2004) Maximum likelihood estimation of oncogenetic tree models. Biostatistics 5:545–56 http://biostatistics.oupjournals.org/cgi/content/abstract/5/4/545?etoc.[Abstract]

    Williamson S. (2003) Adaptation in the env gene of HIV-1 and evolutionary theories of disease progression. Molecular Biology and Evolution 20:1318–25 http://dx.doi.org/10.1093/molbev/msg144.[Abstract/Free Full Text]

    Received September 1, 2005; revised March 13, 2006; accepted for publication March 27, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    BioinformaticsHome page
    M. Gerstung, M. Baudis, H. Moch, and N. Beerenwinkel
    Quantifying cancer progression with conjunctive Bayesian networks
    Bioinformatics, November 1, 2009; 25(21): 2809 - 2815.
    [Abstract] [Full Text] [PDF]


    Home page
    BioinformaticsHome page
    P. Buendia, B. Cadwallader, and V. DeGruttola
    A phylogenetic and Markov model approach for the reconstruction of mutational pathways of drug resistance
    Bioinformatics, October 1, 2009; 25(19): 2522 - 2529.
    [Abstract] [Full Text] [PDF]


    Home page
    BiometrikaHome page
    N. Beerenwinkel and S. Sullivant
    Markov models for accumulating mutations
    Biometrika, September 1, 2009; 96(3): 645 - 661.
    [Abstract] [PDF]


    Home page
    BioinformaticsHome page
    M. C. F. Prosperi, R. D'Autilia, F. Incardona, A. De Luca, M. Zazzi, and G. Ulivi
    Stochastic modelling of genotypic drug-resistance for human immunodeficiency virus towards long-term combination therapy optimization
    Bioinformatics, April 15, 2009; 25(8): 1040 - 1047.
    [Abstract] [Full Text] [PDF]


    Home page
    Brief BioinformHome page
    G. Tesler
    Algebraic Statistics for Computational Biology. * Edited by Lior Pachter and Bernd Sturmfels
    Brief Bioinform, March 1, 2007; 8(2): 138 - 139.
    [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    8/1/53    most recent
    kxj033v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Beerenwinkel, N.
    Right arrow Articles by Drton, M.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Beerenwinkel, N.
    Right arrow Articles by Drton, M.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?