Skip Navigation


Biostatistics Advance Access originally published online on February 16, 2006
Biostatistics 2006 7(4):503-514; doi:10.1093/biostatistics/kxj022
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/4/503    most recent
kxj022v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Newton, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Newton, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

On estimating the polyclonal fraction in lineage-marker studies of tumor origin

Michael A. Newton

Department of Statistics, University of Wisconsin—Madison, 1300 University Avenue, Madison, WI 53706, USA and Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, 600 Highland Avenue, Madison, WI 53792, USA newton{at}biostat.wisc.edu


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Insight into the biology of tumor formation is provided by studies which demonstrate through the use of cell-lineage markers that some tumors have a polyclonal origin. Novelli et al. (1996) proposed to use the proportion of heterotypic tumors among the tumors that are either heterotypic or pure and of the minority marker type as a lower bound on the marginal fraction of polyclonal tumors. Generally, Novelli's ratio does not provide a valid lower bound for the marginal polyclonal fraction, as we demonstrate by analyzing relevant conditional probabilities. Estimation of the polyclonal fraction requires modeling assumptions on the distribution of the number of involved clones. Using three elementary models, we develop maximum likelihood estimation of the polyclonal fraction. We establish robustness of our estimates to misspecification of the clone-marking process, though the estimates are sensitive to assumptions about polyclonal mechanisms. On data from several published studies, our estimates of the polyclonal fraction are substantially smaller than Novelli's ratio.

Keywords: Cancer biology; Conditional probability; Novelli's ratio


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 
A tumor has a monoclonal origin if early in development its constituent cells descend from a single ancestral cell that is aberrant relative to normal tissue. Otherwise, it has a polyclonal origin. Cellular events at the genesis of tumor growth are naturally difficult to measure; clonality studies have provided significant insights, but questions persist about the frequency and functional role of polyclonality. Many studies present evidence supporting the prevailing view, which is that polyclonality is the exception rather than the rule in tumor formation (e.g. Linder and Gartler, 1967; Vogelstein et al., 1985Go; Fearon et al., 1987Go). Other studies suggest that polyclonality may have an important role, especially for certain intestinal tumors (e.g. Beutler et al., 1967Go; Hsu et al., 1983Go; Novelli et al., 1996Go; Merritt et al., 1997Go; Thliveris et al., 2005Go).

The premiss of clonality studies is that cells presenting different states of a binary lineage marker belong to different clones. Thus, tumors presenting both states of the lineage marker presumably have polyclonal origin. Due to X-chromosome inactivation early in development, each tissue in a female who is heterozygous at a marker locus is a mosaic of cells presenting one or the other variant of the marker. X-chromosome-inactivation markers have been used in many clonality studies. A different marker was used in Novelli et al. (1996)Go. Intestinal tumors (microadenomas) had been measured in an unusual patient who not only had inherited a defective tumor suppressor gene, making him susceptible to intestinal cancer, but also whose tissues were mosaic with respect to the presence of the Y-chromosome. The presence (XY) or absence (XO) of Y could be measured in cells, and this formed a binary lineage marker. Aggregation chimeras enable lineage marking in recent clonality studies using mouse models of intestinal cancer (Merritt et al., 1997Go; Thliveris et al., 2005Go). Briefly, two early mouse embyros (morulae) are fused and ultimately produce a single mouse in which the tissue is a mosaic of contributions from both embryos. One embryo is designed to carry a certain reporter gene in order to easily evaluate the embryonic origin of a cell of interest in the adult chimeric mouse.

In large part, evidence regarding the clonality of tumor origin has been inconclusive owing to limitations of lineage marking and possible measurement errors. Fearon et al. (1987)Go noted some problems in previously reported studies. For example, multiple clones would appear to exist in a monoclonal tumor if normal epithelial or stromal cells happened to contaminate the tumor sample; and in enzyme polymorphism studies, the level of expression would not necessarily be uniform among different clones. The Fearon et al. study applied a DNA-based assay to 50 intestinal tumors. Every tumor presented a single state of the binary marker, in support of monoclonality. Subsequent calculations, however, showed that the Fearon et al. study had low power to detect polyclonality because a very small fraction of the tissue was near patch boundaries in the X-inactivation mosaic (Novelli et al., 2003Go). Patch structure in the Novelli et al. (1996)Go (X0/XY) case was finer grained and thus provided a greater opportunity to detect polyclonality. However, in that study, it was possible that the Y-chromosome could be lost sporadically; though the estimated rate was low, there was a small chance that tumors presenting both marker variants were actually monoclonal. Subsequent mouse chimera studies, on the other hand, did not suffer from problems with marker fidelity and clearly demonstrated polyclonality. The background rate of adenoma formation was relatively high in the Merritt et al. (1997)Go study. Thliveris et al. (2005)Go used similar methods but engineered the mice to have many fewer tumors overall. In spite of substantial challenges in measuring early events of tumor formation, there is now clear evidence supporting the polyclonal origin of a class of intestinal tumors.

The mechanisms responsible for polyclonality are not well understood. Some polyclonal tumors may emerge simply by the close proximity of distinct initiated clones without a requirement for clonal cooperation. Tumor multiplicity was quite low in the Thliveris et al. (2005)Go study of murine intestinal adenomas, and so this so-called random collision hypothesis was considered unlikely. If polyclonality is necessary for certain tumors to grow, then there are intercellular interactions of importance to the initiation and maintenance of the tumor. The existence of such interactions would strain the standard model which holds that a tumor develops according to a monoclonal cell lineage within which genetic damage accumulates (e.g. Nowell, 1976Go). As improved methods are applied to study the earliest events of tumor growth, the precise role of polyclonality will be clarified. A pervasive statistical question in this effort is how to estimate the fraction of polyclonal tumors from data obtained in lineage-marker studies. This question is the focus of the present paper.

Regarding statistical concerns, there is the problem that the fraction of polyclonal tumors may be different from the fraction of tumors that appear to have polyclonal origin according to lineage-marker data. Take the data from Novelli et al. (1996)Go as an illustration. The patient presented with 263 microadenomas in his intestinal tract; 4 of these were pure (homotypic) and of the minority XO type, 246 of these were homotypic and of the majority XY type, and the remaining 13 were heterotypic in that they contained cells of both marker types. These 13 tumors were overtly polyclonal; assuming fidelity of the marker, none could have formed as cells descendant from a single initiated aberrant cell. Quite possibly, covertly polyclonal tumors were among the 250 homotypic tumors, though the actual number of such polyclonal tumors cannot be assessed because a binary lineage marker does not have the resolution to distinguish different clones within a tumor that happen to have the same marker type. All heterotypic tumors are polyclonal, but not all polyclonal tumors are heterotypic.

Recognizing the inherent missing-data structure, Novelli et al. (1996)Go proposed, as a lower bound on the fraction of polyclonal tumors, the proportion of heterotypic tumors among those that are either heterotypic or homotypic of the minority marker type. That became Formula for these data. It is a rather impressive inference since we know with confidence only that the polyclonal fraction exceeds the heterotypic fraction, estimated at Formula. Merritt et al. (1997)Go used the same ratio technique to bound the polyclonal fraction in tumor count data from mouse aggregation chimeras. The rates estimated by this Novelli ratio technique have been reported in various reviews (e.g. Playford, 1998Go; Garcia et al., 1999Go). Through an analysis of conditional probabilities, we show that the Novelli ratio technique is flawed. In doing so, we identify two key stochastic components of lineage-marker data, and further show that model-free estimates of the polyclonal fraction cannot improve the proportion of heterotypic tumors as a lower bound on the polyclonal fraction. Model-based methods are developed, and we show that these are robust to certain forms of model violation, but not to others. These findings have guided some of the statistical calculations in Thliveris et al. (2005)Go, and would seem to have relevance in future clonality studies.


    2. THE INFERENCE PROBLEM
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Of interest are tumors that originate within intestinal epithelial tissue, though none of the statistical reasoning is restricted to this site. We suppose that cells in the tissue can be classified as either normal or abnormal; for our purposes, the detailed distinctions among abnormal cells (e.g. adenomas/carcinomas) are not important, and we consider that abnormal cells populate tumors. Classification of cells may be based on histopathology to detect abnormal cell morphology or immunohistochemistry to detect certain proteins produced in tumor cells (e.g. Merritt et al., 1997Go). To study tumor origin, one needs to consider initiation events, each of which irreversibly transforms a normal cell into an abnormal state. We equate a tumor clone with the full set of extant cells that descend from such an initiated cell via cell proliferation within the tumor. Cells comprising a tumor either form a single clone or partition the tumor mass into multiple clones (owing to multiple initiation events). Thus, in the population of intestinal tumors under study, a fraction Formula of tumors are formed from exactly c clones, for Formula. This forms the probability mass function of C, the number of clones in a randomly sampled tumor. Underlying f is a stochastic process governing how clones are bound together to form tumors. Three elementary, mechanistic models of this ‘clone-binding process’ are presented in Section 4.

The sampled tumor is monoclonal if Formula, otherwise it is polyclonal. The polyclonal fraction

Formula

is the parameter of primary interest. Ideally, we can consistently estimate Formula from available data. Because any evidence that Formula is in conflict with the standard theory of monoclonal tumor origin, an informative lower bound is useful in conjunction with any point estimate of Formula.

Lineage-marker studies provide partial information about the clonal structure of tumors and thus enable inference about the polyclonal fraction Formula. Each cell in the tissue assumes one of a finite number of marker types which marks the cell and any descendant cells. All studies to date have used two types, say {1, 2}, though more are biologically possible and could be readily considered in our statistical analysis. Fidelity of the marker through cell proliferation is essential, otherwise we cannot, from measurements on extant cells, conclude much of anything about the type of ancestral cells that existed at the time of tumor initiation. Auxiliary data may support the marker-fidelity hypothesis, and we adopt this hypothesis in what follows.

For a tumor sampled from the population under study, let Formula denote the number of clones of type t. Naturally Formula in the case involving binary types. The tumor is homotypic of type t if Formula. It is heterotypic if it is not homotypic for either type. We can observe which of the three mutually exclusive events has occurred:

Formula

Current measurements do not allow us to know C or Formula for either t; they simply indicate the value of a trinomial random variable for each tumor. Some sort of clone-marking process characterizes the conditional distribution of Formula given Formula. Possible models for the clone-marking process are discussed in Section 5.

Lineage-marker studies of polyclonality offer two classifications of a tumor population: (1) clonality, i.e. whether Formula or Formula, and (2) phenotype, i.e. whether Formula for some t or not. Table 1 shows the cross classification of such a population in terms of these factors. Tumor count data provide direct information on the marginal row proportions, but complete data are not available on entries inside the table. Assuming marker fidelity, no tumors can be both heterotypic and monoclonal, and this forces a structural zero in the table.


View this table:
[in this window]
[in a new window]

 
Table 1 Cross classification of a tumor population in terms of clonality and phenotype

 
In summary, trinomial phenotype data are available from tumors sampled from a relevant population. Stochastic processes governing the biology of clone binding and clone marking affect the distribution of these data. There is substantial missing information, and also there are structural constraints which relate parameters and guide inference about the polyclonal fraction Formula.


    3. NO MODEL-FREE LOWER BOUND IMPROVES P(HET)
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Evidently Formula because all heterotypic tumors are polyclonal (Table 1). This assertion relies on the marker-fidelity assumption, but it requires no assumptions on either the process by which clones are bound into tumors or the process by which clones attain marks. In this sense it is model free. Though valid, the bound Formula is not tight when a substantial fraction of the homotypic tumors is also polyclonal.

First Novelli et al. (1996)Go and then Merritt et al. (1997)Go used a certain ratio aiming to produce a tighter lower bound for Formula. From a sample of tumors, ‘Novelli's ratio’ is the proportion of heterotypic tumors among those that are either heterotypic or homotypic and of the minority type. The empirical value is

Formula 3(3.1)

where type Formula 3 homotypic tumors are less frequent than type Formula 3 homotypic tumors. This estimates the population quantity

Formula 3(3.2)

A clear rationale for the claim that Formula 3 was not provided in Novelli et al. (1996)Go, but evidently there was no appeal to particular modeling assumptions. The idea may have been simply this: among the heterotypic and minority-homotypic tumors, the polyclonal fraction is

Formula 3(3.3)

Here the development uses Formula 3 as noted in Table 1. The term Formula 3 is liable to be small if the minority cell type is a small proportion of the whole, since multiple clones from that minority component have to somehow interact to form each tumor. Regardless of the magnitude of Formula 3, we have a valid bound Formula 3. Thus, Novelli's ratio Formula 3 does bound a certain polyclonal fraction, but it is not Formula 3, the marginal polyclonal fraction of interest; rather Formula 3 is a lower bound on the rate Formula 3 of polyclonality among the heterotypic and minority-homotypic tumors. Were there some sort of conditional independence, it would follow that the bound also holds marginally. This is not so. In fact, in the population of heterotypic and minority-homotypic tumors, polyclonality is more frequent than in the whole population of tumors (see theorem below). There is a positive gap between Formula 3 and the larger Formula 3, which creates a problem; for if Formula 3 lies in this gap, then it is not a lower bound for the marginal polyclonal fraction Formula 3 (see Figure 1). Further, whether or not Formula 3 lies in the gap depends on details of the stochastic processes generating the data, and so Formula 3 can not be a general-purpose, model-free, lower bound.


Figure 1
View larger version (4K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 On the gap and Novelli's ratio Formula 3: Formula 3 would be a valid bound if Formula 3, or at least if the gap between Formula 3 and Formula 3 is small (case 2), because Formula 3. The gap is of an unknown size, and may be large (case 1), in which case Formula 3 is not smaller than Formula 3.

 
The gap affecting Novelli's ratio is always non-negative. Some conditions are required to establish strict positivity. For one, we require Formula 3. But this is innocuous; if Formula 3 then any quoted rate would provide a valid lower bound for Formula 3; on the other hand, if Formula 3 then all tumors would be homotypic and the question of polyclonality would not have surfaced in the first place. We make no specific assumptions about clone binding or clone marking. However, we do require a weak technical assumption about the latter. Consider that in a population of tumors comprising monoclonal tumors and, for various Formula 3, tumors originating by the interaction of c clones, we have an overall proportion Formula 3 of clones that are of type t. More formally,

Formula 3(3.4)

which arises from consideration of size-biased sampling, as long as Formula 3 (e.g. Patil and Rao, 1978Go). Probabilities Formula 3 for various c and n reflect the possibly complex clone-marking process.

DEFINITION 3.1 The clone-marking process is 'regular' if for each clonal type t, Formula 3, and also if for each Formula 3 for which Formula 3, Formula 3.

Roughly speaking, regularity means that homotypic type t tumors are more frequent among monoclonal tumors than they are among polyclonal tumors. The assumption holds for a range of plausible stochastic processes, such as those in which the marking is neutral and thus independent, in a certain sense, from the clone-binding process. We take up the point shortly. First, we state the main theoretical result which is key to the flaw in Novelli's ratio.

GAP THEOREM 3.1 If Formula 3 and the clone-marking process is regular, then Formula 3.

The value of lineage-marker studies derives in part from the possibility that the marking process itself does not alter the polyclonal structure. This concept of neutrality is stronger than the concept of regularity required in the gap theorem. To be specific, reconsider Formula 3, the number of clones of type t that are bound together in a sampled tumor. One definition of neutral marking is to have that the expected proportion of type t clones in clonality-c tumors does not depend on c, i.e. Formula 3. Owing to discreteness, we cannot have that Formula 3 is independent of C, but we can ask that, on average, the proportion of type t clones in a tumor matches the proportion of type t clones overall. If the marking process is neutral and allows heterotypic tumors, i.e. if Formula 3 for all Formula 3 for which Formula 3, then from (3.4), it follows routinely that the marking process is also regular. Thus, neutrality implies regularity.

As an example of a non-regular marking process, suppose that tumors can be either monoclonal (with probability Formula 3) or biclonal (with probability Formula 3). Suppose further that all monoclonal tumors are marked with type 1, and all biclonal tumors are marked with type 2. The phenotypes and the clonality are highly dependent in this case and seem far from a neutral marking process. Overall among clones, Formula 3 are of type 2, yet Formula 3, Formula 3, which clearly violates the definition of a regular marking process.

An elementary, though useful, neutral marking process entails independent-type assignments according to distribution Formula 3 over types. Independence requires few parameters, but it conflicts with the spatial patterning evident in real tissue that is a mosaic of different types (Griffiths et al., 1989Go; Novelli et al., 2003Go; Thliveris et al., 2005Go). Neutral marking can respect this sort of patterning through positive association by boosting the homotypic rate Formula 3 above the independence homotypic rate Formula 3.

Taking these concepts to a concrete example, consider a simplified model in which tumors are monoclonal with probability Formula 3 or are formed from two clones, and thus are biclonal with probability Formula 3. Tumor-bound clones are marked independently by one of the two types, with the minority type Formula 3 having frequency Formula 3. Evaluating (3.3), the proportion of biclonal tumors among the heterotypic or pure type-1 tumors is

Formula 3

As ensured by (3.3), Novelli's ratio Formula 3 does provide a lower bound for a certain conditional polyclonal fraction. However, there is a gap between that conditional fraction Formula 3 and the smaller marginal polyclonal fraction Formula 3 of interest, and so the bound Formula 3 can fail. Figure 2 charts the difference Formula 3 for different polyclonal fractions and different minority-type frequencies Formula 3. When both Formula 3 and Formula 3 are large, Novelli's ratio provides a legitimate bound because Formula 3. The bound fails when Formula 3. In terms of state-space area, the bound fails for most scenarios. The error is particularly extreme in the realistic situation where the minority fraction is small.


Figure 2
View larger version (22K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Discrepancy between the polyclonal fraction Formula 3 and Novelli's ratio Formula 3 in the monoclonal/biclonal, independent-marking model as function of the minority fraction Formula 3 and the biclonal fraction Formula 3. In the lower left of the plot Novelli's ratio fails to bound the polyclonal fraction. The light gray-shaded region corresponds to case 1 (Figure 1) and the dark to case 2.

 
What statistical recourse is there for inference about Formula 3? The weak lower bound Formula 3 is the best one can do without adopting modeling assumptions on clone binding and marking. Mathematically, for example, it is possible that tumors are either monoclonal or polyclonal of some large degree c, and are marked by some simple marking scheme. If this were the case, virtually all the polyclonal tumors would be heterotypic, and so the simple lower bound Formula 3 would be tight.

Curiously, there is a modification of the Novelli ratio which provides a valid lower bound for Formula 3 in the special monoclonal/biclonal model, though not generally. Peter Sasieni (personal communication) proposed to replace the denominator in Novelli's ratio (3.1) with the number of tumors that are heterotypic plus twice the number of minority-homotypic tumors.


    4. MODEL-BASED INFERENCE IS SENSITIVE TO CLONE-BINDING ASSUMPTIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Three elementary models of clone binding are as follows:

  1. Monoclonal/Biclonal: As in Section 3, polyclonality is equivalent to biclonality. This is the simplest form of polyclonality. One justification is parsimony; the model is a minimal representation of interacting clones.
  2. Conditional Poisson: The number C of clones in a tumor has probability mass function

    Formula 3

    for Formula 3 and Formula 3. This is a Poisson distribution conditioned on at least one clone, and could be justified under some model of random collision or random collision followed by selection if there is sufficient tumorigenic potential (Newton et al., 2006Go). Here, the polyclonal fraction is Formula 3.

  3. Geometric: The number C of clones in a tumor has probability mass function

    Formula 3


    and Formula 3. This model might be justified if aberrant clones engage in some sort of recruitment and conversion of additional clones (Shih et al., 2001Go). Here the polyclonal fraction is Formula 3.

Likelihood-based inference for Formula 3 is possible if we invoke a clone-marking model on top of the clone-binding model. The simplest one is to mark the clones that are bound in a tumor independently and according to a common distribution over types Formula 3. A better model would entail some positive association among bound clones since they are constrained spatially and there may be a semiregular patchwork pattern of lineage markers within the tissue. However, it could be computationally challenging to incorporate detailed information about positive association. Maximum likelihood estimation has some validity even in the absence of independent marking. We argue in Section 5 that the maximum likelihood estimate obtained under the independent-marking assumption is conservatively biased, in the sense of converging to a lower bound on Formula 3, regardless of the positive association among clones bound in a polyclonal tumor.

Likelihood-based inference requires the marginal probability of a homotypic tumor of type t, which is obtained by summing over the unknown clonality C. For the three binding models presented above, and with independent marking, these sums can be solved explicitly.

  1. Monoclonal/Biclonal: Formula 3,
  2. Conditional Poisson: Formula 3,
  3. Geometric: Formula 3.

The tumor sample is viewed as a multinomial draw according to these type probabilities, allowing for the heterotypic class to have a probability equal to the complement of the sum of these homotypic class probabilities. We have not found a closed form expression for the maximum likelihood estimates, but they may be obtained routinely by numerical methods. Either one may use external estimates of the clonal marker frequencies Formula 3 or these may be also estimated from the count data.

Table 2 shows the maximum likelihood estimates of Formula 3 for data from Novelli et al. (1996)Go and Merritt et al. (1997)Go. The estimates are rather different from Novelli's ratio in these examples. We obtained approximate Formula 3 confidence intervals by first computing a profile likelihood function in each case (optimizing numerically in the rate parameter Formula 3) and then normalizing the profile likelihood to be an approximate marginal posterior distribution for Formula 3. Confidence intervals mark the central Formula 3 of these distributions. Results from one data set are amplified in Figure 3, which reveals the lack of robustness of estimates for Formula 3 to changes in the clone-binding model.


View this table:
[in this window]
[in a new window]

 
Table 2 Estimation of polyclonal fraction {theta}—For three data sets shown on the left, reported are MLEs and approximate 95% confidence intervals (CI) using three different models for how clones are bound into tumors: monoclonal/biclonal (MB), conditional Poisson (CP), and geometric (Geo). Also shown are values of the Novelli ratio Formula 3 and the naive lower bound (LB) which is simply the observed proportion of heterotypic tumors. All proportions are shown as percentages

 

Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 A comparison of three model-based estimates of Formula 3 using the clonality count data from Novelli et al. (1996)Go: Plotted are profile likelihood functions that are normalized to integrate to one, and thus serve as approximate posterior distributions. The nuisance parameter Formula 3, the proportion of blue clones, was removed by maximization in each case. Maximum likelihood estimates are noted for each polyclonality model: monoclonal/biclonal (MB), conditional Poisson (CP), and geometric (Geo). Approximate confidence intervals were computed as equi-tail Formula 3 posterior intervals. The black triangle indicates the naive lower bound for Formula 3 which is the observed proportion of heterotypic tumors (Formula 3). The grey triangle indicates the Novelli ratio Formula 3. Model-based inference about Formula 3 is highly sensitive to assumptions about process by which clones are bound into tumors. In the models considered, the probability is high that Formula 3 is less than the supposed lower bound Formula 3.

 

    5. MODEL-BASED INFERENCE IS ROBUST TO CLONE-MARKING ASSUMPTIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Maximum likelihood estimates obtained under the independent-marking model will be biased if there is positive association among the types of the bound clones. Such positive association is expected owing to the typical patchy structure of mosaic tissue. However, we show that this bias is expected to be conservative, (i.e. the estimates ought to be low) since independent marking puts more probability mass on heterotypic tumors than would a more realistic positive-association marking. To establish the conservative bias, suppose that clone-type frequencies Formula 3 are known or can be consistently estimated. Under independent marking, a tumor will be homotypic type t with probability Formula 3. Positive association of clonal marking amounts to an increased homotypic rate Formula 3. The rate of heterotypic tumors under independent marking is Formula 3 and under positive association is Formula 3, both positive by regular marking, and satisfying Formula 3. Both functions are in 1-1 correspondence with the polyclonal fraction Formula 3, and so either could be used to parameterize a likelihood computation for the independent-marking model. Suppose that the maximum likelihood estimate for Formula 3 is derived from a binomial model on the heterotypic frequency. Even though the independent-marking model is incorrect, the independent-marking estimate of Formula 3 will be consistent for this population heterotypic frequency; but in fitting closely to the data, an incorrect value Formula 3 will be converged upon. The correct polyclonal fraction is what we would have converged to using the positive-association model, namely, Formula 3. Since Formula 3, the value Formula 3 to which the independent-marking estimator converges must be no greater than the true polyclonal fraction Formula 3.


    6. CONCLUSIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 
The cellular and molecular events that characterize the earliest stages of intestinal tumor development are not fully understood. In particular, the question of tumor clonality—does a tumor derive from more than one initiated cell?—has remained somewhat elusive. Lineage-marker studies provide the approach to address clonality, but many factors affect the information which can be usefully extracted from lineage-marker data: (1) the marker must have fidelity, otherwise it is not transmitted faithfully through cell division. Ideally, the marker is not affected in any way by tumor growth, and simply records lineages (e.g. this fails if marker variants are created in subclones inside a developing tumor); (2) the marker's mosaic pattern in tissue must be fine grained so that truly polyclonal tumors have sufficient opportunity to be heterotypic, otherwise there is insufficient power; (3) the measurements must be taken early in tumor development, else a dominant clone may grow out and mask earlier polyclonal structure (e.g. Bühler, 1967Go); and (4) measurements must be taken with great care to ensure that normal clones do not contaminate the tumor and lead to a false heterotypic determination. Not all clonality studies have satisfied these requirements, but existing data do indicate the polyclonal origin of a class of intestinal tumors.

Even the ideal lineage-marker study entails a statistical inference problem. We have discussed aspects of the problem to estimate the polyclonal fraction and have shown through an analysis of conditional probabilities that Novelli's ratio does not provide a valid lower bound for this fraction. Without assumptions on the process by which clones are bound into a tumor, the heterotypic fraction is the best lower bound. Maximum likelihood estimates may be derived using simplified model assumptions, and under certain conditions, these simplified estimates are robust. Though precise estimation of the polyclonal fraction is difficult, other parameters describing tumor initiation can be inferred when tumor-count data are combined with spatial information about the mosaic patch structure in lineage-marker studies. For example, the extent of spatial interaction among clones was estimated in Thliveris et al. (2005)Go.


    APPENDIX
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 

A.1 Proof of gap theorem

We prove something slightly more general than is stated. Let t denote any one of the clonal types, and reconsider the event Formula 3 which has probability Formula 3 by regularity. Observe that the polyclonal fraction Formula 3 is a weighted average

Formula 3

where Formula 3 is the complement of Formula 3. In a two-type system where t is the majority type, Formula 3, for example. Via convexity, it is sufficient to prove Formula 3. By Bayes's rule, this is equivalent to Formula 3. Now the marginal Formula 3 is decomposed into non-zero terms according to clonality:

Formula 3

where Formula 3 is the marginal rate of type t clones and Formula 3 has to do with the clone-marking process. Thus, the difference

Formula 3

We have assumed Formula 3 and Formula 3 in the statement of the theorem, so the theorem is true if Formula 3. Considering the possible levels of polyclonality C when Formula 3,

Formula 3

where Formula 3 is the fraction of tumors comprising c clones. The last inequality follows from the definition of a regular marking process. Formula 3


    ACKNOWLEDGMENTS
 
A draft of this work was presented in University of Wisconsin Statistics Department Technical Report no. 1099. The work grew from a project in W.F. Dove's laboratory and from meetings with L. Clipson, R. Halberg, R. Sullivan, S. Stanhope, and A. Thliveris, and was supported by grants from the National Cancer Institute: R01 CA63464 (PI Michael A. Newton) and R37 CA63677 (principal investigator W. F. Dove). Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE INFERENCE PROBLEM
 3. NO MODEL-FREE LOWER...
 4. MODEL-BASED INFERENCE IS...
 5. MODEL-BASED INFERENCE IS...
 6. CONCLUSIONS
 APPENDIX
 REFERENCES
 

    Beutler E, Collins Z, Iriwin L. (1967) Value of genetic variants of glucose-6-phosphate dehydrogenase in tracing the origin of malignant tumors. New England Journal of Medicine 276:389–391.[Web of Science][Medline]

    Bühler WJ. (1967) Single cell against multicell hypotheses of tumor formation, Fifth Berkeley Symposium on Mathematical Statistics and Probability(University of California Press, Berkeley, CA) Volume IV: pp. 635–637.

    Fearon ER, Hamilton SR, Vogelstein B. (1987) Clonal analysis of human colorectal tumors. Science 238:193– 197.[Abstract/Free Full Text]

    Garcia SB, Park HS, Novelli M, Wright NA. (1999) Field cancerization, clonality, and epithelial stem cells: the spread of mutated clones in epithelial sheets. Journal of Pathology 187:61–81.[CrossRef][Web of Science][Medline]

    Griffiths D, Sacco D, Williams GT, Williams ED. (1989) The clonal origin of experimental large bowel tumors. British Journal of Cancer 59:385–387.[Web of Science][Medline]

    Hsu SH, Luk GD, Krush AJ, Hamilton SR, Hoover HH. (1983) Multiclonal origin of polyps in Gardner's syndrome. Science 221:951–953.[Abstract/Free Full Text]

    Linder D and Gartler SM. (1967) Problem of single cell versus multicell origin of a tumor, Fifth Berkeley Symposium on Mathematical Statistics and Probability(University of California Press, Berkeley, CA) Volume IV: pp. 625–633.

    Merritt AJ, Gould KA, Dove WF. (1997) Polyclonal structure of intestinal adenomas in ApcMin/+ mice with concomitant loss of Apc+ from all tumor lineages. Proceedings of the National Academy of Sciences of the United States of America 94:13927–13931.[Abstract/Free Full Text]

    Newton MA, Clipson LC, Thliveris AT, Halberg RB. (2006) A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics 62: doi:10.111.j.1541-0420.2006.00522.x.

    Novelli M, Williamson JA, Tomlinson IPM, Elia G, Hodgson SV, Talbot IC, Bodmer WF, Wright NA. (1996) Polyclonal origin of colonic adenomas in an XO/XY patient with FAP. Science 272:1187–1190.[Abstract]

    Novelli MR, Cossu A, Oukrif D, Quaglia A, Lakhani S, Poulsom R, Sasieni P, Carta P, Contini M, Pasca A, et al. (2003) X-inactivation patch size in human female tissue confounds the assessment of tumor clonality. Proceedings of the National Academy of Science 100:3311–3314.[Abstract/Free Full Text]

    Nowell PC. (1976) The clonal evolution of tumor cell populations. Science 194:23–28.[Abstract/Free Full Text]

    Patil GP and Rao CR. (1978) Weighted distributions and size biased sampling with application to wildlife populations and human families. Biometrics 34:179–189.[CrossRef]

    Playford RJ. (1998) Tales from the crypt—intestinal stem cell repertoire and the origins of human cancer. Journal of Pathology 185:119–122.[CrossRef][Web of Science][Medline]

    Shih I, Wang TL, Traverso G, Romans K, Hamilton SR, Ben-Sasson S, Kinzler KW, Vogelstein B. (2001) Top-down morphogenesis of colorectal tumors. Proceedings of the National Academy of Science of the United States of America 98:2640–2645.[Abstract/Free Full Text]

    Thliveris AT, Halberg RB, Clipson LC, Dove WF, Sullivan R, Washington MK, Stanhope S, Newton MA. (2005) Polyclonality of familial murine adenoma: analyses of chimeras at low tumor multiplicity reveal short-range interactions. Proceedings of the National Academy of Science of the United States of America 102:6960–6965.[Abstract/Free Full Text]

    Vogelstein B, Fearon ER, Hamilton SR, Feinberg AP. (1985) Use of restriction fragment length polymorphisms to determine the clonal origin of human tumors. Science 227:642–645.[Abstract/Free Full Text]

    Received June 21, 2005; revised January 5, 2006; accepted for publication February 8, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/4/503    most recent
    kxj022v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Newton, M. A.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Newton, M. A.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?