Biostatistics Advance Access originally published online on June 20, 2006
Biostatistics 2007 8(2):323-336; doi:10.1093/biostatistics/kxl012
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data
Department of Medical Oncology, Dana-Farber Cancer Institute, MA 02115, USA tlafram{at}broad.mit.edu
Department of Biostatistics, Harvard School of Public Health, MA 02115, USA
Department of Medical Oncology, Dana-Farber Cancer Institute, MA 02115, USA
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
Human cancer is largely driven by the acquisition of mutations. One class of such mutations is copy number polymorphisms, comprised of deviations from the normal diploid two copies of each autosomal chromosome per cell. We describe a probe-level allele-specific quantitation (PLASQ) procedure to determine copy number contributions from each of the parental chromosomes in cancer cells from single-nucleotide polymorphism (SNP) microarray data. Our approach is based upon a generalized linear model that takes advantage of a novel classification of probes on the array. As a result of this classification, we are able to fit the model to the data using an expectation-maximization algorithm designed for the purpose. We demonstrate a strong model fit to data from a variety of cell types. In normal diploid samples, PLASQ is able to genotype with very high accuracy. Moreover, we are able to provide a generalized genotype in cancer samples (e.g. CCCCT at an amplified SNP). Our approach is illustrated on a variety of lung cancer cell lines and tumors, and a number of events are validated by independent computational and experimental means. An R software package containing the methods is freely available.
Keywords: Allelic imbalance; Cancer genomics; DNA copy number; Expectation-maximization algorithm; Generalized linear model; Single nucleotide polymorphism array
| 1. INTRODUCTION |
|---|
|
|
|---|
Over the course of the past decade, high throughput probe-based microarray technology has become a vital tool in genomic research. These microarrays contain thousands of unique nucleotide probe sequences, each designed to hybridize to a "target" nucleic acid molecule. When a DNA or RNA sample is properly prepared and applied to the array, specialized equipment can produce a measure of the intensity of hybridization between each probe and its target in the sample. The underlying principle is that the hybridization intensity depends upon the amount of target DNA or RNA in the sample, as well as the affinity between target and probe. Extensive processing and analysis of these raw intensity measures give estimates of some characteristic of the target sequences in the sample. The subject of this paper is the analysis of data from a specific array type, the single-nucleotide polymorphism (SNP) array.
The GeneChip Mapping 100K Set (Affymetrix, 2004
) is a pair of arrays able to interrogate over 100 000 human SNPs. Herein, we shall refer to this pair simply as the SNP array. The original aim of the SNP array was to identify which of the two SNP allelesarbitrarily labeled allele A and allele Boccurs for each chromosome copy (maternal and paternal) at each SNP in an individual's genome. Thus, the individual can be genotyped at a SNP as either homozygous AA, homozygous BB, or heterozygous AB. More recently, it has been demonstrated that these arrays may be used to identify loss-of-heterozygosity (LOH) (Lindblad-Toh and others, 2000
; Lin and others, 2004
), as well as to produce a measure of genomic copy number at each SNP (Bignell and others, 2004
; Zhao and others, 2005
), in cancer samples. Regions of LOH are loci at which one of the two parental copies of a chromosome is deleted. Typically, one may use SNP array data to detect LOH at SNPs where the cancer cell is homozygous, but its matched normal (same individual) counterpart is heterozygous. In copy number inference, the goal is to identify chromosomal regions in which the number of copies deviates from the normal diploid two. These lesions include amplifications (copy number greater than two), heterozygous deletions (copy number one), and homozygous deletions (copy number zero).
The SNP array is designed so that each probe is a sequence of length 25 bases, and is a member of a probe set comprised of 40 unique sequences. Within a probe set, half of all probes are "perfect match" (PM) probes. All PM probes within the set are perfectly complementary to some 25-base subsegment of the same target DNA fragment. Additionally, every PM probe has a corresponding "mismatch" (MM) probe that is identical to its PM counterpart, save that the central (13th) base is altered so as not to be perfectly complementary to the target sequence. The PM probes are complementary to either the A or the B allele of the SNP, and thus the SNP array probes have been typically classified as either PMA, PMB, MMA, or MMB. In fact, the probes on the array may be grouped as quartets comprised of one of each of these four classes, with each quartet interrogating the same 25-base subsequence of the target genomic DNA fragment.
In this paper, we provide a generalization of the three applicationsgenotyping, LOH detection, and copy number inferenceof SNP arrays. Specifically, we present a probe-level allele-specific quantitation (PLASQ) procedure to infer allele-specific copy number (ASCN) and parent-specific copy number (PSCN). The ASCN is a generalization of both genotype and copy number at a SNP, in that all sample SNPs are assigned a genotype, regardless of copy number. Thus, ASCNs for normal (diploid) regions are simply the usual AA, AB, or BB. However, a SNP in an amplified region may have ASCN AAAAB; a SNP in a heterozygously deleted region may have ASCN B. PSCN, on the other hand, refers to the contributions to copy number of each of the two parental chromosomes. Within this framework, for example, we may more precisely identify LOH as a region in which the PSCNs are (c,0) for some positive integer c.
Our PLASQ procedure is rooted in a generalized linear model for the behavior of probe intensities, exploiting a novel classification of the SNP array probes that is fundamentally different from the usual PMA, MMA, PMB, MMB classification. An earlier version of the procedure (LaFramboise and others, 2005
)also termed PLASQused a simpler general linear model, and its performance with regard to genotyping and copy number determination was inferior to the version we present here. In the present work, we analyze statistical properties (which were not discussed in our earlier paper) of this updated model, demonstrating the improvements in fit and performance. In light of these improvements, our intent is that the current PLASQ replace the version described in our previous work.
After specifying our model in Section 2, its fitting, via an expectation-maximization (EM) (Dempster and others, 1977
) algorithm that takes advantage of the inherently discrete nature of the quantity being measured, is detailed in Section 3. In Section 4, we apply our approach to a variety of cell types, demonstrating the ability to (a) very accurately genotype over 100 000 SNPs in normal samples as either AA, AB, or BB; (b) determine copy number, genome wide, at a very high resolution in cancer samples; (c) reveal the contributions of each of the two parental chromosomes to the amplifications and deletions in these aberrant samples; and (d) infer ASCNs at each of the SNPs on the array. We provide statistical justification for the suitability of our model, and our in silico results are validated using a variety of independent in silico and in vitro methods. We conclude in Section 5 with a discussion of the relevance of our results in cancer genomics research.
| 2. ARRAY DESIGN AND MODEL SPECIFICATION |
|---|
|
|
|---|
Studies employing SNP arrays have focused almost exclusively on the PMA, MMA, PMB, and MMB probe classification. However, another classification is relevant. A PM/MM pair may either be centered precisely so that the middle (13th) base of the PM probe is complementary to the SNP site or be offset (by between one and four bases in either direction). The three dichotomizations of the probe set, therefore, leave us with eight probe types: PM
, MM
, PM
, MM
, PM
, MM
, PM
, and MM
, where the superscript denotes centered (c) or offset (o). Our method focuses on the nucleotide-level affinities between each probe and the two target DNA sequences (corresponding to the two SNP alleles). We can count the number of bases at which each probe mismatches each of the target alleles; indeed, this information is encoded in the .CDF (Chip Definition File) provided by the manufacturer. Each probe mismatches each of the two target alleles by either 0, 1, or 2 bases, and the eight probe classes completely determine these counts. See Supplementary Figure 1, available at Biostatistics online, for a specific example of a probe set.
Our model is motivated by the following set of principles. First, the relationship between the target quantity and probe intensity is approximately linear (with an additive term) on a loglog scale, as demonstrated in studies involving known quantities of RNA (Irizarry and others, 2003
) and genomic DNA (Huang and others, 2004
). Second, the authors in Irizarry and others (2003)
justified, via spike-in studies, a multiplicative stochastic error term on the standard (non-log) scale, as evidenced by larger probe variance at higher intensity levels. Third, within a probe set, each probe is complementary to a subsegment of either the forward or the reverse strand in the target DNA fragment. This "forward" or "reverse" distinction is referred to as the probe's orientation, and empirical evidence indicates differences in hybridization intensities between the orientations. Finally, it is reasonable that, aside from orientation, the main factor determining probe/target hybridization affinity within the same probe set would be the number of bases that the probe mismatches the target. More specifically, we reasonably assume that the hybridization affinity of a target for a probe is a decreasing function of the number of bases at which the probe is not complementary to the target. The exception to this assumption arises in differences in the hybridization affinities of the A and B target fragments. Since the A and B difference represents the only potential significant difference in guanine/cytosine content between the probes in a set, we have accommodated target-allele-specific differences in hybridization affinity in our model.
In an array with J probe sets/SNPs (so J > 100000 in our case), let C
and C
denote the number of copies of the alleles A and B, respectively, in the ith sample at the jth SNP site (j = 1,
,J). The model we propose for the normalized, log-transformed intensity Y(ijk) of probe k in the probe set for SNP j in an array interrogating sample i is
|
| (2.1) |
Here Ojk = F (forward) or R (reverse) denotes the orientation of the probe, Ajk, Bjk = 0, 1, or 2 indicate the number of bases at which the probe mismatches the A and B allele targets, respectively, and 
,
represent the unwanted background contributions of optical noise and nonspecific binding to the forward and reverse orientation probe intensities, respectively. One may think of these last terms as representing the signal from a probe whose target is completely absent. The independent, normally distributed, mean zero error terms e(ijk) are meant to capture additional sources of variation. They are assumed to have standard deviation 
when Ojk = F and 
when Ojk = R. The distributions of these error terms are the same for any fixed values of j and Ojk, but are allowed to vary for different probe sets and different orientations within the same probe sets. Finally, we have found in practice that hybridization intensities between probes and targets that mismatch at two bases are indistinguishable from background noise, and thus we fix
|
|
Thus, the parameters of interest for each probe set/SNP j are 
, 
, 
, 
, 
, 
, ß
, ß
, ß
, and ß
.
| 3. MODEL FITTING AND COPY NUMBER INFERENCE |
|---|
|
|
|---|
Equation (2.1) models mean log-transformed probe intensity as a log-linear function of copy number. There are some complications to fitting the model. First, the log transformation on the right side of the equation precludes the use of ordinary least squares. However, the model is a generalized linear model (McCullagh and Nedler, 1989
and C
a priori. We do know that, in a normal sample, each SNP is in one of three statesAA, AB, or BB. This implies three different covariate combinations, and therefore, an EM algorithm is a natural approach to fitting the model to diploid data. The first step is to quantile normalize (Bolstad and others, 2003
For SNP arrays, normal samples provide a convenient basis for model fitting and testing, as the pairwise ASCN sums C
+ C
are known to be two. We exploit this fact to find estimates
,
,
,
,
,
,
,
,
, and
of 
, 
, 
, 
, 
, 
, ß
, ß
, ß
, and ß
, respectively. Model (2.1) may be fit to (normalized) probe intensities using an EM algorithm, and the genotyping inferences automatically result. Details of this procedure are given in the Appendix.
Supplementary Figure 2, available at Biostatistics online, gives a diagrammatic overview of the procedure to obtain ASCNs and PSCNs from (normalized) probe-level data from tumor sample i0. We assume that parameters have been estimated as above from a battery of normal samples, and we replace the parameters in the model with these estimates at each SNP. Our model becomes
|
| (3.1) |
We may now obtain raw ASCN inferences (C
,C
) via IRLS as applied to Model (3.1). In effect, we are treating the covariates C
and C
as parameters to be estimated. The ASCN inferences at this stage are "raw" because we have not yet taken advantage of the fact that total copy number is locally constant; that is, chromosomal copy number aberrations occur in discrete segments, typically spanning many consecutive SNP sites. We may therefore apply a smoothing or break point procedure to the pairwise sums of the raw ASCNs, mapped to their genomic locations. For our study, we have employed the GLAD algorithm (Hupé and others, 2004
) because of its sensitivity, specificity, and computational efficiency. GLAD attempts to detect chromosomal segments with constant total copy number using an adapted weights smoothing (Polzehl and Spokoiny, 2000
) break point-detection algorithm. Our inferred total copy number T(i0s) for a GLAD-determined segment s is the rounded median of the pairwise raw ASCN sums in the segment.
Next, we infer PSCNs in each segment s from inferred total copy number T(i0s) and raw ASCNs as follows. First, if the inferred total copy number is 0 or 1, then our PSCN calls are obviously (major chromosome, minor chromosome) = (0,0) or (1,0), respectively. If not, we next decide whether LOH has occurred. When a matched normal sample is available, this is easily determined by querying for homozygosity SNPs that are heterozygous in the matched normal. In the absence of a matched normal sample, we make use of the fact that the average heterozygosity rate for SNPs on the array is approximately 30% (Affymetrix, 2004
). Therefore, we may think of the number of homozygous SNPs in a segment with m SNPs as an approximate Binomial(m,0.7) variable. Making a Bonferroni correction for the number S of segments, we call LOH for segments in which the number of homozygous SNPs is greater than the 1 0.05/S quantile in the Binomial(m,0.7) distribution (here a SNP j is assumed to be homozygous when the rounded minimum(C
,C
) is less than one). If LOH is deemed to have occurred, our PSCNs for the segment are (T(i0s),0). Otherwise, we ignore homozygous SNPs, as they are noninformative with regard to PSCN, and our PSCN call is (T(i0s)
,
), where
![]() |
rounded to the nearest integer. Both sums in this expression are taken over all heterozygous SNPs j in segment s.
Finally, we determine ASCNs from PSCNs and raw ASCNs at each SNP j. If the SNP is heterozygous, then the ASCNs are the same as the PSCNs, with the copy number of the major SNP allele (as determined by raw ASCNs) identical to that of the major parental chromosome segment. If the SNP is homozygous, the allele with the higher raw ASCN is assigned ASCN T(i0s), and the other 0.
| 4. APPLICATION TO NORMAL AND CANCER DATA |
|---|
|
|
|---|
The SNP array data are encoded in a pair of .cel files (one for each chip type) for each sample. We employed data from 21 normal samples in our study. These data include 24 .cel files from Zhao and others (2005)
that corresponded to all the normal samples in that study, as well as 18 .cel files (corresponding to samples NA6985, NA6991, NA6993, NA12707, NA12716, NA12717, NA12801, NA12812, and NA12813) that were generated as part of the International HapMap Project (http://www.hapmap.org). The latter samples, which we refer to as the HapMap data set, are available for download at the Affymetrix web site (http://www.affymetrix.com). For cancer samples, we used .cel files from 12 lung tumors and cell lines (see Tables 2 and 3) that were generated in Zhao and others (2005)
.
|
|
To validate the assumptions of our model, we first fit it to the HapMap data set. We examined the residuals from the model to check the assumption of normally distributed error terms. Note that, although the error terms are assumed to be identically distributed within same-orientation subsets of a probe set, their variances are allowed to differ across probe sets and orientations. We, therefore, constructed a normal quantilequantile (qq) plot (Figure 1a) of the standardized residuals, with the understanding that the model implies a standard normal distribution for these across all probe sets. For clarity, we randomly selected 10 000 such residuals to plot. To demonstrate the necessity of the loglog transformation, we also plotted the standardized residuals resulting from fitting the linear model
|
| (4.1) |
where
now denotes the normalized, but untransformed, probe intensity. We note that the model in LaFramboise and others (2005)
was similar to (4.1), but even simplerit did not allow for different coefficients for the CA and CB terms, and thus forced 
= ß
for each j = 1,
,J and k = 1,
,40. We fit (4.1) using the EM algorithm as with Model (2.1), except that the M-step involves ordinary least squares rather than IRLS. The resulting qq plot (Figure 1b) clearly shows a severe departure from normality. This demonstrates the improvement of our new generalized linear model-based approach over the previous work.
|
As mentioned above, probes on SNP arrays have traditionally been classified in PM/MM or A allele/B allele terms. The advantage of our approachclassifying probes by base MM countcan be seen in Figure 2. The first scatterplot shows the mean MM
intensity versus the mean MM
intensity across 10 782 HapMap sample SNPs. Each point represents one orientation (F or R) of one SNP for one sample. The means are taken over all MM
(x-axis) or MM
(y-axis) probe intensities for the given orientation/SNP/sample. Each point is colored according to HapMap genotype. Although the traditional classification treats these two probe types as being equivalent measures, there is clearly a separation of the three genotypes visible in the plot. As expected, the centered probes generally have a greater affinity for the B target than the offset probes, and both types have roughly the same affinity for the A target. This effect is even more dramatic when the background
term is subtracted, as shown in Figure 2b. These figures show that the practice of ignoring MM probes, as some approaches do, in fact discards relevant information. Moreover, if we construct a similar plot for MM
versus MM
(Figure 2c), no separation of the genotypes is discernible, even though the traditional classification would treat these two intensities as being measures of separate quantities.
|
Many of the SNPs in the HapMap data set have been independently genotyped, using a variety of genotyping platforms. Of these, 1198 were genotyped by at least two different HapMap centers. Calls that were concurrent among at least two different centers may be considered as being very close to ground truth, and we employed these as the "gold standard" data set against which we compared our PLASQ method. As shown in Table 1, our method performs quite well. The rate of agreement between PLASQ and the HapMap concordant calls is similar to the HapMap Project's concordance rate, and our No Call rate is considerably lower. We should note that 16 sample SNPs in the Table for which PLASQ called AA and the HapMap effort called BB are all from the same two SNP loci. Close inspection of the raw array data from these SNPs reveals a strong AA signal (data not shown). Thus, we suspect that this is simply a case of an error being made by Affymetrix when the "A" and "B" labels were assigned to the nucleotide residues. In any case, the results in the table clearly indicate that the model captures the relevant aspects of the data, and underscore the validity of our EM fitting approach.
|
We applied our PLASQ method to SNP array data from 12 lung cancer samples, using the 12 diploid samples from the same study as normal references on which to train the model. Figure 3 shows an example of a genome-wide view of PSCN for one of these samples, the cell line H2087. Note that LOH is clearly identifiable as a region comprised of only the major chromosome (all green). For example, all of one copy of chromosome 13 appears to be lost, though the total copy number remains at two. This phenomenon is referred to as copy-neutral LOH.
|
To assess the accuracy of our method, we compared our results to polymerase chain reaction (PCR) -based copy number estimates. A total of 16 deletions and 10 amplifications in our 12 lung cancer samples were previously PCR-measured in Zhao et al. (2005)
| 5. DISCUSSION |
|---|
|
|
|---|
Human cancer is driven by the acquisition of genomic changes in the cell. One extremely important class of such changes is amplifications and deletionsdeviations from the normal two copies of each chromosome in a cell. Regions of amplification may harbor cancer-causing oncogenes, while deletions often contain tumor suppressor genes. The localization of such alterations is, therefore, a central goal in cancer research. We have presented a procedure, PLASQ, for determining the copy numbers of SNP alleles and parental chromosomes in cancer cells from SNP array data. Our SNP allele copy number result is particularly of interest in LOH determination, since existing methods often mistakenly call LOH, where in fact allelic balance (due to amplification of one allele) has occurred, resulting in apparent (though false) homozygosity. We avoid these false LOH calls by taking into account the contribution to copy number from both alleles. Two recent papers (Ishikawa and others, 2005
Finally, we should mention two potential weaknesses of our approach. First, we are assuming a diploid copy number two in autosomal chromosomes of normal cells. Recent studies (Iafrate and others, 2004
; Sebat and others, 2004
) have uncovered copy number polymorphisms in normal cells. Given that our approach (and all others that we are aware of) compares signal intensities to normal references, this could in theory present a problem. In practice, however, we feel that this problem is mitigated by the fact that we use a sizable collection of normal reference samples, and that polymorphic genomic regions common to most normal reference samples are likely rare, small in length, or both. A second concern is our practice of fitting the model to normal samples and then applying the result to data from tumors. We are implicitly assuming that the model parameters are appropriate outside of the range of covariates with which they were estimated. Although this is indeed a concern (and may be partially to blame for copy number underestimation of high-level amplifications), we would argue that the results shown in Tables 2 and 3 demonstrate the value of the model, even for aberrant copy numbers.
All procedures described herein are available in an R (R Development Core Team, 2006) package, freely downloadable at http://genome.dfci.harvard.edu/
tlaframb/PLASQ.
| APPENDIX A |
|---|
|
|
|---|
We describe in detail the EM approach to fitting Model (2.1) to probe-level SNP array data from normal samples.
Fix an arbitrarily chosen SNP j0. Suppose that we have N normal samples. For i = 1,
,N and l = 0,1,2, let Zij0l denote the (unobserved) indicator variable I(C
= l). Model (2.1) may be rewritten, using this notation, as
|
| (A.1) |
We think of the Zij0l as missing data, whose values provide the genotypes of our samples. Let
(x|µ,
) denote the density function of the normal distribution with mean µ and variance
2, and let Y(ij0) denote the data vector (Y(ij0k))k = 1,...,40 from probe set j0 for sample i. For l = 0,1,2, let pj0l denote the (unknown) proportion of samples for which C
is l at the SNP j0. We consider the pj0l to be part of the set
of parameters (which also includes the
, ß,
, and
model parameters) to be estimated during the M-step. It follows from (A.1) that the density function for Y(ij0) is
![]() |
where
![]() |
We refer to the vector (Y(ij0),Zij0) = (Y(ij0k),Zij0l)k = 1,
,40;l = 0,1,2 as the complete data vector. The complete data density is
![]() | (A.2) |
where
![]() |
and z = (z1,z2,z3).
We have found our procedure to be somewhat sensitive to starting values for the missing data. Therefore, rather than randomly assigning these values as a first step, we use a reasonable yet crude t-test approach to provide initial values z
of the expectations of the Zij0l. For each i, a one-sided t-test is performed for the null hypothesis that the mean of the (normalized, log-transformed) PMA probe intensities is larger than that of the PMB probes. Let P denote the resulting P-value. If P
0.5, we assign initial probabilities
, we assign
For the mth M-step, we consider the complete data log-likelihood, assuming the current expectations z
, z
, and z
for the values of the missing data along with the observed data Y(ij0) = y(ij0). By the factorization in expression (A.2), this log-likelihood can be written as
![]() |
On the right side of this equation, the pj0l appear only in the first term, while the
, ß,
, and
parameters appear only in the second term. Thus, we may maximize each term separately. It is easy to see that the first expression, subject to the constraint pj00 + pj01 + pj02 = 1, is maximized at the values
![]() |
The maximum likelihood estimates for the model parameters may be computed using IRLS, as applied to Model (A.1) with the Zij0l replaced by z
and the Y(ij0) by y(ij0).
We find the expected values z
of the Zij0l based on the mth M-step parameter estimates
. Given that the value of Zij0l is either 0 or 1, we have
|
|
By Bayes Theorem and (A.2), we have
|
|
where the density functions use
for their parameter values.
The E- and M-steps are alternated repeatedly until the changes in the estimates are very small, say after m0 steps. In this way, we obtain two important results. First, model parameter estimates are produced, which can be used in (2.1) to fit to SNP data from any sample, producing raw ASCN estimates at the SNP as demonstrated in Section 3.2. Second, the z
may be used to infer genotypes for the normal samples. If a call is desired for sample i, a simple rule would be
![]() |
This scheme automatically provides a way to measure uncertainty in the genotype calls. The researcher may set a threshold for the value of maxl(z
), below which the call is considered uncertain and a "No Call" determination is given. In practice, we have found 99% to be a suitable such threshold.
| ACKNOWLEDGMENTS |
|---|
We wish to acknowledge the contributions of the referees and editors, whose insightful comments resulted in a much improved paper. We also thank Matthew Meyerson for support and guidance during the early development of this work. David Harrington was supported by National Institute of Allergy and Infectious Diseases grant 2R01 AI052817. Conflict of Interest: None declared.
| REFERENCES |
|---|
|
|
|---|
-
Affymetrix. (2004) GeneChip Human Mapping 100K Set Data Sheet(Affymetrix Inc, Santa Clara, CA).
Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, Wei W, Stratton MR, others MR. (2004) High- resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Research 14:28795.
Bolstad BM, Irizarry RA, Astrand M, Speed TP. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:18593.
Dempster AP, Laird NM, Rubin DB. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39:138.
Huang J, Wei W, Zhang J, Liu G, Bignell GR, Stratton MR, Futreal PA, Wooster R, Jones KW, Shapero MH. (2004) Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Human Genomics 1:28799.[Medline]
Hupé P, Stransky N, Thiery JP, Radvanyi F, Barillot E. (2004) Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 20:341322.
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. (2004) Detection of large-scale variation in the human genome. Nature Genetics 36:94951.[CrossRef][Web of Science][Medline]
Irizarry RA, Hobbs B, Collin F, Beaxer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:24964.[Abstract]
Ishikawa S, Komura D, Tsuji S, Nishimura K, Yamamoto S, Panda B, Huang J, Fukayama M, Jones KW, Aburatani H. (2005) Allelic dosage analysis with genotyping microarrays. Biochemical and Biophysical Research Communications 333:130914.[CrossRef][Web of Science][Medline]
LaFramboise TL, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, Meyerson M. (2005) Allele-specific amplification in cancer revealed by SNP array analysis. PLoS Computational Biology 1, e65.
Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C. (2004) dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics 20:123340.
Lindblad-Toh K, Tannenbaum DM, Daly MJ, Winchester E, Lui WO, Villapakkam A, Stanton SE, Larsson C, Hudson TJ, Johnson BE, others BE. (2000) Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. Nature Biotechnology 18:10015.[CrossRef][Web of Science][Medline]
McCullagh P and Nedler JA. (1989) Generalized Linear Models 2nd edition (CRC Press, Boca Raton, FL).
Naef F, Socci ND, Magnaso M. (2003) A study of accuracy and precision in oligonucleotide arrays: extracting more signal at large concentrations. Bioinformatics 19:17884.
Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M, Chiba S, Bailey DK, Kennedy GC, others GC. (2005) A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Research 65:60719.
Polzehl J and Spokoiny S. (2000) Adaptive weights smoothing with applications to image restoration. Journal of the Royal Statistical Society, Series B 62:33554.[CrossRef]
R DEVELOPMENT CORE TEAM. (2006) R: a language and environment for statistical computing. (R Foundation for Statistical Computing, Vienna, Austria) http://www.R-project.org.
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, others M. (2004) Large-scale copy number polymorphism in the human genome. Science 305:5258.
Zhao X, Weir BA, LaFramboise T, Lin M, Beroukhim R, Garraway L, Beheshti J, Lee JC, Naoki K, Richards WG, others WG. (2005) Homozygous deletions and chromosome amplifications in human lung carcinomas revealed by single nucleotide polymorphism array analysis. Cancer Research 65:556170.
Received January 26, 2006; revised May 11, 2006; accepted for publication June 16, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. LaFramboise Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances Nucleic Acids Res., July 1, 2009; (2009) gkp552v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. LaFramboise, W. Winckler, and R. K. Thomas A flexible rank-based framework for detecting copy number aberrations from array data Bioinformatics, March 15, 2009; 25(6): 722 - 728. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Giannoulatou, C. Yau, S. Colella, J. Ragoussis, and C. C. Holmes GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population Bioinformatics, October 1, 2008; 24(19): 2209 - 2214. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||












