Skip Navigation


Biostatistics Advance Access originally published online on May 18, 2007
Biostatistics 2008 9(1):18-29; doi:10.1093/biostatistics/kxm013
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
9/1/18    most recent
kxm013v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Tibshirani, R.
Right arrow Articles by Wang, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tibshirani, R.
Right arrow Articles by Wang, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Spatial smoothing and hot spot detection for CGH data using the fused lasso

Robert Tibshirani*

Deptartments of Health, Research & Policy, and Statistics, Stanford University Stanford, CA 94305, USA tibs{at}stat.stanford.edu

Pei Wang

Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, M2-B500, PO Box 19024, Seattle, WA 98109, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 
We apply the "fused lasso" regression method of (TSRZ2004) to the problem of "hot- spot detection", in particular, detection of regions of gain or loss in comparative genomic hybridization (CGH) data. The fused lasso criterion leads to a convex optimization problem, and we provide a fast algorithm for its solution. Estimates of false-discovery rate are also provided. Our studies show that the new method generally outperforms competing methods for calling gains and losses in CGH data.

Keywords: DNA copy number; Signal detection


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 
In this paper, we apply the fused lasso method (Tibshirani and others, 2004) to the "hot-spot" detection problem in comparative genomic hybridization (CGH) data. The CGH signal is approximated by a piecewise function that has relatively sparse areas with nonzero values. Hence, the method is useful for determining which areas of the signal are likely to be nonzero.

CGH is a technique for measuring DNA copy numbers of selected genes on the genome. In cells with cancer, mutations can cause a gene to be either deleted from the chromosome or amplified, that is, there are extra DNA copies of the gene. The CGH array experiments return the log2 ratio between the number of DNA copies of the gene in the tumor cells and the number of DNA copies in the reference cells. A value greater than zero indicates a possible gain in DNA copies of that gene, while a value less than zero suggests possible losses.

The results of a CGH experiment are often interpreted by a biologist, but this is time consuming and not necessarily very accurate. In recent years, a number of algorithms have been developed for automatic interpretation, including Fridlyand and others (2004)Go, Olshen & Venkatraman (2004)Go, Myers and others (2004)Go, Wang and others (2005)Go, Lipson and others (2005)Go, and Picard and others (2005)Go; a comprehensive comparison is given by Lai and others (2005)Go. Approaches include successive top–down splitting, bottom–up agglomeration along the chromosome, and hidden Markov models.

The left panel of Figure 1 shows an example.


Figure 1
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Fused lasso applied to some GBM data. The data are shown in the left panel, and the solid line in the right panel represents the inferred copy number Formula from the fused lasso. The grey line is for y = 0. 1

 
The data represent CGH measurements from 2 glioblastoma multiforme (GBM) tumors (see Section 4.2 for details).

In this paper, we apply the fused lasso method of Tibshirani and others (2004) to spatial smoothing and the CGH detection problem. The solid line in the right panel of Figure 1 shows the result of the fused lasso method applied to these data. The method has successfully detected the narrow regions of gain and the wide regions of loss.


    2. THE FUSED LASSO AND THE PROPOSED METHOD FOR HOT-SPOT DETECTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 
The "fused lasso" (Tibshirani and others, 2004) is a generalization of the lasso (Tibshirani, 1996Go) and is defined by

Formula (2.1)

The fused lasso is developed for the situations when xi1,xi2...,xip have some kind of natural ordering. The additional fused constraint encourages the flatness of the coefficient profiles βj as a function of j.

Here, we apply the fused lasso in the special case in which {xij} is the identity matrix. Hence, we are smoothing the sequence y1,y2,...,yn along the 1-dimensional index i.

Denote the log2 ratio measurement of a chromosome (or chromosome arm) as Y = {yi}Formula; we are interested in finding coefficients Formula satisfying

Formula (2.2)

and Formula is the inferred DNA copy number for gene j. Here, s1 controls the overall DNA copy number alteration amount of the target chromosome (or chromosome arm), while s2 controls the frequency of the alterations in the target region.

We investigated 2 different methods for estimating s1,s2 and calling gains and losses.

2.1 Method 1

The first method estimates s1 and s2 from pre-smoothed version of the data, applies the fused lasso with these values, and then thresholds the estimate to determine regions of gains or losses.

Since the overall DNA copy number alteration amount is contributed mainly by the large gain/loss regions, we derive the value of s1 by using a heavily smoothed version of Y:

1) Apply "lowess" to Y with fraction parameter f = max(1/50,50/p), where p is the length of Y. (The lowess window will be at least 50.)
2) Denote the lowess result by Formula. Define Formula.

On the other hand, s2 controls the frequency of the alterations in the target region. Thus, we use moderately smoothed Y to infer s2:

1) Apply lowess to Y with fraction parameter f = min(1/20,10/p). (The lowess window will be at most 10.)
2) Denote the result as Formula and calculate the median absolute deviation of {di}:

Formula

where µd = median({di}i).

3) It is reasonable to assume that di with absolute values greater than 4{delta} corresponds to copy number alteration break points on the genome. Thus, we define Formula.

After we obtain Formula and Formula, we apply the fused lasso to estimate the underlying signals Formula Finally, we threshold Formula by a value {theta} to obtain the final regions of gain or loss. The choice of {theta} is discussed in Section 2.3, where it is used to control the false-discovery rate (FDR).

2.2 Method 2

Here, we vary both s1 and s2 in a systematic way. Define S1(X) = {sum}i|xi|. We calculate s1 = S1(lowess(Y,f = w/p)) and s2 = 2s1/w, where w represents the average length of alteration regions. The value of w is varied and then the fused lasso is computed for the resulting values of s1 and s2. Figure 2 shows an example with some simulated data from a normal and tumor array. As w increases, the estimate does a better job of isolating the gain regions in the tumor sample. However, there is still a large region of low but nonzero values that would require further thresholding before gains and losses can be called. Hence, this approach seems too complicated, and we therefore settle on method 1.


Figure 2
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Simulated data: the effect of parameter w (the average length of alterations) on the fused lasso solution. The panels of the left column represent a chromosome with no copy number alteration; the panels of the right column represent a chromosome with an amplification region of genes 40–60. The solid lines illustrate the estimated copy numbers of fused lasso; a horizontal line is drawn at y = 0.

 
2.3 Estimation of FDR

For 1 array, when we are trying to decide whether the ith gene/clone has significant DNA copy number alteration, we are actually doing a hypothesis testing with


Formula

Here, rises the issue of multiple hypothesis testing, for, in each array, tens of thousands of genes/clones need to be considered simultaneously.

Although we do not have independent Hi for each gene in our problem, we can still use

Formula (2.3)

as a rough estimator for (FDR) (Benjamini & Hochberg, 1995Go), (Chu and others, 2002Go), (Storey, 2002Go), (Efron & Tibshirani, 2002). We assume that the denominator is greater than 0 with probability 1 (i.e. the event that no significant genes are selected is rare).

We estimate the FDR in 2 different ways according to the availability of normal reference arrays.

Estimation of FDR when normal reference arrays are available.

Suppose there are M normal reference arrays {Rm}Formula, where Rm = {rFormula}Formula. We first scale the normal reference arrays according to the target tumor array and then use the copy number inferred from the reference arrays to approximate the null distribution of Formula.

1) The contiguous genes/clones with the same Formula are defined as 1 segment. Suppose there are K segments resulting from Formula and denote them as Sk = {ik – 1 + 1,...,ik}. We then calculate the within-segment standard deviation of the target array

Formula

where µk = mean(yi:iisinSk).

2) Normalize each reference array according to Formula:

Formula


3) Estimate s1 and s2 for each Formula and then apply fused lasso on it. Denote the resulting coefficients by Formula .
4) For a given threshold {theta}, FDR of the procedure is estimated as a function of {theta}:

Formula (2.4)

Usually, the data analyst pre-chooses a target FDR and vary {theta} over a range to seek the solution with Formula closest to the target.

Estimation of FDR when normal reference arrays are not available.

In real experiments, there are often situations where appropriate normal reference arrays are not available due to sample/lab limitations. Thus, there is a need to control the FDR in the absence of reference arrays.

We approach this by considering segments as the hypothesis unit first. For 1 segment Sk, the hypothesis of interest can be stated as

Formula

This makes it natural to examine statistics Formula where nk is the size of Sk. To take advantage of the inferred copy number Formula , we further define Formula .

We approximate the null distribution of Formula with standard normal and define pk = P(Z > |Formula |), where Z~N(0,1).

If we assume Formula are independent from each other, then, for a given qisin(0,1), a conservative estimator of the genome-wide FDR in (2.3) is

Formula (2.5)

Here, genes/clones in the segments with pk ≤ q are considered to have copy number alterations.

Again, data analyst can vary q over a range and choose the solution with Formula closest to the preselected value.


    2.4 Computational considerations
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 
The optimization problem for the fused lasso is a quadratic programming problem. Criterion (2.1) leads to a quadratic programming problem. For large n, the problem is difficult to solve and special care must be taken to avoid the use of p2 storage elements. We use the approach for the general fused lasso problem outlined in Tibshirani and others (2004): the 2-phase active set algorithm "sqopt" of Gill and others (1999)Go, which is designed for quadratic programming problems with sparse linear constraints.

Figure 3 shows a log-log plot of the number of seconds required for the computation as a function of the number of clones (genes), on a Linux computer. We simulated 10 data sets of each size, and the figure shows the mean plus or minus standard error; the computation took on average about 30 s for 4000 clones. Beyond n = 1000 clones, the plot is roughly linear with a slope of about 2. This suggests that the computation is of order n2. The estimated computation time for a data set with 100000 clones is about 135 min.


Figure 3
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Number of CPU seconds required for the fused lasso algorithm as a function of the number of clones.

 
While this may be sufficiently fast for practical use, clearly it would be preferable to reduce the computation time for large data sets. We are currently working with Jerome Friedman and Trevor Hastie on specialized algorithms for the fused lasso, and the initial results show considerable promise.


    3. COMPARISON TO OTHER SMOOTHING METHODS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 
General smoothing methods are not typically used for analyzing CGH data because their results can be difficult to interpret (Lai and others, 2005Go). This is illustrated in Figure 5, where 2 popular smoothing methods—lowess (Becker and others, 1988) and penalized smoothing splines (Ruppert and others, 2003Go)—are applied to the example data (see Section 4.2 for details). We used R function lowess (smooth window = 10) and "spm" (default parameters) to compute the results. As we can see from the figure, the 2 smoothing methods do not provide direct calls for copy number gains/losses, and thus require additional thresholding for identifying regions with significant alterations. In addition, the smoothing curves do not catch the piecewise constant shape of copy number changes, which raises additional challenges for controlling the FDR. Moreover, copy number alterations can be both large chromosome segmentation gain/loss and also abrupt local amplification/deletion. Therefore, different degrees of smoothness are needed for different chromosome regions, which adds another layer of complexity to the kernel- and spline-based approaches.

In the fused lasso, the fused term in the loss function can be viewed as the first derivative of the coefficient profiles, and thus the method can also be deemed as a "smoothing" approach. However, the use of L1-norm on the fused penalty term enables the method to capture both the piecewise flatness patterns and the abrupt local jumps at the same time (see Figure 5). In addition, the control on the overall sparsity of the coefficient solution helps to screen away the "cold"-spot regions. These make the fused lasso a more attractive approach for analyzing CGH data.


    4. SOME RESULTS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 

4.1. Simulated data

We apply the proposed method on artificial chromosomes simulated by Lai and others (2005)Go (downloaded from http://www.chip.org/~ppark/Supplements/Bioinformatics05b.html). We consider the most challenging situation where the signal-to-noise ratio is equal to 1. For each of the 4 different aberration widths (5, 10, 20, and 40 probes), there are 100 independently simulated chromosomes with 100 probes in total.

We first compare the inferred copy number Formula by fused lasso with the estimated copy numbers from 3 other popular programs "CGHseg" (Picard and others, 2005Go), "CBS" (Olshen & Venkatraman, 2004Go), and "CLAC" (Wang and others, 2005Go). The receiver operating curves of different methods shown in Figure 4 suggest that fused lasso better captures the true DNA copy number alterations than the other 3 methods, especially when the aberration width is small.


Figure 4
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. TPR = (the number of probes within the aberration width that is above a threshold)/(the total number of probes within the aberration width); FPR = (the number of probes outside the aberration width that is above a threshold)/(the total number of probes outside the aberration width).

 
We then investigate the 2 FDR estimators proposed in Section 2.3. Formula is computed based on 10 simulated reference arrays with no copy number alterations (i.i.d. from N(0,1)). We then control Formula and Formula at the level of 0.01 and calculate the true FDR for the selected solutions. The result is summarized in Table 1. Overall, when we control Formula, 80.75% simulated cases have true FDR less than 0.01; when we control Formula, 84.5% simulated cases have true FDR less than 0.01. Generally, Formula is more conservative than Formula, for the former is derived on the assumption that segments are independent from each other.


View this table:
[in this window]
[in a new window]

 
Table 1 The median, 75% quantile, and 90% quantile of the true FDR values

 
4.2 GBM data

The glioma data from Bredel and others (2005)Go contain samples representing primary GBMs, a particular malignant type of brain tumor. We investigate the performance of various methods on the array CGH profiles of the GBM samples examined in Lai and others (2005)Go. To generate a more challenging situation where both local amplification and large region loss exit in the same chromosome, we paste together the following 2 array regions: (1) chromosome 7 in GBM29 from 40 to 65 Mb and (2) chromosome 13 in GBM31. The performance of different methods on this pseudo chromosome is illustrated in Figure 5. We can see that the proposed method using fused lasso successfully identified both the local amplification and the big chunk of copy number loss.


Figure 5
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Chromosome 7 and chromosome 13 from 2 GBM tumors.

 
4.3 Breast tumor data

In the study conducted by Pollack and others (2002)Go, cDNA microarray CGH was profiled across 6691 mapped human genes in 44 breast tumor samples and 10 breast cancer cell lines. The scanned raw data were downloaded from the Stanford Microarray Database (http://smd.stanford.edu). We pick the breast cancer cell line (MDA157) as an example, which has a large degree of copy number alterations, and apply our proposed method as well as the other 3 methods on it to estimate the underlying copy number changes. From the results shown in Figure 6, we can see that fused lasso successfully recognized varies copy number alterations.


Figure 6
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Array CGH profile of 10 chromosomes of breast cancer cell line MDA157. Panels in the same row are for the same chromosome. The integers at the beginning of each row are chromosome indexes. The dark black line in each panel represents the estimated copy number of a particular method, whose name is shown on the top of each column. The black horizontal line in each panel represents y = 0.

 
CGHseg appears to be very sensitive to outlier measurements and thus will be more suitable for detecting single-gene copy number mutations of high-quality arrays. CLAC is conservative in handling outliers with opposite signs in 1 alteration region and therefore tends to break large alteration segments into small blocks. CBS provides clean solutions for segmentations but has the limitation to detect break points whose alteration signals are weak (e.g. chromosome 7 and 15 of the selected cell line).


    5. DISCUSSION AND FUTURE WORK
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 
The fused lasso can be generalized to other analysis besides the calling of gains and losses in CGH data. For example, biologists have great interest in understanding the interactions between copy number alterations and mRNA expression levels. To study this, a commonly used method is to identify gene pairs, of which one gene's copy number and the other gene's expression level are significantly correlated. However, genome-wide screening for such pairs is quite challenging: (1) the large dimensionality of the problem raises the requirement of controlling the sparsity in the solutions; (2) the strong spacial correlations of gene copy number changes need to be taken into account. The fused lasso can be used again to tackle these challenges.

Denote the expression measurements of a target gene as Y = (y1,y2,...,yJ), where J represents the total number of samples. Denote the CGH measurement of the ith gene across the J samples as Xi = (xFormula,...,xFormula). Suppose Y and {Xi} have all been normalized to mean 0 and variance 1. cor(Y,Xi) is then exactly the least-squares coefficient of the linear model Y~βXi. Thus, we can infer the correlation coefficients by solving

Formula (5.1)

where a nonzero βj suggests that Y and Xi are significantly correlated.

In addition, for CGH data, our paper focuses on the calling of gains and losses from a single array. Typically, the researcher collects data on a few dozen arrays, often divided into normal and diseased groups. Ideally, one would like to carry out a joint analysis of the data set, finding regions of gains and losses that occur commonly among the patients and also ones that occur differentially in the normal and diseased groups. One interesting proposal along these lines is the STAC procedure of Diskin and others (2006)Go. This method projects the calls of gains and losses for individual arrays along chromosome, measuring the frequency and "footprint" of calls in each local region. The STAC procedure could be applied to the gains and losses called by the fused lasso. However, a more integrated approach might be possible. For example, given data on arrays j = 1,2,...,J, one could first apply the fused lasso to the data from each array. Suppose that we parse the resulting coefficient vectors into a collection of nonzero basis functions Formula, k = 1,2,...,K. Each function is nonzero over a single interval of the real line. Then, we can do some kind of joint regression of the data from all arrays on this collection, allowing the arrays to share the basis functions as efficiently as possible. This would reveal basis functions (areas of the chromosome) that are common among many or all arrays and also those that are unique to only a few. This is a topic for further study.

An R language package for the fused lasso will be freely available at http://www.stat.stanford.edu/~tibs/cghFLasso.


    ACKNOWLEDGMENTS
 
We would like to thank Michael Saunders for his expert advice on computational issues and would like to thank the referees and editors for helpful comments that led to improvements in this manuscript. Tibshirani was partially supported by National Science Foundation Grant DMS-9971405 and National Institutes of Health Contract N01-HV-28183. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE FUSED LASSO...
 2.4 Computational considerations
 3. COMPARISON TO OTHER...
 4. SOME RESULTS
 5. DISCUSSION AND FUTURE...
 REFERENCES
 

    Becker RA, Chambers JM, Wilks AR. The New S Language (1988) Pacific Grove, CA: Wadsworth Brooks Cole.

    Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (1995) 85:289–300.

    Bredel M, Bredel C, Juric D, Harsh GR, Vogel H, Recht LD, Sikic BI. High-resolution genome-wide mapping of genetic alterations in human glial brain tumors. Cancer Research (2005) 65:4088–4096.[Abstract/Free Full Text]

    Chu G, Narasimhan B, Tibshirani R, Tusher V. Significance Analysis of Microarrays (SAM) Software. (2002) http://www-stat.stanford.edu/~tibs/SAM/. Accessed July 16, 2003.

    Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ Jr, Weber BL, Maris JM, Grant GR. Stac: a method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. Genome Research (2006) 16:1149–1158.[Abstract/Free Full Text]

    Efron B, Tibshirani R. Microarrays, empirical Bayes methods, and false discovery rates. Genetic Epidemiology (2002) 1:70–86.

    Fridlyand J, Snijders AM, Pinkel D, Albertson DG, Jain AN. Hidden Markov models approach to the analysis of array CGH data. Journal of Multivariate Analysis (2004) 90:132–153.[CrossRef][Web of Science]

    Gill P, Murray W, Saunders M. Users guide for SQOPT 5.3: a Fortran package for large-scale linear and quadratic programming. In: Technical Report (1999) Palo Alto, CA: Stanford University.

    Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Biostatistics (2005) 21:3763–3770.

    Lipson D, Aumann Y, Ben-Dor A, Linial N, Yakhini Z. Efficient calculation of interval scores for DNA copy number data analysis. Journal of Computational Biology (2005) 13:215–228.[CrossRef][Web of Science]

    Myers CL, Dunham MJ, Kung SY, Troyanskaya OG. Accurate detection of aneuploidies in array CGH and gene expression microarray data. Bioinformatics (2004) 20:3533–3543.[Abstract/Free Full Text]

    Olshen A, Venkatraman E. A faster circular binary segmentation algorithm for the analysis of array CGH data. Biostatistics (2004) 5:557–572.[Abstract]

    Picard F, Robin S, Lavielle M, Vaisse C, Daudin J. A statistical approach for array CGH data analysis. BMC Bioinformatics (2005) 11:6–27.

    Pollack JR, Sórlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Bórresen-Dale A-L, Brown PO. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proceedings of the National Academy of Sciences USA (2002) 99:12963–12968.[Abstract/Free Full Text]

    Ruppert D, Wand MP, Carroll R. Semiparametric Regression (2003) New York: Cambridge University Press.

    Storey JD. A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B (2002) 64:479–498.[CrossRef]

    Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (1996) 58:267–288.

    Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society, Series B (2005) 67:91–108.[CrossRef]

    Wang P, Kim Y, Pollack J, Narasimhan B, Tibshirani R. A method for calling gains and losses in array CGH data. Biostatistics (2005) 6:45–58.[Abstract]

    Received November 28, 2006; revised March 15, 2007; accepted for publication March 21, 2007.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    BioinformaticsHome page
    K. Salari, R. Tibshirani, and J. R. Pollack
    DR-Integrator: a new analytic tool for integrating DNA copy number and gene expression data
    Bioinformatics, February 1, 2010; 26(3): 414 - 416.
    [Abstract] [Full Text] [PDF]


    Home page
    GeneticsHome page
    B. Daines, H. Wang, Y. Li, Y. Han, R. Gibbs, and R. Chen
    High-Throughput Multiplex Sequencing to Discover Copy Number Variants in Drosophila
    Genetics, August 1, 2009; 182(4): 935 - 941.
    [Abstract] [Full Text] [PDF]


    Home page
    BiostatisticsHome page
    D. M. Witten, R. Tibshirani, and T. Hastie
    A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis
    Biostat., July 1, 2009; 10(3): 515 - 534.
    [Abstract] [Full Text] [PDF]


    Home page
    BioinformaticsHome page
    E. Budinska, E. Gelnarova, and M. G. Schimek
    MSMAD: a computationally efficient method for the analysis of noisy array CGH data
    Bioinformatics, March 15, 2009; 25(6): 703 - 713.
    [Abstract] [Full Text] [PDF]


    Home page
    BioinformaticsHome page
    E. Ben-Yaacov and Y. C. Eldar
    A fast and flexible method for the segmentation of aCGH data
    Bioinformatics, August 15, 2008; 24(16): i139 - i145.
    [Abstract] [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    9/1/18    most recent
    kxm013v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Tibshirani, R.
    Right arrow Articles by Wang, P.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Tibshirani, R.
    Right arrow Articles by Wang, P.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?