Skip Navigation


Biostatistics Advance Access originally published online on April 17, 2009
Biostatistics 2009 10(3):515-534; doi:10.1093/biostatistics/kxp008
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
10/3/515    most recent
kxp008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Witten, D. M.
Right arrow Articles by Hastie, T.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Witten, D. M.
Right arrow Articles by Hastie, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Daniela M. Witten*

Department of Statistics, Stanford University, Stanford, CA 94305, USA
dwitten{at}stanford.edu

Robert Tibshirani

Department of Health Research & Policy and Department of Statistics, Stanford University, Stanford, CA 94305, USA

Trevor Hastie

Department of Statistics, Stanford University, Stanford, CA 94305, USA

* To whom correspondence should be addressed.

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as Formula , where dk, uk, and vk minimize the squared Frobenius norm of XFormula , subject to penalties on uk and vk. This results in a regularized version of the singular value decomposition. Of particular interest is the use of L1-penalties on uk and vk, which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L1-penalty on vk but not on uk, a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

Keywords: Canonical correlation analysis; DNA copy number; Integrative genomic analysis; L1; Matrix decomposition; Principal component analysis; Sparse principal component analysis; SVD

Received July 28, 2008; revised December 23, 2008; revised February 4, 2009; accepted for publication February 24, 2009.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. Waaijenborg and A. H. Zwinderman
Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis
Bioinformatics, November 1, 2009; 25(21): 2764 - 2771.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.