Biostatistics Advance Access originally published online on June 13, 2006
Biostatistics 2007 8(2):265-284; doi:10.1093/biostatistics/kxl007
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nonparametric pathway-based regression models for analysis of genomic data
Genomics and Computational Biology Graduate Group, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
Genomics and Computational Biology Graduate Group, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA and Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104-6021, USA hli{at}cceb.upenn.edu
* To whom correspondence should be addressed.
High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression (NPR) analysis to efficiently integrate genomic data and metadata. Such NPR models consider multiple pathways simultaneously and allow complex interactions among genes within the pathways and can be applied to identify pathways and genes that are related to variations of the phenotypes. These methods also provide an alternative to mediating the problem of a large number of potential interactions by limiting analysis to biologically plausible interactions between genes in related pathways. Our simulation studies indicate that the proposed boosting procedure can indeed identify relevant pathways. Application to a gene expression data set on breast cancer distant metastasis identified that Wnt, apoptosis, and cell cycle-regulated pathways are more likely related to the risk of distant metastasis among lymph-node-negative breast cancer patients. Results from analysis of other two breast cancer gene expression data sets indicate that the pathways of Metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer relapse and survival. We also observed that by incorporating the pathway information, we achieved better prediction for cancer recurrence.
Keywords: Additive models; Gene set enrichment analysis; Gradient descent boosting; Microarray; SNPs; Tree
Received February 9, 2006; revised April 19, 2006; revised June 9, 2006; accepted for publication June 12, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Y. Luan and H. Li Group additive regression models for genomic data analysis Biostat., January 1, 2008; 9(1): 100 - 113. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Tai and W. Pan Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data Bioinformatics, December 1, 2007; 23(23): 3170 - 3177. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Tai and W. Pan Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms Bioinformatics, July 15, 2007; 23(14): 1775 - 1782. [Abstract] [Full Text] [PDF] |
||||

