Biostatistics Advance Access originally published online on May 11, 2006
Biostatistics 2007 8(2):212-227; doi:10.1093/biostatistics/kxl002
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Averaged gene expressions for regression
Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA meeyoung{at}google.com
Department of Statistics and Department of Health Research & Policy, Stanford University, CA 94305, USA
Department of Health Research & Policy and Department of Statistics, Stanford University, CA 94305, USA
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the challenge of having far more features than samples. In this paper, we introduce a two-step procedure that combines (1) hierarchical clustering and (2) Lasso. By averaging the genes within the clusters obtained from hierarchical clustering, we define supergenes and use them to fit regression models, thereby attaining concise interpretation and accuracy. Our methods are supported with theoretical justifications and demonstrated on simulated and real data sets.
Keywords: Averaging; Hierarchical clustering; Lasso; Variance reduction
| 1. INTRODUCTION |
|---|
|
|
|---|
In this paper, we present a way to improve the regression of gene expression measurements through coupling with the hierarchical clustering method. The DNA microarray data consist of several thousands of genes (predictors) and only a few hundreds of or less than a hundred experiments (observations). Since we focused on supervised methods, we assumed that there is also a response variable; a regression was performed using the continuous response, such as the survival time. Analyzing such data requires special treatment, in particular, overcoming the collinearity among the predictors, which results in large variance of the estimates and inaccurate prediction. We propose a simple yet efficient method of averaging to solve this problem. By averaging, we could also extract a subset of genes with essential predictive power and partition the subset into groups, within which the genes are coherent.
An overview of our method is as follows: We first applied hierarchical clustering (Eisen and others, 1998
or Chapter 14 of Hastie and others, 2001b
) to the genes to obtain a dendrogram that reveals their nested correlation structure. At each level of the hierarchy, we created a unique set of genes and supergenes by computing the average expression of the current clusters. We then used the different sets of genes and supergenes as inputs for regression, in particular, Lasso (Tibshirani, 1996
).
Hierarchical clustering is an especially attractive clustering method in our approach because it provides multiple levels at which the supergenes can be formed. Due to the simple fact that the Euclidean distance measure among the genes is a monotone function of their correlation (when the genes are properly standardized), hierarchical clustering provides flexibility in model selection in such a way that the genes are merged into supergenes in order of their correlation. Lasso yields a sparse fit; the useful property of automatic gene selection motivated us to present Lasso as an ideal procedure for regression. We achieved clearer interpretation and accuracy by combining an unsupervised method with a supervised method. As an alternative to hierarchical clustering, we may also use known facts about the gene association. We present an example of grouping the genes based on the Gene Ontology (GO) (available at http://www.geneontology.org and introduced in various publications, such as Ashburner and others, 2000
; GO-Consortium, 2004
) and using the principal components as supergenes. Although we did not significantly expand on this method, the GO database provides useful information that may be helpful in various applications.
Defining the average expression from a certain cluster to be a new feature, we in effect forced every component of the cluster to play the same role in prediction; in other words, all their coefficients were constrained to be the same. This restriction, depending on the correlation structure of the predictors, reduces the variance in prediction, and this idea has been explored in various settings. For instance, ridge regression (Hoerl and Kennard, 1970
), by penalizing the size of the L2 norm of the coefficients, allows the predictors with strong correlation to bear similar coefficients. In Hastie and others (2001a)
, the authors proposed the "tree harvesting" method that used 2p 1 (where p is the number of genes) average expression profiles from the hierarchical clustering dendrogram as potential features. In the forward selection stage, the model was sequentially augmented by adding a new univariate feature or an interaction term to the preexisting ones; in the following backward deletion stage, the feature causing the least improvement was removed, thereby generating an order for the final model selection. We used the same strategy of obtaining the averaged features from hierarchical clustering, but applied different methods to handle them in a model. Zou and Hastie (2005)
proposed "elastic net," an automatic way to let the correlated, important variables have comparable coefficients and to leave out unimportant variables. The elastic net regression is a regularization scheme with a penalty that combined that of ridge regression and Lasso. Bair and others (2004)
introduced the supervised principal component (SuperPC) method by which the principal component directions were found using only the predictors related to the outcome variable; through the principal component analysis, the correlated variables were automatically collected and their coefficients were constrained to be similar. Yu (2005)
also suggested different approaches to forming overlapping/nonoverlapping gene clusters; the resulting information was used for providing groups of genes as predictors in regression.
In the following sections, we illustrate and support our approach in more detail with examples and justifications. We explore the use of averaged gene expressions for regression in Section 2 and illustrate the method with a microarray data example in Section 3. In Section 4, we present several experiments that used the GO. We conclude with a summary and possible extensions of our studies in Section 5. The appendix contains the proof.
| 2. HIERARCHICAL CLUSTERING AND AVERAGING FOR REGRESSION |
|---|
|
|
|---|
In this section, we investigate the hierarchical clustering and averaging method in regression settings. In particular, we focus on Lasso (Tibshirani, 1996
Let (xi, yi) for i = 1,...,n denote pairs of a gene expression profile (xi
p) and the corresponding response variable (yi
). First we apply hierarchical clustering of the genes to yield their nested correlation structure. With p different levels of hierarchy, we create a unique set of genes and supergenes at each level by averaging the gene expressions within the current clusters. We regress y on every set of the predictors (genes and supergenes) using Lasso. For each fit of Lasso, we obtain a set of solution paths of the coefficients to which we usually apply cross-validation and select the optimal degree of shrinkage. Algorithm 1 summarizes the steps.
Algorithm 1: Hierarchical Clustering and Averaging for Regression
- Apply hierarchical clustering of the genes to yield the nested correlation structure.
- At each level of hierarchy, create supergenes by averaging the gene expressions at each cluster. This gives p different sets of genes and supergenes that represent each level.
- For every set of the predictors (genes and supergenes), fit Lasso, using y as the response variable.
- Using cross-validation, find the optimal degree of shrinkage and level of hierarchy.
We have a two-way factor that affects the goodness of the fit; a bias-variance trade-off occurs as the granularity of clustering or the amount of shrinkage changes. While this bias-variance trade-off occurs, we observe a monotonicity in the minimizing criterion with respect to the two factors. Lasso minimizes the following loss + penalty measure:
|
| (2.1) |
Let us compare the following objective function values:
|
| (2.2) |
|
| (2.3) |
|
| (2.4) |
It is easily seen that M1
M2, while M2
M3 for
1
2. This monotonicity will be demonstrated with data sets in Sections 2.5 and 3.
To find the most desired fit, we optimize the two parameters simultaneously by drawing a one-dimensional path on the two-dimensional space. If we start from the topmost level of hierarchy with
large enough to shrink the single coefficient close to zero, move in a descending direction of the objective function, and end at the bottommost level with
= 0, infinitely many paths can be drawn. Once the path is specified, we cross-validate along the curve, hoping that a near-optimal point will be found on the way. Finding a path so that either the correlation factor representing the hierarchical level increases (i.e. the height decreases) or the shrinkage factor (
) decreases along the curve ensures that the path is searching in a descending direction. We define an appropriate path by adjusting the relative rate between the increment in the correlation factor and the decrement in the shrinkage factor.
Our method is advantageous when there exist multiple variables with strong positive correlations. In fact, in the case of ordinary least-squares (OLS) method, if the sample correlations of the predictors are high enough, an averaged predictor yields the coefficient estimates with lower expected squared error than the raw predictors. The following theorem illustrates this fact.
THEOREM 2.1 Let X1,X2,...,Xm, columns of X, be predictors with the sample correlation structure, corr(Xj,Xk) =
> 0 for j
k. Without loss of generality, assume that the predictors are standardized so that XTX is a symmetric matrix with the diagonal elements being 1 and all the off-diagonal elements being
. Let y be the response variable such that yi = 
ßjXji +
i where
is are iid with mean 0 and variance
2. Let
be the OLS estimates of the coefficients
|
|
Let
be the OLS estimate of the coefficient when y is regressed on the sum of the m predictors.
denotes the corresponding vector of estimates for the original predictors
![]() |
Then
if and only if
![]() | (2.5) |
This theorem claims that if the true coefficients of the predictors are similar, thereby making the ratio
small, then the range of
to improve the fit by averaging is large. Figure 1 illustrates the range. The curves represent different numbers of variables (m), and for each curve, we expect improvement in the upper left-hand region. Although
yields a larger bias than
the former is a more accurate estimate due to a smaller variance.
|
The theorem can easily be generalized to a block-diagonal correlation structure. The average features within each block may yield a more accurate fit than the individual predictors.
We present a data structure for which the averaged predictors are the optimal features to regress on. Assume the following scenario: U1 and U2 are independent random variables; Xj are (fixed) linear functions of U1 for m1 different j's and linear functions of U2 for m2 other j's, and y, the response, is a linear combination of U1 and U2. Both y and X are functions of U's, which are unknown. If 
(the error variance of X) is small enough, any linear combination 
wjXj (without loss of generality, assume that wj
0 and
wj = 1) may be a reasonable predictor replacing U1, and the same applies to 
wjXj for U2. However, we must choose wj = 1/m1 (wj = 1/m2 for the latter) to minimize the variance of the weighted average. We present a realization of this scenario and its result through a simulation.
A data set with n = 100 and p = 200 was generated according to the scheme described above. Among 200 predictors, 90 were significant; they were divided into three blocks (30 each), predictors in the three blocks being functions of the latent variables U1,U2, and U3, respectively. The response variable y was a function of all three latent variables: y = 2U1 2U2 + 3U3 +
. This simulation yields the correlation structure shown in Table 1. Each entry of the matrix is the average sample correlation between two blocks. The average correlation within the first block was high enough that we expected all 30 components to be present in a tight cluster; however, the components in the third block would naturally be split into subgroups.
|
Figure 2 is the dendrogram from the hierarchical clustering of the predictors based on the average linkage. The nodes belonging to the groups 1,2, and 3 are labeled with 1,2, and 3, respectively; the noisy variables are unlabeled. The horizontal dotted line (where 98 clusters are formed) indicates the optimal level selected based on the cross-validation. We next describe the cross-validation procedure.
|
The left-hand side of Figure 3 is the contour plot of the loss + penalty (2.1) values in terms of
and the height. We used height2, since it is linear in the corresponding correlation score. Six different paths were drawn on the plot so that 
(height2)d with d = 0.5,1,1.5,3,5,7 from the left to the right, respectively. We noticed that the paths with larger d favored instances with higher levels of hierarchy. The contours of the cross-validation errors were illustrated on the same two-dimensional plane, on the right-hand side of Figure 3. From the contours, we anticipated that the optimal model would be selected somewhere along the paths with d = 5,7. In practice, we would avoid assessing the cross-validation errors all over the plane, but restrict our search to the given paths. We have shown the level sets all over the plane for a clearer demonstration of our strategy.
|
The six solid curves in Figure 4 connect the cross-validation errors through the respective paths shown in Figure 3. The dotted curves connect the mean squared errors along the same set of paths. We found the minimum from all the values of the cross-validation errors shown on the plot and selected the optimal model (optimal
and height) according to the one standard error rule. The chosen
and the height are marked by a solid dot on both Figures 3 and 4.
|
Figure 5 presents the coefficient paths for the optimal models selected: the optimal Lasso fits on the averaged predictors and on the original predictors. The (dotted) vertical lines denote the chosen shrinkage levels. As for the fit on the averaged predictors, the three coefficients with relatively large absolute values were assigned to the three blocks with positive correlations; all 30 predictors from the first block, 27 from the second block, and only 7 from the third block were averaged to form the three major features. The other nonzero coefficients were for single or double predictors belonging to the third group. Few noisy variables were in the active set. On the other hand, for the fit on the original variables, 45 coefficients were nonzero, 8 of which were assigned to noisy variables.
|
The results are summarized in Table 2. The optimal Lasso fit on the averaged predictors correctly identified 78(30/27/21 from the three groups) out of 90 true predictors, and the 78 were grouped into 18 clusters; however, regular Lasso only recovered 37(8/14/15 from the three groups) significant predictors. The comparison of R2 and the P-values on the test data of size 200 shows that the fit on the averaged predictors not only yields a model that is more interpretable but also explains a much higher percentage of the variation in response.
|
| 3. MICROARRAY DATA EXAMPLE |
|---|
|
|
|---|
In this section, we discuss the implementation of our method in a microarray data set. We used the data set of van't Veer and others (2002)
To reduce the number of genes to a reasonable size and to remove insignificant genes, we first fit 24 481 univariate Cox proportional hazards models and selected the 3017 most significant genes based on the rankings of their P-values. The choice of this cutoff P-value could be another parameter for the whole procedure, but here we did not tune it separately. Instead, we used a large cutoff value (0.15) so that we do not eliminate any features that are potentially significant.
After standardizing the genes, we applied hierarchical clustering and fit Lasso. As can be seen in Figure 6, the loss + penalty (with squared error loss) quantity has a monotone pattern in terms of both
and the correlation score. We investigated five different paths, as indicated on the same contour plot: for these curves, 
(height2)d with d = 2,6,7,9,13.
|
In Figure 7, cross-validation errors are connected along each path. The dotted curves are the mean squared errors along the same five paths. Based on this display of the cross-validation errors, we found the minimum cross-validation error and applied one standard error rule; we selected the model that fits the clusters defined at the height of 1.30 with
= 6.57. The optimal height and
are marked by a solid dot on Figures 6 and 7.
|
We compared this model with the regular Lasso fit using 3017 individual genes and, in addition, tested the model by fitting the Cox proportional hazards model with test data. Table 3 summarizes the results. When tested by fitting the Cox proportional hazards model on 148 test samples, R2 was larger, and the P-value was smaller for the fit with averaged genes than the original predictors.
|
In addition to applying the averaging strategy to Lasso, we did so to the Cox proportional hazards model with L1 regularization as well. As proposed in Tibshirani (1997)
We also compared the performance to those of the tree harvesting (Hastie and others, 2001a
), elastic net (Zou and Hastie, 2005
), and SuperPC (Bair and others, 2004
) methods. Although tree harvesting is similar to our method, its forward stepwise selection procedure searches over a large model space in a greedy manner compared to Lasso. It only selected one cluster of size 32, and augmenting the model further increased the cross-validated deviance score. Although tree harvesting performed comparable to our method, the number of gene clusters it might discover was limited because of its selection method which is less smooth. Elastic net, by adding the L2 penalty term as a grouping device, showed an improvement from Lasso with original predictors. The SuperPC method performed slightly better than our method, using two leading principal components. However, to discover gene clusters through the SuperPC analysis, one must make a strong assumption that all the groups form orthogonal directions with one another. If this assumption is not true, then the SuperPC model is likely to identify only a few of the groups.
Table 4 contains more information on the 17 supergenes in the active set of the optimal model. As a way to validate coherence among the genes forming each cluster, especially the large ones, we used the GO. The GO database (available at http://www.geneontology.org and introduced in various publications, such as Ashburner and others, 2000
; GO-Consortium, 2004
) contains information on genes or the interconnected gene products that form a directed acyclic graph (DAG) structure. There are three nonoverlapping ontologies: molecular function, biological process, and cellular component. Using a related device, Go-TermFinder (Boyle and others, 2004
), we investigated whether each of our clusters is significantly associated with any node in GO, which in turn would imply that there is a common factor within the cluster. For a given cluster, Go-TermFinder gathers all the nodes in which the genes of our cluster are relatively abundant compared to the background of all the other genes. It computes (corrected) P-values based on the hypergeometric distribution for all the possible nodes. All the clusters in Table 4 that are larger than 13 were significantly abundant (Bonferroni corrected P-value < 0.05) in at least one of the GO annotations, although it was hard to expect a statistical significance for groups of size 2 or 3.
|
| 4. USING THE GO DATABASE |
|---|
|
|
|---|
We previously used hierarchical clustering under the assumption that the genes with highly correlated expression levels are likely to be involved in the same functions or processes. As an alternative to the hierarchical clustering method, we propose a more direct way of grouping, using the GO introduced in Section 3. We used the GO annotations to replace the clustering step in our original algorithm.
Using the GO database, we found all the clusters or the gene products representing a function, process or a cellular component, in which more than one of the genes in our data set were included. In an ideal situation, every gene in our list would have been annotated in GO, but it was not true. Thus, we illustrated our strategy using all the genes that belonged to at least one of the GO clusters. Since the genes in the predefined GO clusters were not necessarily highly (positively) correlated in their expression levels, we used the first principal component as a supergene instead of the average expression.
In this application, we fit regression models using the principal components from all the clusters simultaneously. Because the gene products in GO form a DAG structure instead of a nested structure, as in hierarchical clustering, we did not choose any one level of granularity. However, we filtered the clusters based on the following:
- Size of the clusters,
- Variation along the first principal component direction. Once the principal components were achieved, we fit either a Lasso or a Cox proportional hazards model with variable selection.
Table 5 summarizes the number of genes in our data set that were found in three different ontologies, as well as the number of clusters to which they were assigned. The other rows of Table 5 show how many genes were left after the two-stage filtering and how many of the coefficients were nonzero in Lasso fit/Cox Lasso fit with penalization.
|
The performance of Lasso (the first row) and penalized Cox model (the second row) on the test data is summarized in Table 6 to provide a comparison. The third row contains the result from fitting Lasso on the union of the genes that belonged to the clusters that we used. The figures in Table 6 can be compared to the results in Table 3, Section 3. The first two fits using the clustering information from the GO (the first two rows) yield P-values that are comparable to the previous method of using hierarchical clustering. Lasso fits on several thousand individual genes (the third row) yield larger P-values than all other methods with grouping procedures.
|
Using the GO not only provided an automatic grouping of the genes but also revealed to which functions/processes/components the selected groups were associated with. Table 7 lists the eight GO terms from the biological process ontology whose corresponding principal components had nonzero coefficients in the Lasso fit. The column labeled "size" contains pairs of numbers, indicating the number of components that our data contained and the total numbers of genes in the nodes of the GO. Lasso selected the clusters of rather small sizes, which means that the clusters of larger sizes contained many noisy genes, possibly assigning spurious loadings to the first principal component. Although the regression fits using the GO showed reasonable performances, it would have been more impressive if larger clusters had been chosen. The relationship among the genes contained in a node of the GO is reflected in the gene expression, for instance through high correlations among the elements; however, the correlations may be insignificant depending on the granularity of the node.
|
| 5. DISCUSSION |
|---|
|
|
|---|
In this study, we introduced a simple, but effective, method of combining multiple variables into a representative feature and using the new feature in regression. We first used hierarchical clustering to obtain the sets of correlated variables; we averaged the variables within each cluster and input the averages as regressors to Lasso. When the variables were measured in comparable units and were positively correlated, their average was a strong feature, yielding a fit with lower variance than the individual variables.
The gene expression data often satisfy the conditions for improving the fit through averaging. Microarray data are composed of a large number of genes (predictors) that are often divided into blocks; within each block, the genes are highly correlated. These data are the main target of our technique. In the following paragraphs, we describe several directions in which our method may be modified for wider applications.
Although we applied hierarchical clustering with the average linkage for all the analyses, two other popular choices are the complete linkage and the single linkage. As illustrated in Theorem 2.1, averaging reduces the variance but increases the bias of the coefficient estimates; the amount of change depends on the group sizes and the correlations. The complete linkage tends to yield smaller clusters with higher correlations among the elements compared to the single linkage; therefore, these different dissimilarity measures will cause a different amount of changes in variance and bias in resulting Lasso models. Complete linkage is favorable because it generates clusters with stronger correlations; but because of the small group sizes, the variance reduction might not be as large as in the case of the single linkage. We used the average linkage as a compromise.
All three dissimilarity measures of the hierarchical clustering method mentioned above only detect positively correlated elements or groups of elements. The dissimilarity may be characterized more generally so that the elements with highly negative correlations are merged simultaneously. However, before averaging two negatively correlated variables, one of them must be multiplied by 1; an analogous adjustment is necessary when averaging more than two variables with negative correlations.
Another way to modify the clustering scheme is to incorporate the relationship of the predictors with y in the dissimilarity measure. By adding a flavor of supervised learning, one can form the clusters considering the correlations of the predictors with y in addition to the correlations among the predictors.
We proposed using the averaged features as inputs for Lasso a regression method. The idea can be extended to other situations, such as classification and clustering. By averaging the initial variables, we can provide a smaller number of features with lower dimensions for classification and clustering.
In Section 4, we used the clustering result available in the GO rather than discovering the structure from the data. Similarly, other qualitative facts about the genes may replace or accompany the clustering step in our procedure.
| APPENDIX |
|---|
|
|
|---|
It can be easily shown that
is equivalent to the least-squares estimate of ß when all m elements are constrained to be the same. Thus, defining an mx(m 1) matrix
![]() |
and
|
|
For any positive
,
![]() |
Letting 

,
|
|
Letting Z = (XTX) 1J(JT(XTX) 1J) 1JT,
![]() |
The above equations imply that
if and only if
![]() |
| ACKNOWLEDGMENTS |
|---|
Trevor Hastie was partially supported by grant DMS-0505676 from the National Science Foundation, and grant 2R01 CA 72028-07 from the National Institutes of Health. Robert Tibshirani was partially supported by National Science Foundation Grant DMS-9971405 and National Institutes of Health Contract N01-HV-28183. Conflict of Interest: None declared.
| REFERENCES |
|---|
|
|
|---|
-
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J. (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genetics 25:259.[CrossRef][Web of Science][Medline]
Bair E, Hastie T, Paul D, Tibshirani R. (2004) Prediction by supervised principal components. Journal of the American Statistical Association 101:119137.
Boyle E, Weng S, Gollub J, Jin H, Botstein D, Cherry J, Sherlock G. (2004) Go::termfinder-open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20:37105.
Efron B, Hastie T, Johnstone I, Tibshirani R. (2004) Least angle regression. Annals of Statistics 32:40799.[CrossRef][Web of Science]
Eisen M, Spellman P, Brown P, Botstein D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 95:148638.
GO-Consortium D. (2004) The gene ontology (go) database and informatics resource. Nucleic Acids Research 32:23561.
Hastie T, Tibshirani R, Botstein D, Brown P. (2001a) Supervised harvesting of expression trees. Genome Biology 2:112.[Medline]
Hastie T, Tibshirani R, Friedman J. (2001b) Elements of Statistical Learning; Data Mining, Inference, and Prediction(Springer, New York).
Hoerl AE and Kennard R. (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:5567.[CrossRef][Web of Science]
Park M and Hastie T. (2006) l1 Regularization path algorithm for generalized linear models. Technical Report(Stanford University, Stanford).
Tibshirani R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58:26788.
Tibshirani R. (1997) The lasso method for variable selection in the Cox model. Statistics in Medicine 16:38595.[CrossRef][Web of Science][Medline]
van't Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M, Peterse H, van der Kooy K, Marton M, Witteveen A. (2002) Gene expression profiling predicts clinical outcomes of breast cancer. Nature 415:5306.[CrossRef][Medline]
Verweij P and Van Houwelingen H. (1993) Cross-validation in survival analysis. Statistics in Medicine 12:230514.[Web of Science][Medline]
Yu X. (2005) Regression methods for microarray data, [PhD. Thesis](Stanford University, Stanford) pp. 2339.
Zou H and Hastie T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67:30120.[CrossRef]
Received January 3, 2006; revised April 27, 2006; accepted for publication May 8, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Sameith, P. Antczak, E. Marston, N. Turan, D. Maier, T. Stankovic, and F. Falciani Functional modules integrating essential cellular functions are predictive of the response of leukaemia cells to DNA damage Bioinformatics, November 15, 2008; 24(22): 2602 - 2607. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Luan and H. Li Group additive regression models for genomic data analysis Biostat., January 1, 2008; 9(1): 100 - 113. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














