Skip Navigation


Biostatistics Advance Access originally published online on April 7, 2006
Biostatistics 2007 8(1):86-100; doi:10.1093/biostatistics/kxj035
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Data Supplement
Right arrow All Versions of this Article:
8/1/86    most recent
kxj035v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Guo, Y.
Right arrow Articles by Tibshirani, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Guo, Y.
Right arrow Articles by Tibshirani, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Regularized linear discriminant analysis and its application in microarrays

Yaqian Guo*

Department of Statistics, Stanford University, Stanford, CA 94305, USA yaqiang{at}stanford.edu

Trevor Hastie

Department of Statistics, Stanford University, Stanford, CA 94305, USA

Robert Tibshirani

Department of Health Research and Policy, Redwood Building, Room T101C, Stanford University, Stanford, CA 94305, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 
In this paper, we introduce a modified version of linear discriminant analysis, called the "shrunken centroids regularized discriminant analysis" (SCRDA). This method generalizes the idea of the "nearest shrunken centroids" (NSC) (Tibshirani and others, 2003Go) into the classical discriminant analysis. The SCRDA method is specially designed for classification problems in high dimension low sample size situations, for example, microarray data. Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method (using the NSC algorithm) and can be as competitive as the support vector machines classifiers. It is also suitable for feature elimination purpose and can be used as gene selection method. The open source R package for this method (named "rda") is available on CRAN (http://www.r-project.org) for download and testing.

Keywords: Classification; Discriminant analysis; Microarray; Prediction analysis of microarrays (PAM); Regularization; Shrunken centriods


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 
Discriminant analysis (DA) is widely used in classification problems. The traditional way of doing DA was introduced by R. Fisher, known as the linear discriminant analysis (LDA). For the convenience, we first describe the general setup of this method so that we can follow the notation used here throughout this paper.

Suppose there are G different populations, each assumed to have a multivariate normal distribution with a common covariance matrix Formula of dimension Formula and mean vectors Formula (Formula), both of which are assumed known for the time being. Now, suppose we have a random sample of n observations from these populations with their true group labels being unknown. The question is how to correctly identify the population from which each observation comes. To be more explicit, let Formula be observations from population 1, Formula from population 2, and so on. Thus, Formula

Formula

The idea of LDA is to classify observation Formula to a population Formula which minimizes Formula, i.e.

Formula

Under the above multivariate normal assumptions, this is equivalent to finding the population that maximizes the likelihood of the observation. More often, people have some prior knowledge as to the proportion of each population. For example, let Formula be the proportion of population g such that Formula. Then, instead of maximizing the likelihood, we maximize the posterior probability the observation belongs to a particular group, i.e.

Formula

The linearity of this DA method comes from the assumption of common covariance matrix, which simplifies the above criterion as

Formula (1.1)

where

Formula

is the so-called discriminant function.

In reality, both Formula and Formula are unknown and therefore need to be estimated from the sample. Almost always, one uses the maximum likelihood estimates for these parameters,

Formula

where X is a Formula matrix with columns corresponding to the observations and Formula is a matrix of the same dimensions with each column corresponding to the sample mean vector of the population that column belongs to. Therefore, in a more practical form, the sample version discriminant function is usually used in the LDA,

Formula (1.2)

When the assumption of common covariance matrix is not satisfied, one uses an individual covariance matrix for each group and this leads to the so-called quadratic discriminant analysis (QDA) as the discriminating boundaries are quadratic curves. There is also an intermediate method between LDA and QDA, which is a regularized version of discriminant analysis (RDA) proposed by Friedman (1989)Go. However, the regularization used in that method is different from the one we will propose here. A detailed source about the LDA, QDA, and Friedman's RDA methods can be found in the book by Hastie and others (2001)Go. As we can see, the concept of DA certainly embraces a broader scope. But in this paper, our main focus will be solely the LDA part and henceforth the term DA will stand for LDA unless otherwise emphasized.

This paper is arranged as follows. In Section 2, we will first discuss in detail our version of regularization in DA, its statistical properties, and some computational issues (Section 2.1). Then, we introduce the "shrunken centroids regularized discriminant analysis" (SCRDA) method based on this regularization (Section 2.2). In Section 3, we compare our SCRDA method against other classification approaches through several publicly available real life microarray data sets. We also discuss an important issue about how to choose the optimal parameter pairs Formula for our methods (Section 3.4). Section 4 is devoted to a simulation study, where we generate data sets under different scenarios to evaluate the performance of our SCRDA method. In Section 5, we briefly discuss the feature selection property of SCRDA method. Section 6 is the discussion.


    2. SHRUNKEN CENTROIDS RDA
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 

2.1 Regularization in DA

LDA is straightforward in the cases where the number of observations is greater than the dimensionality of each observation, i.e. Formula. In addition to being easy to apply, it also has nice properties, like robustness to deviations from model assumptions and almost-"Bayes" optimality. However, it becomes a serious challenge to use this method in the microarray analysis settings, where Formula is always the case. There are two major concerns here. First, the sample covariance matrix estimate is singular and cannot be inverted. Although we may use the generalized inverse instead, the estimate will be very unstable due to lack of observations. Actually, the performance of LDA in high-dimensional situations is far from being optimal (Dipillo, 1976), (, dipillo2). Second, high dimensionality makes direct matrix operation formidable, hence hindering the applicability of this method. Therefore, we will make some changes to the original LDA to overcome these problems. First, to resolve the singularity problem, instead of using Formula directly, we use

Formula (2.1)

for some Formula, Formula. Some other forms of regularization on Formula can be

Formula (2.2)

or

Formula (2.3)

for some Formula, Formula. It is easy to see that if we ignore the prior constant, the three forms of regularization are equivalent in terms of the discriminant function. In this paper, the form (2.1) will be applied in all our computational examples due to its convenience to use.

The formulations above are not entirely new and actually have been frequently seen in situations like the ridge regression (Hoerl and Kennard, 1970Go), where the correlation between predictors is high. By introducing a slightly biased covariance estimate, we not only resolve the singularity problem but also stabilize the sample covariance estimate. For example, the discriminant function (2.5) below is the main formula that we will be using in this paper. It utilizes the regularization form (2.1). Figures 1 and 2Go show how the discriminant function (2.5) behaves in a two-class DA setup, where data are generated according to our model assumptions in Section 1. As we can see, regularization both stabilizes the variance and reduces the bias of the discriminant function (2.5). And as a result, the prediction accuracy is improved. In Figure 1, the identity covariance matrix is used to generate the data. From the plot, we can see that the optimal regularization parameter Formula is very close to 0, which means the optimal regularized covariance matrix tends to look like the true identity covariance matrix, regardless the sample size and data dimension. On the other hand, in Figure 2, an auto-regressive covariance structure (4.2) is used and the optimal Formula now lies somewhere between 0 and 1. Furthermore, the sample size and data dimension now have effect on choosing the optimal regularization parameter Formula. When the sample size is greater than the data dimension as is the case in the two top plots of Figure 2, the sample covariance matrix can be accurately estimated and since it is also unbiased for the true population covariance matrix, it is more favored in this case. When sample size is less than data dimension as is the case in the two bottom plots of Figure 2, the sample covariance matrix estimate will be highly variable. Although it is still unbiased for the true covariance matrix, a biased estimator will be favored due to the bias-variance trade-off. Hence, the optimal value of Formula is shifted toward 0. These plots give indications on the conditions when the regularized DA can work well. We will discuss this point later by using more data examples.


Figure 1
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The two plots on the left show the variance and bias of the discriminant function (2.5) as a function of the regularization parameter Formula in (2.1) for different sample sizes and dimensions. The large difference between two plots is due to different Formula ratios. The two plots on the right show the prediction error of the discriminant function for the corresponding conditions on the left. The data points are generated from a p-dimensional multivariate normal distribution with Formula.

 

Figure 2
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 The data are generated from a p-dimensional multivariate normal distribution with an auto-regressive correlation matrix similar as in (4.2). The auto-correlation is {rho} = 0.6.

 
Another intuitive way of regularization modifies the sample correlation matrix Formula in the same way,

Formula (2.4)

where Formula is the diagonal matrix taking the diagonal elements of Formula. Then, we compute the regularized sample covariance matrix by Formula. In this paper, we will consider both cases and their performance will be compared. Now, having introduced the regularized covariance matrix, we can define the corresponding regularized discriminant function as

Formula (2.5)

where the Formula can be from either (2.1) or (2.4).

Our next goal is to facilitate the computation of this new discriminant function. We have addressed the issue that direct matrix manipulation is impractical in microarray settings. But if we employ the singular value decomposition (SVD) trick to compute the matrix inversion, we can get around this trouble. This enables a very efficient way of computing the discriminant function and reduces the computation complexity from the order of Formula to Formula, which will be a significant saving when Formula. For more details about the algorithm, please refer to Hastie and others (2001)Go.

2.2 Shrunken centroids RDA

In this section, we define our new method SCRDA based on the regularized discriminant function (2.5) in Section 2.1. The idea of this method is similar to the "nearest shrunken centroids" (NSC) method (Tibshirani and others, 2003Go), which we will describe briefly first. In microarray analysis, a widely accepted assumption is that most genes do not have differential expression level among different classes. In reality, the differences we observe are mostly due to random fluctuation. The NSC method removes the noisy information from such fluctuation by setting a soft threshold. This will effectively eliminate many non-contributing genes and leave us with a small subset of genes for scientific interpretation and further analysis. In the NSC method, the group centroids of each gene are shrunken individually. This is based on the assumption that genes are independent of each other, which however, for most of the time is not totally valid. Note that after shrinking the group centroids of a particular gene g, they compute the following gene-specific score for an observation Formula:

Formula (2.6)

where Formula is the gth component of the Formula vector Formula, Formula is the shrunken centroid of group k for gene g, and Formula is the pooled standard deviation of gene g. Then Formula is classified to group k if k minimizes the sum of the scores over all genes (if the prior information is available, a term Formula should be included.), i.e.

Formula

which is also equivalent to

Formula

given Formula. This is similar to the discriminant function (2.5) except that we replace Formula with the diagonal matrix Formula and the centroid vector Formula with the shrunken centroid vector Formula. Therefore, a direct modification in the regularized discriminant function (2.5) to incorporate the idea of the NSC method is to shrink the centroids in (2.5) before calculating the discriminant score, i.e.

Formula (2.7)

However, in addition to shrinking the centroids directly, there are also two other possibilities. One is to shrink Formula, i.e.

Formula (2.8)

and the other is to shrink Formula, i.e.

Formula (2.9)

Although, it has been shown in our numerical analysis that all three shrinking methods have very good classification performance, only (2.8) will be the main focus of this paper as it also possesses the feature elimination property, which we will discuss later. Hence, we will refer to the DA resulted from (2.8) as SCRDA without differentiating whether Formula comes from (2.1) or (2.4). We will say more specifically which method is actually used when such a distinction is necessary.


    3. COMPARISON BASED ON REAL MICROARRAY DATA
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 
In this section, we will investigate how well our new method works on some real data sets. A few existing classification methods are used as standards for comparison. The first two methods are the penalized logistic regression (PLR) and the support vector machines (SVM), both via univariate ranking (UR) and recursive feature elimination (RFE). They are proposed by Zhu and Hastie (2004)Go and are shown to have good classification performance. Two data sets, the "Tamayo" and "Golub" data sets from their paper are also used (Sections 3.1 and 3.2). The third example uses the "Brown" data set. SVM via UR is used as comparison. In addition, as the sibling method of SCRDA, PAM, whose core idea is NSC, is naturally included in all three examples. In the supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org), we will give more real data examples showing the performance of different methods.

3.1 Tamayo data

The Tamayo data set (Ramaswamy and others, 2001Go; Zhu and Hastie, 2004) is divided into a training subset, which contains 144 samples and a test subset of 54 samples. They consist of totally 14 different types of cancers and the number of genes in each array is 16 063. Since there are two tuning parameters in the SCRDA method, i.e. the regularization parameter Formula and the shrinkage parameter Formula, we choose the optimal pairs Formula for Formula and Formula using cross-validation on the training samples. And then we calculate the test error based on the tuning parameter pairs we chose and compare it with the results from Zhu and Hastie (2004)Go. The results are summarized in Table 1. Based on how the covariance matrix is regularized in (2.8), two different forms of SCRDA are considered. In the table, we use "SCRDA" to denote the one from regularization on the covariance matrix and "SCRDAFormula" for the regularization on the correlation matrix. We can see that SCRDA clearly dominates PAM and slightly outperforms the last four methods in the table. Meanwhile, it also does a good job on selecting informative gene subset.


View this table:
[in this window]
[in a new window]

 
Table 1 Tamayo data. The last four rows are excerpted from Zhu and Hastie (2004)Go for comparison. The SCRDA and SCRDAr methods correspond to the situations where Formula in (2.5) comes from (2.1) and (2.4), respectively

 
3.2 Golub data

The Golub data (Golub and others, 1999Go; Zhu and Hastie, 2004) consist of 38 training samples and 34 test samples from two cancer classes. The number of genes on each array is 7129. As there are only two groups to predict, this data set is much easier to analyze than the Tamayo data. The classification performance is generally impressive for most methods such that the difference among them is almost negligible. The result is summarized in Table 2.


View this table:
[in this window]
[in a new window]

 
Table 2 Golub data. This table is similar to Table 1

 
3.3 Brown data

The Brown data set, similar to the Tamayo data, is also a complex cancer data set. It contains a large number of samples (Formula) and classes (Formula, 14 cancer types and one normal type). The number of genes in the arrays is smaller (Formula) than the Tamayo data (Formula). As we can see (Table 3), the SCRDA method dominates PAM by a large margin and is as good as SVM.


View this table:
[in this window]
[in a new window]

 
Table 3 Brown data (n = 349, p = 4718, G = 15)

 
3.4 Choosing the optimal parameters ({alpha}, {Delta}) in SCRDA

In this section, we discuss the issue on how to choose the optimal tuning parameters. We have mentioned briefly that cross-validation should be used in determining the parameters. However, in practice, this process can be somewhat confusing. The main problem is that there are many possible tuning parameter pairs giving the same cross-validation error rate. Yet, the test error rate based on them may vary differently. Therefore, how to choose the best parameter pairs is an essential issue in evaluating the performance of the SCRDA method. Therefore, we will suggest two rules for choosing the parameters based on our experience.

First let us take a look at how the classification errors and the number of genes remained are distributed across the varying scopes of the tuning parameters Formula. The two plots in Figure 3 show the cross-validation error and test error given by the SCRDA method for the Tamayo data. Formula is chosen to lie between 0 and 1 while Formula between 0 and 3. Figure 4 shows the number of genes remained for the same range of the parameters. In all three plots, the stars correspond to the parameter pairs that yield the minimal cross-validation error. The round dots correspond to the 1 standard error boundary around the minimal cross-validation error points. The diamond dot in the test error plot (Figure 3, right) corresponds to the parameter pair that gives the minimal test error in the whole parameter space.


Figure 3
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Distribution maps the cross-validation error (left) and test error (right) of SCRDA for the Tamayo data. In each plot, the x and y axes correspond to Formula and Formula, respectively. The gray scale at the bottom shows the magnitude of the errors. The stars correspond to the parameter pairs that yield the minimal cross-validation error. The round dots correspond to the 1 standard error boundary around the minimal cross-validation error points. The diamond in the test error plot (right) corresponds to the parameter pair that gives the minimal test error in the whole parameter space.

 

Figure 4
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Distribution map of the number of genes by SCRDA remained for the Tamayo data. The stars and round dots have the same notations as in Figure 3.

 
The most significant pattern we can observe in Figure 4 is the decreasing gradient approximately along the 45° diagonal line, i.e. the number of genes remained decreases as Formula increases or Formula decreases. This makes sense by intuition and has been consistently observed for all the real data and simulation data we have worked on. On the other hand, the distribution of the classification errors (both CV and test) as in Figure 3 does not have such a clearly consistent pattern. They may change dramatically from one data set to another. Further, as it is not possible to establish a unified correspondence between the distribution map of the classification error and the number of genes remained, we need to consider two distribution maps at the same time to achieve improved classification performance using a reasonably reduced subset of genes. We hence suggest the following two rules for identifying the optimal parameter pair(s) Formula: the "Min-Min" rule and the "One Standard Error" rule.

The "Min-Min" rule

  1. First, find all the pairs Formula that correspond to the minimal cross-validation error from training samples.
  2. Select the pair or (pairs) that use the minimal number of genes.

The "One Standard Error" rule

  1. First, identify all the pairs Formula that correspond to the minimal cross-validation error from the training samples.
  2. Find the one standard error boundary points Formula.
  3. On the boundary, find the pair Formula that gives the smallest number of genes remained.

If there is more than one optimal pair, it is recommended to calculate the averaged test error based on all the pairs chosen as we did in this paper. There is no theoretical justification yet why these two rules are suggested. But from our empirical experience, both methods worked and provided no significant difference in terms of classification accuracy. For consistency, we have been using the "Min-Min" rule throughout this paper.


    4. SIMULATION STUDY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 
It was encouraging to see how our new method performs on some real microarray data sets. In this section, we investigate the performance of the SCRDA method in a more controlled manner. We deliberately construct three different simulation setups to study the conditions under which our SCRDA method would work well. Particularly, we choose the PAM method as our competitor.

4.1 Two-group independence structure

The first setup is the simplest. There are two classes of multivariate normal distributions: Formula and Formula, each of dimension Formula. The true covariance structure is the independence structure, i.e. Formula. Also, for simplicity, all components of Formula are assumed to be 0 and for Formula, the first 100 components are Formula while the rest are all 0 as well, i.e. Formula and Formula For each class, we generate Formula training samples and Formula test samples.

This is not a situation where we see much advantage of the SCRDA method over the PAM method (Table 4). In this situation, all methods produce basically the same results. PAM seems to be even slightly better than the SCRDA method. However, it is hard to declare PAM as a clear winner in this case as the margin of betterment is still within the range of error fluctuation. There are two reasons. First, the number of classes is only two, the simplest case in all classification problems. As people are aware of, it is much easier for most classification methods to work well in the two-group classification problems and often it is hard to really observe the advantage of one method over another, e.g. the Golub data example. Second, the data is generated from the covariance structure of identity matrix. This suits exactly the assumption in the PAM method to make it work well. As we can see in the next two examples, when the number of classes increases, especially when data is more correlated, the SCRDA method will start to show true advantage over the PAM method.


View this table:
[in this window]
[in a new window]

 
Table 4 Setup I—two groups with independent structure

 
4.2 Multi-group independence structure

The second simulation setup is slightly more complicated than the first one. We generate a multiple groups (Formula) classification scenario. Again each class is assumed to have distribution Formula with dimension of Formula. Formula is still assumed to be Formula as in setup I. The components of each mean vector Formula is assumed to be all 0 except for Formula components, which are set to be Formula. The positions of the non-zero components are selected randomly for each mean vector and they do not overlap. Again a total of Formula training samples and Formula test samples are generated with equal probabilities for each class. This time, we start to observe noticable differences among these methods (Table 5). Particularly, the SCRDA method starts to outperform the PAM method as we would expect.


View this table:
[in this window]
[in a new window]

 
Table 5 Setup II—multiple groups with independent structure

 
4.3 Two-group dependence structure

In the last case, we produce a scenario that more resembles the real microarray data. The simulation structure is as follows. We again consider a two-group classification problem as in setup I. Two distributions are still Formula and Formula with Formula. Formula is assumed to be the same as in setup I while Formula is slightly different. Formula is no longer the identity matrix. Instead, we assume the following block diagonal structure:

Formula (4.1)

with each diagonal block being the following auto-regressive structure:

Formula (4.2)

The block size is Formula and there are totally Formula blocks. We assume the auto-correlation within each block is Formula and we set alternating signs for each block. A total of Formula training samples and Formula test samples are generated with half from each class.

This simulation setup does have sound basis in real microarray data. It is common knowledge that genes are networked together in pathways. Although it is true that weak connections between groups may exist, independence between groups is usually a reasonable assumption. Also, within each group, genes are either positively or negatively correlated and due to their relative distance in the regulatory pathway, the further apart two genes, the less correlation between them. These are exactly the reasons why we use the above simulation model. From the results in Table 6, we can clearly see that the SCRDA method outperforms PAM by a significant margin. Considering this is only a two-group classification problem mimicking the real microarray data, we should expect that the difference will be more significant when the number of classes is large as we have observed for the Tamayo and Brown data.


View this table:
[in this window]
[in a new window]

 
Table 6 : Setup III—two groups with dependent structure

 

    5. FEATURE SELECTION BY THE SCRDA METHOD
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 
Remember that the discriminant function (2.5) is linear in X with coefficients vector Formula. Now, if the ith element of the coefficients vector is 0 for all Formula, then it means gene i does not contribute to our classification purpose and hence can be eliminated. Therefore, SCRDA potentially can be used for the gene selection purpose. For example, as shown in Table 7, the number of genes that are truly differentially expressed is 100, 280, and 200, respectively, in the three simulation setups above. Correspondingly, the SCRDA method picks out 240, 785, and 282 genes in each setup. Among these genes, 82, 204, and 138 are truly differentially expressed, respectively. The detection rate is at least Formula in all situations. However, the false-positive rate is also high, especially when the number of classes is large. For now, there is not a good way to adjust this high false-positive rate. Therefore, SCRDA can be conservatively used as gene selection method.


View this table:
[in this window]
[in a new window]

 
Table 7 Feature selection by SCRDA

 

    6. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 
Through comparisons using both real microarray data sets and simulated data sets, we have shown that the SCRDA method can be a promising classification tool. Particularly, it is consistently better than its sibling method, PAM in many problems. This new method is also very competitive to some other methods, e.g. SVM.

This new method is not only empirically useful in terms of classification performance, it also has some interesting theoretical implications, which we will discuss carefully in a future paper. For example, it can be shown that the regularized discriminant function (2.5) is equivalent to the penalized log-likelihood method and in some special cases, our new method SCRDA can be related to another recently proposed new method called "elastic net" (Zou and Hastie, 2005Go). These results are interesting since they not only give different perspectives of statistical methods but also provide new computational approaches. For example, an alternative method for estimating the shrunken regularized centroids other than the way we have proposed in this paper is to solve the solution of the mixed Formula-Formula penalty function. This has been made possible as the problem will convert to the usual LASSO (Tibshirani, 1996Go) solution. And with the emergence of the new algorithm LARS (Efron and others, 2004Go), efficient numerical solution is also available.

As mentioned before, choosing the optimal parameter pairs for the SCRDA method is not as straightforward as in PAM and in some cases, the process can be somewhat tedious. The guidelines given in Section 3.4 work generally well, at least for all the data examples provided in this paper. However, it may require some experience with the SCRDA method to get the best result. Also, the computation in the SCRDA method is not as fast as in PAM due to two reasons. First, we have two parameters Formula to optimize over a two-dimensional grid rather than the one-dimensional one in PAM. Second, although the SVD algorithm is very efficient, the computation still involves large matrix manipulation in practice, while only vector operations are involved in PAM. On the other hand, as shown in this paper, PAM does not always do a worse job than the SCRDA method. In some situations, e.g. when the number of classes is small or the covariance structure is nearly diagonal, PAM is both accurate in prediction and computationally efficient. Therefore, we recommend using the SCRDA method only when PAM cannot perform well in classification.

Also, the SCRDA method can be used directly for gene selection proposes. As we have seen in Section 5, the selection process of SCRDA is rather conservative, tending to include many more genes unnecessary. But overall speaking, it is not generally worse than PAM. And since it includes most of the genes that are truly differentially expressed, it is a safer way of including the ones we really should detect.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. SHRUNKEN CENTROIDS RDA
 3. COMPARISON BASED ON...
 4. SIMULATION STUDY
 5. FEATURE SELECTION BY...
 6. DISCUSSION
 REFERENCES
 

    Dipillo P. (1976) The application of bias to discriminant analysis. Communication in Statistics Theory and Methodology A5:843–54.

    Dipillo P. (1977) Further application of bias discriminant analysis. Communication in Statistics Theory and Methodology A6:933–43.

    Efron B, Hastie T, Johnstone I, Tibshirani R. (2004) Least angle regression. Annals of Statistics 32:407–99.[CrossRef][Web of Science]

    Friedman J. (1989) Regularized discriminant analysis. Journal of the American Statistical Association 84:165–75.[CrossRef]

    Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–6.[Abstract/Free Full Text]

    Hastie T, Tibshirani R, Friedman J. (2001) The Elements of Statistical Learning(Springer, New York).

    Hoerl A and Kennard R. (1970) Ridge regression: biased estimation for non-orthogonal problems. Technometrics 12:55–67.

    Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov J. and others. (2001) Multiclass cancer diagnosis using tumor gene expression signature. Proceedings of the National Academy of Sciences of the United States of America 98:15149–54.[Abstract/Free Full Text]

    Tibshirani R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58:267–88.

    Tibshirani R, Hastie T, Narashimhan B, Chu G. (2003) Class prediction by nearest shrunken centroids with applications to dna microarrays. Statistical Science 18:104–17.[CrossRef]

    Zhu J and Hastie T. (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5:427–43.[Abstract]

    Zou H and Hastie T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67:301–20.[CrossRef]

    Received November 1, 2005; revised March 27, 2006; accepted for publication March 29, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    BioinformaticsHome page
    V. Zuber and K. Strimmer
    Gene ranking and biomarker discovery under correlation
    Bioinformatics, October 15, 2009; 25(20): 2700 - 2707.
    [Abstract] [Full Text] [PDF]


    Home page
    BioinformaticsHome page
    J. Gertheiss and G. Tutz
    Supervised feature selection in mass spectrometry-based proteomic profiling by blockwise boosting
    Bioinformatics, April 15, 2009; 25(8): 1076 - 1077.
    [Abstract] [Full Text] [PDF]


    Home page
    BioinformaticsHome page
    F. Tai and W. Pan
    Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data
    Bioinformatics, December 1, 2007; 23(23): 3170 - 3177.
    [Abstract] [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow Data Supplement
    Right arrow All Versions of this Article:
    8/1/86    most recent
    kxj035v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Guo, Y.
    Right arrow Articles by Tibshirani, R.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Guo, Y.
    Right arrow Articles by Tibshirani, R.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?