Biostatistics Advance Access first published online on January 22, 2007
This version published online on February 16, 2007
Biostatistics, doi:10.1093/biostatistics/kxm002
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data
Department of Statistics and Biostatistics Center, The George Washington University, Washington, DC 20052, USA
ylai{at}gwu.edu
| SUMMARY |
|---|
|
|
|---|
Due to advances in experimental technologies, it is feasible to collect measurements for a large number of variables. When these variables are simultaneously screened by a statistical test, it is necessary to consider the adjustment for multiple hypothesis testing. The false discovery rate has been proposed and widely used to address this issue. A related problem is the estimation of the proportion of true null hypotheses. The long-standing difficulty to this problem is the identifiability of the nonparametric model. In this study, we propose a moment-based method coupled with sample splitting for estimating this proportion. If the p values from the alternative hypothesis are homogeneously distributed, then the proposed method will solve the identifiability and give its optimal performances. When the p values from the alternative hypothesis are heterogeneously distributed, we propose to approximate this mixture distribution so that the identifiability can be achieved. Theoretical aspects of the approximation error are discussed. The proposed estimation method is completely nonparametric and simple with an explicit formula. Simulation studies show the favorable performances of the proposed method when it is compared to the other existing methods. Two microarray gene expression data sets are considered for applications.
Keywords: Microarray; Moment estimator; Proportion of true null hypothesis
| 1. INTRODUCTION |
|---|
|
|
|---|
Due to advances in experimental technologies, it is feasible to collect measurements for a large number of variables. These data include microarray gene expression data (Hedenfalk and others, 2001), mass spectrometry data (Wu and others, 2003), and nuclear magnetic resonance spectral data (Tadesse and others, 2005). The sample sizes of these data sets are usually small because of their relatively high costs. These data sets can be collected for multiple sample groups, and a typical interest is to identify variables significantly distinguishing these groups, such as normal against disease groups. Statistically, we conduct a multisample comparison test for each of the measured variables. Because numerous variables are simultaneously screened, it is necessary to consider the adjustment for multiple hypothesis testing. The false discovery rate (FDR) has been proposed and widely used to address this issue (Benjamini and Hochberg, 1995
. For microarray data, it is equivalent to estimate the proportion of differentially expressed genes. This quantity is also crucial for the sample-size calculation in microarray experiment designs (Jung, 2005
Many statistical methods have been proposed to estimate
, such as a mixture model proposed by Allison and others (2002), QVALUE (Storey and Tibshirani, 2003
), BUM (Pounds and Morris, 2003
), SPLOSH (Pounds and Cheng, 2004
), and LBE (Dalmasso and others, 2005). These methods are not always efficient. They may give accurate estimation results in some cases but fail in other cases. If the distributions of test statistics or the related p-value distributions can be specified in parametric forms for both the null and the alternative hypotheses, then the model-based estimation approach, such as the mixture model proposed by Allison and others (2002) or BUM proposed by Pounds and Morris (2003)
, should provide favorable performances. However, it is generally difficult to validate these distribution assumptions, especially when sample sizes are small. For the nonparametric approach, a long-standing difficulty is the model identifiability (unique solution of model parameters), because observations are sampled from mixed distributions from the null and the alternative hypotheses. QVALUE (Storey and Tibshirani, 2003
) and SPLOSH (Pounds and Cheng, 2004
) first smooth the empirical p-value distribution and then estimate an upper bound of
. LBE proposed by Dalmasso and others (2005) estimates the upper bound of
through a moment-based method. Recently, Pawitan and others (2005a,b) discussed the bias in the estimation of
and the influence from sample sizes.
Moment-based estimation methods usually require no independence assumptions. Explicit formulas can generally be derived. The requirement of large sample sizes, which is necessary for the statistical efficiency of these methods, limits their usefulness in practice. However, when estimating
for "omics" data, the sample size is the number of variables and is usually large. Therefore, we consider a moment-based method coupled with sample splitting for estimating
. By splitting the sample, we are able to understand the p-value distribution under different hypotheses by establishing the conditional independence structure of joint p-value distribution. If the p values from the alternative hypothesis are homogeneously distributed, then the proposed method will solve the model identifiability and give its optimal performances. When the p values from the alternative hypothesis are heterogeneously distributed, we propose to approximate this mixture distribution so that the model identifiability can be achieved. The proposed method is completely nonparametric and simple with an explicit formula.
In the following sections, we first propose the method for estimating
. Theoretical aspects of the approximation error are also presented. Then, we present analysis results for several simulated and experimental data sets to compare the performances of the proposed method and the other existing methods. Finally, the advantages and disadvantages of the proposed method are discussed.
| 2. A MOMENT-BASED ESTIMATION METHOD |
|---|
|
|
|---|
A typical situation when multiple hypothesis testing is performed for omics data (microarray data, mass spectrometry data, etc.) is that numerous p values are generated. A proportion of these p values are consistent with the null hypothesis and the rest are consistent with the alternative hypothesis. Our interest in this study is to estimate
, the proportion of true null hypothesis. To provide an illustrative example for our proposed method, we simulate 2 independent data sets. Both data sets have the same 3000 variables and 2 sample groups with 5 samples in each group. In each data set, the first 1200 variables are independently simulated from the normal distribution
and
for the first and the second sample groups, respectively (40% nonnull), and the rest 1800 variables are independently simulated from the normal distribution
for both the groups (60% null). p values from the 2-sample Student's t-test are calculated for these simulated variables.
The marginal histograms in Figure 1(a) give illustrations of the p-value distributions based on one data set. From these histograms, one may realize the problem of identifiability when estimating
. Although the null distribution is known as uniformly distributed in
, the nonnull distribution is unknown. Without imposing any parametric or other assumptions on the nonnull distribution, we cannot obtain a unique solution for
if only one data set is considered.
|
However, if we have 2 independent data sets such that both data sets contain the same variables, then the pairs of p values can be obtained for all variables, and these pairs are actually conditionally independent. The scatter plot in Figure 1(a) gives an illustration. From this plot, one may realize that it is possible to solve the identifiability problem and obtain a unique solution for
under certain conditions. In the following subsections, we first introduce an estimation method when 2 independent data sets are available. When there is only one data set, we propose a procedure to generate 2 independent data sets. A bootstrap procedure for confidence intervals and some theoretical aspects are also discussed.
At the beginning, we consider 2 independent data sets. Both data sets contain the same m variables and g sample groups. Their sample sizes may be different. Test statistics are chosen to test some specific hypotheses for each variable, such as
: the variable has the same population means in different sample groups versus
: the variable has different population means in different sample groups. (For simplicity, we skip the mathematical description of data structure and the related test statistics.) The goal is to estimate
, the proportion of variables consistent with the null hypothesis.
Suppose a test statistic T is chosen to test a specified hypothesis. Without loss of generality, we assume that T is continuous. For each variable, we can obtain 2 corresponding p values from the 2 data sets. For data set k,
, the p value
follows a uniform distribution
under the null hypothesis
. Under the alternative hypothesis
, there may be various distribution components (except
) for the p-value distribution. We use
to denote the set containing the indices representing different nonnull distribution components.
Generally, the set I may contain many different components (
, where
is the number of elements in I). We propose that the null component and the different nonnull components can be approximated by 2 components: a null component and a nonnull component. Under this approximation, there is an approximated proportion of true null hypothesis
, which may be different from
(however, if
, then
). Considering the moments of p values, we have
![]() |
,
, and
are the expected values of p value following the null, nonnull, and marginal distributions in data set k,
, respectively.
is the expected value of the product of
and
under the marginal joint distribution. Note that
because the null distribution is known as
. Furthermore,
,
, and
can be estimated from the data (using the corresponding sample moments). Then, there are only 3 unknown parameters:
,
, and
. With the above 3 equations, we can obtain an explicit formula
|
|
The mathematical proof is given as Lemma 1 in supplementary material available at Biostatistics online. Therefore, an estimator for
is proposed as
![]() | (2.1) |
where
is the calculated p value of the jth variable in data set k,
,
. Boundary constrains are imposed since the proportion
must be within
.
To estimate
for a given data set, which contains m variables and g sample groups, we can first divide the data set into 2 parts and then use the method described above. The following procedure is proposed.
PROCEDURE 1
- 1) For a given variable, randomly divide its observations in each sample group into 2 parts with (approximately) equal sample sizes;
- 2) With a given test statistic T, calculate the p value for each part;
- 3) Repeat steps 1 and 2 for all variables and obtain the set of paired p values;
- 4) Use (2.1) to estimate
;
- 5) Repeat steps 14 R times and obtain R estimates of
;
- 6) Return the median of these R estimates.
- 2) With a given test statistic T, calculate the p value for each part;
There may be complicated dependence structures among the different variables in the data set. We perform data division step (step 1) separately for each variable to reduce the impacts from dependence structures (see Figure 1(b) for an illustration). Although the proposed method is moment based and does not require any independence assumptions, it is still necessary to reduce these impacts so that the estimation can be more statistically efficient. Because different random divisions of the data set result in different estimates, we repeat steps 14 R times to obtain a resample distribution of estimates. (In this study, we repeat
times. Based on some simulation studies [data not shown], 25 is an appropriate choice for the balance between estimation accuracy and computation burden.) Then, the median is reported for robustness purpose.
Theoretically, we can apply Delta method (Casella and Berger, 2002
, p. 240) to obtain formulas for the large sample variance and confidence intervals. However, these formulas may be invalid because of complicated dependence structures among the variables in omics data. Therefore, we use the bootstrap method (Efron, 1979
) to obtain confidence intervals. For QVALUE, BUM, SPLOSH, and LBE, we can simply repeat sampling p values and estimating
B times to obtain a resample distribution. For the proposed method, a resample distribution of estimates can be similarly obtained by the following procedure.
PROCEDURE 2
- 1) Run the following 3 steps R times to obtain R sets of paired p values:
- a) For a given variable, randomly divide its observations in each sample group into 2 parts with (approximately) equal sample sizes;
- b) With a given test statistic T, calculate the p value for each part;
- c) Repeat steps a and b for all variables and obtain the set of paired p.
- b) With a given test statistic T, calculate the p value for each part;
- 2) Sample m integer numbers
with replacement from the set
with probability
.
- 3) Perform the following 2 steps for each set of paired p values:
- Form a new set by selecting
th paired p values;
- use (2.1) to estimate
.
- 4) Record the median of these R estimates of
.
- 5) Return a resample distribution by repeating steps 24 B times.
- a) For a given variable, randomly divide its observations in each sample group into 2 parts with (approximately) equal sample sizes;
The proposed estimation method is derived based on the approximated
. It is necessary to study the approximation error. We can show that
|
| (2.2) |
where
is the expected value of p value following the nonnull distribution component
. The mathematical proof is given as Lemma 2 in supplementary material available at Biostatistics online.
The approximation will be close if
for all
and any
. An ideal case is that all p values from the alternative hypothesis follow only one distribution (
). In this situation, we have
for all
and any
, and therefore
.
The approximation will also be close if
for all
and any
. An ideal case is that the number of samples in each group goes to infinity, in which we have
for all
and any
, and therefore
.
To better understand the approximation error when the p values from the alternative hypothesis are heterogeneously distributed, we have the following discussion. If the number of samples in each group in the first data set is the same as the corresponding one in the second data set, then we have
for all
and
|
|
Since moment estimators are generally asymptotically efficient,
will be asymptotically overestimated. An upper bound can be further derived:
|
|
Based on this upper bound, the following conclusions can be drawn:
- The approximation error depends on the "factor" (the smaller the better). It will be small if
. The estimation bias will be larger if
is closer to 0 (or if the proportion of differentially expressed genes is larger).
- The approximation error depends on the "numerator" (the smaller the better). It will be small if
or, equivalently,
for all
. This case has been discussed above.
- The approximation error depends on the "denominator" (the larger the better). For p values from the alternative hypothesis, we have
. Since
,
. Therefore, the approximation error will be small if
for all
. This case has also been discussed above.
| 3. SIMULATIONS AND APPLICATIONS |
|---|
|
|
|---|
A typical application of the proposed method is to estimate the proportion of differentially expressed genes in a given microarray gene expression data set. This proportion is actually
. Therefore, it is equivalent to estimate
, which is the proportion of nondifferentially expressed genes. Many statistical methods have been proposed to estimate
, such as QVALUE (Storey and Tibshirani, 2003
), BUM (Pounds and Morris, 2003
), SPLOSH (Pounds and Cheng, 2004
), and LBE (Dalmasso and others, 2005). In this section, we compare the proposed method with these existing statistical methods through simulations and applications. The simulations are conducted based on a microarray gene expression data set for a breast cancer study. We use the 2-sample Student's t-test for hypothesis testing. For the experimental data set, we observe from QuantileQuantile plots that the p values given by the t-distribution and the permutation procedure are consistent (data not shown). Therefore, we choose to use the t-distribution to assess p values because it gives unique results.
Statistical efficiencies can be compared in simulation studies since we know the truth. With a given
, we repeat simulation and estimation procedures B=100 times. Note that the proposed method requires much more computation time than these existing methods because of its repetition of random data division (
). Although
is a relatively small number, it is adequate to compare the performances of different methods. The root mean square error (RMSE), Bias, and standard deviation (SD) are used to compare different methods (estimators) including the proposed one. For an estimator
be the calculated estimate in the ith simulation. The Bias, SD, and RMSE are defined as:
and
In general, there are complicated dependence structures in a microarray gene expression data set. Therefore, we conduct the following simulation studies with covariance matrices constructed based on a microarray gene expression data set (the first data set in Section 3.3). A gene expression data set is simulated with
genes and 2 sample groups with sample sizes
(simulation studies 1 and 2) or 50 (simulation study 3). Data are simulated from normal distributions with an assumed proportion
of differentially expressed genes. Genes are grouped into 30 blocks with 100 genes in each block. For each block, we randomly select 100 genes from the experimental data set and calculate the correlation matrices
and
in the first and the second groups, respectively. For blocks of differentially expressed genes, we simulate data from the normal distributions
and
for the first and the second sample groups, respectively. For the remaining blocks, we simulate data from the normal distributions
and
for the first and the second sample groups, respectively. Here, 0 and
are (random) vectors. For each configuration, we repeat simulation and estimation procedures
times. Different statistical methods are used to estimate
. We run QVALUE, BUM, SPLOSH, and LBE with their default settings. For the proposed method, we divide each sample group into 2 parts with equal sample sizes:
for simulation studies 1 and 2,
for simulation study 3. The results are summarized in Figure 2 in which RMSE, Bias, and SD are compared. We also compare boxplots of the estimation results from different methods when
.
|
The first simulation study is to consider the situation that there is only one p-value distribution component for differentially expressed genes. We fix
and let
. Generally, the sample size of a microarray data set is relatively small. Therefore, we set
. As shown in Figure 2, for
around 0.2, only BUM gives smaller RMSEs than the proposed method. For other values of
, the proposed method gives the lowest RMSEs. Note that the behavior of BUM is not stable. It gives the highest RMSEs when
or
. For different values of
, the proposed method consistently gives relatively low biases and the second lowest SDs.
The second simulation study is to consider a general situation that p values of differentially expressed genes may follow different distribution components. We randomly sample
from a uniform distribution
and let
and
. As shown in Figure 2, for
, the proposed method gives the lowest RMSEs. For
around 0.2, only BUM gives lower RMSEs than the proposed method. Note again that the behavior of BUM is not stable. It gives the highest RMSEs when
. For
around 0.1, QVALUE gives the lowest RMSEs, and the proposed method gives a slightly higher RMSEs. For different values of
, the proposed method consistently gives relatively low biases and the second lowest SDs.
The third simulation study is to consider the situation that the sample size of a microarray data set is relatively large. Therefore, we set
. We still consider a general situation that p values of differentially expressed genes may follow different distribution components. We randomly sample
from a uniform distribution
and let
. As shown in Figure 2, the proposed method always gives the lowest RMSEs and biases and the second lowest SDs for different values of
.
Simulations for other configurations are also considered. Generally, the proposed method can give comparably favorable performances. However, if the sample size is very small (e.g.
), the proposed method will give poor performances. This is not surprising. If the sample size of a given data set is very small, then the sample size of a divided subset will be even smaller, which significantly reduces the power to detect differential expressions. This fact has also been discussed by Pawitan and others (2005a,b). Therefore, while enjoying the model identifiability through data division, we lose certain statistical efficiency in estimations.
The above theoretical and simulation studies show the favorable performances of the proposed method especially when (i) the sample size is relatively large, (ii) the p values from the alternative hypothesis are homogeneously distributed, or (iii) the proportion of differentially expressed genes is relatively small. In practice, it is difficult to find a microarray data set for the second or the third situation. However, there are many microarray data sets with relatively large sample sizes.
We consider 2 data sets for applications. The first one is the famous microarray gene expression data set for a breast cancer study. Hedenfalk and others (2001) used microarrays to compare 3226 gene expression profiles between 7 BRCA1 samples and 8 BRCA2 samples. The data set is publicly available at http://research.nhgri.nih.gov/microarray/NEJM_Supplement. A total of 56 genes were filtered out, because they had one or more expression measurements exceeding 20, which were considered not trustworthy (Storey and Tibshirani, 2003
). Therefore, 3170 gene expression measurements for 15 samples are used in this study.
The second data set has a relatively large sample size. Wiestner and others (2003) used lymphochips to compare 12 447 gene expression profiles between 79 Ig-mutated and 28 Ig-unmutated samples with chronic lymphocytic leukemia. The data set is publicly available at http://llmpp.nih.gov/cll/. We use the k-nearest neighbors method (R package impute; Troyanskaya and others, 2001) to impute the missing values in the data set.
We use different statistical methods to estimate
. QVALUE, BUM, SPLOSH, and LBE are run with their default settings. For the proposed method, we divide the data set into 2 subsets:
for the first data set and
for the second data set. We bootstrap
times to obtain the resample distributions of estimates (see Section 2 for details). Since the p values from the null hypothesis follow a uniform distribution
,
is expected to be under the curve of underlying empirical p-value distribution. (
, where
,
, and f are the null, nonnull, and marginal distributions of p value, respectively.)
For the first data set, Figure 1(c) shows a histogram of p values and boxplots to compare estimates from different methods. Only the proposed method and BUM give estimates under the histogram. The proposed method gives the smallest estimated
. Among these 5 methods, BUM gives a relatively small variance and the other 4 give comparatively high variances. However, from the simulation studies (e.g. boxplots in Figure 2), some confidence intervals given by BUM do not contain the true value and are not meaningful. Therefore, the proposed method may give more reliable estimation results.
For the second data set, Figure 1(d) shows a histogram of p values and boxplots to compare estimates from different methods. Not only the proposed method gives the smallest estimates but also its whole boxplot is under the histogram. Furthermore, its variance is relatively small among these 5 methods.
In the above simulation studies and applications, the variances of BUM are always the lowest among these 5 estimation methods. This comes from the simple model of BUM: the mixture of a beta distribution and a uniform distribution. However, it is difficult to validate this model in practice.
| 4. DISCUSSION |
|---|
|
|
|---|
In the problem of estimating the proportion of true null hypotheses, the number of variables is the sample size of study. Microarrays and other high-throughput technologies enable us to collect measurements for a large number of variables. With these data, moment-based estimation methods can be considered, because they are generally asymptotically efficient. In this study, we proposed a moment-based estimation method coupled with sample splitting and discussed its theoretical properties. The simulation studies and the applications to microarray data showed the favorable performances of the proposed method when it was compared with the other existing methods. Since the t-test requires at least 2 samples in each group, the proposed method cannot be applied when a group sample size is less than 4. In such a situation, other statistical methods, such as QVALUE, should be considered. From the above analyses, we observe that there are certain situations for a particular method to achieve its optimal performance. New methods for estimating
are being proposed (Langaas and others, 2005). It is necessary to conduct more comprehensive reviews and systematical comparisons for different
-estimation methods.
We recently proposed a likelihood-based method coupled with an EM algorithm for estimating
(Lai, 2006
). Random data division was also used to achieve the model identifiability. Through simulations and applications to microarray gene expression data, we showed the favorable performances of this method (Lai, 2006
). However, there are 2 disadvantages: (i) The method is likelihood based and assumes independence among different genes, which is unlikely to be true because genes interact with each other during cellular processes. (ii) The method uses an EM algorithm, which may provide unreliable estimation when the likelihood function is not regular. The moment-based method proposed in this study requires no independence assumption. In addition to its favorable performances, it is completely nonparametric and simple with an explicit formula to give a unique solution.
A future research topic is to generalize the proposed method so that estimation efficiencies can be further improved. As shown in the simulation studies, the estimation variance tends to increase when the true proportion increases (Figures 2). In the second simulation study for heterogeneous alternative, there is a considerable estimation bias when the true proportion is relatively small (Figure 2). It is necessary to pursue both theoretical and simulation studies so that more efficient estimation methods can be developed.
| ACKNOWLEDGMENTS |
|---|
I am grateful to Prof. Tapan Nayak, the editors, associate editors, and the anonymous reviewers for their helpful comments and suggestions. This work was partially supported by a start-up fund from the George Washington University and the National Institutes of Health grant DK-75004. The R codes are available at http://home.gwu.edu/
ylai/research/RDPM. Conflict of Interest: None declared. | REFERENCES |
|---|
|
|
|---|
-
Allison DB, Gadbury GL, Heo M, Fernandez JR, Lee C-K, Prolla TA, Weindruch R. (2002) A mixture model approach for the analysis of microarray gene expression data. Computational Statistics and Data Analysis 39:120.
Benjamini Y and Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57:289300.
Casella G and Berger RL. (2002) Statistical Inference 2nd edition (Duxbury, Pacific Grove, CA).
Dalmasso C, Broët P, Moreau T. (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660668.
Efron B. (1979) Bootstrap methods: another look at the jackknife. Annals of Statistics 7:126.
Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioniemi OP. and others. (2001) Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine 344:539548.
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics 21:30973104.
Lai Y. (2006) A statistical method for estimating the proportion of differentially expressed genes. Computational Biology and Chemistry 30:193202.[CrossRef][Web of Science][Medline]
Langaas M, Lindqvist BH, Ferkingstad E. (2005) Estimating the proportion of true null hypotheses, with application to DNA microarray data. Journal of the Royal Statistical Society, Series B 67:555572.[CrossRef]
Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A. (2005a) False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics 21:30173024.
Pawitan Y, Murthy KRK, Michiels S, Ploner A. (2005b) Bias in the estimation of false discovery rate in microarray studies. Bioinformatics 20:38653872.
Pounds S and Cheng C. (2004) Improving false discovery rate estimation. Bioinformatics 20:17371745.
Pounds S and Morris SW. (2003) Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19:12361242.
Storey JD and Tibshirani R. (2003) Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America 100:94409445.
Tadesse MG, Ibrahim JG, Vannucci M, Gentleman R. (2005) Wavelet thresholding with Bayesian false discovery rate control. Biometrics 61:2535.[CrossRef][Web of Science][Medline]
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520525.
Wang S-J and Chen JJ. (2004) Sample size for identifying differentially expressed genes in microarray experiments. Journal of Computational Biology 11:714726.[CrossRef][Web of Science][Medline]
Wiestner A, Rosenwald A, Barry TS, Wright G, Davis RE, Henrickson SE, Zhao H, Ibbotson RE, Orchard JA, Davis Z. and others. (2003) ZAP-70 expression identifies a chronic lymphocytic leukemia subtype with unmutated immunoglobulin genes, inferior clinical outcome, and distinct gene expression profile. Blood 101:49444951.
Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, Ward D, Williams K, Zhao H. (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19:16361643.
Received August 4, 2006; revised January 5, 2007; accepted for publication January 17, 2007.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



