Skip Navigation


Biostatistics Advance Access originally published online on August 11, 2006
Biostatistics 2007 8(2):402-413; doi:10.1093/biostatistics/kxl018
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
8/2/402    most recent
kxl018v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Ghosh, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ghosh, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Incorporating monotonicity into the evaluation of a biomarker

Debashis Ghosh*

Department of Biostatistics, University of Michigan, 1420 Washington Heights Ann Arbor, Michigan 48105, USA ghoshd{at}umich.edu

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 
In the assessment of clinical utility of biomarkers, case–control studies are often undertaken based on existing serum samples. A common assumption made in these studies is that higher levels of the biomarker are associated with increased disease risk. In this article, we consider methods of analysis in which monotonicity is incorporated in associating the biomarker and the clinical outcome. We consider the roles of discrimination versus association and assess methods for both goals. In addition, we propose a semiparametric isotonic regression model for binary data and describe a simple estimation procedure as well as attendant inferential procedures. We apply the various methodologies to data from a prostate cancer study involving a serum biomarker.

Keywords: Generalized linear model; Mixed model; Monotone regression; Pooled adjacent violators algorithm; Smoothing spline


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 
There has been extensive work done on the development of methodology for diagnostic testing and screening (Pepe, 2003Go). The primary scientific goal in this area is to determine the discriminatory power of a biomarker for detecting disease.

Given the recent proliferation of candidate biomarkers from various genetic technologies (e.g. microarrays, single nucleotide polymorphisms and methylation assays), there is great interest in the development and validation of biomarkers that may be used for early detection of cancer (Pepe and others, 2001). The case–control study is a popular design in which one might wish to study the operating characteristics of a biomarker. Here, we will consider data from a case–control study of Formula-methylacyl coA racemase (AMACR). AMACR may serve to improve operating characteristics in prostate cancer detection and prognosis. We give further details of the study and the data in Section 2.

For much of the work on statistical methodology on screening and diagnostic tests utilizing biomarkers, it has been assumed that increasing levels of the biomarker are associated with increased disease risk. Typically, this effect has been specified in a parametric manner. We give two examples of this. The first is linear discriminant analysis, in which the distribution of the biomarker in the diseased and undiseased populations is assumed to be Gaussian. The second is logistic regression, in which the effect of the biomarker on the probability of developing disease is assumed to be linear on the logit scale. For these procedures, if the parametric form is misspecified, the estimate of association between biomarker levels and disease risk may be biased.

In this article, we explore various ways of accounting for the monotonic relationship between biomarker and disease risk while adjusting for other covariates, without making strong parametric assumptions. If the goal is to assess the discriminatory power of the biomarker, then regression models for the receiver operating characteristic (ROC) curve are an appropriate tool. We outline their use in Section 3. In Section 4 of this article, we consider the use of semiparametric binary regression models with a nonparametric monotonic component. While nonparametric isotonic procedures have been studied in the literature (Robertson and others, 1988), their semiparametric counterparts have not been explored as much. One proposal for this setting was given by Morton-Jones and others (2000); we describe its application to the prostate cancer data in Section 4.2. In addition, we propose a new estimation algorithm, which is quite simple and can be implemented using standard statistical software. The algorithm, along with the attendant inference procedures, is given in Section 4. We apply the proposed methods to the AMACR case–control data in Section 5. We conclude with some brief remarks in Section 6.


    2. DATA BACKGROUND
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 
AMACR is an enzyme at the merging point of two important pathways of lipid metabolism: elimination of methyl-branched fatty acids and synthesis of bile acids. AMACR is regarded as obligatory for these processes. It was identified in a series of microarray experiments as being upregulated in prostate cancer tissue relative to noncancerous tissue (Rubin and others, 2002). In order to study the potential utility of AMACR as a serum biomarker for prostate cancer, a case–control study was performed to study the association between AMACR and risk of prostate cancer (Sreekumar and others, 2004). While the microarray experiments measured AMACR at a messenger RNA level, for the purposes of diagnosing disease it is important to assay the biomarker at a protein level. This is what motivated the study of Sreekumar and others (2004).

We now describe the design of the case–control study. The cases were men who had biopsy-proven clinically localized prostate cancer diagnosed at the University of Michigan between January 1995 and January 2003; they are incident cases of prostate cancer. Serum samples were taken from these men before radical prostatectomy (surgical removal of the prostate). The controls were a random sample of subjects who had been at the University of Michigan Hospital between May 2001 and May 2003 who had no known history of any cancer.

Because elevated prostate-specific antigen (PSA) levels and/or an increase in PSA over time contribute to a positive diagnosis, the distribution of PSA in these cases represents a biased sample. The ideal study for determining whether or not AMACR has better operating characteristics as a screening biomarker than PSA would be one in which both measurements were available before the start of the prostate cancer diagnosis process. Nevertheless, the current study can still provide information on the discriminating ability of AMACR.

An IgG2-specific secondary antibody was used to detect the presence of AMACR immune response in 326 subjects (212 controls and 114 cases). In this analysis, we take S to be the AMACR measurement, and Z to be the age at biopsy. We transform the biomarker to the logarithmic scale in order to reduce skewness because of outlying observations. It is important to include age at biopsy as it may be a confounding factor in the assessment of the association between AMACR and risk of prostate cancer. Finally, D denotes disease status, 0 for control and 1 for cancer (case).


    3. BIOMARKER EVALUATION: DISCRIMINATION VERSUS ASSOCIATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 
In this section, we address the issue of the appropriateness of studying discrimination versus association and how monotonicity affects each. In order purely to assess the discriminatory ability of a biomarker, the ROC curve (Baker, 2003Go) is a useful summary measure. The ROC curve is a graphical presentation of Formula, where Formula and Formula are the false- and true-positive rates for S, respectively, based on a cutoff point c. The ROC curve for AMACR is given in Figure 1. Comparing the ROC curve to the 45Formula line suggests that there is some, but not very much, discriminative ability of AMACR. The area under the ROC curve is 0.609.


Figure 1
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. ROC curve for AMACR in prostate cancer study.

 
As discussed by Pepe (2000)Go, an alternative definition of the ROC curve is given by the formula

Formula (3.1)

for Formula, where Formula and Formula are S for Formula and Formula and t is a false-positive rate. Equation (3.1) represents the conditional probability that the biomarker for a randomly chosen case is higher than that of a randomly chosen control, conditioning on the fact that the biomarker measurement was standardized to the values in the population without disease. Inspection of (3.1) reveals that it will be invariant to monotone transformations of S. Thus, the ROC curve is a probabilistic quantity that directly incorporates monotonicity. It is commonly assumed that diseased populations will have larger values of the biomarker.

There are several ways to assess the effects of covariates on the ROC curve. A very simple approach is given by Alonzo and Pepe (2002). The model being fitted is the following:

Formula

where Formula, g is a monotone increasing function on the unit interval, and Formula is a regression coefficient summarizing the effect of AGE on the ROC curve. Following the procedure of Alonzo and Pepe (2002), we take g to be the inverse probit link, Formula and Formula, where Formula is the standard normal cumulative distribution function. The estimation algorithm can be summarized as follows:

  1. Select a set of false-positive rates Formula Formula. Here, we take Formula and Formula.
  2. For FormulaFormula , estimate Formula, the t age-specific quantile of the survivor distribution of the nondiseased test results.
  3. Calculate Formula for Formula and FormulaFormula . Note that Formula is the number of cases (i.e. number of individuals with Formula).
  4. Fit the model

Formula

using standard software for generalized linear models for binary data.

Note that we could use other links (e.g. logit) and other functions Formula (e.g. splines) in step 4; those extensions are not pursued here. The algorithm provides an estimate of Formula. In order to get the standard error of the estimate (SEE) of Formula, we bootstrap this algorithm 500 times, as recommended by Alonzo and Pepe (2002). Applying this method to the AMACR data, the estimate of Formula is Formula with a 95% confidence interval (CI) of Formula. Thus, age has no effect on the discriminatory ability of AMACR to separate cancer cases from controls.

An alternative approach to considering discrimination of a biomarker is to consider its association with case/control status, potentially adjusting for important prognostic factors and confounders. While Pepe and others (2004) show that strong association as measured through an odds ratio does not imply good discrimination, we argue that such association measures for biomarkers should still be considered. Potentially, AMACR could be used in combination with PSA to increase the specificity of screening for prostate cancer. Then information on association of AMACR with disease status, possibly adjusting for other variables, may still be of use. As discussed by Sidransky (2002)Go, it will be the case that cancer diagnostic and screening panels in the future will contain multiple biomarkers. Thus, it is still useful to consider associations of AMACR with cancer status. We next examine strategies for modeling associations where the monotonicity of the biomarker effect is incorporated.


    4. SEMIPARAMETRIC REGRESSION METHODOLOGIES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 

4.1 Semiparametric model and interpretation

We formulate the following class of regression models:

Formula (4.1)

where Formula is the regression coefficient for age to be estimated, g is a link function, and f is an unspecified monotone function. The function f quantifies the effect of the biomarker on the probability of disease, adjusting for Formula. We assume f to be monotone in S. For ease of exposition, we will use the logit link, although the proposed methods can work with other links as well. While (4.1) refers to the AMACR example being considered, the model extends generally to handle multidimensional Z. The data consist of n independent and identically distributed observations Formula, Formula.

We are interested in fitting the model to case–control data. The parametric components of the Model (4.1) retain the same interpretations as if the data had been collected from a cohort study. For example, if the link function is the logit, then Formula describes the odds ratios associated with Formula, adjusting for the biomarker. A complication arises in interpreting the function f in (4.1). In particular, while the shape and curvature of f will be the same with case–control data as with cohort data, one cannot make any inference about the intercept of f with case–control data.

4.2 Algorithm of Morton-Jones and others (2000)

One approach to fitting (4.1) can be found in the work of Morton-Jones and others (2000). They describe an algorithm which is an extension of a cyclical pooled adjacent violators algorithm (PAVA) (Bachetti, 1989) that allows for adjustment for covariates. The algorithm proceeds by iterating between the isotonic regression for the nonparametric component and iteratively weighted least squares for the parametric component. When we tried to implement the algorithm, the nonparametric component failed to converge after 1000 iterations. In the nonparametric case, Bachetti (1989) suggests that convergence of this type of algorithm may be quite slow in practice.

In addition to the practical problems, there are issues as to how to perform inference. While Morton-Jones and others (2000) propose a simulation-based method to determine pointwise (CIs) for the nonparametric component, they give no inferential procedure for the parametric component. Thus, it would be desirable to have a computationally simple procedure that allows for relatively easy inference on both aspects of the model.

4.3 Proposed numerical fitting procedure

Assume that there is no constraint for f in (4.1). Note that we can use either a retrospective or a prospective likelihood for estimation because the two are equivalent up to the intercept with case–control data (Prentice and Pyke, 1979Go). For the sake of computational simplicity, we use the prospective likelihood. A natural method of estimation is smoothing splines (Green and Silverman, 1994Go; Eubank, 1999Go). This can be formulated as maximizing the following penalized log-likelihood:

Formula (4.2)

where Formula is a smoothing parameter, Formula, and Formula is the second derivative of f. We take a and b to define the range of Formula. By utilizing the smoothing spline representation of f, maximizing (4.2) is equivalent to maximizing

Formula (4.3)

where F is the Formula vector of f evaluated at the r unique values of Formula and Formula is the corresponding nonnegative definite smoothing matrix (Green and Silverman 1994Go, Section 2.3). Differentiating (4.3) with respect to Formula and f yields estimating equations that we set equal to zero in order to solve for the estimators Formula and Formula.

Following the arguments in Lin and Zhang (1999)Go, we can re-express the estimation procedure above using the mixed model framework. Similar results can be found in Verbyla and others (1999) and Ruppert and others (2003). In particular, we can estimate Formula and f using the following generalized linear mixed effects model:

Formula (4.4)

where a is a normal random variable with mean zero and variance Formula, Formula, Formula, where Formula is a Formula full rank matrix that satisfies the following two conditions:

  1. Formula;
  2. Formula
We can then re-express (4.3) as

Formula (4.5)

where Formula. The log-likelihood (4.5) corresponds precisely to that for a binary generalized linear mixed effects model with fixed effects given by Formula and random effects represented by Formula. This implies that standard generalized linear mixed effects software, such as PROC GLIMMIX in SAS (Wolfinger, 1996Go), can be used to fit the semiparametric binary regression model to the data. The estimates of Formula and Formula are calculated by Newton–Raphson procedures. Based on the resulting solutions, Formula and Formula, we can estimate the parametric and nonparametric components of (4.1) where no constraints are placed on f. The estimate of f is given by

Formula

The details of the required computations can be found in Lin and Zhang (1999)Go. Note that the smoothing parameter Formula can be estimated automatically (Lin and Zhang, 1999Go). This is because it is represented as a variance component in the mixed effects model formulation and is optimized using an approximate restricted quasi-likelihood, which is an approximation to the likelihood for the binomial random effects model (4.4), integrated over the random effects.

We next show how to modify the procedure in the presence of the monotonicity constraint for (2.1). It is obvious that if the resulting estimate of f is monotonic, then no further modification is required. Otherwise, we are proposing an extension of the method proposed for nonparametric models by Mammen and others (2001) to the semiparametric model (4.1) proposed here. We first rephrase the optimization problem (4.2) and incorporate the monotonicity constraint

Formula (4.6)

where C is the set of monotonic functions. If we appeal to the iterated weighted least squares solution of f and Formula in (4.1) in the absence of the constraint, then by following the arguments in Mammen and others (2001), the solution of (4.6) can be obtained using the following two-stage procedure:

  1. At the first stage, estimate f and Formula jointly ignoring the constraint using the likelihood-based algorithm in Lin and Zhang (1999)Go.
  2. Based on the estimated f from (1), project it onto the space of monotonic functions, using the PAVA algorithm as described by Robertson and others (1988). The algorithm finds Formula that minimizes

Formula

such that Formula, where Formula are the ordered distinct values of the biomarker.

The solution of the PAVA algorithm is equivalently given by the maximin formula

Formula

This estimation procedure can be thought of a semiparametric analogue of the "smooth, then isotonize" estimators considered by Kelly and Rice (1990)Go and Mammen (1991)Go and is very simple to implement.

In terms of inference about the parametric and nonparametric components of (4.1), the standard errors (SEs) for Formula are estimated using an approximated doubly penalized quasi-likelihood (Lin and Zhang, 1999Go), which is performed in step 1 of the algorithm. Because of the constraint on f, getting a handle on the asymptotic behavior is difficult so that the standard errors from the Lin and Zhang (1999)Go method may not be appropriate. This issue is taken up further in the Section 4.4.

4.4 Inference procedures

In fitting semiparametric models such as (4.1), Murphy and Van der Vaart (1997) note that inference based on Wald-type statistics tends to be unstable, and have developed an elegant theory of semiparametric likelihood ratio inference. However, their focus is on the finite-dimensional component in the model and not on the infinite-dimensional component. In a later paper by Murphy and Van der Vaart (2000), inference for the parametric component of the model is made by profiling out the infinite-dimensional component, which is regarded as a nuisance parameter, and the profile likelihood function is shown to have some of the crucial properties of usual parametric likelihoods. In the cancer setting that motivated this work, we have joint interest in both the parametric and nonparametric components of the model.

By appealing to the profile likelihood arguments of Murphy and Van der Vaart (2000), we have that the likelihood ratio-based statistic for the parametric component will have a Formula limiting distribution. Inference for the nonparametric component in (4.1) is more complicated. Assume that the data came from a cohort study; then f is identifiable. In the case of the monotonicity constraint, an argument combining Taylor series expansions with those in Banerjee and Wellner (2001)Go would yield a nonstandard limiting distribution for the likelihood ratio statistic for testing Formula. Our interest is in testing if there is a biomarker effect, which effectively places an infinite number of constraints on f in (4.1) under the null hypothesis. Maximizing the likelihood with an infinite number of constraints under the null hypothesis renders the theory of Banerjee and Wellner (2001)Go inapplicable. In addition, we have the complication that the data come from a case–control study so that f is not completely identified. We take an alternative approach and utilize subsampling techniques (Politis and others, 1999). The algorithm proceeds as follows:

  1. Sample without replacement Formula from the controls and Formula from the cases, where Formula and Formula are the numbers of the subsampled controls and cases in the dataset.
  2. Perform the two-stage estimation procedure in Section 4.3.
  3. Repeat steps 1 and 2 several times.
The procedure yields an empirical distribution for f from which we can construct test statistics for testing for a biomarker effect. Note that we subsample from cases and controls without replacement. It turns out that the bootstrap is not applicable here because the nonparametric component has a limiting distribution that converges at a slower than Formula rate. In practice, we take Formula and Formula to be such that the ratio of the two matches that of the original number of controls to cases. Testing the null hypothesis that the biomarker has no effect is equivalent here to testing for a zero first derivative. We thus compute

Formula

where Formula is the derivative based on the isotonized estimate of f, and use the empirical distribution of T from the subsampled datasets to test the hypothesis. The derivative is calculated using the methods of Press and others (1992).

To examine the finite-sample properties of the proposed methodology, we performed some simulation studies. First, data were generated from (4.1). Two choices were considered for the joint distribution of Formula. In the first, we assumed independence of Z and S, with S being normally distributed with mean 0 and variance 1, whereas Z has an exponential distribution with mean 1 and truncated to [0.5, 1.5]. In the second case, S has the same distribution as in the first case but conditional on S, Z has an exponential distribution with mean Formula. Sample sizes Formula and 1000 were considered. To mimic the sampling design of the prostate cancer data, we used case–control sampling with on average a third of the sample being cases. We took Formula, and 0.5. The choice for f was taken as Formula. Likelihood ratio and Wald statistics for Formula were calculated using the proposed estimation procedure; the results are given in Table 1. Based on the table, we see that the procedure yields unbiased estimators for all scenarios considered. In addition, the 95% likelihood ratio-based CIs tend to perform slightly better than those based on the Wald statistic, although the discrepancy diminishes in larger sample sizes. We also studied the power of the subsampling procedure; the results can be found in the supplementary material available at Biostatistics online to this paper. Based on the results there, the proposed method appears to have satisfactory operating characteristics.


View this table:
[in this window]
[in a new window]

 
Table 1. Simulation results for ß

 

    5. AMACR DATA
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 
We now return to the prostate cancer case–control study described in Section 2. We first fit univariate logistic regression models for S and Z. These results are summarized in Table 2. Based on these crude analyses, we find that AMACR is highly predictive of risk of prostate cancer and that age is not significantly associated with disease risk. A multivariable regression model with both AMACR and age was also considered; this is given in Table 2. From that, we find that the association of AMACR and prostate cancer risk remains even after adjusting for age, and that the confounding effects of age on AMACR appear to be minimal. We then explored the assumption of linearity for the effect of AMACR. This was done in two ways. First, we looked at residuals from the multivariate model in Table 2. This showed evidence of nonlinearity of the effect of AMACR for values beyond 1.2. Second, we fit generalized additive models (Hastie and Tibshirani, 1990Go) for disease as a smooth function of AMACR and age. The plot for AMACR is given in Figure 2; the assumption of linearity (constant effect of S) seems clearly to be violated based on the picture. The test for nonlinearity from the generalized additive model indicated strong evidence for nonlinearity (p-value Formula). While the picture appears to be inconsistent with monotonicity, we also note that the plot appears to be sensitive to three influential points for AMACR.


View this table:
[in this window]
[in a new window]

 
Table 2. Univariate and multivariate logistic regression results for Sreekumar and others (2004) prostate cancer data

 

Figure 2
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Estimated effect of AMACR (solid line) and pointwise CIs (dotted lines) from fitting a generalized additive model of AMACR on probability of prostate cancer. Tick marks indicate data points.

 
We next used the proposed algorithm from Section 4 to fit (4.1). The log odds ratio for age was 0.016, with an associated standard error of 0.013. The 95% likelihood ratio-based interval was Formula, which suggests that after adjusting for AMACR, there is no significant association between age and risk of prostate cancer. The unconstrained and constrained estimates of the AMACR effect are given in Figure 3. The percentile-based 95% CI for T based on 10000 subsamplings is (0.5, 3.2). We thus have strong evidence for a biomarker effect in this model, adjusting for age. Combining the results, we find strong evidence that modeling a linear effect of AMACR is fine for values between 0.2 and 1.2 but not over the full range of AMACR values. This was confirmed by residual plots from fitting the multivariate logistic regression models over the restricted range of values for AMACR. In addition, if we wish to assume that increasing values of AMACR are associated with higher risk of cancer, then this monotonicity will substantially alter the estimate of the AMACR effect.


Figure 3
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Unconstrained (solid line) and constrained (dashed line) estimate of AMACR effect from fitting (2.1) to prostate cancer data.

 
Combining these results with those in Section 3, we find that while AMACR does not have good ability to discriminate between prostate cancer and noncancerous tissues, it is nevertheless strongly associated with disease status. This suggests that AMACR might potentially have a role in combination with PSA as a biomarker panel for screening and diagnosis of prostate cancer. This hypothesis is currently being tested using prospectively collected samples (M. Sanda, personal communication).


    6. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 
For the current application, it makes sense to consider monotonicity. If a biomarker were to be adopted for use in a clinical setting, then one imagines a diagnosis rule of the biomarker being above or below a threshold value. It should be pointed out that in risk assessment and other contexts, J-shaped risk relationships have been found (Goetghebeur and Pocock, 1995Go). For such a setting, incorporating monotonicity might be quite misleading.

One extension of (4.1) would be to allow for multiple biomarkers. Given this setting, one could imagine formulating monotone effects for each individual one. This leads to an extension of (4.1) in which there are several nonparametric components that must satisfy a monotonicity condition. Assuming no interactions and in the absence of a parametric component for the model, we would have a generalized additive model with isotonic components. One approach to fitting such a model is the backfitting algorithm (Hastie and Tibshirani, 1990Go) combined with the PAVA to guarantee monotonicity. If we include a parametric component to the model, then estimation of the two classes of components becomes more challenging. This is currently under investigation.


    ACKNOWLEDGMENTS
 
The author would also like to thank the editor and two reviewers whose comments substantially improved the manuscript. This research is supported in part by the National Institutes of Health. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. DATA BACKGROUND
 3. BIOMARKER EVALUATION:...
 4. SEMIPARAMETRIC REGRESSION...
 5. AMACR DATA
 6. DISCUSSION
 REFERENCES
 

    Alonzo TA and Pepe MS. (2002) Distribution-free ROC analysis using binary regression techniques. Biostatistics 3:421–32.[Abstract]

    Bacchetti P. (1989) Additive isotonic models. Journal of the American Statistical Association 84:289–94.[CrossRef][Web of Science]

    Baker SG. (2003) The central role of receiver operating characteristic curves in evaluating tests for the early detection of cancer. Journal of the National Cancer Institute 95:511–5.[Free Full Text]

    Banerjee M and Wellner JA. (2001) Likelihood ratio tests for monotone functions. Annals of Statistics 29:1699–731.[CrossRef][Web of Science]

    Eubank RL. (1999) Nonparametric Regression and Spline Smoothing 2nd edition (Marcel Dekker, New York).

    Goetghebeur E and Pocock S. (1995) Detection and estimation of J-shaped risk-response relationships. Journal of the Royal Statistical Society, Series A 158:107–21.

    Green PJ and Silverman BW. (1994) Nonparametric Regression and Generalized Linear Models(Chapman and Hall, London).

    Hastie T and Tibshirani R. (1990) Generalized Additive Models(Chapman and Hall, London).

    Kelly C and Rice J. (1990) Monotone smoothing with application to dose-response curves and the assessment of synergism. Biometrics 46:1071–85.[CrossRef][Web of Science][Medline]

    Lin X and Zhang D. (1999) Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society, Series B 61:381–400.[CrossRef]

    Mammen E. (1991) Estimating a smooth monotone regression function. Annals of Statistics 19:724–40.[Web of Science]

    Mammen E, Marron JS, Turlach BA, Wand MP. (2001) A general projection framework for constrained smoothing. Statistical Science 16:232–48.[CrossRef][Web of Science]

    Morton-Jones T, Diggle P, Parker L, Dickinson HO, Binks K. (2000) Additive isotonic regression models in epidemiology. Statistics in Medicine 19:849–59.[CrossRef][Web of Science][Medline]

    Murphy SA and Van Der Vaart AW. (1997) Semiparametric likelihood ratio inference. Annals of Statistics 25:1471–509.[CrossRef]

    Murphy SA and Van Der Vaart AW. (2000) On profile likelihood (with discussion). Journal of the American Statistical Association 95:449–85.[CrossRef][Web of Science]

    Pepe MS. (2000) An interpretation for the ROC curve and inference using GLM procedures. Biometrics 56:352–9.[CrossRef][Web of Science][Medline]

    Pepe MS. (2003) The Statistical Evaluation of Medical Tests for Classification and Prediction(Oxford University Press, Oxford).

    Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, Winget M, Yasui Y. (2001) Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute 93:1054–61.[Free Full Text]

    Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. (2006) Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology 159:882–90.[CrossRef]

    Politis DN, Romano JP, Wolf M. (1999) Subsampling(Springer-Verlag, New York).

    Prentice RL and Pyke R. (1979) Logistic disease incidence models and case-control studies. Biometrika 66:403–11.[Abstract/Free Full Text]

    Press WH, Teukolsky SA, Vetterling WT, Flanner BP. (1992) Numerical Recipes in C(Cambridge University Press, Cambridge).

    Robertson T, Wright FT, Dykstra RL. (1988) Order Restricted Statistical Inference(Wiley, New York).

    Rubin MA, Zhou M, Dhanasekaran SM, Varambally S, Barrette TR, Sanda MG, Pienta KJ, Ghosh D, Chinnaiyan AM. (2002) {alpha}-Methylacyl coenzyme A racemase as a tissue biomarker for prostate cancer. Journal of the Americal Medical Association 287:1662–70.[CrossRef]

    Ruppert D, Wand MP, Carroll RJ. (2003) Semiparametric Regression(Cambridge University Press, Cambridge).

    Sidransky D. (2002) Emerging molecular markers of cancer. Nature Reviews Cancer 2:210–9.[CrossRef][Web of Science][Medline]

    Sreekumar A, Laxman B, Rhodes D, Bhagavathula S, Giacherio D, Ghosh D, Sanda MG, Rubin M, Chinnaiyan AM. (2004) Humoral immune response to alpha-methylacyl-CoA racemase and prostate cancer. Journal of the National Cancer Institute 96:834–43.[Abstract/Free Full Text]

    Verbyla AP, Cullis BR, Kenward MG, Welham SJ. (1999) The analysis of designed experiments and longitudinal data by using smoothing splines. Applied Statistics 48:269–311.

    Wolfinger R. (1996) The GLIMMIX SAS Macro(SAS Institute, Cary, NC).

    Received August 12, 2005; revised February 2, 2006; revised July 9, 2006; revised August 1, 2006; accepted for publication August 9, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow Supplementary Material
    Right arrow All Versions of this Article:
    8/2/402    most recent
    kxl018v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Ghosh, D.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Ghosh, D.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?