Biostatistics Advance Access originally published online on October 24, 2006
Biostatistics 2007 8(3):654-673; doi:10.1093/biostatistics/kxl036
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Generalized monotonic regression based on B-splines with an application to air pollution data
Department of Statistics, Ludwig-Maximilians-Universität München, Akademiestraße 1, 80799 München, Germany florian.leitenstorfer{at}stat.uni-muenchen.de
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
In many studies, it is known that one or more of the covariates have a monotonic effect on the response variable. In these circumstances, standard fitting methods for generalized additive models (GAMs) generate implausible results. A fitting procedure is proposed that incorporates monotonicity assumptions on one or more smooth components within a GAM framework. The algorithm uses the monotonicity restriction for B-spline coefficients and provides componentwise selection of smooth components. Stopping criteria and approximate pointwise confidence bands are derived. The method is applied to the data from a study conducted in the metropolitan area of São Paulo, Brazil, where the influence of several air pollutants like SO2 on respiratory mortality is investigated.
Keywords: Air pollution data; Generalized additive models; Likelihood-based boosting; Monotonic regression
| 1. INTRODUCTION |
|---|
|
|
|---|
In many biometrical problems where generalized smooth regression methods are used, a monotonic relationship between one or more explanatory variables and the response variable has to be assumed. A typical problem of this type which will be considered more closely arises in studies, where the influence of air pollution on mortality or illness is investigated (see, e.g. Schwartz, 1994b
Starting from the pool adjacent violators algorithm (PAVA) (see, e.g. Robertson and others, 1988
) which produces a step function, a variety of methods have been developed to smooth the PAVA results, so as to obtain a smooth estimate of the underlying monotonic function. Details of such approaches, which are mainly based on kernel regression techniques, are given in Friedman and Tibshirani (1984)
, Mukerjee (1988)
, or Mammen and others (2001)
. Alternative approaches, which will be pursued in the following, are based on the expansion of a monotonic function into a sum of basis functions, that is, f =
j
jBj. To assure monotonicity of the estimate, constraints have to be put on the coefficients
j. Ramsay (1988)
suggests the use of monotonic basis functions (integrated splines), while Kelly and Rice (1991)
propose a B-spline basis. As the B-spline approach has become very popular in nonparametric regression (see Eilers and Marx, 1996
), we will focus on the latter.
Many publications on monotonic regression focus on unidimensional smoothing problems with a Gaussian response variable y (see, e.g. Ramsay, 1998
, Zhang, 2004
, or Turlach, 2005
). In the example considered here, as in many ecological or biometrical applications, one has multiple covariates x' = (x1,...,xp), and only for some of the covariates, a monotonic relationship to E(y|x) has to be assumed. Furthermore, the response variables are typically binary or count data, which are considered as binomial or Poisson distributed. Because little work has been done on monotonic regression in a generalized linear model (GLM) context, least-squares approaches have often been used in such cases (see, e.g. Kelly and Rice, 1991
), which lead to unsatisfactory results. Flexible modeling tools are needed where monotonicity restrictions can easily be incorporated into a generalized additive model (GAM) framework.
Recently, boosting approaches became increasingly important in nonparametric regression (see, e.g. Bühlmann and Yu, 2003
). As demonstrated by Tutz and Leitenstorfer (2006)
, monotonicity restrictions are easy to include in likelihood-based algorithms for generalized response problems by componentwise boosting of monotonic basis functions in each step. In the present paper, we suggest boosting based on B-spline basis functions, rather than using monotonic basis functions as in Tutz and Leitenstorfer (2006)
or Ramsay (1988)
. When using B-splines, the monotonicity condition of the estimate is preserved in a different way. A special update scheme for the basis coefficients is proposed which shows good performance. It should be noted that the proposed method avoids the use of algorithms which handle inequality constraints. Procedures of this type typically are computationally burdensome and often yield unstable estimates. From a Bayesian perspective, a B-spline approach to monotonic regression has been suggested by Brezger and Steiner (2004)
.
We illustrate generalized monotonic regression techniques on a data set that has previously been analyzed by Conceição and others (2001)
, Singer and others (2002)
, and Einbeck and others (2004)
. The data have been collected to evaluate the association between mortality due to respiratory causes and the concentration of various air pollutants in the city of São Paulo, Brazil, from 1994 to 1997. The analysis presented here focuses on the effect on people older than 65 years, which is one of the most susceptible population segments. GAMs are frequently used for the analysis of such data, where one first aims at controlling for seasonal and weather effects and then includes an air pollution variable in the model (see Schwartz, 1994b
). There are numerous publications that suggest that respiratory mortality increases with the concentration of air pollutants (see, e.g. Schwartz, 1994a
, for a review). Under this assumption, it seems sensible to estimate smooth effects of pollution concentration on mortality under a monotonicity constraint, resulting in more reliable fits.
In Section 2, the concept of monotonic likelihood boosting based on B-splines is introduced, and an extension to multiple covariate settings is given. In Section 3, the performance of our approach is evaluated in various simulation studies. In Section 4, we take a closer look on the data set mentioned above. Note that throughout the paper, we take monotonic to mean nondecreasing.
| 2. BOOSTING B-SPLINES IN GENERALIZED MONOTONIC REGRESSION |
|---|
|
|
|---|
First, we consider a generalized smooth monotonic regression problem with dependent variable y that can be non-Gaussian and a single covariate x. As in GLMs (e.g. McCullagh and Nelder, 1989
), it is assumed that yi|xi has a distribution from a simple exponential family f(yi|xi) = exp{[yi
i b(
i)]/
+ c(yi,
)}, where
i is the canonical parameter and
denotes the dispersion parameter. The link between µi = E(yi|xi) and the explanatory variable xi is determined by µi = h(
i), where h is a given response function which is strictly monotone (the inverse of the link function g = h 1), and the predictor
i =
(xi) is a function of x. While in GLMs,
(x) is assumed to be a linear predictor, here more generally it is assumed that
(x) = f(x) is a smooth function that satisfies the monotonicity condition
|
| (2.1) |
Obviously, monotonicity in
transforms into monotonicity in the means.
Due to their flexibility, smoothing methods based on B-splines are a common tool in statistics (see, e.g. Eilers and Marx, 1996
). Such approaches are based on an expansion of f into B-spline basis functions, where a sequence of knots {tj} is placed equidistantly within the range [xm in,xmax]. With
denoting the number of interior knots, one obtains the linear term
![]() | (2.2) |
where q denotes the degree of the B-splines and
(the number of basis functions). An algorithm for the computation of B-splines of degree q is given in De Boor (1978)
. Monotonicity can be assured in the following way. Suppose we have B-splines of degree q
1. Let h be the distance between the equally spaced knots. Then the derivative
'(x) = 
(x)/
x can be written as
|
|
for a proof see De Boor (1978)
. Since Bj(x,q 1)
0, it follows from
|
| (2.3) |
that
'(x)
0 holds. In other words, since (2.3) is a sufficient condition for the monotonicity of
(x), the sequence of coefficients
j has to be nondecreasing in order to obtain monotonic functions. This property of B-splines has been previously exploited by Kelly and Rice (1991)
and Brezger and Steiner (2004)
in a monotonic regression setting. Note that throughout this paper, we work with B-splines that are centered by their corresponding integral.
Boosting was originally introduced within the machine learning community (e.g. Schapire, 1990
) for classification problems. More recently, the approach has been extended to regression modeling with a continuous dependent variable (e.g. Bühlmann and Yu, 2003
, Bühlmann, 2006
). The basic idea is to fit a function iteratively by fitting in each stage a "weak" learner to the current residual. In componentwise boosting as proposed by Bühlmann and Yu (2003)
, only the contribution of one variable is updated in one step. In contrast to these approaches, we propose to update a specific simplification of the predictor which makes it easy to control the monotonicity restriction.
For simplicity, in the following the degree q of the B-splines is suppressed. In matrix notation, the data are given by y = (y1,...,yn)' and x = (x1,...,xn)'. Based on the expansion into basis functions, the data set may be collected in matrix form (y,B), where B = (B1(x),...,Bm(x)),Bj(x) = (Bj(x1),...,Bj(xn))'.
The residual model that is fitted by weak learners in one iteration step uses a grouping of B-splines. One considers for r = 1,...,m 1 the simplified model with predictor
![]() | (2.4) |
When fitting model (2.4), the monotonicity constraint is easily checked by comparing the estimates
and
, since monotonicity follows from
. Given an estimate from the previous fitting
![]() |
refitting is performed by
![]() |
It is obvious that
is monotonic if the estimates fulfill
, provided that the previous estimate
was monotonic. The grouping of basis functions into B1,...,Br and Br + 1,...,Bm, the effect of which is adapted by the amount
1(r) in the first and
2(r) in the second group, allows monotonicity to be controlled in a simple way. Fitting a full model with m new parameters would imply much more computational effort. Furthermore, if individual constraints are imposed on each of the m parameters, problems with convergence might occur. Instead, the possible groupings (r = 1,...,m 1) are evaluated and by analogy with componentwise boosting, the best refit is selected. The grouping of B-splines can be derived as a restricted model in the sense of restricted least-squares estimators in linear models (see Theil and Goldberger, 1961
). In the usual form of a smoothed estimate based on B-splines, model (2.4) is given by
![]() | (2.5) |
with the constraints
1 =
=
r =
1(r),
r + 1 =
=
m =
2(r). The constraints specify that blocks of r and m r parameters have to be identical.
Before giving the algorithm, which is based on likelihood-based boosting strategies as proposed by Tutz and Binder (2006)
, the fit of model (2.4) is embedded into the framework of penalized likelihood estimation. Moreover, the model is generalized to a model that contains parametric effects in addition to the smooth monotonic effects. Thus, the intercept term is replaced by z'
0, where z is a vector of covariates and
0 is an unknown parameter vector (possibly specifying only the intercept). It is assumed that zi always contains an intercept. In order to avoid identifiability issues which arise for B-splines in connection with an intercept, the update step is split up into 2 parts. In the first part, the smooth component is updated and in the second, the parametric term. In the first part, one fits by penalized likelihood. Therefore, one considers
![]() |
with 0r, 1r denoting the vectors of length r containing 0s and 1s only. Then the linear predictor may be represented in matrix form by
(x) = B(r)
(r), where B(r) = BR(r) and
(r) = (
1(r),
2(r))'. It is proposed that in each boosting step, the model is estimated by the 1-step Fisher scoring based on generalized ridge regression (Marx and others, 1992
). Standard ridge regression maximizes the penalized log-likelihood
|
|
where li(
(r)) = li(h(B(r)
(r))) is the usual log-likelihood contribution of the ith observation and P(
(r)) = (
/2)

(r) represents the penalty term with ridge parameter
. Model (2.5) is asymmetric in a specific sense. If, for example, r = 2, the first constraint
1 =
2 concerns only 2 parameters, whereas the second constraint
3 =
=
m concerns m 2 parameters, which for m = 22 means 20 parameters are restricted. To account for this asymmetry, we tried the modified penalty
where the parameters are weighted by the number of parameters that are implicitly considered as identical. However, numerical examples showed that the performance of the modified penalty is nearly the same as for the usual ridge penalty
. Hence, in the following we use this simpler scheme, which in matrix form leads to the penalized log-likelihood
|
|
where
= diag{1,1} and
> 0 represents the ridge parameter. Derivation yields the corresponding penalized score function
|
| (2.6) |
with W(
) = D2(
)
(
) 1, D(
) = diag{
h(
1)/
,...,
h(
n)/
},
(
) = diag{
,...,
}, 
= var(yi), all of them evaluated at the current value of
. The monotonicity constraint from (2.3) is incorporated by taking into account only estimates which fulfill
. It is easily seen that the update scheme given below yields the desired nondecreasing sequences of estimates
in each boosting iteration. The update of the parametric term z'
0 is performed in the same way but without penalization and with the design matrix determined by Z = (1,z1,...,zu), where z1,...,zu denote the observed covariate values.
Monotonic Likelihood Boosting for B-splines
Step 1: (Initialization)
Set
,
,
, and
.
Step 2: (Iteration)
For l = 1,2,...
- 1) Fitting step, monotone component
For r = 1,
,m 1, compute the modified ridge estimate based on the 1-step Fisher scoring,

(2.7) where
,
,
, and
. Let
denote the candidates that fulfill the monotonicity constraint. If A =
, stop. Otherwise continue with Step 2.
- 2) Selection step and update, monotone component
Compute the potential update of the linear predictor,
, r
{1,
,m 1}. Choose rl
A such that the deviance is minimized, that is 

(2.8)
- 3) Fitting step and update, parametric term
Based on the 1-step Fisher scoring, one obtains
where
and
. Set 
- 2) Selection step and update, monotone component
When using boosting techniques, the number of iterations l plays the role of a smoothing parameter. Therefore, in order to prevent overfitting, a stopping criterion is necessary. A quite common measure of the complexity of a smooth regression fit is the hat matrix. Consequently, Bühlmann and Yu (2003)
and Bühlmann (2006)
developed a hat matrix for L2-boosting with continuous dependent variable. In the case of likelihood boosting, for more general exponential-type distributions, the hat matrix has to be approximated. For integrated splines, Tutz and Leitenstorfer (2006)
give an approximation based on first-order Taylor expansions, which shows satisfying properties. It is straightforward to derive the hat matrix for the present case along the lines of Tutz and Leitenstorfer (2006)
. With
, Ml = 
W
B(rl)(B
WlB(rl) + 
) 1B
W

, where
and
, and
where
and
, l = 1,2,
, the approximate hat matrix is given by
![]() | (2.9) |
with
. By considering tr(Hl) as the degree of freedom of the smoother, we use as potential stopping criteria an information criterion proposed by Akaike (AIC) and the bayesian information criterion (BIC) criteria, AIC(l) = Devl + 2tr(Hl) and BIC(l) = Devl + log(n)tr(Hl), where
denotes the deviance of the model in the lth boosting step. The optimal number of boosting iterations is defined by l
= argminlAIC(l) or l
= argminlBIC(l). Since the BIC (Schwarz, 1978
) penalizes the complexity of the fit stronger, usually sparser models result. A more extensive treatment of stopping criteria for boosting algorithms is given in Bühlmann and Yu (2006)
.
In biometrical or ecological problems, one is usually interested in the effect of several smooth predictor variables, some of which might have monotonic influence on y. In the following, we demonstrate that the concept given above can easily be extended to a GAM setting (see, e.g. Hastie and Tibshirani, 1990
, or Marx and Eilers, 1998
). Let
|
| (2.10) |
where for some of the p unknown smooth functions (say f1,...,fv, v
p) monotonicity constraints are assumed to hold. Using the matrix notation from above, we have a design matrix (Z,X), with the matrix of linear terms Z and the matrix of smooth components X = (x1,...,xp), where xs = (x1s,...,xns)'. Componentwise expansion of X into B-spline basis functions leads to the matrix (B(1),...,B(p)), where B(s) refers to the sth predictor.
It is essential to distinguish between components that are or are not under monotonicity restrictions. For the former, grouping of basis functions is done within each component in the same way as described in (2.4). For the unconstrained components, we follow Bühlmann and Yu (2003)
and Tutz and Binder (2006)
and use penalized regression splines (P-splines, cf. Eilers and Marx, 1996
) as weak learners for the chosen component. Thereby, the second-order differences of the B-spline coefficients are penalized. For simplicity, it is assumed that the same number of basis functions m is used for all fs. The vector of basis coefficients for all smooth terms in the model is then given by
= (
11,...,
1m,...,
p1,...,
pm)'. Thus, Step 2 (iteration) of the algorithm described above is extended as follows:
Step 2: (Iteration)
For l = 1,2,...
- 1. Fitting step, smooth components
For s = 1,...,p,- If s
{1,...,v} (the components under monotonicity constraint), compute the estimates from (2.7) componentwise for the possible groupings r = 1,...,m 1, with

(2.11) The set of indices for components s and split points r that satisfy the monotonicity constraint is given by

- If s
{v + 1,
,p} (the components without constraints), compute the 1-step Fisher scoring estimate of the P-spline,

(2.12) where

denotes the matrix representation of the second-order differences. Since the P-spline fit (2.12) does not distinguish between split points r
{1,...,m 1}, for convenience of notation we set r = 0 and extend the selection set by A2 = {(s,0),s
{v + 1,...,p}}, yielding

(2.13)
- 2. Selection step and update, smooth components
Compute the potential update of the linear predictor, which only for the monotonic coefficients s
v depends on the split point r. Otherwise, r is set to 0, indicating that
is not affected by r. Choose (sl,rl)
A such that the deviance is minimized, that is 
In each iteration, only the basis coefficients belonging to the chosen component sl are refitted. That is, if the selected sl is in {1,...,v}, then
, j = 1,...,m, are updated by the refitting scheme (2.8). If sl > v, then update
with
from (2.12).
- 3. Fitting step and update, parametric terms. See above.
- If s
By using B
from (2.11), along with B(s) and the penalty matrices for the P-spline estimates, the hat-matrix approximation and the corresponding AIC and BIC stopping criteria can be extended to the additive setting. One obtains the hat matrix from (2.9), where
![]() |
with l = 1,2,.... In the case of many predictors, it might occur that boosting stops before a certain component has been chosen. Thus, the extended approach has the nice effect of doing variable selection for smooth components, similar to the methods proposed by Bühlmann and Yu (2003)
. This additional strength is important only in data sets with a large number p of covariates, where only some of them are influential.
A derivation of standard deviations for function and parameter estimates is given in Appendix A.1.
| 3. SIMULATION RESULTS |
|---|
|
|
|---|
In order to evaluate the performance of the proposed method, we conduct some simulation studies. In a first setting, a unidimensional Poisson regression model is considered, with response yi generated from P(exp(
i)), where
i =
(xi) is specified by a monotonic function. The xi are drawn from a U[0,5]-distribution. We investigate a step function,
(x) = 2cI(x > 2.5), and a plateau function,
(x) = c(2/{1 + exp[ 10(x 1)]} + 2/{1 + exp[ 5(x 4)]} 1). The strength of the signal is controlled by the constant c.
For Generalized Monotonic B-spline Boosting (GMBBoost), a B-spline basis of degree q = 3 is used, with 20 equally spaced knots in the domain of the data, which results in m = 22 basis functions. The ridge parameter has been chosen as
= 300, and the maximum number of boosting iterations is limited to L = 500. Note that for better comparability, the predictor variable is always rescaled to [0,1] before proceeding further.
GMBBoost is compared to unconstrained penalized regression splines as implemented in the R library "mgcv", where the penalization parameter is determined by the unbiased risk estimation (UBRE) criterion (see Wood, 2000
, Wood, 2001
, Wood, 2003
). In order to obtain comparability with the proposed approach, a cubic regression spline basis with a dimension of k = 22 is used. Furthermore, following a suggestion of a referee, we consider a monotonicity-constrained version of this approach, based on quadratic programming. This involves embedding a monotone smoother in an iteratively reweighted least-squares loop. When using regression splines, this monotone smoother can be obtained by the function "pcls" in mgcv which solves a penalized least-squares quadratic programming problem. For details, see Wood (1994)
. The smoothing parameter is chosen by UBRE applied to the unconstrained model. Ordinary PAVA is also included, since in 1-parameter exponential families, it is the restricted maximum likelihood estimate for isotonic regression (see Robertson and others, 1988
).
A criterion for the performance of the fitting methods is the averaged KullbackLeibler (AKL) distance,
|
| (3.1) |
where in the case of Poisson regression we have
, with
and µi = exp[
(xi)]. The means of AKL over S = 250 simulated data sets are given in Table 1 for selected sample sizes and noise levels. For the step function example, it is seen that GMBBoost is a strong competitor that clearly outperforms the unconstrained and constrained MGCV fits. Note that especially in the lower noise case, in about 10% of the simulated data sets, no fit could be obtained by the restricted version of MGCV. PAVA, which might be thought of being appropriate for this example due to its noncontinuous character, does better only in the case of a stronger signal and n = 100. For the plateau function, monotonicity-restricted MGCV is the best performer, but the performance of GMBBoost is very similar. It is obvious that using PAVA is not a good idea when the true underlying function is smooth. Note that the differences between the stopping criteria are marginal. For the step function, BIC does a little better, whereas in the plateau example, AIC has the advantage.
|
To assess the performance of the GMBBoost extension to the GAM framework, we conduct simulation studies from 2 settings with multiple covariates. We investigate again a Poisson model with logarithmic link, where the linear predictor is given by
![]() | (3.2) |
and c relates to the strength of the signal. First, p = 2 and 2 monotonic increasing functions,
|
|
and
|
|
are considered (
0 = 0.5). The covariates xi1 and xi2 are drawn independently from a U[0,1]-distribution. We compare GMBBoost with both smooth components under a monotonicity restriction to the additive extension of MGCV based on cubic regression splines, where the multiple smoothing parameters are selected by UBRE based on Newton's method in multidimensions (see, e.g. Gu and Wahba, 1991
, and Wood, 2000
). As above, for each smooth component, B-splines of degree 3 with 20 equidistant knots within the domain of the data and a ridge parameter of
= 300 are used. We use a dimension of k = 22 in each component for MGCV. Table 2 shows the results of the mean AKL over S = 250 for 2 different levels of the signal strength and sample sizes of n = 100 and 200. In all settings, GMBBoost yields considerably better estimates than the unconstrained GAM. The differences between AIC- and BIC-stopped GMBBoost are again small, with no clear preference for either criterion.
|
Finally, we consider a setting in higher dimensions, where only some of the components are under monotonicity constraints. A Poisson model with a log-link and a linear predictor as given in (3.2) is considered, where p = 5 and the last 3 functions are assumed to be monotonic. See Figure 1 for the shape of the functions (
0 = 0). The algebraic expressions of these functions are given in Appendix A.2. The covariates are drawn from a
5(0,
)-distribution, where a compound symmetry correlation structure of the design is obtained by using
=
11' + (1
)I, we chose
= 0.4. In order to keep the strength of the signal comparable to the example above, rather small values of c = 0.2 and 0.3 are chosen.
|
In this case, GMBBoost selects between 2 different types of weak learners: grouped basis functions with restrictions and P-splines (see Section 2.3). Simulations suggest that different ridge parameters should be used for these 2 types of learners. In the examples presented above, we observed that a ridge parameter of
= 300 for the grouped B-splines is large enough to yield an appropriate fit. We also tried larger values which required a higher number of iterations, but only with marginally differing results. Hence,
= 300 is also applied here. However, we allow for more flexibility in the choice of the P-spline penalty parameter
P. In each setting, GMBBoost cycles for 3 different values of
P, and the combination of
P and number of iterations l is chosen that minimizes AIC or BIC. A maximum number of L = 120 boosting iterations for each cycle suffices in most cases, which makes the procedure computationally feasible. Since small values of
P may lead to premature stopping, we consider 3 different values of
P
{1000,3000,5000}. The choice of basis functions and knots is the same as in the example with p = 2. We again compare GMBBoost to MGCV which uses the same settings as before. In Figure 2, the logarithm of the AKL is given for the various settings. It is seen that GMBBoost outperforms MGCV in most considered settings, with a distinct dominance in the higher noise case c = 0.2 (left panel). Interestingly, BIC seems to stop boosting prematurely in the higher dimensional case, yielding an inferior performance compared to AIC-stopped GMBBoost. Thus, we recommend to use the AIC criterion in problems where also nonrestricted smooth components have to be estimated.
|
| 4. AIR POLLUTION IN SÃO PAULO |
|---|
|
|
|---|
In the following, the air pollution data mentioned in Section 1 are investigated more closely. The objective is to evaluate the association between mortality for respiratory causes and the concentration of SO2, CO, PM10, and O3. Previous analyses of this data set (see, e.g. Conceição and others, 2001
![]() | (4.1) |
The model includes a nonspecified function of time (in days) to control for long-term seasonality. We also tried models where temperature (daily minimum, 2-day lagged) and humidity were considered as smooth functions. Since we found no evidence of nonlinearity, we decided to include these covariates as linear terms. In addition, day-of-week dummies are included to control for short-term seasonality and the number of deaths by nonrespiratory causes as a linear term. The basic strategy to investigate the effect of a specific pollutant is to take only this pollutant into the model. In the following, we will focus exclusively on the concentration of SO2, given in daily mean values of µg/m3 (2-day lagged), considering the predictor
|
| (4.2) |
where f2(·) is monotonic increasing (for a discussion see concluding remarks). To account for that assumption, model (4.2) has been fitted by the boosting procedure described in Section 2 (GMBBoost), where f2 was estimated under the monotonicity constraint. We used a B-spline basis of degree 3 with
equidistant interior knots for each of the smooth component and concentrate on AIC-stopped GMBBoost. When using the same values for the penalty parameters
and
P as in the simulation study, no distinct minimal AIC-value was attained after L = 400 boosting iterations. To keep computation feasible, we decided to use considerably lower values of
= 30 and
P
{30,50,100}. Then, AIC-stopped GMBBoost chose
P = 30 at an optimal number of lopt = 71 iterations. It should be noted that the results for the other values of
P are very similar.
The fixed effects were reestimated in each iteration. For comparison, we also fitted a GAM by the use of the R package mgcv with the same adjustments as in the simulations.
Figure 3 shows the estimated curves
and
for the various fitting procedures. It is seen from
(left panel) that a strong seasonal pattern in mortality is evident for both fitting methods. The more interesting result is the difference in the fits for the concentration of SO2 (right panel). The MGCV fit shows a rather wiggly curve up to a concentration of 35 µg/m3. Interestingly, the fitted curve decreases for very small concentrations, which is counterintuitive. In contrast, GMBBoost shows different behavior. Since monotonicity is assumed for this component, one obtains a fit that increases first slowly (up to about 40 µg/m3) and then steeply (up to 60 µg/m3), where it remains constant on a rather high level of mortality for high pollutant concentrations.
|
In Table 3, the parameter estimates for the fixed effects, controlling for temperature, humidity, long-term seasonality, and nonrespiratory deaths, are given for the different fitting methods, along with the corresponding standard errors. It is seen that the estimates are rather stable across fitting procedures. We also give the deviance and the effective degrees of freedom for both models. It is seen that the MGCV fit is closer to the data, whereas GMBBoost yields a sparser model due to the restriction. Note that for GMBBoost, the effective degrees of freedom have to be interpreted with caution, since they depend on the approximation of the hat matrix.
|
The editor pointed out with good reason that overdispersion may be a problem in Poisson models. We therefore estimated the scale parameter,
|
|
for both models, where edf denotes the effective degrees of freedom. From Table 3, it is seen that
is in both cases fairly close to one, indicating that overdispersion should not be an issue in this application. In addition, we computed the partial serial correlation of residuals. The inclusion of days of the week as covariates had the effect that partial serial correlation is rather weak. The maximal value was 0.062. For significance level 0.05 for single lags, 2 lags out of 30 (1 and 13) were just above the critical value. When adjusting for multiple tests, serial correlation seems hardly relevant. Similar results were reported by Singer and others (2002)
.
Figure 4 shows the curves fitted by AIC-stopped GMBBoost and approximate 0.95 pointwise confidence bands as derived in Appendix A.1. Although confidence intervals are rather large for a SO2 concentration above 40 µg/m3, the confidence intervals show that the increase should be taken seriously.
|
In studies of the type presented here, one is often interested in the risk of death at a certain pollutant concentration, relative to the risk of death at the minimum concentration of that pollutant (see, e.g. Singer and others, 2002
,n, and SO2(min) the minimum concentration recorded, then the relative risk of death is defined by
![]() |
In Figure 5, the estimated relative risk curve is given for 2 fitting methods. It is obvious that the nonmonotonic fit yields unreliable results with a fit which indicates that the relative risk of death is below 1 up to 45 µg/m3. In contrast, the GMBBoost fit shows monotonic increase over the whole risk curve. For concentrations higher than 60 µg/m3, the risk seems to remain on a fairly high level for the AIC-stopped boosting.
|
| 5. CONCLUDING REMARKS |
|---|
|
|
|---|
A procedure is proposed that incorporates monotonicity constraints for one or more components within a GAM. By using monotonicity, the procedure yields stable estimates which in contrast to GAM fitting avoids overfitting and questionable wiggly estimates. Einbeck and others (2004)
The results provided by the proposed methodology go with other findings regarding the relationship between air pollution and mortality in the São Paulo area. In an earlier study, Saldiva and others (1995)
investigated all causes of mortality of elderly people. They found a significant effect of SO2 by using a parametric Poisson model, controlling for seasonality by dummy variables for months and years. Later, Pereira and others (1998)
investigated the association between air pollution and mortality of fetuses with a parametric Poisson model, reported a weakly increasing risk of death for SO2, and derived a doseresponse relationship between the pollutant and intrauterine mortality. In the study by Conceição and others (2001)
, the respiratory mortality of children under 5 was investigated by using GAM techniques and the same database as this paper. They found an increase in relative risk of death of about a 25% on the most polluted days (compared to the least polluted days) for SO2. This is comparable to the relative risk of death that we observed for elderly people using the GMBBoost procedure, see Figure 5. Braga and others (2001)
explored hospital admissions due to respiratory causes of children and adolescents under 20. They observed a significant effect of SO2 on infants under 2 but no significance in other age groups.
To conclude, previous investigations of adverse health effects of SO2 in São Paulo are supported by our approach. Besides the known effects on children's health (Conceição and others (2001)
), SO2 seems also to affect the health of elderly people. A particular characteristic of the effects of the pollutant SO2 in São Paulo is that most of the pollution in this city is caused by auto engine exhaust gases. SO2 (as well as CO) may be seen as good proxies for automotive emissions.
A referee raised the problem that there is not necessarily an increasing relationship between high air pollution concentration and mortality. When ambient air pollution is at a very high level, people may adjust their behavior, for example they may stay inside on particularly polluted days (see Bresnahan and others, 1997
, or Neidell, 2002
, for a detailed discussion). Nevertheless, there is empirical evidence that high pollutant concentrations result in an increasing mortality. Thus, we think that it is reasonable to consider ambient air pollution as a proxy for air pollution exposure.
An implementation in R of the approach outlined in this paper is available at http://www.statistik.lmu.de/institut/lehrstuhl/semsto/Software/software.htm.
| Appendix |
|---|
|
|
|---|
In order to obtain standard deviations for function estimates, we suggest starting from the approximate hat matrix given in (2.9). Consider the model from (2.10), where components 1,...,v are estimated under the monotonicity constraint and components v + 1,...,p are not; the linear predictor after l boosting iterations is given by
|
| (A.1) |
where
and
results from updating the linear terms in each iteration. Let sk be the smooth component chosen in the kth boosting iteration, one has from the update step of the extended algorithm,
![]() | (A.2) |
where, according to (2.10) and (2.12),
![]() | (A.3) |
From (A.3), it becomes apparent that the type of the update of the chosen smooth component depends on the presence or absence of a monotonicity restriction. Using the indicator function I(·), (A.2) can be written in the closed form
|
|
With the approximation of the hat matrix, one has
, which leads to
|
|
and hence, in a recursive fashion,
|
|
where
![]() |
Approximate confidence intervals for the estimate of the smooth component fs after l boosting iterations are then found from
|
|
where
, with
.
Standard deviations for the fixed effects may be obtained in a similar way. The vector of fixed parameters after l boosting iterations is given by
|
| (A.4) |
Let
and use the approximation of the hat matrix,
, where
, then (A.4) may be expressed in a recursive way as
![]() |
where
and
. With
, it immediately follows that
|
|
The smooth functions used in the simulation study of the 5-dimensional additive Poisson model (see Figure 1) are given in algebraic terms by
![]() |
The data stem from a study conducted between January 1994 and December 1997. The original database can be found on the web at
http://www.ime.usp.br/jmsinger/Polatm9497.zip
(the data file is called master.xls). The concentrations of several air pollutants were recorded daily by 13 monitoring stations of the São Paulo state air pollution controlling agency. For SO2, 24-h mean values (in µg/m3) are given (variables SO21 to SO221). We took the mean over all monitoring station where records are available to get an overall variable for SO2 concentration. The daily minimum temperature (TEMPMIN) and relative humidity (HUMIDMED) are provided by the University of São Paulo. We follow Conceição and others (2001)
and consider the measurements taken 2 days before as influential. This refers to the measurements of the SO2 concentration and the minimum temperature, as well as the measurement of the concurrent day is used for relative humidity. Daily records of mortality were recorded from the municipal mortality information improvement program. For several age groups, the total number of deaths is given, as well as a more detailed partition into deaths due to respiratory diseases, cardiovascular, and other causes. In the present analysis, the respiratory mortality of population at an age of 65 years or older is investigated. We use the daily number of deaths in this age group due to respiratory diseases (RES65) as response variable. The mortality due to nonrespiratory causes, which is used as a covariate, contains the deaths due to cardiovascular (CAR65) and other (OTH65) causes. For a more detailed description of the classification of mortality, we refer to Conceição and others (2001)
. The preprocessing results in a data set of n = 1351 complete observations.
| ACKNOWLEDGMENTS |
|---|
We gratefully acknowledge support from the Deutsche Forschungsgemeinschaft (SFB 386, "Statistical Analysis of Discrete Structures"). We thank the "Laboratorio de Poluição Atmosférica Experimental, Faculdade de Medicina, Universidade de São Paulo, Brazil", Jochen Einbeck, and Julio M. Singer for letting us use the data on air pollution in São Paulo. We also thank Harald Binder and 2 unknown referees for constructive comments. Conflict of Interest: None declared.
| REFERENCES |
|---|
|
|
|---|
-
Braga ALF, Saldiva PHN, Pereira LAA, Menezes JJC, Conceição GMS, Lin CA, Zanobetti A, Schwartz J, Dockery DW. Health effects of air pollution exposure on children and adolescents in São Paulo, Brazil. Pediatric Pulmonology (2001) 31:106113.[CrossRef][Web of Science][Medline]
Bresnahan B, Dickie M, Gerking S. Averting behavior and urban air pollution. Land Economics (1997) 73:340357.[CrossRef][Web of Science]
Brezger A, Steiner WJ. Monotonic regression based on Bayesian P-splines: an application to estimating price response functions from store-level scanner data. SFB Discussion Paper 331 (2004) Munich, Germany: LMU München.
Bühlmann P. Boosting for high-dimensional linear models. Annals of Statistics (2006) 34:559583.[CrossRef][Web of Science]
Bühlmann P, Yu B. Boosting with the L2-loss: regression and classification. Journal of the American Statistical Association (2003) 98:324339.[CrossRef][Web of Science]
Bühlmann P, Yu B. Sparse boosting. Journal of Machine Learning Research (2006) 7:10011024.[Web of Science]
Conceição GMS, Miraglia SGEK, Kishi HS, Saldiva PHN, Singer JM. Air pollution and children mortality: a time series study in São Paulo, Brazil. Environmental Health Perspectives (2001) 109:347350.[CrossRef][Web of Science][Medline]
De Boor C. A Practical Guide to Splines (1978) New York: Springer.
Dilleen M, Heimann G, Hirsch I. Non-parametric estimators of a monotonic dose-response curve and bootstrap confidence intervals. Statistics in Medicine (2003) 22:869882.[CrossRef][Web of Science][Medline]
Ducharme GR, Fontez B. A smooth test of goodness-of-fit for growth curves and monotonic nonlinear regression models. Biometrics (2004) 60:977986.[CrossRef][Web of Science][Medline]
Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science (1996) 11:89121.[CrossRef][Web of Science]
Einbeck J, Andre CDS, Singer JM. Local smoothing with robustness against outlying predictors. Environmetrics (2004) 15:541554.[CrossRef][Web of Science]
Fan J, Gijbels I. Local Polynomial Modelling and Its Applications (1996) London: Chapman & Hall.
Friedman JH, Tibshirani R. The monotone smoothing of scatterplots. Technometrics (1984) 26:243250.[Medline]
Green DJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach (1994) London: Chapman & Hall.
Gu C, Wahba G. Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal of Scientific and Statistical Computing (1991) 12:383398.[CrossRef]
Hastie T, Tibshirani R. Generalized Additive Models (1990) London: Chapman & Hall.
Kelly C, Rice J. Monotone smoothing with application to doseresponse curves and the assessment of synergism. Biometrics (1991) 46:10711085.[Web of Science]
Lee C-IC. On estimation for monotone dose-response curves. Journal of the American Statistical Association (1996) 91:11101119.[CrossRef][Web of Science]
Mammen E, Marron J, Turlach B, Wand M. A general projection framework for constrained smoothing. Statistical Science (2001) 16:232248.[CrossRef][Web of Science]
Marx B, Eilers P, Smith E. Ridge likelihood estimation for generalized linear regression. In: Statistical Modellingvan der Heijden R, Jansen W, Francis B, Seeber G, eds. (1992) Amsterdam: North-Holland. 227238.
Marx BD, Eilers PHC. Direct generalized additive modelling with penalized likelihood. Computational Statistics & Data Analysis (1998) 28:193209.[CrossRef][Web of Science]
McCullagh P, Nelder JA. Generalized Linear Models (1989) 2nd edition. New York: Chapman & Hall.
Mukerjee H. Monotone nonparametric regression. Annals of Statistics (1988) 16:741750.[Web of Science]
Neidell M. Air pollution, health and socio-economic status: the effect of outdoor air quality on childhood asthma. Journal of Health Economics (2002) 23:12091236.[CrossRef]
Pereira LAA, Loomis D, Conceição GMS, Braga ALF, Arcas RM, Kishi HS, Singer JM, Böhm GM, Saldiva PHN. Association between air pollution and intrauterine mortality in São Paulo, Brazil. Environmental Health Perspectives (1998) 106:325329.[Web of Science][Medline]
Ramsay JO. Monotone regression splines in action. Statistical Science (1988) 3:425461.
Ramsay JO. Estimating smooth monotone functions. Journal of the Royal Statistical Society B (1998) 60:365375.[CrossRef]
Robertson T, Wright FT, Dykstra RL. Order-Restricted Statistical Inference (1988) New York: Wiley.
Saldiva PHN, Pope, III CA, Schwartz J, Dockery DW, Lichtenfels AJFC, Salge JM, Barone IA, Böhm GM. Air pollution and mortality in elderly people: a time-series study in São Paulo, Brazil. Archives of Environmental Health (1995) 50:159163.[Web of Science][Medline]
Schapire RE. The strength of weak learnability. Machine Learning (1990) 5:197227.[Web of Science]
Schwartz J. Air pollution and daily mortality: a review and meta analysis. Environmental Research (1994a) 64:3652.[Medline]
Schwartz J. Nonparametric smoothing in the analysis of air pollution and respiratory illness. Canadian Journal of Statistics (1994b) 22:471487.[CrossRef]
Schwarz G. Estimating the dimension of a model. Annals of Statistics (1978) 6:461464.[Web of Science]
Singer JM, Andre CDS, Lima PL, Conçeicão GMS. Association between atmospheric pollution and mortality in São Paulo, Brazil: regression models and analysis strategy. In: Statistical Data Analysis Based on the L1 Norm and Related MethodsDodge Y, ed. (2002) Berlin: Birkhäuser. 439450.
Theil H, Goldberger AS. On pure and mixed estimation in econometrics. International Economic Review (1961) 2:6578.[CrossRef][Web of Science]
Turlach BA. Shape constrained smoothing using smoothing splines. Computational Statistics (2005) 20:81103.[Web of Science]
Tutz G, Binder H. Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics (2006) (in press).
Tutz G, Leitenstorfer F. Generalized smooth monotonic regression in additive modeling. Journal of Computational and Graphical Statistics (2006) in press.
Wood SN. Monotonic smoothing splines fitted by cross validation. SIAM Journal on Scientific Computing (1994) 15:11261133.[CrossRef][Web of Science]
Wood SN. Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society B (2000) 62:413428.[CrossRef]
Wood SN. mgcv: GAMs and generalized ridge regression for R. R News (2001) 1:2025.
Wood SN. Thin plate regression splines. Journal of the Royal Statistical Society B (2003) 65:95114.[CrossRef]
Zhang JT. A simple and efficient monotone smoother using smoothing splines. Journal of Nonparametric Statistics (2004) 16:779796.[CrossRef][Web of Science]
Received June 2, 2005; revised March 21, 2006; revised July 27, 2006; accepted for publication October 20, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






. Set








).



