Biostatistics Advance Access originally published online on October 26, 2005
Biostatistics 2006 7(2):235-251; doi:10.1093/biostatistics/kxj004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Covariate-adjusted varying coefficient models
entürk*Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA dsenturk{at}stat.psu.edu
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
Covariate-adjusted regression was recently proposed for situations where both predictors and response in a regression model are not directly observed, but are observed after being contaminated by unknown functions of a common observable covariate. The method has been appealing because of its flexibility in targeting the regression coefficients under different forms of distortion. We extend this methodology proposed for regression into the framework of varying coefficient models, where the goal is to target the covariate-adjusted relationship between longitudinal variables. The proposed method of covariate-adjusted varying coefficient model (CAVCM) is illustrated with an analysis of a longitudinal data set containing calcium absorbtion and intake measurements on 188 subjects. We estimate the age-dependent relationship between these two variables adjusted for the covariate body surface area. Simulation studies demonstrate the flexibility of CAVCM in handling different forms of distortion in the longitudinal setting.
Keywords: Covariate-adjusted regression; Local polynomial regression; Longitudinal data; Multiplicative effects; Smoothing
| 1. INTRODUCTION |
|---|
|
|
|---|
Covariate-adjusted regression (CAR) has been proposed by
entürk and Müller (2005a
![]() |
for a sample of n subjects, where
(·) and
(·) are unknown smooth functions of the observed covariate U. Here, Y and X denote the underlying unobserved parts of the observed response and predictor that are thought to be measured in a scale that does not depend on U. CAR uncovers the regression relationship adjusted for the covariate by giving consistent estimates for the parameters in the underlying, unobserved regression model
![]() |
based on the observed data
i = 1, ..., n.
The procedure utilizes the identifiability condition of no average distortion, i.e.
and
Under this identifiability condition, CAR gives consistent estimates regardless of the form of distortion considered as long as the distortion on the response and the predictors are of the same form (see
entürk and Müller, 2005a
, for justification). More specifically, CAR yields consistent estimates under multiplicative (i.e.
), additive (i.e.
), and no distortion (i.e.
). This makes CAR a very flexible adjustment where even the form of the distortion need not be known.
We propose an extension of the CAR algorithm for longitudinal data, where the measurements taken on the response and the predictor are time dependent. We focus on the simple case of a cross-sectional covariate; however, the proposed method can also be applied for the case of a longitudinal covariate as will be discussed in the Section 6. One example is a study on 188 subjects, where the longitudinal relationship between calcium intake and absorbtion is of interest (Davis, 2002
). The covariate to be adjusted for is body surface area (BSA) of the subjects and a new adjustment procedure is needed to uncover the time-dependent relationship between the underlying variables adjusted for this covariate. Denote the response and the predictor measurements for the ith subject, i = 1, ..., n, taken at time tij, j = 1, ..., Ti, as
![]() |
Here, Y and X denote the underlying unobserved longitudinal variables assumed to be related through the varying coefficient model
![]() | (1.1) |
(Hastie and Tibshirani, 1993
). Varying coefficient models are an extension of the regression models where the coefficients are allowed to vary as smooth functions of a covariate possibly different than the predictors. They have been especially popular in applications to longitudinal data, where the coefficient functions vary as functions of time. They reduce the modeling bias with their unique structure while also avoiding the curse of dimensionality problem. Wu and Yu (2002)
give an overview of applications to longitudinal data, where the proposed estimation procedures include Hoover et al. (1998)
, Wu and Chiang (2000)
, Fan and Zhang (2000)
, and Wu et al. (2000)
on local least squares, Hoover et al. (1998)
and Chiang et al. (2001)
on smoothing splines, and Huang et al. (2004)
on basis approximations.
The central goal of this paper is the estimation of the smooth coefficient functions, ß0(·) and ß1(·) in (1.1) based on observations of the covariate Ui, and the contaminated observations on the response and the predictor
A key observation to reach this goal is that regressing
on
leads to another varying coefficient model, where the coefficient functions depend both on time and the covariate U. This is illustrated in Section 2, where more details on the covariate-adjusted varying coefficient model (CAVCM) with multiple predictors are provided. A two-step procedure is proposed for estimation in the CAVCM in Section 3, motivated by the two-step procedure of Fan and Zhang (2000)
proposed for estimation in varying coefficient models. As also argued by Fan and Zhang, a common feature of many longitudinal studies is that the measurements are collected at the same time points for all subjects with possibly missing values at few time points for some subjects. Let {tj, j = 1, ..., T} be the distinct time points among all tij, i = 1, ..., n, j = 1, ..., Ti. The proposed algorithm targets the raw estimates
and
j = 1, ..., T, of the varying coefficient functions by fitting CAR to the data collected at each distinct time point tj, in the first step. The CAR algorithm is appropriate for this step, since the data observed at each time point tj is independent, collected from different subjects. The final estimates of the coefficient functions in (1.1) are obtained in the second step by smoothing the scatter plot of the raw estimates,
for each component r, r = 0, 1, separately. The second step consists of only one-dimensional smoothing procedures, and can be carried out with any smoothing technique. Therefore, the proposed estimation procedure is fast, intuitive, and easy to implement with any standard software containing least-squares procedures.
The proposed estimation procedure for CAVCM also enjoys the same attraction as CAR in that it targets the coefficient functions regardless of the form of distortion considered under the identifiability conditions considered for CAVCM discussed in Section 2. In other words, the proposed estimation procedure targets the coefficient functions not only for the multiplicative distortion (i.e.
) but also for additive (i.e.
) and no distortion (i.e.
) as demonstrated through simulation studies in Section 5. Another advantage of the proposed estimation procedure shown in Section 5 is that it can handle missing values easily. If there are not enough subjects observed at a given time point tj to fit CAR, the missing raw estimate at tj is imputed through the smoothing in the second step. Application of the proposed method to the longitudinal calcium data can be found in Section 4.
| 2. COVARIATE-ADJUSTED VARYING COEFFICIENT MODELS |
|---|
|
|
|---|
Consider the general case of an underlying varying coefficient model with p predictors,
![]() | (2.1) |
evaluated at T distinct time points, tj, j = 1, ..., T. Here ei(tj) is a zero-mean stochastic process with covariance function
(tj, tj') = cov{ei(tj), ei(tj')}, and ß0(·), ß1(·), ..., ßp(·) are the unknown coefficient functions of interest. In the varying coefficient model (2.1), Y and Xr are not observable. Instead, one observes distorted versions
along with a univariate covariate U, where
![]() | (2.2) |
for r = 1, ..., p, and
r and
are unknown smooth functions of U. The identifiability conditions considered are an extension of the no-average distortion condition used for CAR. They entail no average distortion at distinct time points tj, i.e.
and
for j = 1, ..., T. The identifiability conditions can equivalently be written as conditions on the unknown smooth distortion functions as
![]() | (2.3) |
Model (2.1)(2.3) will be referred to as the CAVCM.
A central goal is to obtain estimators of the coefficient functions in model (2.1), given the observations of the covarite U and the distorted observations
in (2.2). The key to the estimation of the targeted regression functions {ßr(·)} is to express the regression of
on
as another varying coefficient model. More precisely, under the assumption that (e(·), U, Xr(·)) (r = 1, ..., p) are mutually independent at each fixed time point, the regression of
on
can be expressed as
![]() |
where
![]() |
![]() | (2.4) |
with
(Ui, tj)
(Ui)ei(tj). The assumption that the underlying predictors, Xr(·), and response, Y(·), are independent of the contaminating variable U is an assumption defining the proposed contamination setting through defining these unobserved, underlying variables, and for that matter is not one that can be checked in practice. Thus, the question to be answered in practice is whether or not these independence conditions help define interpretable latent variables of interest from their observable counter parts. In the calcium data analyzed, the interpretations of the latent variables are BSA-adjusted calcium intake and absorbtion.
In the varying coefficient model given in (2.4), the observed variables vary according to two variables instead of one, the covariate U and time, resulting in two-dimensional coefficient functions. Since the variables
and Ui and the time points are all observable, we first target these estimable two-dimensional coefficient functions,
r(·, ·), through their one-dimensional projections at each time point tj. The underlying one-dimensional coefficient functions of interest, ßr(·), are then targeted using estimates of
r(·, ·) and the identifiability conditions given in (2.3).
| 3. TWO-STEP ESTIMATION PROCEDURE |
|---|
|
|
|---|
The proposed estimation algorithm is based on a similar idea as the two-step procedure proposed by Fan and Zhang (2000)
Even though this two-step estimation procedure is easy to implement, involving only linear regression fits and one-dimensional smoothing procedures, it will not be applicable when the longitudinal response Y and predictors Xr are not observed directly. In addition, regressing the observed distorted response
on the predictors
in the first step of the algorithm will not target ßr(tj) of the underlying varying coefficient model under the CAVCM. More specifically, it follows from equation (2.4) of
entürk and Müller (2005a)
that under the multiplicative distorting effects given in (2.2), the raw estimates of Fan and Zhang for ß1(tj) evaluated at each time point tj in the simple case of one predictor targets
![]() | (3.1) |
instead of ß1(tj). It is also shown that
can assume any real value, resulting possibly in arbitrarily large biases for the raw estimates of Fan and Zhang if the distortion covariate is ignored within the CAVCM. The bias created by the final smooth estimates of Fan and Zhang will be investigated in Section 5, through simulation studies.
Our proposal is geared towards handling distortions on the response and predictors, as commonly encountered in medical data. Rather than fitting a linear regression between
and
observed at each time point tj, we fit CAR between
and
adjusting for U, to obtain the raw estimates
...,
in the first step. This is motivated by the fact that a different one-dimensional varying coefficient model holds at each time point tj in the two-dimensional varying coefficient model given in (2.4). This one-dimensional varying coefficient model can be expressed as,
![]() | (3.2) |
where only data observed at a fixed time point tj is used, and the coefficient functions vary depending on U only. Here the one-dimensional coefficient functions will be related to the constants ß0(tj), ß1(tj), ..., ßp(tj) through the equations
![]() | (3.3) |
where not necessarily all n subjects but say nj subjects may be observed at time tj, i = 1, ..., nj. We first target the coefficient functions
rj(·) in (3.2) through local linear fits, and then arrive at the raw estimate of ßr(tj) through a weighted average of the estimates of
rj(·), making use of the relations in (3.3) and the identifiability conditions. We use local polynomial regressions to fit CAR at each time point in the first step of the algorithm, as it is shown through simulation studies to have the best small sample performance yielding the smallest mean squared error compared to other binning algorithms (
entürk and Nguyen, 2005
). The second step similarly consists of smoothing the scatter plot of each coefficient component {tj,
to obtain the final estimates
Denote the available (observed) data at time tj by
i = 1, ..., nj, for a sample of size nj, where
are the p-dimensional predictors. The function
rj(U) in (3.2) can be approximated based on local polynomial modeling as
![]() |
for U in a neighborhood of u. Here,
denotes the kth derivative of
rj(·). Consider the local linear least-squares estimator of
rj through the minimization of
![]() | (3.4) |
with respect to
and
for a specified kernel function K with bandwidth h where Kh(·) = K(·/h)/h. We choose to consider local linear fits for computational simplicity, as they are comparable to local cubic fits for all practical purposes in implementation. Note that
corresponding to the intercept function
0j(U). Minimization of criterion (3.4) is a weighted least-squares problem. Assuming that
is nonsingular, the solution is
![]() |
where
is the following nj x 2(p + 1) matrix
|
|
![]() |
The local least-squares estimator of
rj(u) is given by
![]() |
where e2r+1, 2(p+1) is a unit vector of length 2(p + 1) with 1 in position 2r + 1.
The estimators of the targeted regression parameters,
are then obtained by averaging over the raw estimates,
rj(Ui), this time evaluated at the original observations of the covariate
More precisely,
![]() |
where
and Wi are
and W with u = Ui. This leads to the following raw estimates:
|
| (3.5) |
where
The estimates are motivated by the relations E{
0j(U)} = ß0(tj) and
that follow directly from (2.3) and (3.3). An important assumption here is that
or equivalently E{Xr(tj)} is not equal to 0, since it is targeted by the denominator of
in (3.5). The consistency of
for
has been shown in
entürk and Nguyen (2005)
. As outlined by
entürk and Nguyen (2005)
, we utilize the generalized cross-validation (GCV) criterion proposed by Wahba (1977)
and Craven and Wahba (1979)
for the selection of the bandwidth h in the first step of the proposed algorithm with local polynomial modeling. Note that other criteria can be used for bandwidth selection at this step. The literature includes studies of Zhang et al. (1998)
and Zhang (2004)
that utilized the double-penalized quasi-likelihood approach to estimate the smoothing parameters and the nonparametric functions simultaneously in a mixed model framework.
The final smooth estimates of ßr(t) are computed in the second step of the algorithm for each component r, r = 0, 1, ..., p, separately as
![]() |
The weights wr(tj, t) in the above expression can be obtained from any linear smoothing technique, such as local polynomial smoothing used by Fan and Zhang (2000)
or spline smoothing used by Wu et al. (2000)
. This additional smoothing step is needed to bring in information from neighboring time points, improving on the efficiency of the estimates. It can be easily carried out, using any convenient software, as it only involves one-dimensional smoothing. Another benefit of this one-dimensional smoothing procedure is that it would be easier to choose a suitable bandwidth for each component separately through visualization of the data. This second step is also crucial in providing flexibility in dealing with missing values. If there are not enough patients observed at a particular time point to fit CAR (a minimum of roughly 2030 observations are needed to fit CAR with one predictor as determined through simulation studies), raw estimates at that time point will be considered missing. These missing values can be imputed in the second smoothing step.
| 4. APPLICATION TO LONGITUDINAL DATA: CALCIUM ABSORBTION |
|---|
|
|
|---|
The relationship between calcium absorbtion and calcium intake is of interest in addressing the problem of calcium deficiency. Heaney et al. (1989)
The coefficient functions from the underlying varying coefficient model
![]() | (4.1) |
have been estimated adjusted for BSA through fitting a CAVCM to the longitudinal measurements of
as proposed in Section 3. Three subjects have been removed before analysis, as their BSA values were outliers. A total of 20 age points, 36, 39, 41, 42, ..., 55, 56, 58, 61, have been considered to fit the CAVCM, where the data observed at ages (35, 36, 37), (38, 39, 40), (57, 58, 59), and (60, 61, 62) have been collapsed to groups concentrated at 36, 39, 58, and 61, respectively. This grouping was carried out in order to have enough subjects observed at each age point to fit CAR. The number of subjects per age point ranged from 20 to 39, where all observations came from different subjects even after the collapsing of age points, since longitudinal measurements per subject were taken in 5-year intervals. The raw estimates
and
of ß0(·) and ß1(·) given in Figure 1 (top panels, dots) have been obtained using the weighted averaging described in Section 3.1, using a GCV bandwidth choice of 0.2. Overlaying the raw estimates are the proposed smooth estimates
(solid) and Fan and Zhang's smooth estimates (dashed) of the two coefficient functions, both obtained through local polynomial smoothing with a bandwidth choice of 7. The 90% pointwise bootstrap confidence intervals (dotted) are also displayed in Figure 1.
|
The bootstrap confidence intervals in Figure 1 are based on the (
/2)Bth and (1
/2)Bth percentiles of the bootstrap estimates,
and
obtained from B = 1000 bootstrap samples generated from the original data. The bootstrap estimates are obtained through the same two-step procedure used for CAVCM estimates. In the first step, the raw bootstrap estimates for each time point are computed through CAR based on the bootstrap sample. In the second step, smooth bootstrap estimates are obtained as linear combinations of the raw bootstrap estimates
![]() |
where the weights wr(tj, tj') are the ones used in obtaining the CAVCM smooth estimates,
The estimated nonparametric densities of the standardized 1000 bootstrap estimates of ß0(tj) (top panel) and ß1(tj) (bottom panel), evaluated at the 20 time points tj, j = 1, ..., 20, are given in Figure 2, overlaying the standard normal density. The estimates at all the time points seem to be reasonably close to normal for both functions, enabling the use of the percentile bootstrap method. We also examine the estimated coverage levels of the proposed bootstrap confidence intervals in the simulation setting described in Section 5, Model I. One thousand data sets have been generated under the varying coefficient model given in (5.1) below, where 1000 bootstrap samples have been generated from each data set. Figure 3 gives the estimated coverage values of the proposed confidence intervals for the three coefficient functions in (5.1) at each time point, corresponding to significance levels of 0.80 and 0.90. The estimated coverage values are roughly on target. The cross-sectional mean estimates and mean confidence intervals for the three coefficient functions, averaged over the 1000 Monte Carlo runs, are also shown in Figure 3, overlaying the true coefficient functions.
|
|
As is seen from Fan and Zhang's estimates, when unadjusted for BSA, the inverse effect of calcium intake on absorbtion seems to be declining with age. However, when adjusted for BSA with CAVCM, the inverse effect of calcium intake seems to be staying at about the same level as age increases. Even though the bootstrap confidence intervals from the CAVCM include the unadjusted estimates as well, the smooth estimates for BSA-adjusted and unadjusted models seem to differ, especially after the age of 45. In order to discover the precise nature of the effect of BSA on the relationship between calcium intake and absorbtion, we fitted varying coefficient models to the two groups of data observed at ages before and after 45. The two models fitted have the following form
|
|
where the longitudinal measurements of intake and absorbtion in the two age groups are collapsed together and treated as independent for the sake of the argument.
For both age groups, the inverse effect of intake on absorbtion declines in general as BSA increases as seen in Figure 1 (bottom panels). Nevertheless, one difference between the two age groups is that for those subjects over 45 with BSA greater than 1.6, the general inverse effect stays constant, whereas for those under 45, the effect keeps declining after BSA of 1.8. Also given in Figure 1 are the slope estimates from the linear regressions of
unadjusted for BSA. Notice that a general CAR model targeting the underlying coefficients in the regression model
![]() |
adjusted for BSA would get its estimates by averaging the smooth estimates of the coefficients in model (4.2). An important observation is that this average and therefore the estimates of such a CAR model would have lower values than their unadjusted linear regression counterparts for subjects over 45, but roughly at the same level for those less than 45. This also explains the fact that the BSA-adjusted varying slope estimates from model (4.1) are lower than the unadjusted estimates for ages over 45. Hence, adjusting for or stratifying by BSA, the stronger inverse effect of intake on absorbtion for subjects with lower BSA values becomes more influential in forming the BSA-adjusted regression coefficients. Thus, adjusted for BSA, this inverse effect does not decline, but stays at about the same level as the age of the subject considered increases.
| 5. SIMULATION STUDY |
|---|
|
|
|---|
In this section we compare the performance of CAVCM and Fan and Zhang's estimation procedure under three distortion models: multiplicative distortion, additive distortion, and no distortion.
We consider a simulation setup that mimics the calcium absorbtion data, by having 185 subjects with up to four repeated measurements. We consider the following underlying varying coefficient model
![]() | (5.1) |
where i = 1, ..., 185 and the time points tj, j = 1, ..., 20, are chosen to be equidistant between 0 and 1. The number of repeated measurements for each subject is chosen randomly to be 1, 2, 3, or 4 with probabilities 0.025, 0.025, 0.05, and 0.90, respectively. Thus, there are unequal number of observations taken on each subject where 80% or more of the data is missing. This yields 2045 subjects observed at each time point on average, where the expected number of observations per time point is around 35. In (5.1), the three targeted coefficient functions are chosen to represent three different types of curves; ß0(t) = 15 + 8.7 sin( 2
t), ß1(t) = 1 + 11.2t, and ß2(t) = 1 + 2t2 + 11.3(1 t)3. The predictor X1(t) is a uniform random variable over the time-dependent interval [t/4, 0.6 + t/4], and X2(t), when conditioning on X1(t), is a normal random variable with mean 1.5, and conditional variance var{X2(t)|X1(t)} = {1 + X1(t)}/{8 + X1(t)}. The error process e(t) is sampled independently from the predictors from a stationary Gaussian process with mean zero and a decaying exponential covariance function
(tj, tj') = 5.27 exp(0.5|tj tj'|). The covariate U is generated from a uniform [0, 1] distribution.
For the first considered multiplicative distortion model, the observed response and predictors are modeled as
![]() | (Model I) |
where the distorting functions considered are
![]() |
The constants a = 12.33, b = 1.71, and c = 4.08 are chosen such that the distorting functions satisfy the identifiability constraint of no average distortion in (2.3), namely E{
(Ui)} = 1 and E{
r(Ui)} = 1.
For the second considered additive distortion model, the observed response and predictors are modeled as
![]() | (Model II) |
where the distorting functions are
![]() |
The constants a, b, and c have the same values as in Model I, but this time are subtracted from the specified functions of U, so that the distorting functions satisfy the identifiability constraint of no average distortion,
and
The identifiability condition entails E{
(Ui)} = 0, and E{
r(Ui)} = 0 in the additive distortion model.
As the last model, we consider no distortion in which case the observed and underlying response and predictors are the same:
![]() | (Model III) |
For all the above models, the proposed CAVCM smooth estimates ß0(·), ß1(·), and ß2(·) have been obtained through local polynomial smoothing with cross-validation bandwidth choices of 0.12, 0.20, and 0.14, respectively. The GCV bandwidth choice was 0.5 in obtaining the CAVCM raw estimates in the first step. Fan and Zhang's smooth estimates have also been obtained for the three coefficient functions in the above three models, through local polynomial smoothing with cross-validation bandwidth choices of 0.14, 0.20, and 0.14, respectively. Smooth estimates of both methods from a single Monte Carlo run are displayed overlaying the true coefficient functions in Figures 4, 5, and 6 for Models I, II, and III, respectively. The bias of Fan and Zhang's raw estimates under the multiplicative distortion model have been given in (3.1). Thus, even though the smooth estimates of Fan and Zhang are on target for Model III of no distortion, they have considerable bias as seen in Figures 4 and 5 for the distortion Models I and II, as expected. The CAVCM smooth estimates are on target for all three models. This shows that the CAVCM method is a very flexible adjustment method, where the form or even the existence of the distortion need not be known.
|
|
|
Another comparative measure of the performance of the fits obtained by the two methods is mean absolute deviation error (MADE), or weighted average-squared error (WASE), defined as
![]() |
where
are the smooth estimates for both methods and range(ßr) is the range of the function ßr(t). We also consider unweighted average-squared error (UASE) which is defined in the same way as WASE, but without any weights in the denominator. The box plots of the MADE, WASE, and UASE ratios of the proposed CAVCM method over Fan and Zhang's estimates from 1000 Monte Carlo runs are given in Figure 7, upper panel for Model I, middle panel for Model II, and lower panel for Model III. These plots indicate that the proposed estimates indeed handle the multiplicative and additive distortion models much better than Fan and Zhang's estimates. Even though CAVCM estimates target the true coefficient functions also under the case of no distortion, they are outperformed by Fan and Zhang's estimates in case of Model III. This result is not surprising since the simple linear regression fits utilized in the first step of Fan and Zhang's algorithm are more efficient than CAR estimates in obtaining the raw estimates at each time point, for the specific case of no distortion.
|
| 6. REMARKS |
|---|
|
|
|---|
The proposed method of CAVCM provides a covariate-adjusted analysis for the regression relation between longitudinal variables. The two-step procedure is especially flexible in two different ways. It is flexible in handling different forms of distortion as illustrated through simulation studies. Not only the form but also the existence of the distortion need not be known. This nature of the algorithm is particularly appealing in case of a multiple varying coefficient model, where different predictors may be believed to have different relations with the covariate. Note, however, that there are some restrictions if the form of confounding on the response and predictors are not of the same form. More specifically, CAR yields consistent estimates under a model having an additive distortion in the response with predictors having either multiplicative or additive distortion. On the other hand, it will not give consistent estimates under a model with multiplicative distortion on the response with additive distorted predictors.
The second flexibility of the proposed method is its applicability to most longitudinal data structures. Assuming that the data is collected on the same set of time points for different subjects, the proposed methodology can handle a great percentage of missing values including those cases with only one measurement per some subjects. The only limitation comes from the use of the CAR algorithm in the first step, entailing the need of more than 20 subjects observed at most of the time points considered. However, the subjects observed at different time points do not need to be the same or of the same number. The values of the estimated coefficient functions at those time points where there are not enough subjects to fit CAR are imputed through the smoothing procedure in the second step.
The CAVCM algorithm can be applied to cases where longitudinal measurements are collected on the covariate as well. The methodology has been described for the case of cross-sectional covariate so far for simplicity of notation. For the case of a cross-sectional covariate, a fixed subject will have one reading on the covariate variable across time, whereas for the case of a longitudinal covariate, the readings will vary across time even for the same subject. However, in both cases, the observed measurements at a fixed time point come from different subjects which is a key observation enabling the application of CAR in the first step of the proposed estimation procedure. Thus, there is no change in the algorithm in case of a longitudinal covariate. The only difference between the two cases is one in model assumptions that the identifiability conditions on the distorting functions given in (2.3) need to hold at each time point for the case of the longitudinal covariate.
Fan and Zhang (2000)
provide expressions for the asymptotic bias and variance of their smooth estimates obtained in the second step conditional on the predictor processes and the observed time points. These results give some insight to the optimal choice of bandwidth for the second smoothing step. The asymptotic bias and variance of the smooth CAVCM estimates can similarly be obtained once the bias and variance expressions for the CAR estimates in the first step of the estimation procedure are worked out. Another idea for future research would be to look for ways of incorporating the correlation structure of the longitudinal data into the proposed covariate-adjusted varying coefficient estimator to improve its efficiency.
| ACKNOWLEDGMENTS |
|---|
We are extremely grateful to an anonymous referee, the associate editor, and the editors for many helpful remarks that improved the exposition of the paper. We also want to acknowledge Danh Nguyen for his valuable feedback during the preparation of the manuscript.
| REFERENCES |
|---|
|
|
|---|
-
CHIANG, C., RICE, J. A. AND WU, C. O. (2001). Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. Journal of the American Statistical Association 96, 605617.[CrossRef]
CRAVEN, P. AND WAHBA, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerical Mathematics 31, 377403.
DAVIS, C. S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York: Springer, p. 336.
FAN, J. AND ZHANG, J. (2000). Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society, Series B 62, 303322.[CrossRef]
HASTIE, T. AND TIBSHIRANI, R. (1993). Varying coefficient models. Journal of the Royal Statistical Society, Series B 55, 757796.
HEANEY, R. P. (2003). Normalizing calcium intake: projected population effects for body weight. Journal of Nutrition 133, 268S270S.
HEANEY, R. P., RECKER, R. R., STEGMAN, M. R. AND MOY, A. J. (1989). Calcium absorption in women: relationships to calcium intake, estrogen status, age. Journal of Bone and Mineral Research 4, 469475.[Web of Science][Medline]
HOOVER, D. R., RICE, J. A., WU, C. O. AND YANG, L.-P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85, 809822.
HUANG, J. Z., WU, C. O. AND ZHOU, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica 14, 763788.
KAYSEN, G. A., DUBIN, J. A., MÜLLER, H. G., MITCH, W. E., ROSALES, L. M., LEVIN, N. W. AND THE HEMO STUDY GROUP (2003). Relationship among inflammation nutrition and physiologic mechanisms establishing albumin levels in hemodialysis patients. Kidney International 61, 22402249.
ENTÜRK, D. AND MÜLLER, H. G. (2005a). Covariate adjusted regression. Biometrika 92, 5974.
ENTÜRK, D. AND MÜLLER, H. G. (2005b). Inference for covariate adjusted regression via varying coefficient models. Annals of Statistics (in press).
ENTÜRK, D. AND NGUYEN, D. V. (2005). Estimation in covariate adjusted regression. Computational Statistics and Data Analysis (in press).
WAHBA, G. (1977). A survey of some smoothing problems and the method of generalized cross-validation for solving them. In Krisnaiah, P. R. (ed.), Applications of Statistics. North Holland: Amsterdam, pp. 507523.
WU, C. O. AND CHIANG, C. T. (2000). Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statistica Sinica 10, 433456.
WU, C. O. AND YU, K. F. (2002). Nonparametric varying coefficient models for the analysis of longitudinal data. International Statistical Review 70, 373393.
WU, C. O., YU, K. F. AND CHIANG, C. T. (2000). A two-step smoothing method for varying-coefficient models with repeated measurements. Annals of the Institute of Statistical Mathematics 25, 519543.[CrossRef]
ZHANG, D. (2004). Generalized linear mixed models with varying coefficients for longitudinal data. Biometrics 60, 815.[Medline]
ZHANG, D., LIN, X., RAZ, J. AND SOWERS, M. F. (1998). Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association 93, 710719.[CrossRef]
Received April 26, 2005; revised September 2, 2005; revised October 5, 2005; accepted for publication October 21, 2005.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
























(top left panel) and


(dotted, lower panel) used in forming 90% pointwise confidence intervals of the varying coefficient functions in the analysis of calcium absorbtion data. For both coefficients, 20 densities are presented corresponding to the 20 time points tj that the bootstrap estimates are evaluated at. The standard normal density (solid) is also given in both panels. A fine binning procedure is followed by local least-squares fits with bandwidth choices of 0.5 to obtain the nonparametric densities.











