Biostatistics Advance Access originally published online on October 5, 2005
Biostatistics 2006 7(2):225-234; doi:10.1093/biostatistics/kxj003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Functional mixed-effects model for periodic data
Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA lqin{at}scharp.org
Department of Biostatistics and Epidemiology University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
* To whom correspondence should be addressed.
| SUMMARY |
|---|
|
|
|---|
Periodic data are frequently collected in biomedical experiments. We consider the underlying periodic curves giving rise to these data, and account for the periodicity in their functional model to improve estimation and inference. We propose to incorporate the periodic constraint in the functional mixed-effects model setting. Both the fixed functional effects and random functional effects are modeled in the same periodic functional space, hence the population-average estimates and subject-specific predictions are all periodic. An efficient algorithm is given to estimate the proposed model by an O(N) modified Kalman filtering and smoothing algorithm. The proposed method is evaluated in different scenarios through simulations. Treatments to none-full period data and missing observations along the period are also given. Analysis of a cortisol data set obtained from a study on fibromyalgia is conducted as illustration.
Keywords: Functional data analysis; Kalman filter; Periodic constraint; Periodic spline; Smoothing spline; State space model
| 1. INTRODUCTION |
|---|
|
|
|---|
Data in many biomedical experiments arise as curves, so it is natural to consider a curve as the basic observational unit in the data analysis, which is termed functional data analysis (Ramsay and Silverman, 1997
An important periodic pattern of experimental data is the circadian rhythm, which is the periodic behavior of the data as a function of 24 h. It is well known that many hormones including cortisol have circadian rhythms. Figure 1 shows the 24 h cortisol profiles of 11 fibromyalgia (FM) patients and 11 healthy subjects, collected in a study conducted at the University of Michigan Medical Center (Guo, 2002
). FM is a stress-related syndrome that may relate to cortisol through a complex feedback system. The objective of the study is to investigate the relationship between the circadian patterns of cortisol, in normal subjects and patients with FM. As an initial attempt, we are interested in estimating the cortisol profiles of FM patients and normal individuals, and identifying the time course when the two groups are different in terms of their cortisol concentration. This may help unveil the mystery behind the unknown mechanism of FM. A key step of the analysis is to incorporate the circadian patterns in the model. We propose to incorporate it by modeling both functional fixed effects and functional random effects as periodic splines; consequently, both the group-average profiles and subject-specific predictions are of 24-h cycle. More details about this example will be given in Section 5.
|
Substantial developments have been accomplished in functional models. Reviews of the early work can be found in Ramsay and Silverman (1997)
Models for periodic curves can be viewed as a special case of constrained functional models, which can be estimated by different smoothing techniques. Mammen et al. (2001)
introduced a general framework of constrained smoothing via projection. A periodic curve can be modeled through projecting the data onto the smoothed periodic space. Readers are referred to Mammen et al. (2001)
and the references therein for other types of constrained smoothing, and to Wahba (1990
, Chapter 2) for periodic smoothing splines as projections. Wang and Brown (1996)
considered a periodic spline model for hormone circadian rhythms without random effects and Zhang et al. (2000)
used periodic cubic splines to model the fixed-effects functions and modeled the serial correlation by a nonhomogeneous Ornstein Uhlenbeck process. In general, these approaches require the inversion of large dimensional matrices and thus are very computationally demanding. An alternative low-rank smoother for the curves is P-spline (Ruppert et al. 2003
), while it is only an approximation to smoothing splines. In this paper, we take a different approach than projection, in which numerical constraints are imposed on the usual smoothing splines, resulting in a significant computational saving. We extend the numerical constrained single-curve model of Ansley et al. (1993)
to a mixed-effects functional model setting, enabling unified estimation and inference for the functional effects. We represent the whole proposed functional model as a multivariate state space model and adopt the fast algorithm of Koopman and Durbin (2000)
. This approach can estimate the exact smoothing splines in order O(N).
The rest of the paper is structured as follows. In Section 2, we propose the model specification for the functional effects with periodic constraint. In Section 3, we present an efficient estimation procedure. A simulation study can be found in Section 4. We illustrate the proposed method through the application to the cortisol data in Section 5.
| 2. THE MODELS |
|---|
|
|
|---|
Suppose yij (j = 1, ..., Ti, i = 1, ..., n) is the observation on the ith curve at index tij. For simplicity, we assume t
[0, 1] (or can be re-scaled to [0, 1]). We consider the following general functional mixed-effects model
![]() | (2.1) |
where
is a p x 1 vector of functional fixed effects,
is a q x 1 vector of functional random effects, which are modeled as realizations of the q x 1 vector of zero mean Gaussian process A(t) = {a1(t), ..., aq(t)}T, Xij = {Xij[1], ..., Xij[p]} is the design matrix for the fixed effects, Zij = {Zij[1], ..., Zij[q]} is the design matrix for the random effects, and eij is the measurement error.
i(t) with periodic constraint
Suppose that ßk(t) and
il(t) are periodic functions on [0, 1] with period 1 (or can be mapped to period 1), with continuous up to the (m 1)th derivatives. Similar to Ansley et al. (1993)
, we can model
and
by periodic polynomial smoothing splines of order 2m 1:
![]() | (2.2) |
where
and
are the
th derivatives of ßk(t) and
il(t), {bk0, ..., bk, ,m1}T
N(0,
I) with
and {ail0, ..., ail, m1}T
N(0,
).
Remarks:
- (1) We use polynomial smoothing splines with numerical constraints to account for the periodicity of the functional effects. This is to reduce the computational load through the connection with state space models. Other smoothing splines Wahba(1990)
; Gu(2002)
such as splines on the circle can also be used, while with a more intensive computation procedure.
- (2) The constraints put on the functional effects essentially constrain the curves and their variances to be periodic functions.
- (2) The constraints put on the functional effects essentially constrain the curves and their variances to be periodic functions.
| 3. ESTIMATION |
|---|
|
|
|---|
Due to the connection of smoothing splines and linear mixed-effects models (see a recent review in Guo, 2004), our proposed model can be rewritten as linear mixed-effects type of model and estimated using standard software. However, this approach is very computationally intensive because of the need to invert large dimensional matrices. Therefore, we represent the proposed model in a state space form and adopt the O(N) algorithm of (Koopman and Durbin(2000)
To simplify the presentation, we first assume that observations on different curves were collected at the same design points (i.e. tij = tkj = tj, for k
i and Ti = T for all i). This assumption is relaxed later.
Define
and
We have the following state equations corresponding to models (2.2):
![]() | (3.3) |
where 
and
.
We adopt the idea of (Ansley et al.(1993)
) to numerically constrain
and
The state space model without constraint is given by
![]() | (3.4) |
where
The n x m(p + qn) matrix
where
if r = m(k 1) + 1, k = 1, ..., p, and 0 otherwise;
if r = m(l 1) + 1, l = 1, ..., q, and 0 otherwise;
and
The periodic constraint
is then added to the above unconstrained model by augmenting
by
:
and adding m(p + qn) pseudo observations zero at t = 1 to numerically enforce
. The pseudo observations form [m(p + qn)/n] + 1 columns of pseudo data vectors, where [x] denotes the largest integer that is no greater than x. Note that the last column does not need to be a full column because we adopt the univariate algorithm of (Koopman and Durbin(2000)
. The augmented model is
![]() | (3.5) |
where T* = T + mq + [mp/n] + 1,
when j = 1, ..., T,
and
for the added pseudo observations,
with
![]() |
The state transition matrix
and the variancecovariance matrix
of
are defined as
![]() |
The initial state
with
where P0 denotes the variancecovariance matrix for
.
The parameters in the state space model (3.5) include smoothing parameters and variance parameters. The unknown parameters can be estimated by the restricted maximum likelihood, which is equivalent to the generalized maximum likelihood (Wahba, 1985
). The generalized likelihood can be sequentially calculated by the Kalman filter and maximized through numerical optimization routines. Given the estimates of the unknown parameters, a smoothing algorithm (Koopman and Durbin, 2000
) is used to obtain the posterior estimates
and
where Y denotes all the observations and Yi represents data from the ith curve (i = 1, ..., n). The posterior means are estimates of the functional effects, and Bayesian confidence intervals (Wahba, 1983
) for the functional estimates can be constructed using the posterior variances.
Because yj is an n x 1 observational vector, the standard vector filtering and smoothing algorithm still requires inversion of the n x n variancecovariance matrices. We adopt the univariate approach of Koopman and Durbin (2000)
for efficient estimation. The basic idea of this algorithm is to convert the multivariate state space model into a univariate model of series y11, ..., yn1, y12, ..., yn2, ..., y1T, ..., ynT, and apply univariate filtering and smoothing to avoid matrix inversion. Readers are referred to Koopman and Durbin (2000)
for more details on the algorithm. The algorithm for our model is implemented in Matlab 6.12.
We have assumed tij = tj and Ti = T in the estimation procedure of this section, for presentation purpose only. In general, measurements may be taken at different indices (e.g. time points) from different curves. Examples are unequally spaced data, irregularly collected data, and none-full period data. The proposed algorithm can be applied with data augmentation, where a collection of all possible indices serves as the observational points tj for all curves. This artificially creates some missing data points yij, which are regarded as zero with the corresponding
and
set to zero in model (3.5).
| 4. SIMULATION |
|---|
|
|
|---|
We conducted a simulation study to evaluate the performance of the proposed model and estimation procedure. More importantly, we hope to highlight the advantage of using the proposed model over the most commonly used cubic smoothing spline model for the curves, and assess the difference when data are available only on a partial period.
Periodic data yij(i = 1, ..., 30, j = 1, ..., 30) were generated on a full period from yij = wiß(tij) +
i(tij) +
ij, where
ij
N(0, 1), wi
U(0, 1), ß(tij) = A sin 2
tij, and
i(tij)
N(0, b
), where
with
jk = B4(tj tk)/4! and
. We assume
so both ß(t) and
i(t) are periodic curves on [0, 1]. We specified two settings to simulate data (1) A = 5 and b = 25, and (2) A = 2.5 and b = 6.25. We also considered none-full period data by taking off the last 10 data points from each individual's profile and treating them as unobserved in the models. Cubic spline model and the proposed model with periodic constraint were used to fit the data. A total of 100 simulations were conducted.
Figure 2 displays the boxplots of the mean square errors (MSEs) and the coverage probabilities of the 95% Bayesian confidence intervals (Wahba, 1983
) for ß(t). The MSE is defined as the overall mean of the sum-of-squares of the subject-specific prediction bias. It is shown that in all the settings, the cubic spline estimates (Cubic) have less precision than the periodic constraint approach (Periodic), and on average, the coverage from the periodic approach is closer to the nominal level. When data were only available on partial period, the periodic estimates still outperform the cubic spline estimates because more information was gained by explicitly modeling the periodicity. We can also see that as the amplitudes of the underlying curves increase, the MSEs of both models increase. This is due to the shrinkage toward the mean effect of the random curve predictions. Because the subject-specific deviations are modeled by random effects with zero mean Gaussian prior, the posterior means from these models shrunk toward zero. The absolute shrinkage is proportional to the the amplitudes of the underlying curves.
|
With the current sample size and chosen smoothing parameters, each simulation run takes 33.7 s (on Dual AMD Opteron 250 processors running at 2.4 GHz and 8 Gigabytes SDRAM). Timing results of different sample sizes were obtained to show that the computing times increase linearly with the sample size of the data set. Details are omitted here due to space limit.
| 5. APPLICATION |
|---|
|
|
|---|
In this section, we apply the proposed method to the cortisol data shown in Figure 1. We use m = 2 in this section. The following model is fitted:
![]() | (5.6) |
where yij (j = 1, ..., 24, i = 1, ..., 22) is the cortisol concentration of the ith subject at the jth hour, I(·) is the indicator function, ß1(t) is the group-average curve of the FM group and ß2(t) is that of the normal group,
i(t) is the subject-specific deviation for the ith subject, and eij is the measurement error.
We fit the circadian pattern model specified in Section (2.2). Figure 3 shows the estimates of the group-average curves together with their 95% Bayesian confidence intervals (Wahba, 1983
) for both the FM group and the normal group. While there are overlaps between the two group-average curves, it can be seen that FM patients tend to release more cortisol late afternoon (4 pm9 pm) and early morning (6 am8 am). This result is similar to the finding by Guo (2002)
. However, the confidence intervals are in general narrower by incorporating the periodic constraint, which enables us to differentiate the two groups during the early mornings. In the cubic spline approach of Guo (2002)
, the confidence intervals are wider at both ends because they are estimated with fewer observations. In our periodic approach, the confidence intervals at the boundaries are of similar widths as those in the middle. This is due to the periodic assumption which enables us to borrow the observations from the other end in estimating the curves at the boundary, so there is no boundary points in our case. The data and the fitted subject-specific curves with their 95% Bayesian confidence intervals are shown in Figure 4. The subject-specific curves generally fit the data well. Similar to the fitted group-average curves, the Bayesian confidence intervals of the estimated subject-specific curves are narrower than those in Guo (2002)
.
|
|
| ACKNOWLEDGMENTS |
|---|
The authors wish to thank the editors, the associate editor, and a referee for helpful comments and suggestions. In addition, Li Qin is grateful to Steve Self, Ross Prentice, and Peter Gilbert for providing helpful comments on an earlier draft of the paper. Li Qin's research was supported by U.S. National Institute of Health (NIH) Grants NIH 5 U01 AI46703 and NIH 2 R37 AI29168. Wensheng Guo's research was supported by U.S. NIH Grant R01 CA 84438.
| REFERENCES |
|---|
|
|
|---|
-
ANSLEY, C. F., KOHN, R. AND WONG, C. (1993). Nonparametric spline regression with prior information. Biometrika 80, 7588.
BRUMBACK, B. A. AND RICE, J. A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 93, 961994.[CrossRef][Web of Science]
GU, C. (2002). Smoothing Spline ANOVA Models. New York: Springer.
GUO, W. (2002). Functional mixed effects models. Biometrics 58, 121128.[CrossRef][Web of Science][Medline]
GUO, W. (2004). Functional data analysis in longitudinal settings using smoothing splines. Statistical Methods in Medical Research 13, 114.
HASTIE, T. AND TIBSHIRANI, R. (1993). Varying-coefficient models (with discussions). Journal of the Royal Statistical Society, Series B 55, 757796.
KOOPMAN, S. J. AND DURBIN, J. (2000). Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis 21, 281296.[CrossRef][Web of Science]
MAMMEN, E., MARRON, J. S., TURLACH, B. A. AND WAND, M. P. (2001). A general projection framework for constrained smoothing. Statistical Science 16, 232248.[CrossRef][Web of Science]
RAMSAY, J. O. AND SILVERMAN, B. W. (1997). Functional Data Analysis. Berlin: Springer.
RUPPERT, D., WAND, M. P. AND CARROLL, R. J. (2003). Semiparametric Regression. New York: Cambridge University Press.
VERBYLA, P., CULLIS, B. R., KENWARD, M. G. AND WELHAM, S. J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines (with discussions). Applied Statistics 48, 269311.
WAHBA, G. (1983). Bayesian confidence intervals for the cross-validated smoothing spline. Journal of the Royal Statistical Society, Series B 45, 133150.
WAHBA, G. (1985). A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. The Annals of Statistics 13, 13781402.[CrossRef]
WAHBA, G. (1990). Spline models for observational data. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. Philadelphia: SIAM.
WANG, Y. AND BROWN, M. M. (1996). A flexible model for human circadian rhythms. Biometrics 52, 588596.[CrossRef][Web of Science][Medline]
ZHANG, D., LIN, X., RAZ, J. AND SOWERS, M. (1998). Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association 93, 710719.[CrossRef][Web of Science]
ZHANG, D., LIN, X. AND SOWERS, M. (2000). Semiparametric regression for periodic longitudinal hormone data from multiple menstrual cycles. Biometrics 56, 3139.[CrossRef][Web of Science][Medline]
Received June 13, 2005; revised September 9, 2005; accepted for publication September 29, 2005.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











