Skip Navigation


Biostatistics Advance Access originally published online on October 5, 2005
Biostatistics 2006 7(2):225-234; doi:10.1093/biostatistics/kxj003
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/2/225    most recent
kxj003v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Qin, L.
Right arrow Articles by Guo, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Qin, L.
Right arrow Articles by Guo, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Functional mixed-effects model for periodic data

Li Qin*

Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA lqin{at}scharp.org

Wensheng Guo

Department of Biostatistics and Epidemiology University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE MODELS
 3. ESTIMATION
 4. SIMULATION
 5. APPLICATION
 REFERENCES
 
Periodic data are frequently collected in biomedical experiments. We consider the underlying periodic curves giving rise to these data, and account for the periodicity in their functional model to improve estimation and inference. We propose to incorporate the periodic constraint in the functional mixed-effects model setting. Both the fixed functional effects and random functional effects are modeled in the same periodic functional space, hence the population-average estimates and subject-specific predictions are all periodic. An efficient algorithm is given to estimate the proposed model by an O(N) modified Kalman filtering and smoothing algorithm. The proposed method is evaluated in different scenarios through simulations. Treatments to none-full period data and missing observations along the period are also given. Analysis of a cortisol data set obtained from a study on fibromyalgia is conducted as illustration.

Keywords: Functional data analysis; Kalman filter; Periodic constraint; Periodic spline; Smoothing spline; State space model


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE MODELS
 3. ESTIMATION
 4. SIMULATION
 5. APPLICATION
 REFERENCES
 
Data in many biomedical experiments arise as curves, so it is natural to consider a curve as the basic observational unit in the data analysis, which is termed functional data analysis (Ramsay and Silverman, 1997Go). In many situations, measurements taken at different time points (or indices) display a periodic pattern, or the underlying curves are periodic. This information is the intrinsic nature of the functional space residing the curves. In this paper, we propose a new class of functional mixed-effects models that can incorporate the periodicity in both the fixed effects and random effects. The periodicity is imposed by constraining the underlying functional space of the model. The functional effects are modeled through a unified approach and are estimated by an efficient algorithm.

An important periodic pattern of experimental data is the circadian rhythm, which is the periodic behavior of the data as a function of 24 h. It is well known that many hormones including cortisol have circadian rhythms. Figure 1 shows the 24 h cortisol profiles of 11 fibromyalgia (FM) patients and 11 healthy subjects, collected in a study conducted at the University of Michigan Medical Center (Guo, 2002Go). FM is a stress-related syndrome that may relate to cortisol through a complex feedback system. The objective of the study is to investigate the relationship between the circadian patterns of cortisol, in normal subjects and patients with FM. As an initial attempt, we are interested in estimating the cortisol profiles of FM patients and normal individuals, and identifying the time course when the two groups are different in terms of their cortisol concentration. This may help unveil the mystery behind the unknown mechanism of FM. A key step of the analysis is to incorporate the circadian patterns in the model. We propose to incorporate it by modeling both functional fixed effects and functional random effects as periodic splines; consequently, both the group-average profiles and subject-specific predictions are of 24-h cycle. More details about this example will be given in Section 5.


Figure 1
View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Hourly cortisol concentration data profiles from 11 FM patients and 11 normal subjects for 24 h.

 
Substantial developments have been accomplished in functional models. Reviews of the early work can be found in Ramsay and Silverman (1997)Go. A review of the recent developments using smoothing splines was given by Guo (2004)Go. For example, Hastie and Tibshirani (1993)Go proposed varying-coefficient models and used cubic splines to model the functional coefficients. Zhang et al. (1998)Go considered semiparametric mixed models with nonparametric fixed effects and parametric random effects. Brumback and Rice (1998)Go modeled nested curves by cubic splines. Verbyla et al. (1999)Go used cubic splines to model fixed effects. Guo (2002)Go proposed functional mixed-effects models with nonparametric fixed effects and nonparametric random effects. However, all these papers use cubic splines that do not incorporate the periodicity of the functions.

Models for periodic curves can be viewed as a special case of constrained functional models, which can be estimated by different smoothing techniques. Mammen et al. (2001)Go introduced a general framework of constrained smoothing via projection. A periodic curve can be modeled through projecting the data onto the smoothed periodic space. Readers are referred to Mammen et al. (2001)Go and the references therein for other types of constrained smoothing, and to Wahba (1990Go, Chapter 2) for periodic smoothing splines as projections. Wang and Brown (1996)Go considered a periodic spline model for hormone circadian rhythms without random effects and Zhang et al. (2000)Go used periodic cubic splines to model the fixed-effects functions and modeled the serial correlation by a nonhomogeneous Ornstein Uhlenbeck process. In general, these approaches require the inversion of large dimensional matrices and thus are very computationally demanding. An alternative low-rank smoother for the curves is P-spline (Ruppert et al. 2003Go), while it is only an approximation to smoothing splines. In this paper, we take a different approach than projection, in which numerical constraints are imposed on the usual smoothing splines, resulting in a significant computational saving. We extend the numerical constrained single-curve model of Ansley et al. (1993)Go to a mixed-effects functional model setting, enabling unified estimation and inference for the functional effects. We represent the whole proposed functional model as a multivariate state space model and adopt the fast algorithm of Koopman and Durbin (2000)Go. This approach can estimate the exact smoothing splines in order O(N).

The rest of the paper is structured as follows. In Section 2, we propose the model specification for the functional effects with periodic constraint. In Section 3, we present an efficient estimation procedure. A simulation study can be found in Section 4. We illustrate the proposed method through the application to the cortisol data in Section 5.


    2. THE MODELS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE MODELS
 3. ESTIMATION
 4. SIMULATION
 5. APPLICATION
 REFERENCES
 

2.1 The general model

Suppose yij (j = 1, ..., Ti, i = 1, ..., n) is the observation on the ith curve at index tij. For simplicity, we assume t isin [0, 1] (or can be re-scaled to [0, 1]). We consider the following general functional mixed-effects model

Formula 1(2.1)

where Formula 1 is a p x 1 vector of functional fixed effects, Formula 1is a q x 1 vector of functional random effects, which are modeled as realizations of the q x 1 vector of zero mean Gaussian process A(t) = {a1(t), ..., aq(t)}T, Xij = {Xij[1], ..., Xij[p]} is the design matrix for the fixed effects, Zij = {Zij[1], ..., Zij[q]} is the design matrix for the random effects, and eij is the measurement error.

2.2 Model specification for ß(t) and {alpha}i(t) with periodic constraint

Suppose that ßk(t) and {alpha}il(t) are periodic functions on [0, 1] with period 1 (or can be mapped to period 1), with continuous up to the (m – 1)th derivatives. Similar to Ansley et al. (1993)Go, we can model Formula 1 and Formula 1 by periodic polynomial smoothing splines of order 2m – 1:

Formula 2(2.2)

where Formula 2 and Formula 2 are the {nu}th derivatives of ßk(t) and {alpha}il(t), {bk0, ..., bk, ,m–1}T ~ N(0, {kappa}I) with Formula 2 and {ail0, ..., ail, m–1}T ~ N(0, Formula 2).

Remarks:

(1) We use polynomial smoothing splines with numerical constraints to account for the periodicity of the functional effects. This is to reduce the computational load through the connection with state space models. Other smoothing splines Wahba(1990)Go; Gu(2002)Go such as ‘splines on the circle’ can also be used, while with a more intensive computation procedure.
(2) The constraints put on the functional effects essentially constrain the curves and their variances to be periodic functions.


    3. ESTIMATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE MODELS
 3. ESTIMATION
 4. SIMULATION
 5. APPLICATION
 REFERENCES
 
Due to the connection of smoothing splines and linear mixed-effects models (see a recent review in Guo, 2004), our proposed model can be rewritten as linear mixed-effects type of model and estimated using standard software. However, this approach is very computationally intensive because of the need to invert large dimensional matrices. Therefore, we represent the proposed model in a state space form and adopt the O(N) algorithm of (Koopman and Durbin(2000)Go) for estimation.

To simplify the presentation, we first assume that observations on different curves were collected at the same design points (i.e. tij = tkj = tj, for k != i and Ti = T for all i). This assumption is relaxed later.

3.1 State space representation for periodic splines

Define Formula 2 and Formula 2 We have the following state equations corresponding to models (2.2):

Formula 3(3.3)

where Formula 3Formula 3 and Formula 3.

We adopt the idea of (Ansley et al.(1993)Go) to numerically constrain Formula 3 and Formula 3 The state space model without constraint is given by

Formula 4(3.4)

where Formula 4 Formula 4 The n x m(p + qn) matrix Formula 4 where Formula 4 if r = m(k – 1) + 1, k = 1, ..., p, and 0 otherwise; Formula 4 if r = m(l – 1) + 1, l = 1, ..., q, and 0 otherwise; Formula 4 and Formula 4

The periodic constraint Formula 4 is then added to the above unconstrained model by augmenting Formula 4 by Formula 4: Formula 4 and adding m(p + qn) pseudo observations zero at t = 1 to numerically enforce Formula 4. The pseudo observations form [m(p + qn)/n] + 1 columns of pseudo data vectors, where [x] denotes the largest integer that is no greater than x. Note that the last column does not need to be a full column because we adopt the univariate algorithm of (Koopman and Durbin(2000)Go. The augmented model is

Formula 5(3.5)

where T* = T + mq + [mp/n] + 1, Formula 5 Formula 5 when j = 1, ..., T, Formula 5 and Formula 5 for the added pseudo observations, Formula 5 with

Formula 5

The state transition matrix Formula 5 and the variance–covariance matrix Formula 5 of Formula 5 are defined as

Formula 5

The initial state Formula 5 with Formula 5 where P0 denotes the variance–covariance matrix for {gamma}.

3.2 Parameter estimation and algorithm

The parameters in the state space model (3.5) include smoothing parameters and variance parameters. The unknown parameters can be estimated by the restricted maximum likelihood, which is equivalent to the generalized maximum likelihood (Wahba, 1985Go). The generalized likelihood can be sequentially calculated by the Kalman filter and maximized through numerical optimization routines. Given the estimates of the unknown parameters, a smoothing algorithm (Koopman and Durbin, 2000Go) is used to obtain the posterior estimates Formula 5 Formula 5 Formula 5 and Formula 5 where Y denotes all the observations and Yi represents data from the ith curve (i = 1, ..., n). The posterior means are estimates of the functional effects, and Bayesian confidence intervals (Wahba, 1983Go) for the functional estimates can be constructed using the posterior variances.

Because yj is an n x 1 observational vector, the standard vector filtering and smoothing algorithm still requires inversion of the n x n variance–covariance matrices. We adopt the univariate approach of Koopman and Durbin (2000)Go for efficient estimation. The basic idea of this algorithm is to convert the multivariate state space model into a univariate model of series y11, ..., yn1, y12, ..., yn2, ..., y1T, ..., ynT, and apply univariate filtering and smoothing to avoid matrix inversion. Readers are referred to Koopman and Durbin (2000)Go for more details on the algorithm. The algorithm for our model is implemented in Matlab 6.12.

We have assumed tij = tj and Ti = T in the estimation procedure of this section, for presentation purpose only. In general, measurements may be taken at different indices (e.g. time points) from different curves. Examples are unequally spaced data, irregularly collected data, and none-full period data. The proposed algorithm can be applied with data augmentation, where a collection of all possible indices serves as the observational points tj for all curves. This artificially creates some missing data points yij, which are regarded as zero with the corresponding Formula 5 and Formula 5 set to zero in model (3.5).


    4. SIMULATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE MODELS
 3. ESTIMATION
 4. SIMULATION
 5. APPLICATION
 REFERENCES
 
We conducted a simulation study to evaluate the performance of the proposed model and estimation procedure. More importantly, we hope to highlight the advantage of using the proposed model over the most commonly used cubic smoothing spline model for the curves, and assess the difference when data are available only on a partial period.

Periodic data yij(i = 1, ..., 30, j = 1, ..., 30) were generated on a full period from yij = wiß(tij) + {alpha}i(tij) + {epsilon}ij, where {epsilon}ij ~ N(0, 1), wi ~ U(0, 1), ß(tij) = A sin 2{pi}tij, and {alpha}i(tij) ~ N(0, b{Sigma}), where Formula 5 with {sigma}jk = –B4(tjtk)/4! and Formula 5. We assume Formula 5 so both ß(t) and {alpha}i(t) are periodic curves on [0, 1]. We specified two settings to simulate data (1) A = 5 and b = 25, and (2) A = 2.5 and b = 6.25. We also considered none-full period data by taking off the last 10 data points from each individual's profile and treating them as unobserved in the models. Cubic spline model and the proposed model with periodic constraint were used to fit the data. A total of 100 simulations were conducted.

Figure 2 displays the boxplots of the mean square errors (MSEs) and the coverage probabilities of the 95% Bayesian confidence intervals (Wahba, 1983Go) for ß(t). The MSE is defined as the overall mean of the sum-of-squares of the subject-specific prediction bias. It is shown that in all the settings, the cubic spline estimates (‘Cubic’) have less precision than the periodic constraint approach (‘Periodic’), and on average, the coverage from the periodic approach is closer to the nominal level. When data were only available on partial period, the periodic estimates still outperform the cubic spline estimates because more information was gained by explicitly modeling the periodicity. We can also see that as the amplitudes of the underlying curves increase, the MSEs of both models increase. This is due to the shrinkage toward the mean effect of the random curve predictions. Because the subject-specific deviations are modeled by random effects with zero mean Gaussian prior, the posterior means from these models shrunk toward zero. The absolute shrinkage is proportional to the the amplitudes of the underlying curves.


Figure 2
View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2. MSEs and coverage probabilities (CPs). The upper two plots correspond to the simulation model with A = 5 and b = 25, and the lower two plots have A = 2.5 and b = 6.25. All 30 data points were used to obtain the full period results. Only the first 20 data points were included to get partial period estimates.

 
With the current sample size and chosen smoothing parameters, each simulation run takes 33.7 s (on Dual AMD Opteron 250 processors running at 2.4 GHz and 8 Gigabytes SDRAM). Timing results of different sample sizes were obtained to show that the computing times increase linearly with the sample size of the data set. Details are omitted here due to space limit.


    5. APPLICATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE MODELS
 3. ESTIMATION
 4. SIMULATION
 5. APPLICATION
 REFERENCES
 
In this section, we apply the proposed method to the cortisol data shown in Figure 1. We use m = 2 in this section. The following model is fitted:

Formula 6(5.6)

where yij (j = 1, ..., 24, i = 1, ..., 22) is the cortisol concentration of the ith subject at the jth hour, I(·) is the indicator function, ß1(t) is the group-average curve of the FM group and ß2(t) is that of the normal group, {alpha}i(t) is the subject-specific deviation for the ith subject, and eij is the measurement error.

We fit the circadian pattern model specified in Section (2.2). Figure 3 shows the estimates of the group-average curves together with their 95% Bayesian confidence intervals (Wahba, 1983Go) for both the FM group and the normal group. While there are overlaps between the two group-average curves, it can be seen that FM patients tend to release more cortisol late afternoon (4 pm–9 pm) and early morning (6 am–8 am). This result is similar to the finding by Guo (2002)Go. However, the confidence intervals are in general narrower by incorporating the periodic constraint, which enables us to differentiate the two groups during the early mornings. In the cubic spline approach of Guo (2002)Go, the confidence intervals are wider at both ends because they are estimated with fewer observations. In our periodic approach, the confidence intervals at the boundaries are of similar widths as those in the middle. This is due to the periodic assumption which enables us to borrow the observations from the other end in estimating the curves at the boundary, so there is no boundary points in our case. The data and the fitted subject-specific curves with their 95% Bayesian confidence intervals are shown in Figure 4. The subject-specific curves generally fit the data well. Similar to the fitted group-average curves, the Bayesian confidence intervals of the estimated subject-specific curves are narrower than those in Guo (2002)Go.


Figure 3
View larger version (10K):
[in this window]
[in a new window]
 
Fig. 3. Group-average estimates from the circadian model. The solid lines are the fitted group-average curves with 95% Bayesian confidence intervals for the FM group. The dotted lines are those for the normal group.

 

Figure 4
View larger version (16K):
[in this window]
[in a new window]
 
Fig. 4. Subject-specific estimates from the circadian model. The solid lines are observed data. The dotted lines are fitted subject-specific curves with 95% Bayesian confidence intervals. The upper two rows are for FM patients and the lower two rows are for normal subjects.

 

    ACKNOWLEDGMENTS
 
The authors wish to thank the editors, the associate editor, and a referee for helpful comments and suggestions. In addition, Li Qin is grateful to Steve Self, Ross Prentice, and Peter Gilbert for providing helpful comments on an earlier draft of the paper. Li Qin's research was supported by U.S. National Institute of Health (NIH) Grants NIH 5 U01 AI46703 and NIH 2 R37 AI29168. Wensheng Guo's research was supported by U.S. NIH Grant R01 CA 84438.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. THE MODELS
 3. ESTIMATION
 4. SIMULATION
 5. APPLICATION
 REFERENCES
 

    ANSLEY, C. F., KOHN, R. AND WONG, C. (1993). Nonparametric spline regression with prior information. Biometrika 80, 75–88.[Abstract/Free Full Text]

    BRUMBACK, B. A. AND RICE, J. A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 93, 961–994.[CrossRef][Web of Science]

    GU, C. (2002). Smoothing Spline ANOVA Models. New York: Springer.

    GUO, W. (2002). Functional mixed effects models. Biometrics 58, 121–128.[CrossRef][Web of Science][Medline]

    GUO, W. (2004). Functional data analysis in longitudinal settings using smoothing splines. Statistical Methods in Medical Research 13, 1–14.[Free Full Text]

    HASTIE, T. AND TIBSHIRANI, R. (1993). Varying-coefficient models (with discussions). Journal of the Royal Statistical Society, Series B 55, 757–796.

    KOOPMAN, S. J. AND DURBIN, J. (2000). Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis 21, 281–296.[CrossRef][Web of Science]

    MAMMEN, E., MARRON, J. S., TURLACH, B. A. AND WAND, M. P. (2001). A general projection framework for constrained smoothing. Statistical Science 16, 232–248.[CrossRef][Web of Science]

    RAMSAY, J. O. AND SILVERMAN, B. W. (1997). Functional Data Analysis. Berlin: Springer.

    RUPPERT, D., WAND, M. P. AND CARROLL, R. J. (2003). Semiparametric Regression. New York: Cambridge University Press.

    VERBYLA, P., CULLIS, B. R., KENWARD, M. G. AND WELHAM, S. J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines (with discussions). Applied Statistics 48, 269–311.

    WAHBA, G. (1983). Bayesian confidence intervals for the cross-validated smoothing spline. Journal of the Royal Statistical Society, Series B 45, 133–150.

    WAHBA, G. (1985). A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. The Annals of Statistics 13, 1378–1402.[CrossRef]

    WAHBA, G. (1990). Spline models for observational data. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. Philadelphia: SIAM.

    WANG, Y. AND BROWN, M. M. (1996). A flexible model for human circadian rhythms. Biometrics 52, 588–596.[CrossRef][Web of Science][Medline]

    ZHANG, D., LIN, X., RAZ, J. AND SOWERS, M. (1998). Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association 93, 710–719.[CrossRef][Web of Science]

    ZHANG, D., LIN, X. AND SOWERS, M. (2000). Semiparametric regression for periodic longitudinal hormone data from multiple menstrual cycles. Biometrics 56, 31–39.[CrossRef][Web of Science][Medline]

    Received June 13, 2005; revised September 9, 2005; accepted for publication September 29, 2005.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/2/225    most recent
    kxj003v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Qin, L.
    Right arrow Articles by Guo, W.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Qin, L.
    Right arrow Articles by Guo, W.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?