Skip Navigation


Biostatistics Advance Access originally published online on January 30, 2007
Biostatistics 2007 8(4):756-771; doi:10.1093/biostatistics/kxm003
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/4/756    most recent
kxm003v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Elliott, M. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Elliott, M. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Identifying latent clusters of variability in longitudinal data

Michael R. Elliott*

Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA and Institute for Social Research, University of Michigan, 426 Thompson Street, Ann Arbor, MI 48106, USA mrelliot{at}umich.edu

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 
Means or other central tendency measures are by far the most common focus of statistical analyses. However, as Carroll (2003)Go noted, "systematic dependence of variability on known factors" may be "fundamental to the proper solution of scientific problems" in certain settings. We develop a latent cluster model that relates underlying "clusters" of variability to baseline or outcome measures of interest. Because estimation of variability is inextricably linked to estimation of trend, assumptions about underlying trends are minimized by using nonparametric regression estimates. The resulting residual errors are then clustered into unobserved clusters of variability that are in turn related to subject-level predictors of interest. An application is made to psychological affect data.

Keywords: Cubic spline; Heteroscedasticity; Longitudinal profiles; Nonparametric regression; Variance function


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 
Symptoms of psychiatric illnesses are usually evaluated by retrospective assessments with patients or other informants. However, symptom counts based on recall may miss sporadic occurrences of subthreshold symptoms that may be associated with significant disability or that might signal increased risk for the development of the full threshold diagnosis of major depression. For example, only 38% of persons in a community sample of adults recalled a lifetime history of dysphoric mood that they had reported 13 years earlier (Thompson and others, 2004Go). Consequently, an alternative approach increasingly used in research is to ask subjects to rate their affect (measures of positive or "happy" mood and negative or "sad" mood) through time using daily diaries rather than depend on recall of affect (Walls and Schafer, 2006Go). However, methods requiring daily reports of mood are limited by conceptual and practical difficulties in data analysis (Schwartz and Stone, 1998Go). Elliott and others (2005)Go developed generalized growth mixture models (GGMMs) to relate continuous mood (affect) and discrete event patterns over time to minor or subthreshold depression. But it may be that, rather than "mean" levels or trends over time, day-to-day "variability" of affect measures predicts psychiatric disturbances such as minor depression. Consider in Figure 1 the positive affect scores of 4 patients who had experienced a myocardial infarction within the past year and were in treatment at a University of Pennsylvania cardiology clinic. Note that some subjects have stable day-to-day positive affect measures around their long-term trends, while others are highly variable day-to-day even after any long-term trend has been accounted for. Hence, we might wish to consider whether short-term, within-subject variation in positive affect might encode information about mental health status.


Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Examples of longitudinal positive affect scores for MI recovery subjects. Circles give observed positive affect scores; lines indicate underlying mean trends estimated via nonparametric regression estimator (2.4) under hierarchical model using age as a covariate. SD = standard deviation of positive affect data; sigma = Figure 1 as given in (4.6).

 
So-called "blunting" or reduction in variability of affect has been considered as an issue in psychiatric health. For certain disorders, blunting of affect can be therapeutic (reducing violent outburst or pathological episodes of laughter or crying), for others harmful (increasing apathy in depressed patients). However, because of a lack of standard statistical methods to explore differences in affect variability across subjects, little research has explored whether these differences encode clinically relevant information. (One exception is Furlan and others, 2004Go, who found that normal elderly subjects randomly assigned to selective serotonin reuptake inhibitors (SSRIs) had similar between-subject variability as subjects assigned to placebo after fitting linear random-effects models for affect trends, addressing a concern that SSRI use might blunt responses to positive events.) Hence, this manuscript introduces a latent cluster model to investigate whether subjects with differing levels of daily variability in affect measures can be categorized into (unobserved) clusters of subjects that can then be related to baseline patient measures such as age, gender, and depression status. Estimation of variability is inextricably linked to estimation of mean trend (expected value of a subject's observation at each point in time). At one extreme, the mean trend could be estimated at each observed value, leaving zero residual variance. At the other, assuming an unchanging mean trend would suggest estimating each subject's affect variability using the standard deviation or log standard deviation of affect measure for the subject. We propose estimating each subject's mean trend by assuming only that it is a smooth, twice-differentiable function of time. Under this assumption, we estimate this mean trend using a cubic spline regression model. We then cluster the resulting residual variance estimates into latent clusters; subject-level covariates of interest can then be related to the latent clusters in order to obtain predictive models of latent cluster membership. This approach requires no assumptions to be made (other than smoothness) about the mean trends in affect measure for each subject and admits the large inherent measurement error in the affect measures by focusing on latent clusters underlying the variance measures rather than the variance measures themselves. It also provides a unified model from which joint inferences can be made, in contrast to a 2-stage approach.

The method presented here can be viewed as a method to explore "systematic dependence of variability on known factors," as described in Carroll (2003)Go. While variances are sometimes modeled to accommodate heteroscedaticity or hierarchical covariance models (Barnard and others, 2000), treating the variance of the outcome as being of primary interest and the mean as a nuisance parameter is far less common than methods that consider dependence of a mean on known factors and treat variance as a nuisance parameter. One example is Harlow and others (2000)Go, where the association between variance of the mean menstrual cycle and age of the woman is considered. As in this example, our focus is on variability within subjects, not across subjects, and further on short-term variability—the residual variance that remains after accounting for longer-term trends via cubic spline regression, not the variability (wiggliness) of the splines themselves. Section 2 describes both manifest and hierarchical models for residual variance that can be related to baseline covariates. Section 3 applies these models to psychological affect data, relating affect variability to age, gender, and depression status among recovering myocardial infarction (MI) patients. Section 4 concludes with a discussion and outline of future extensions.


    2. LATENT CLUSTER MODELS FOR RESIDUAL VARIANCE
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 
We first describe removing the mean trend from a subject-level longitudinal profile via a nonparametric estimate that only assumes that the mean trend is a smooth, twice-differentiable function, where the resulting residual variance is allowed to differ by subject. We then assume that these residual variances belong to one or more unobserved (latent) clusters, either manifestly or though a hierarchical model whose second-stage parameters are a function of cluster membership. Membership in the clusters is then modeled via a multinomial model as a function of baseline covariates of interest.

2.1 Estimating a mean trend using nonparametric regression

Let the observed positive affect measure for subject i at time t be denoted by yit, i=1,...,n, t=1,...,ni. We model the positive affect score by


Formula (2.1)

where Formula, fi(t) is a twice-differentiable smooth function of t,

Formula

and fi(t) minimizes the residual sum of squares plus a roughness penalty parameterized by {lambda}i:

Formula (2.2)

It can be shown (Wahba, 1978Go, Hastie and Tibshirani, 1990Go) that, for a given value of {lambda}i, the Formula that minimizes (2.2) is given by a natural cubic spline with knots at the interior points of t (t=2,...,t=ni–1). As {lambda}i->0, Formula is given by the cubic spline that interpolates yit (i.e. Formula); as {lambda}i-> {infty}, Formula is given as the least-squares linear regression line (i.e. Formula, where Formula and Formula). Consequently, one can rewrite (2.1) and (2.2) as a mixed-effect linear model (Wahba, 1990Go, Speed, in discussion of Robinson, 1991Go, p. 42–44, Ruppert and others, 2003Go)


Formula (2.3)

where Formula , ßi is a 2 x 1 vector of fixed-effect parameters, Zi is an ni x(ni –1) random-effect design matrix such that ZiZi' = {Omega}i, where {Omega}i is a cubic spline basis matrix with knots at each of the interior points (2,...,ni) given by {Omega}ihk = {int}Formula((h –1)/(ni–1)–v)+((k–1)/(ni–1)– v)+dv, h,k =1,...,ni, and Gi = ({sigma}Formula/ni {lambda}i)Ini–1. The function (x)+ is defined as (x)+ = x if x ≥ 0 and (x)+ = 0 if x<0.

If {sigma}Formula and Gi are estimated via restricted maximum likelihood (REML), the estimator given by the fitted values of (2.3) corresponds to the natural cubic spline with knots at the interior points of t estimated by (2.1) (Wahba, 1985Go, Green, 1987Go, Wang, 1998Go):


Formula (2.4)

where Formula and Formula . This allows us to model the observed data as

Formula (2.5)

for Q{lambda}i = (ni{lambda}i)–1ZiZFormula+Ini.

2.2 Manifest models

Denote the unobserved variance cluster for subject i by Ci = k, k = 1,...,K. A manifest model assumes that all subjects within cluster k have identical subject-level variances {sigma}Formula{equiv} {sigma}Formula for all i such that Ci=k:

Formula


Formula


Formula


Formula

where X~ MULTI(n,pk) is drawn from the multinomial distribution with n assignments and K categories: P(X1 = n1,...,XK = nK;n,p1,...,pK) = Formula where {sum}knk = n and {sum}kpk=1. The parameters {delta}k allow us to relate the variance clusters to observed subject-level baseline covariates xi.

We can rewrite (2.4) as Formula a function of the penalty parameter {lambda}i only. Thus, for a fixed number of latent variance clusters K, the underlying mean trend will be identical for all latent variance cluster assignments for subject i. This assists in identifying the variance clusters and allows the variance cluster parameters to be interpreted separately from the underlying mean trends.

2.3 Hierarchical models

A less restrictive model assumes that each subject has a unique residual variance, drawn from one of K latent cluster distributions. We assume distinct conjugate inverse gamma prior distribution on the variances within each of the clusters:

Formula

where X~Inv-{chi}Formula({xi}2) is drawn from the inverse chi-square distribution with {nu} degrees of freedom and scale parameter Formula . Under this approach, the primary parameter of interest to describe the variance cluster is the mode of the inverse chi-square distribution, Formula The cluster memberships are modeled using the same multinomial logistic form as for the manifest model.


    3. ESTIMATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 
We propose a joint maximum likelihood estimation procedure for the penalized likelihood parameter {lambda}i and the variance cluster parameters Formula for the manifest models and Formula for the hierarchical models. We describe an expectation-conditional maximization (ECM) algorithm (Meng and Rubin, 1993Go) to obtain maximum likelihood estimates (MLEs), maximizing first the penalized likelihood parameter {lambda}i for each subject conditional on the variance cluster parameters, then the variance cluster parameters conditional on the penalized likelihood and the cluster probability parameters, and finally the cluster probability parameters conditional on the penalized likelihood and variance cluster parameters. In all these conditional maximization steps, the indicators of cluster membership in the log-likelihood are replaced with their expected values, namely, the subject-level posterior probabilities of cluster membership conditional on the previous iteration of the maximization algorithm. The random effects ui have been integrated out of the complete data model (see 2.5); hence, they are not required except when computing Formula as in (2.4). Similarly, Formula is a linear function of Formula and does not need to be estimated separately. Details of the ECM algorithms for both the manifest and the hierarchical models are found in Appendix A.

Inference can be obtained by bootstrapping (resampling with replacement among the n subjects). In mixture models such as these, ridges or multiple modes in the likelihood are common, particularly in small samples, so that alternatives for inference such as profile likelihood or the negative of the inverse of the observed or expected observed information matrix that rely on the quadratic approximation to the normal likelihood may no longer be accurate. In addition, multiple start points for the ECM algorithm are required to ensure convergence to a global maximum. In the application, we used (1,...,K), (4,...,4K), and (10,...,10K) as 3 start points for {sigma}Formula in the manifest model and (1,...,K), (2,...,2K), and (3,...,3K) and (1,...,10K), (2,...,2 x 10K), and (3,...,3 x 10K) as 3 start points for {xi}Formula and {nu}k, respectively, to try to find local maxima. However, in our example, all start points converged to the same REML estimate for a given K, which appears to be the global maximum.

3.1 Choosing the number of clusters

The above models assume that the true number of latent clusters K is known. In practice, this is not the case. A number of methods are available to choose the number of clusters, although their accuracy in small sample size settings is often less than ideal. For the manifest models, we report the Bayesian Information Criterion (BIC) of Schwartz (1978)Go using the REML estimates. The BIC measure is given by –2lr + pk log n, where

Formula

Formula is the number of free parameters in the K-cluster model, and n is the number of independent subjects in the sample.

For the hierarchical models, we report the Deviance Information Criterion (DIC) of Spiegelhalter and others (2002)Go. The BIC penalty assumes that the number of parameters is a known quantity; the DIC measure accounts for the fact that, in a hierarchical framework, the number of effective parameters may be unclear: the random effects associated with each subject may "count" as approximately one parameter if the between-variance estimates are large (small degree of shrinkage) and as nearly zero parameters if the between-variance estimates are small (large degree of shrinkage). DIC estimates the number of effective parameters by Formula where Formula and Formula for the restricted likelihood deviance Formula Formula . The DIC measure is then given by Formula Because we do not entertain a fully Bayesian hierarchical model in this manuscript, we treat the second-stage model parameters {nu} and {xi} that govern the distribution of {sigma}Formula as known and replace them with their empirical Bayes estimates to obtain the DIC value.

The posterior distribution of {sigma}Formula given membership in cluster k is Formula also, for X~Inv-{chi}Formula({xi}2) we have Formula and E(log X)=log({nu}/2)+ log {xi}2{Psi}({nu}/2). Thus,

Formula

and

Formula

for Formula

Use of the DIC measure has been criticized in mixture models for underpenalizing complex models (Richardson, in discussion of Spiegelhalter and others, 2002Go, p. 626–627). We retain it here because of the problem of over/undercounting the random subject-level variance effects in the hierarchical setting.


    4. APPLICATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 
Positive affect scores were observed in 35 patients who had experienced a myocardial infarction within the past year and were in treatment at a University of Pennsylvania cardiology clinic. These patients were recruited to participate in a pharmacological and neuroimaging study of elderly patients and included both subjects who met Structural Clinical Interview DSM-IV criteria for threshold minor depressive disorder and those without depression (Kumar and others, 1997Go). Positive affect scores were collected for up to 35 consecutive days. Complete data were available for 20 subjects, while 11 had 31–34 days of observations, 1 had 22 days of observations, and 3 had 5–7 days of observations. We excluded the 3 subjects for whom less than 22 days of observations were available, in order to ensure stable estimates of {lambda}i. Scores yit ranged from 5 to 25, with a mean of 14.8 and a standard deviation of 4.6; the mean within-subjects positive affect scores ranged from 7.0 to 23.5. Figure 1 plots yit for 4 example subjects; estimates of mean trend Formula given by (2.4) are the solid lines shown for each subject in Figure 1. Regressing the standard deviation of the subject-level positive affect scores against the subject-level means shows no evidence of a linear (p =0.58) or a quadratic (p =0.4) trend, suggesting that neither "floor" nor "ceiling" effects are inducing associations between mean trends and daily variability.

We consider whether the variability of the positive affect measures is associated with baseline measure of age (56% over 65 years), gender (84% male), and/or depression (6%). A preliminary 2-stage analysis of variance (ANOVA) using the log of the estimated variance for each subject showed lower variability among older subjects (difference of –0.37 on the log scale, p = 0.17), among males (difference of –0.85, p = 0.018), and among the nondepressed (difference of –0.20, p =0.73), although only the gender difference is statistically significant.

Table 1 reports the results of the cluster size selection procedures (BIC for the manifest model; DIC for the hierarchical model). The 3-cluster model is always favored, both for the manifest and for the hierarchical models, as well as for each of the specific regression models (gender, age, and depression status). We focus the remainder of our analysis on the 3-cluster model. Because of the small number of subjects (32) and the trinomial outcome, we include only one covariate at a time in the regression model for predictors of cluster status.


View this table:
[in this window]
[in a new window]

 
Table 1. Recovering MI patients: BIC measures under manifest models and DIC measures under hierarchical models. BIC for a 1-cluster model is 5356.32; DIC for a 1-class model is 4996.88. Smaller values mean better fit; best fit values in bold

 
4.1 Manifest model

Table 2 reports the results of a latent cluster analysis for a 3-cluster manifest model. Because some of the MLEs for the probability of a cluster membership were converging to 0 or 1, the ECM algorithm was stopped when maxk, l| {delta}kl|≥15; this was indicative of near separation of the clusters with respect to gender. This had no effect on the reported results, since both cluster membership probabilities and variance cluster parameters had converged. The 3-cluster model finds a cluster with a standard deviation slightly greater than 1, slightly less than 2, and somewhat more than 3. (The model appears to be picking up the fact that affect is integer valued rather than truly continuous: the 4-cluster model adds a fourth cluster with a standard deviation of approximately 4.) Depressed subjects were less likely to be in the lowest variance cluster (<1% versus 20% of nondepressed, with a 95% CI of –35% to –5% for the "difference" between depressed and nondepressed in the probability of membership in lowest variance cluster). Males and older persons are associated with the lower variability affect clusters, although only the gender difference is statistically significant: an estimated 26% of men belong to the lowest variance cluster, versus 1% of women (95% CI 6%–31% for the difference between males and females in the probability of membership in lowest variance cluster).


View this table:
[in this window]
[in a new window]

 
Table 2. Recovering MI patients: REML estimates of variance clusters {sigma}Formula and cluster membership probabilities from 3-class manifest model, and REML estimates of variance cluster posterior mode {theta}K =Table 2 and cluster membership probabilities from 3-class hierarchical model (95% confidence intervals via bootstrap in subscript)

 
Figure 2 plots the posterior probabilities of belonging to the 3-cluster manifest model using age as a predictor of cluster status, replacing {lambda}i, {delta}k, and {sigma}Formula in (A.1) with their MLEs. (Results using gender and depression as predictors were similar.) Subjects with high posterior probability of belonging to a given cluster are close to the X in the figure: the large majority of subjects belong to one cluster and only one cluster with a very high posterior probability for both models.


Figure 2
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Recovering MI patients: MLEs of posterior probability of cluster membership by subject under manifest and hierarchical models, using age as a covariate: 3-cluster model. Distances from X's proportional to posterior probability of cluster membership.

 
4.2 Hierarchical model

Table 2 reports the results of a latent cluster analysis for a 3-cluster hierarchical model. As with the manifest model, convergence to the boundary of the parameter space sometimes occurred due to near separation of the clusters with respect to gender and depression status, so the ECM algorithm was stopped when maxk,l| {delta}kl|≥15. As with the manifest model, both cluster membership probabilities and variance cluster parameters had converged when the ECM algorithm was stopped. The central tendency measure for the variance is given by Formula , the mode of an Inv-{chi}Formula({xi}Formula) random variable.

The hierarchical class model again centers the variances around clusters of 1, 4, and 9, although there are a larger fraction of subjects in the smaller variance cluster than in the manifest model. Subjects are also somewhat less "cleanly" identified than in the manifest model, as Figure 2 shows, with intermediate variance subjects not as well defined as in the manifest class. However, use of the hierarchical model has sharpened associations between covariates and the cluster type, with depressed subjects being model likely to belong to an intermediate variability cluster than nondepressed subjects (>99% versus 22%, 95% CI for difference of 55%–90%) and less likely to belong to a cluster of low variability (<1% versus 28%, 95% CI for difference of –50% to –12%) or high variability (<1% versus 50%, 95% CI for difference of –64% to –13%). As in the manifest model, there is a positive association between males and older persons and the lower variability class, although these associations are not significant.

Although the ECM algorithm avoids computation of the individual {sigma}Formula values, the posterior distribution of {sigma}Formula given membership in cluster k is Formula Thus,

Formula (4.6)

(see Appendix A.2). Figure 1 shows the nonparametric underlying trend estimated by (2.4) under the hierarchical model using age as a covariate, together with (a) the standard deviation of the positive affect measures over the period and (b) estimates of E({sigma}Formula|yi) obtained by replacing the parameters in (4.6) with the REML estimates. "Detrending" the positive affect measures has little effect on estimates of day-to-day variance when trend lines are flat but yield substantial difference when longer-term trends appear to be present.

Figure 3 compares the REML estimates of the smoothing parameter {lambda} under the manifest and hierarchical models on the log scale. For 14 of the subjects under the manifest model and 13 of the subjects under the hierarchical model, the REML estimate of {lambda}i={infty}, corresponding to a linear model relating time to positive affect; we set these values to e10 in order to facilitate plotting. Generally, the values of Formula were similar under the 2 approaches, except for 1 subject for whom Formula only under the hierarchical model and 2 subjects for whom Formula only under the manifest model; in the former case, a linear trend would be fit only for the hierarchical model, while the manifest model would suggest substantial nonlinearity, whereas the reverse would be true in the latter cases. A visual inspection of these subjects suggests they are somewhat difficult to classify with respect to nonlinearity. An example of a subject for which Formula only under the manifest model is shown in Figure 1(c); an example of a subject for which Formula only under the hierarchical model is shown in Figure 1(d).


Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. log (Formula) under manifest model versus log (Formula) under hierarchical model.

 

    5. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 
We consider whether the daily variability of positive affect measures may be related to depression or other covariates of interest. Our interest is in the day-to-day variability within a subject, not in variability across subjects or even in long-term varability within subjects, so we proceed by treating long-term mean trends as nuisance parameters, modeled by nonparametric cubic splines. This removes linear or smooth nonlinear trends and allows a more accurate measure of the day-to-day changes in positive affect. The resulting residuals were then classified according to a latent cluster model that considered whether clusters of residual variance could be identified and then related to baseline covariates of interest. We considered positive affect measures from a sample of recovering MI patients. We found that depressed subjects were more likely than nondepressed subjects to belong to intermediate levels of variability rather than low or high levels. We also found that men were more likely to belong to clusters of low daily affect variability. Older subjects were also associated with low daily affect variability, but evidence for this association was relatively weak.

The results of the latent cluster analysis were generally consistent with alternative preliminary analyses using a 2-step ANOVA after log-transforming the standard errors of each subject's positive affect. Advantages of the latent cluster analysis approach over a 2-step ANOVA or regression approach in this context include (1) joint estimation of the smoothing and variance cluster parameters, (2) avoiding overinterpretation of the "resolution" of affect measures for clinically relevant information, and (3) distinguishing situations where categorical covariates may be associated with both very low and very high levels of variability.

Nearly half of the subjects had some missing data during their follow-up. Since the model assumes that the day-to-day variability in positive affect for each subject is constant over time, such missing data will reduce the efficiency of the variance cluster parameter estimation, but should not introduce bias unless the model is misspecified (i.e. later observations tend to have either increased or decreased variability). In the example considered, the intermittent dropout that characterizes most of the missing observations in the example should have very limited impact even in the presence of model misspecification.

We consider both manifest model and hierarchical variance mixture model, which parallel the manifest (Roeder and others, 1999Go) and hierarchical (Muthen and Shedden, 1999Go) GGMMs. In this manuscript, however, instead of the growth curve for each subject now either belonging to a fixed-effect class (manifest) or being a random effect drawn from a fixed distribution with a class of prior parameters (hierarchical), it is the residual variances for each subject that either belong to a fixed class (manifest) or are drawn from a distribution with a class of prior parameters (hierarchical). While the hierarchical models contain the manifest models as a special case (as {nu}k-> {infty}), the small sample size means that the variance cluster parameters will be less well estimated in the hierarchical than in the manifest model. Both manifest and hierarchical approaches have been considered to better illustrate the underlying ideas of the method.

An alternative analysis would be to replace the nonparametric regression estimator (2.1) with a standard linear mixed model. This could easily be accomplished by replacing the matrix Zi in (2.3) with Ti, allowing a separate (random) slope and intercept to be estimated for each subject. However, this would assume that the underlying trends in affect score are linear, which appear to be contradicted by the observed data. Since the nonparametric model includes the linear model as a special case, little would be gained by this approach, and subjects with highly nonlinear trends would tend to having their residual variance overestimated due to underfitting of the linear regression estimator. In cases where the number of observations per subject is limited (ni<30), such an approach may be required, although it may be extended to consider higher-order polynomials. A linear mixed model would also allow identification of time-dependent within-subject variance, again at the cost of making prespecified assumptions about the underlying mean trend.

Other extensions of the method discussed here are possible. Alternative priors on the subject-level variance such as a normal distribution on log{sigma}Formula were considered; they yielded similar results but were less analytically tractable for matters such as DIC computation. A fully Bayesian approach that posits known hyperpriors on the smoothing and variance cluster parameters is also possible, if more computationally intensive. Finally, our analysis has focused on classifying day-to-day variability in affect, treating the affect variability resulting from longer-term underlying smooth trends as a nuisance parameter. One might instead consider relating both the mean and variance to subject-level covariates of interest. Thus, it may be of interest to consider parametric forms for the mean to improve the interpretability of the results. Alternatively, both the daily residual variance and the underlying smoothing parameter could be assigned to a single "stability" cluster defined by the latent variable Ci: ({sigma}Formula,{lambda}i){equiv}({sigma}k2,{lambda}k) for all i such that Ci=k; such an analysis would combine information about daily variability and variability in longer-term mean trends into a single measure.


    APPENDIX A
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 

A.1 ECM algorithm for manifest model

Assuming that the latent cluster membership is known, the complete data-restricted log-likelihood under the manifest model is given by

Formula

where

Formula

for Formula and Formula Formula

The E-step of the ECM algorithm for the manifest model involves computation of the posterior probability of cluster membership estimated using the previous iteration of the parameters:

Formula (A.1)

The maximization step involves 3 conditional maximizations. The restricted log-likelihood involving {lambda}i with expectation taken with respect to the cluster membership indicators at step r–1 is given by

Formula

where summation over the indicators of the cluster membership is replaced with their expectation obtained at the E-step. The score equation for {lambda}i is then given by

Formula

where Formula The rth maximization (M-step) of {lambda}i conditional on {sigma}2 is obtained by solving Formula via a modified bisection method that ensures we are maximizing lr({lambda}i). We define the endpoints for the bisection method at the rth iteration as {lambda}Formula={lambda}Formula/100 and {lambda}Formula=100{lambda}Formula, respectively. If Formula , the endpoints encompass the root of the score equation and the standard bisection method is iterated to obtain Formula and Formula , both endpoints are smaller than the root of the score equation, and {lambda}Formula is set to {lambda}Formula; similarly, if Formula is set to {lambda}Formula. The iterations quickly move the endpoints to encompass the maximizing values of {lambda}i in an efficient fashion.

The rth maximization (M-step) for {sigma}Formula conditional on {lambda} is the standard REML estimator of the variance

Formula

The rth maximization (M-step) for {delta} is obtained using the Newton–Raphson algorithm:

Formula (A.2)

where

Formula

The constant A is used to adjust the length of the gradient to ensure that the rth iteration of the algorithm is maximizing l({delta}(r)).

A.2 ECM algorithm for hierarchical model

Under the hierarchical model, the complete data-restricted log-likelihood is given by

Formula

where f(yi;ni,{lambda}i,{nu}k,{xi}k) is obtained by integrating the subject-level variance {sigma}Formula out of the joint distribution of the residuals conditional on the subject-level variance and the prior for the subject-level variance:

Formula

The E-step computes the posterior probability of cluster membership as

Formula

The conditional rth maximization for {lambda}i maximizes

Formula

whose score equation is given by

Formula

We solve Formula via the modified bisection method described in Appendix A.1.

The conditional rth maximization for {nu}k,{xi}Formula requires a Newton–Raphson step:

Formula

where

Formula

for Formula and Formula The rth maximization (M-step) for {delta} proceeds as under the manifest model (see (A.2)).


    ACKNOWLEDGMENTS
 
The author would like to thank the editors, the associate editor, and an anonymous reviewer, as well as Mary Sammel, Thomas Ten Have, Joseph Gallo, and Hilary Bogner for their helpful comments. This research was supported in part by the National Institute of Mental Health Grant P30-MH066270. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLUSTER MODELS...
 3. ESTIMATION
 4. APPLICATION
 5. DISCUSSION
 APPENDIX A
 REFERENCES
 

    Barnard J, McCulloch R, Meng X.-L. Modeling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage. Statisticia Sinica (2000) 10:1281–1311.

    Carroll RJ. Variances are not always nuisance parameters. Biometrics (2003) 59:211–220.[CrossRef][Web of Science][Medline]

    Elliott MR, Ten Have TR, Gallo J, Bogner HR, Katz IR. Using a Bayesian latent growth curve model to identify trajectories of positive affect and negative events following myocardial infarction. Biostatistics (2005) 6:119–143.[Abstract]

    Furlan PM, Kallan MJ, Ten Have T, Lucki I, Katz I. SSRIs do not cause affective blunting in healthy elderly volunteers. American Journal of Geriatric Psychiatry (2004) 12:323–330.[CrossRef][Web of Science][Medline]

    Green PJ. Penalized likelihood for general semi-parametric regression models. International Statistical Review (1987) 55:245–260.[Web of Science]

    Harlow SD, Lin X, Ho MJ. Analysis of menstrual diary data across the reproductive life span: applicability of the bipartite model approach and the importance of within-woman variance. Journal of Clinical Epidemiology (2000) 53:722–733.[CrossRef][Web of Science][Medline]

    Hastie TJ, Tibshirani RJ. Generalized Additive Models (1990) London: Chapman and Hall.

    Kumar A, Schweizer E, Zhisong J, Miller D, Bilker W, Swan LL, Gottlieb G. Neuroanatomic substrates of late-life minor depression: a quantitative magnetic resonance imaging study. Archives of Neurology (1997) 54:613–617.[Abstract/Free Full Text]

    Meng X-L, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika (1993) 80:267–278.[Abstract/Free Full Text]

    Muthen B, Shedden K. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics (1999) 55:463–469.[CrossRef][Web of Science][Medline]

    Robinson GK. That BLUP is a good thing: the estimation of random effects (with discussion). Statistical Science (1991) 6:15–51.[CrossRef]

    Roeder K, Lynch KG, Nagin DS. Modeling uncertainty in latent class membership: a case study in criminology. Journal of the American Statistical Association (1999) 94:766–776.[CrossRef][Web of Science]

    Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression (2003) Cambridge, UK: Cambridge University Press.

    Schwartz JE, Stone AA. Strategies for analyzing ecological momentary assessment data. Health Psychology (1998) 17:6–16.[CrossRef][Web of Science][Medline]

    Schwartz G. Estimating the dimension of a model. Annals of Statistics (1978) 6:461–464.[CrossRef][Web of Science]

    Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society (2002) B64:583–639.[CrossRef]

    Thompson R, Bogner HR, Coyne JC, Gallo JJ, Eaton WW. Personal characteristics associated with consistency of recall of depressed or anhedonic mood in the 13-year follow-up of the Baltimore Epidemiologic Catchment Area Survey. Acta Psychiatrica Scandinavica (2004) 109:345–354.[CrossRef][Web of Science][Medline]

    Walls TA, Schafer JL, eds. Models for Intensive Longitudinal Data (2006) New York: Oxford University Press.

    Wahba G. Improper priors, spline smoothing, and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society (1978) B40:364–372.

    Wahba G. A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. The Annals of Statistics (1985) 4:1378–1402.

    Wahba G. Spline Models for Observational Data (1990) Philadelphia, PA: SIAM.

    Wang Y. Smoothing spline models with correlated random errors. Journal of the American Statistical Association (1998) 93:341–348.[CrossRef][Web of Science]

    Received May 9, 2006; revised December 13, 2006; accepted for publication January 23, 2007.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


    This article has been cited by other articles:


    Home page
    StrokeHome page
    J. H. Pan, X. Y. Song, S. Y. Lee, and T. Kwok
    Longitudinal Analysis of Quality of Life for Stroke Survivors Using Latent Curve Models
    Stroke, October 1, 2008; 39(10): 2795 - 2802.
    [Abstract] [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    8/4/756    most recent
    kxm003v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Elliott, M. R.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Elliott, M. R.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?