Skip Navigation



Biostatistics Advance Access published online on July 11, 2007

Biostatistics, doi:10.1093/biostatistics/kxm026
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
9/2/249    most recent
kxm026v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Desantis, S. M.
Right arrow Articles by Betensky, R. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Desantis, S. M.
Right arrow Articles by Betensky, R. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

A penalized latent class model for ordinal data

Stacia M. Desantis*, E. Andrés Houseman and Brent A. Coull

Department of Biostatistics, Harvard University, 655 Huntington Avenue, Boston, MA 02115, USA sdesanti{at}hsph.harvard.edu

Anat Stemmer-Rachamimov

Department of Pathology, CNY-7015, Massachusetts General Hospital, 149, 13th Street, Charlestown, MA 02129, USA

Rebecca A. Betensky

Department of Biostatistics, Harvard University, 655 Huntington Avenue, Boston, MA 02115, USA

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
Latent class models provide a useful framework for clustering observations based on several features. Application of latent class methodology to correlated, high-dimensional ordinal data poses many challenges. Unconstrained analyses may not result in an estimable model. Thus, information contained in ordinal variables may not be fully exploited by researchers. We develop a penalized latent class model to facilitate analysis of high-dimensional ordinal data. By stabilizing maximum likelihood estimation, we are able to fit an ordinal latent class model that would otherwise not be identifiable without application of strict constraints. We illustrate our methodology in a study of schwannoma, a peripheral nerve sheath tumor, that included 3 clinical subtypes and 23 ordinal histological measures.


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
Schwannomas are benign nerve sheath tumors that occur in one of the following 3 clinical settings: as isolated sporadic tumors, in the setting of neurofibromatosis 2 (NF2), or in a recently described disease called schwannomatosis. Patients with these 3 conditions have different genetic predispositions and prognoses. Distinguishing schwannomatosis from NF2 patients and from patients with sporadic tumors remains a clinical challenge (Polliani and others, 2005Go). Diagnostic criteria have only very recently been established for schwannomatosis (MacCollin and others, 2005Go). In addition, some patients meet the clinical criteria for more than one subset (Baser and others, 2006Go). As a result, it is of interest to define unique histological characteristics of schwannomas arising in the different clinical subsets that could aid in diagnosis. The data in this study consist of 16 ordinal and 7 binary histological characteristics of schwannomas, all of which are grouped into 9 broad histological features (Table 1). Our first goal is to ascertain whether the patients cluster into a small number of classes that have similar histology, and the second is whether these data-driven classes correspond to current clinical diagnosis. The presence of both ordinal and binary data types requires a methodology that can handle mixed data. Furthermore, we wish to incorporate subject matter knowledge, such as histological subsets, into the data analysis.


View this table:
[in this window]
[in a new window]

 
Table 1. Schwannoma data

 
High-dimensional data such as the schwannoma data are becoming quite common in biomedical applications. Preliminary analyses indicated that the 23 variables are highly correlated, making multivariate regression techniques unattractive. One multivariate approach to classification analysis is the use of latent class modeling. Since its introduction decades ago (e.g. Lazarsfeld, 1950Go; McHugh, 1956Go), latent class modeling has been successfully used for classification in the presence of correlated outcomes. It has been applied to the analysis of psychosocial, biomarker, inter-rater, educational testing, quality-of-life, and genetic data (e.g. Bartholomew and Knott, 1999Go; Agresti and Lang, 1993Go; Dayton and Macready, 1988Go; Bandeen-Roche and others, 1997Go; Houseman and others, 2006Go). The latent classes (or clusters) are determined by similar responses to several observed binary, nominal, ordinal, or continuous variables, conditional on class membership. A common assumption is that, conditional on class membership, probabilities of response are homogenous over individuals and variables are independent. Thus, the unobservable latent classes are assumed to account for the observed association in the variables.

Constrained likelihood methods of parameter estimation have been developed when the number of observed variables is a substantial fraction of the number of observations and local identifiability of model parameters is questionable. Use of equality and inequality constraints of class-specific and latent class parameters enables a parsimonious representation of the latent class model. For example, some authors have set conditional probabilities given latent class membership to be equal for certain variables (Lazarsfeld and Henry, 1968Go; Hoijtink, 1998Go; Mooijaart and Van der Heijden, 1992Go), and others have constrained them to increase or decrease across classes (Agresti and Lang, 1993Go; Croon, 1990Go). This approach is appealing when there are natural constraints for the particular application. However, when the natural constraints are not absolute, this approach offers no flexibility.

Houseman and others (2006)Go developed a penalized approach to categorical latent class parameter estimation for binary data. This allowed them to fit an otherwise unidentifiable latent class model with nominal latent classes to moderate-dimensional genetic data. Such a model has not yet been developed for ordinal data, which are quite common in biomedical applications such as the aforementioned schwannoma study. Penalization is preferable to the use of explicit parameter constraints in the absence of subject matter knowledge that yields natural restrictions on the probabilities. Even in the presence of natural constraints, the approach allows for more flexibility and large penalties can mimic the application of constraints.

Histological classification for different tumor types uses both a binary system (present, absent) and quantitative 3-level systems (mild, moderate, marked or few, some, many) depending on the tumor type and the feature. The way to determine the best system is to see which is best at predicting the correct diagnosis by comparing the prediction to the gold standard diagnosis. Many histological classifications evolve with time as researchers compare different scoring systems. The ordinal scoring considered for the schwannoma data performed well at predicting the correct diagnosis, providing a biological rationale for the treatment of latent classes as ordinal. From a statistical point of view, it is natural to consider ordinal latent classes when considering ordinal variables. Agresti and Lang (1993)Go commented that, "When observed categorical scale is ordinal, one can further improve model parsimony and obtain simpler interpretations by fitting latent class models that utilize the ordinality." As the motivating schwannoma data contain a large number of variables relative to subjects, it was not possible to fit an unrestricted, locally identifiable, ordinal latent class model with more than 2 classes using the approach of Agresti and Lang (1993)Go. We encountered problems with the estimation of parameters for variables for which not every ordinal level was represented. Moreover, the penalized latent class model developed by Houseman and others (2006)Go could not be fit to the dichotomized histological scores, due to the loss of information resulting from dichotomizing the ordinal variables. As a result, the schwannoma data require a new technique for latent class modeling that enables stable parameter estimation. In this paper, we develop a penalized estimation method that accommodates both ordinal and categorical variables in conjunction with ordinal latent classes, which are typically assumed when the observed variables are ordinal, and is estimable even in the presence of missing ordinal levels.

Section 2 provides an introduction to latent class models for nominal categorical data. Section 3 presents the latent class model for ordinal data using the linear-by-linear latent class model proposed by Agresti and Lang (1993)Go. Section 4 proposes a penalized latent class model for ordinal data. Section 5 discusses reparameterization, penalty selection, and selection of starting values for the optimization algorithm. Section 6 gives an analysis of the schwannoma data. Section 7 is a concluding discussion.


    2. LATENT CLASS MODEL FOR UNORDERED CATEGORICAL VARIABLES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
For each subject i, i = 1,2,...,N, we observe M categorical variables (Yi1,...,YiM). In the latent class literature, these variables have been called observed, indicator, manifest, item, or response variables. This discussion will adhere to the term "variables." We refer to the realization of a variable as a "response." Let {eta}j denote probability of membership of a subject to unobserved latent class j. Let Ki denote the latent class to which subject i belongs, with Ki taking values from 1,...,J. Note that {eta}j = P(Ki = j) and {sum}Formula{eta}j = 1. The variables Yim take values from {1,...,Cm}, where Cm ≥ 2. We denote the probability distribution of Yim given latent class as {pi}mj, with {pi}mj(c) = P(Yim = c|Ki = j). Note that we could assume a regression model for this probability to allow for nonexchangeable subjects (Bandeen-Roche and others, 1997Go), but to simplify the presentation we do not do so. For unordered categorical variables, we can parameterize the {pi}mj(c) as

Formula (2.1)

The ß's are unknown latent class-specific parameters whose collection is denoted as ß = (ß11',...,ßMJ'), where each ßmj' is a vector of length Cm – 1 (one parameter for each of Cm – 1 possible responses). Let

Formula

Based on the standard assumption that within latent class, variables are independent, the joint probability of Yi is expressed as

Formula (2.2)

and the log-likelihood for all n observations is

Formula (2.3)

The model can be fitted by maximizing the log-likelihood in (2.3) with respect to the parameters {eta}j and ßmj. The log-likelihood is easily differentiated to obtain the score equations. Setting the right-hand side of each score equation equal to zero yields the estimating equations for the Formula , which can be expressed as functions of the posterior probabilities of latent class membership, given the observed data. The posterior probabilities are

Formula (2.4)

The probabilities in (2.4) can be used for the classification of subjects into latent classes using, for example, highest posterior probability and are calculated in the expectation step of the expectation–maximization (EM) algorithm (Dempster and others, 1977Go). Considering latent class membership as missing data, the score equations can be solved using a variant of the EM algorithm, which involves iterating between these posterior probabilities and maximum likelihood estimates. In the context of latent class models, this estimation approach is explained in Goodman (1974)Go and Bartholomew and Knott (1999)Go.


    3. LATENT CLASS MODEL FOR ORDINAL RESPONSES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
One way to model ordinal variables is via an adjacent categories logit model. This model specifies a common effect of response level across latent classes and is computationally simpler than alternative models as it does not involve cumulative probabilities. The "linear-by-linear latent class model" implies a stochastic ordering of response distributions for ordered classes (Agresti and Lang, 1993Go):

Formula (3.1)

where sj is a score corresponding to class j. In the absence of any other motivation for class scores, in this paper we take sj = j. Conditional on class, {alpha}mc + {delta}m is the log-odds of response level c relative to c + 1 in latent class 1. {delta}m is the log-odds ratio of response level c versus c + 1 for a unit increase in ordinal latent class score sj for variable m. It follows from (3.1) that

Formula (3.2)

This model is more parsimonious than that given in (2.1) because the probabilities are functions of parameters, {delta}m and {alpha}mc, that do not depend on latent class. In the ordinal setting, the probabilities relate to latent class only through the class score sj. Although sj = j throughout this paper, if different subject matter considerations arose, another possibility would be to allow for unequal spacing of latent classes through the score sj. It is possible to allow sj to vary between 0 and 1. An example is to standardize the latent class score, sj = (j – 1)/(J – 1). In this case, the scale of {delta}m is the same for any chosen J, thereby making the penalty parameter roughly comparable across models with different numbers of classes. This model formulation can accommodate a combination of ordinal and binary variables, such as in the schwannoma example. The log-likelihood contribution for the ith subject for a latent class regression model with M ordinal and binary variables is written as

Formula (3.3)

As for the case of unordered categorical variables, the EM algorithm, with a Newton–Raphson maximization step, can be used to solve for {alpha}, {delta}, and {eta}, treating the latent variable indicators, ki, as missing.


    4. PENALIZED ORDINAL LATENT CLASS MODEL
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
A potential drawback of the ordinal latent class model for high-dimensional data is its large number of parameters. For example, if Cm = C, there are Mx(C – 1) intercepts, M slopes, and J 1 prevalence parameters. The schwannoma data contain 16 ordinal variables with C = 3 (the 2 highest categories were collapsed to give C = 3 since very few values in the highest category were actually observed) and 7 binary variables (C = 2), resulting in 64 parameters for a 3-class model. Not surprisingly, with only 84 subjects we could not obtain a fit using the ordinal latent class model of Agresti and Lang (1993)Go. Estimation of {alpha} for variables in which not every response level was observed was particularly problematic in the unpenalized version of the model. Thus, when M is a substantial fraction of n, we propose to maximize a penalized log-likelihood of the form

Formula (4.1)

where C({alpha},{delta},{Lambda}1,{Lambda}2) is a penalty that depends on {delta} and {alpha}, {Lambda}1 is a diagonal penalty matrix for the {alpha} parameters, and {Lambda}2 is a diagonal penalty matrix for the {delta} parameters. Specifically, we consider a ridge penalty of the form C({alpha},{delta},{Lambda}1,{Lambda}2) = {alpha} '{Lambda}1{alpha} + {delta} '{Lambda}2{delta}. Maximization of (4.1) directly penalizes {alpha} and {delta} from the model in (3.1), with the goal of biasing these estimates toward the null.

Since a 1-unit change in response level on a binary variable is considered to be qualitatively different from a 1-unit change in response level on an ordinal variable, we differentially penalize ordinal and binary variables. Thus, we allow the diagonal elements of {Lambda}1 and {Lambda}2 to differ according to which type of variable that diagonal element penalizes. In the presence of ordinal outcomes, we consider separate penalties for the intercepts and slopes from (3.1). The ridge penalty produces an additional term in the score functions for the unpenalized likelihood represented as (Formula ):

Formula

where Formula , and {lambda}1mjc and {lambda}2mj represent the diagonal smoothing parameters of the penalty matrices for {delta} and {alpha}, respectively. For computational efficiency, we set the penalties to be constant for all j and c. Since we penalize ordinal and binary variables differently, let {lambda}1mo,{lambda}2mo and {lambda}1mb,{lambda}2mb be the respective diagonal penalty parameters of {Lambda}1 and {Lambda}2 that penalize the ordinal and binary variables. The penalties are allowed to differ for ordinal versus binary variables but are the same for all ordinal and for all binary variables, respectively, so we now drop the "m" subscript. The reparameterization in Section 5 will require a modest expansion of this notation. The penalized likelihood is maximized by an adaptation of the EM algorithm. Paralleling the maximum likelihood strategy developed by Bandeen-Roche and others (1997)Go, the estimation technique includes a Newton–Raphson step for maximization:

  1. Fix J,{lambda}Formula,{lambda}Formula,{lambda}Formula,{lambda}Formula and compute posteriors Formula for each subject i.
  2. Apply Newton–Raphson to obtain Formula using a weighted adjacent categories logit model with Formula as weights.
  3. Set Formula equal to Formula .
  4. Iterate Steps 1 through 3 until a convergence criterion is met.
Conditional on the penalties {lambda}Formula,{lambda}Formula,{lambda}Formula,{lambda}Formula and the number of classes, J, an estimate of the variance of the estimator Formula is obtained by a Taylor expansion. Fixing the penalty, the variance is HFormulanVHFormula, where Hc is the Hessian matrix of Lc and V is the asymptotic variance of the score component Formula , which is estimated empirically from the data, as in Houseman and others (2006)Go.


    5. ORTHOGONAL REPARAMETERIZATION, PENALTY SELECTION, AND STARTING VALUES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
In the schwannoma study, there is a natural grouping of the histological variables into 9 broad features of tumor cells. Six features contain ordinal and 3 contain binary variables, as seen in Table 1. Because of this structure, we choose an orthonormal reparameterization directly to contrast these broad features and apply the penalty to the reparameterized parameters, allowing important feature contrasts to be penalized less than detail contrasts. In the strictly binary setting, such a reparameterization improved the predictive power of the model (Houseman and others, 2006Go). It is desirable to leave a term unpenalized so that classes can be distinguished in mean response, that is, so that there is a parameter that represents the average {alpha} and {delta} over all m. This is accomplished by allowing the first vector of the orthonormal matrix to be a vector of constants. Hereafter, the term "feature contrasts" refers to vectors of the orthogonal contrast matrix that contrast the 9 "groups" of variables or features and the term "detail contrasts" refers to the supplemental vectors of the MxM orthonormal matrix. To motivate the transformation, first note that (3.2) can be written as

Formula (5.1)

where xm is a unit vector of length M with 1 in the mth place, {alpha}c is a length M vector, and {delta} is the length M vector ({delta}1,...,{delta}M)'. In order to implement penalization, we generalize x to include any set of vectors {xm} such that xFormulaxl = 1 if m = l and 0 otherwise. Let U = (x1,...,xM) represent an orthonormal matrix of dimension MxM such that U ' U = I. The matrix U can be chosen such that its elements represent contrasts between the features of interest. In the presence of mixed variable types as in the schwannoma example, to accommodate M1 ordinal and M2 binary variables, we allow U to have a block diagonal structure, where the first block contains M1 orthogonal contrasts and the second contains M2 orthogonal contrasts. More specifically, the first block of U contains the 6 orthogonal contrasts of interest for ordinal variables (an "intercept" term that allows for a mean response plus 5 feature contrasts), which are supplemented using Gram–Schmidt orthogonalization with 10 detail contrasts, for a total of M1 = 16 vectors. Similarly, the second block of U contains the 3 orthogonal contrasts of interest for binary variables, which are supplemented with 4 detail contrasts. Penalization is then performed on the transformed parameters, U{alpha} and U{delta}. The penalty term is then C({alpha},{delta},{Lambda}1,{Lambda}2) = {alpha} ' U '{Lambda}1U{alpha} + {delta} ' U '{Lambda}2U{delta}, where the feature-based parameterization of the linear model allows us further to decompose {Lambda}1 and {Lambda}2 in order differentially to penalize features and details, and ordinal and binary variables. To illustrate this, let the diagonal penalty matrix Formula , where Formula is an M1xM1 diagonal matrix with ones on the diagonals corresponding to ordinal feature contrasts and zeros elsewhere, Formula is an M2xM2 diagonal matrix with ones on the diagonals corresponding to the ordinal detail contrasts and zeros elsewhere. Similarly, Formula is a diagonal matrix with ones on the diagonals corresponding to binary feature contrasts and zeros elsewhere and Formula is a diagonal matrix with ones on the diagonals corresponding to the binary detail contrasts and zeros elsewhere. We can decompose the penalty matrix for {delta} in the same way; Formula . If aFormula < aFormula and aFormula < aFormula , then the penalty matrix {Lambda}1 constrains {alpha} for the detail contrasts more than the feature contrasts. The same logic holds for aFormula < aFormula, aFormula < aFormula, {Lambda}2, and {delta}.

Because we penalize separately intercepts versus slopes, features versus details, and ordinal versus binary variables, there are a total of 8 penalties, which necessitates a careful search methodology in the interest of computational efficiency. To accomplish this, we first fit the model with only ordinal data, which requires that we search over 4 penalty parameters. Treating as fixed the optimal penalties obtained for the ordinal variables, we then fit a full model including the binary data and search for the remaining penalties. In our experience, we have found that results depend little on the exact value of the penalties and more on their order of magnitude.

The choice of smoothing parameter(s) can be informed by using an analog of the Bayesian Information Criterion (BIC) adapted to the present latent class setting: BIC = tr(HFormulanV)log(n) – 2L({eta},{alpha},{delta}). This criterion penalizes models with more parameters, which are calculated as an effective degrees of freedom, df = tr(HFormulanV), where V is an estimate of the asymptotic variance of the score component. In high dimensions, nV should be estimated empirically from the data. BIC has previously been used in latent class model selection for non-nested models, such as those with varying J (Lin and Dayton, 1997Go), and tends to give consistent results. Houseman and others (2005) showed that BIC was superior to several other information criteria in selecting the number of latent classes. Thus, we use BIC in the schwannoma application for selecting both the number of classes and the smoothing parameters.

Several authors have addressed the sensitivity of the maximum of the log-likelihood, L, in (4.1) to the choice of initial values supplied to the optimization algorithm (Bandeen-Roche and others, 1997Go; Huang and Bandeen-Roche, 2004Go). The model proposed in Section (3.1) is not globally identifiable and in our experience, the EM algorithm may reveal different maxima for different starting values. We have found via simulation that when there is large separation between class-specific probabilities, it is usually sufficient to obtain reasonable estimates for class-specific probabilities from a univariate adjacent categories logit regression of observed variables Yi against an estimate of latent class membership Ki. To estimate Ki, we have had success using various clustering algorithms including K-means and hierarchical clustering. In this paper, we use a hierarchical clustering technique called Divisive Analysis (DIANA), which is a part of the R library "cluster." For each of these algorithms, the number of clusters is set to the number of classes, J, for the model we wish to fit. Posterior weights are input directly into the EM algorithm by slightly perturbing the cluster assignment. In the case where the log-likelihood from (4.1) is flat, that is, where conditional probabilities do not vary markedly over ordered latent classes, several EM iterations using starting values obtained from different clustering procedures should be used to assess convergence to a local maximum, as different starting values often converge to different maxima. The proper solution can be determined by observing to which solution the algorithm converges for various starting values and noting whether the Hessian matrix of the penalized likelihood, Hc, is positive definite as this indicates whether that model is identifiable at this point in the parameter space. For the schwannoma application, we used this suggested method. A small simulation study was conducted to study the behavior of our proposed methodology in two simulation scenarios. Results are presented in the supplementary material (http://www.biostatistics.oxfordjournals.org). In these simulations, the trend in mean square error demonstrates the bias-variance tradeoff, as controlled by the smoothing parameter. We also demonstrate the benefit of the ridge penalty when the effect size differs across observed variables, which is similar to the situation observed in the schwannoma data.


    6. APPLICATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
In this section, we consider data on 84 peripheral schwannomas obtained from 59 patients previously characterized clinically and by molecular analysis as sporadic, NF2-associated, and schwannomatosis-associated tumors (Poliani and others, 2006Go). The tumors were obtained from the Neurofibromatosis clinic at Massachusetts General Hospital. Twenty-six ordinal and binary histological variables were measured in the schwannomas and 23 variables (16 ordinal and 7 binary, see Table 1) showed enough variation to be considered in our analysis.

It is desirable to fit a 3-class model since there are 3 schwannoma diagnoses of interest. The penalized latent class model from Houseman and others (2006)Go with J = 3 could not be fitted to the binary version of the schwannoma data (with responses of 1, 2, or 3 combined into one level). An adequate solution could not be obtained for the several starting values that were supplied. Instead, results implied that either a 1- or 2-class model was more appropriate. We also applied the unpenalized ordinal latent class model to the data and could not obtain a fit. Again, despite several initial values supplied to the algorithm, an adequate 3-class model with positive-definite Hessian matrix could not be obtained. Thus, not only was it essential fully to utilize the ordinality of the variables in order to fit the desired 3-class model but also penalization in the context of ordinal variables was necessary to obtain a fit. Applying a ridge penalty of 0.0001 for features and 0.001 for details to intercept and slope parameters enabled a sensible ordinal latent class model to be fitted to the schwannoma data.

We considered penalized models with J = 2 and J = 3. Ordinal levels 0, 1, and 2 in the following analysis refer to 0, 1, and (2 or 3) of that particular trait. The 7 binary variables corresponded to a yes or no (1 or 0, respectively) for the presence of the binary characteristics represented in Table 1. As shown in Table 1, the histological characteristics of schwannomas are grouped into 9 broader categories of histological features including growth pattern, malformed or abnormal blood vessels, inflammation, trapped axons, as well as an "other" category containing 5 features: nerve edema, nerve inflammation, intraneural growth pattern, and presence of protein pools and cysts. We contrasted these features as described in Section 5. Thus, C = 3 for ordinal and C = 2 for binary variables. In order to stabilize estimation, we penalized both the intercepts and the slopes in the context of the feature-based parameterization from Section 5. Feature contrasts were penalized less than detail contrasts, requiring a grid search over the feature and detail penalty parameters for {alpha} and {delta}: aFormula, aFormula, aFormula, and aFormula, where aFormula < aFormula and aFormula < aFormula. For features, we considered (0.0001, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.5, 1, 2) and for details, we considered (0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.5, 1, 2, 5), where features were constrained to be less than detail penalties for both ordinal and binary variables. Given the trade-off between accuracy and computation time, using a coarse grid of values to search for the smoothing parameters was more efficient than performing an exhaustive search since we have found that the shrinkage depends more on the magnitude of penalty parameters than the exact values. We chose the optimal penalty parameter by minimizing BIC.

In the first stage of model fitting, we obtained optimal penalty parameters by fitting a penalized latent class model to the ordinal data. Conditional on these penalties, in a second stage we performed the same grid search for the penalty parameters for the binary variables, with the ordinal variables included in the model. The first-stage model fitted only to the ordinal data implied that a 3-class model was a better fit and that small penalties for features and details were optimal to stabilize estimation; that is, aFormula = aFormula = 0.0001 and aFormula = aFormula = 0.001. Holding these penalties constant, the penalized model fitted to both ordinal and binary variables yielded a similar result where the 3-class model was superior to the 2-class model (lowest 3-class BIC = 2605.9, df = 58.0 and lowest 2-class BIC = 2848.9, df = 97.0) and where the optimal penalty parameters for the binary variables were aFormula = aFormula = 0.0001 and aFormula = aFormula = 0.001. This result is supported by the case 2 simulations in the Supplementary Material (available at Biostatistics online) as the bias introduced by larger penalties becomes too severe when the effect of observed variables on classes differs.

The conditional probabilities of response levels 2 and 3 for the optimal 3-class model are illustrated in the first panel of Figure 1 and probabilities of response level 0 are illustrated in the second panel. The class prevalences are 0.26, 0.38, and 0.36 for classes labeled in the figures as 1, 2, and 3, respectively. Figure 1 shows that membership in the third class is evidenced by a higher conditional probability of observing a response of 2 or 3 than the other 2 classes, for several of the variables. The variables that lead to this distinction include the 5 variables in the malformed or abnormal blood vessels group: hemosiderin, thrombosis, clustered-small blood vessels, clustered-medium/large blood vessels, and hyalinized blood vessels, as well as the presence of cysts. A similar, but reversed pattern is seen in the second panel of Figure 1 where class 3 is least likely to exhibit none of those features, class 2 is more likely, and class 1 is most likely. The back-transformed {delta}m's (and standard errors as obtained from the formula in Section 4) which demonstrate the strength of association between observed and latent variables are, respectively, – 1.05(0.307), – 1.42(0.351), – 1.16(0.278), – 1.14(0.235), – 1.81(0.348), and – 1.36(0.266). It is notable that the results imply a class distinction that is at least partially driven by the 5 histological variables in the "malformed or abnormal blood vessels" feature group. Class 2 has a very low probability of response 2 or 3 for these variables (~0.07 or less for all variables), while these histological variables are not present at all in subjects belonging to class 1 (~0 probability for all variables).


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Conditional histological probabilities of response for ordinal variables among schwannoma patients—best fit with ridge penalty.

 
Overall, 66 out of 84 patients had posterior probabilities (2.4) greater than 0.80 and 73 out of 84 patients had posterior probabilities greater than 0.70; thus in general, most subjects belonged to a latent class with very high posterior probability. To determine how well the above-mentioned variables predict latent class assignment, it was natural to look at the posterior probabilities for those who had these present and for those who did not. For the 10 subjects who scored greater than 2 on 3 or more of the 6 symptoms that most distinguished latent class membership, their posterior probabilities of membership in class 3 ranged from 0.92 to 1.0, and 8 of these actually had a probability 1.00. Similarly, for the 15 subjects who did not have any of these features present, 9 had a posterior probability of 1.00 of belonging to latent class 1, 2 had probabilities 0.67 and 0.76, and 4 were more likely to be classified into the second latent class. Thus, there would be a low misclassification rate if one were to base classification on either the presence or the absence of these 6 variables.

Figure 2 shows the conditional probability of response for each of the binary variables. While low probabilities of bundled and intraneural axons were indicative of membership in class 1, there was no feature group that clearly distinguished latent classes.


Figure 2
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Conditional histological probabilities of response for binary variables among schwannoma patients—best fit with ridge penalty.

 
We checked the validity of this result by fitting the model with only the 16 ordinal variables and used these results to analyze the binary variables. A weighted univariate logistic regression was considered with the binary variable as the dependent variable, ordinal latent class as the independent variable, and using as weights the posterior probabilities of latent class membership obtained from the ordinal only model. All resulting logistic regression coefficients were insignificant at the 0.10 type I error rate, suggesting that these variables (peripheral, intratumoral, splayed, and bundled axons; nerve edema, inflammation, and intraneural axons) are not associated with the class structure determined by the 16 ordinal variables. Thus, it is not surprising that when all variables are included, the binary variables are generally unrelated to class structure.

Subjects were assigned to classes based on their highest posterior probabilities, and we cross-tabulated these with the clinical diagnosis of the 3 subsets of schwannoma in Table 2. The table indicates that the latent class methodology was best at distinguishing NF2 tumors from the other types but had difficulty discerning schwannomatosis from sporadic tumors. We note that 13 out of 21, or 62%, of those diagnosed with NF2-associated tumors were assigned to class 1; thus, subjects with NF2 had a high probability of exhibiting the blood vessel feature group (Figure 1). Coincidentally, 62% of those assigned to class 1 had NF2 tumors. Classes 2 and 3 were indistinguishable from each other with regard to their compositions of clinical diagnoses. Only 4 out of 32 class 2 and 4 out of 31 class 3 tumors were NF2. Formally, to quantify how well latent classes correlate with clinical subsets, we performed a weighted adjacent categories regression with imputed latent class as the outcome and diagnosis as the predictor. Included in the model were 3 rows for each tumor, corresponding to diagnosis, each weighted by that tumor's posterior probability of latent class membership. Results indicated that diagnosis is significantly associated with latent class (p = 0.002). While latent classes do not correlate perfectly with clinical diagnosis, the latent class methodology does provide histological profiles of schwannoma patients that offer an additional tool for the diagnosis of a disease that is difficult to diagnose.


View this table:
[in this window]
[in a new window]

 
Table 2. Cross-classification of latent class membership with clinical diagnosis

 

    7. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
We have shown that the introduction of a ridge penalty to the feature-based parameterization of class-specific response probabilities allows us to fit an ordinal latent class model to the high-dimensional schwannoma data, which cannot be fitted by unconstrained methods for ordinal or binary variables. In the context of ordinal latent class models, the ridge penalty results in a continuous variable selection procedure without the need to impose unrealistic or inflexible constraints on coefficients as employed by existing ordinal latent class models. We note that constraining all variables to be equally associated with latent class, which has been used by several researchers as the constraint of choice for high-dimensional data (Hoijtink, 1998Go; Meulders and others, 2002Go; Agresti and Lang, 1993Go), is analogous to applying a large penalty to the transformed {delta}m's with unpenalized intercepts. Agresti and Lang (1993)Go reported results of a constrained ordinal latent class model that was fitted to inter-rater agreement data where M = 7, C = 3, and J = 2. In that analysis, {delta}m's were constrained to be equal across all 7 binary variables (raters). Our penalized likelihood approach applied to their data achieved the same results by setting {lambda}Formula = 5 and leaving {alpha} unpenalized. The advantage of the penalized likelihood method, however, is that we can allow the data, rather than the researcher, to drive the level of smoothing necessary to obtain a fit. This way, strong assumptions regarding observed variables are not required to obtain an adequately fitting, parsimonious model for high-dimensional data.

In Section 3, we introduced a latent class score sj into 3.1. We used sj = j and considered testing its appropriateness by fitting a penalized latent class model with categorical latent class and ordered variables. This corresponds to unequal spacing between latent classes and would be the most relaxed score possible. We fitted the categorical latent class model for the dichotomized ordinal outcomes but could not achieve convergence for this model. A model which used ordinal outcomes and categorical latent classes would have many more parameters. We therefore decided, for both substantive and pragmatic reasons, to use a linear score.

We have addressed some challenges in choosing the smoothing parameter for the {delta}m coefficients obtained from the model described in Section 4, although some still remain. Latent class literature has addressed joint modeling of binary, nominal, ordinal, and continuous variables (e.g. Moustaki and Papageorgiou, 2005). In biomedical applications, qualitatively different ordinal and nominal variables may be measured. For example, for some schwannoma patients, nominal genotypic data may be available, and NF-2 tumors are thought to be associated with genotype. Our penalized method handles all types of categorical data and applies different penalties to binary versus ordinal data. The problem of how to choose appropriate penalties for nominal parameter estimates, as well as how to relate an ordinal latent class to nominal variables, remains for future investigation. Furthermore, it would be interesting to apply different penalties to parameters representing different response levels. For fitting models with J > 3 in which we allow a regression of fixed covariates on the latent class indicator (as in Bandeen-Roche and others, 1997Go), it might also be beneficial to consider a penalty on the resulting {eta}j or prevalence parameter. In our application, a sensitivity analysis showed that the choice of penalty parameter does not drastically affect inference or classification.

Our latent class modeling was a form of unsupervised clustering. Following estimation of the model, we evaluated the association between the derived estimated latent classes and clinical diagnosis. Alternatively, a supervised clustering approach would be of interest when the goal is to use the variables to improve prognosis. We could incorporate the clinical outcome into the latent class model along the lines of Larsen (2004). In this setting, latent class and clinical endpoint conditional on latent class would be jointly modeled. We could accommodate the high dimensionality with a penalty on the parameters resulting from both the latent class part of the model and the survival part of the model. To model the survival outcome, we could consider a Cox model or even a general class of transformation models (Cheng and others 1995).

Inferences from Table 2 hinge on the idea that the "true" diagnosis of these patients is known and that there is no misclassification by physicians. We thus remark that the clinical criteria used for classification into the 3 clinical subsets were very strict. The NF2 cases satisfied the strictest criteria for clinical diagnosis of NF2 (there are currently 4 different criteria sets and the most strict ones were used). Similarly, schwannomatosis and sporadic cases satisfied the most strict criteria for clinical diagnosis. They were identified using thin MRI, which is the most sensitive method available. While there is a remote possibility that a patient was misclassified, the techniques and strict criteria used are the best classification based on the available clinical methods to date.

Overall, we have demonstrated that penalized latent class models are effective for classification based on a large number of outcomes and a small to moderate number of subjects, even though they may not correlate well with clinical diagnosis. They are appealing to subject matter investigators for their clear interpretations. Finally, as they are model based, they allow for formal comparisons and inference.


    FUNDING
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 
The National Institutes of Health (CA075971 [GenBank] ); the Schwannomatosis Award by the Children's Tumor Foundation; the National Institutes of Health/National Institute of Neurological Disorders and Stroke (P01NS24279-18).


    ACKNOWLEDGMENTS
 
Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. LATENT CLASS MODEL...
 3. LATENT CLASS MODEL...
 4. PENALIZED ORDINAL LATENT...
 5. ORTHOGONAL...
 6. APPLICATION
 7. DISCUSSION
 FUNDING
 REFERENCES
 

    Agresti A, Lang J. Quasi-symmetric latent class models, with application to rater agreement. Biometrics (1993) 49:131–139.[CrossRef][Web of Science][Medline]

    Bandeen-Roche K, Miglioretti DL, Zeger SL, Rathouz PJ. Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association (1997) 92:1375–1386.[CrossRef][Web of Science]

    Bartholomew DJ, Knott M. Latent Variable Models and Factor Analysis (1999) London: Arnold.

    Baser ME, Friedman JM, Evans DGR. Increasing the specificity of diagnostic criteria for schwannomatosis. Neurology (2006) 66:730–732.[Abstract/Free Full Text]

    Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods Research (2004) 33:261–304.[Abstract]

    Cheng SC, Wei LJ, Ying Z. Analysis of transformation models with censored data. Biometrika (1995) 82:835–845.[Abstract/Free Full Text]

    Croon M. Latent class analysis with ordered latent classes. British Journal of Mathematical and Statistical Psychology (1990) 43:171–192.

    Dayton CM, Macready GB. Concomitant-variable latent-class models. Journal of the American Statistical Association (1988) 83:173–178.[CrossRef][Web of Science]

    Dempster CM, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society, Series B (1977) 39:1–38.

    Goodman LA. Exploratory latent structure-analysis using both identifiable and unidentifiable models. Biometrika (1974) 61:215–231.[Abstract/Free Full Text]

    Hoijtink H. Constrained latent class analysis using the Gibbs sampler and posterior predictive p-values: applications to educational testing. Statistica Sinica (1998) 8:691–711.[Web of Science]

    Houseman EA, Coull BA, Betensky RA. Feature-specific constrained latent class analysis for genomic data. Biometrics (2006) 62:1062–1070.[CrossRef][Web of Science][Medline]

    Huang GH, Bandeen-Roche K. Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika (2004) 69:5–32.[CrossRef][Web of Science]

    Larsen K. Joint analysis of time-to-event and multiple binary indicators of latent classes. Biometrics (2004) 60:85–92.[CrossRef][Web of Science][Medline]

    Lazarsfeld PF. Measurement and Prediction (1950) Princeton, NJ: Princeton University Press.

    Lazarsfeld PF, Henry NW. Latent Structure Analysis (1968) New York: Houghton-Mifflin.

    Lin TH, Dayton CM. Model selection information criteria for non-nested latent class models. Journal of Educational and Behavioral Statistics (1997) 22:249–264.[Abstract/Free Full Text]

    MacCollin M, Chiocca EA, Evans DG, Friedman JM, Horvitz R, Jaramillo D, Lev M, Mautner VF, Niimura M, Plotkin SR, et al. Diagnostic criteria for schwannomatosis. Neurology (2005) 64:1838–1845.[Abstract/Free Full Text]

    McHugh TB. Efficient estimation and local identification in latent class analysis. Psychometrika (1956) 21:331–347.[CrossRef][Web of Science]

    Meulders M, Boeck PD, Kuppens P, Mechelen IV. Constrained latent class analysis of three-way three-mode data. Journal of Classification (2002) 19:277–302.[CrossRef][Web of Science]

    Mooijaart A, Van Der Heijden PGM. The EM algorithm for latent class analysis with equality constraints. Psychometrika (1992) 52:261–269.

    Moustaki I, Papageorgiou I. Latent class models for mixed variables with applications in Archaeometry. Computational Statistics and Data Analysis (2005) 48:659–675.[CrossRef]

    Poliani PL, Desantis S, Betensky RA, Hurwitz E, Nutt C, MacCollin M, Stemmer-Rachamimov AO. Clinicopathological correlates of schwannomas: defining the pathological features of sporadic, nf2-associated and schwannomatosis-associated schwannomas. Journal of Neuropathology and Experimental Neurology (submitted) (2006).

    Received July 10, 2006; revised December 18, 2006; revised April 16, 2007; accepted for publication May 9, 2007.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow Supplementary Material
    Right arrow All Versions of this Article:
    9/2/249    most recent
    kxm026v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Desantis, S. M.
    Right arrow Articles by Betensky, R. A.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Desantis, S. M.
    Right arrow Articles by Betensky, R. A.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?