Skip Navigation


Biostatistics Advance Access first published online on June 1, 2006
This version published online on March 5, 2007

Biostatistics, doi:10.1093/biostatistics/kxl006
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
8/2/252    most recent
kxl006v2
kxl006v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Kang, M.
Right arrow Articles by Lagakos, S. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kang, M.
Right arrow Articles by Lagakos, S. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Statistical methods for panel data from a semi-Markov process, with application to HPV

Minhee Kang* and Stephen W. Lagakos

Department of Biostatistics, Harvard University School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA mkang{at}hsph.harvard.edu

* To whom correspondence should be addressed.


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCE FOR A...
 3. A THREE-STATE SEMI-MARKOV...
 4. ILLUSTRATIONS
 5. DISCUSSION
 REFERENCES
 
Continuous-time, multistate processes can be used to represent a variety of biological processes in the public health sciences; yet the analysis of such processes is complex when they are observed only at a limited number of time points. Inference methods for such panel data have been developed for time homogeneous Markov models, but there has been little research done for other classes of processes. We develop likelihood-based methods for panel data from a semi-Markov process, where transition intensities depend on the duration of time in the current state. The proposed methods account for possible misclassification of states. To illustrate the methods, we investigate a three- and a four-state models in detail and apply the results to model the natural history of oncogenic genital human papillomavirus infections in women.

Keywords: Human papillomavirus; Misclassification; Multistate process; Natural history


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCE FOR A...
 3. A THREE-STATE SEMI-MARKOV...
 4. ILLUSTRATIONS
 5. DISCUSSION
 REFERENCES
 
Continuous-time, multistate stochastic processes provide a useful framework for many studies of event history data (Commenges, 1999Go, 2002Go; Hougaard, 1999Go; Andersen and Keiding, 2002Go). Most research in continuous-time, discrete-state processes has been probabilistic, and inference about processes using independent realizations from a group of individuals has been based almost entirely on settings where sample paths are continuously observed (cf. Andersen and Borgan, 1985Go).

However, in many instances, observations consist of the states of the individual processes at discrete time points, with no information about the types and times of events between observation times. For example, when state transitions of a process are silent events—such as the onset of an early stage of a disease before symptoms—the sample paths are observed infrequently, often resulting from diagnostic tests given during patient visits to their caregivers. Inference methods for such panel data from multistate processes have been limited. Kalbfleisch and Lawless (1985)Go proposed methods for the analysis of panel data for Markov models with time homogeneous transition intensities; Kay (1986)Go, Andersen (1988)Go, and Gentleman and others (1994)Go have applied these methods to cancer, diabetes, and HIV. Inference based on Markov models in such settings is greatly simplified, because the discrete-time process observed at prespecified time points forms a Markov process.

In many applications, however, the Markov assumption is not appropriate because the transition intensities depend on the elapsed time in the current state. For instance, in modeling the natural history of human papillomavirus (HPV), which is known to cause almost all cervical cancers, the Markov assumption would not account for the strong association between infection duration and progression to cervical abnormality (Stoler, 2000Go). Sexually acquired genital HPV infections in women are often transient and recurring, and are usually resolved by the host prior to the onset of symptoms. However, epidemiology studies show that when an infection by certain HPV types persists for a long period of time, it can eventually lead to clinical conditions such as high-grade cervical intraepithelial neoplasia (CIN 2 and 3) and, ultimately, cervical cancer (Stoler, 2000Go). When the underlying process is not a time homogeneous Markov process, the observed discrete-time process will in general have a complex structure. A further complication that can arise is that the observations of the states of the process are subject to misclassification.

Motivated by the HPV studies, this paper considers inference for continuous-time semi-Markov processes in settings where sample paths of individuals are observed only at a finite number of prespecified times, possibly with misclassification errors in the observed states. We show that evaluation of likelihood functions can be greatly simplified when the transition intensity from at least one of the states of the underlying process is time homogeneous. Section 2.1 introduces the underlying and the observed processes and outlines the general approach to likelihood methods. Section 3 presents the likelihood contributions in detail for a nonprogressive three-state process. Section 4 applies the proposed methods to a recent HPV trial and a simulated study.


    2. INFERENCE FOR A K-STATE SEMI-MARKOV PROCESS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCE FOR A...
 3. A THREE-STATE SEMI-MARKOV...
 4. ILLUSTRATIONS
 5. DISCUSSION
 REFERENCES
 

2.1 Underlying process and observed data

Suppose that X(·) = {X(t),t ≥ 0} denotes a continuous-time process with K states, denoted 1,...,K. Let the random variables {sigma}0,{sigma}1,... denote the initial and subsequent consecutive states occupied by the process, and let {tau}n represent the sojourn time between the (n – 1)th and nth states, for n = 1,... . Thus, X(·) is equivalent to


Formula

The process is semi-Markov if the sequence {{sigma}0,{sigma}1,...} of consecutively occupied states forms a simple Markov chain, and the sojourn times {tau}n between consecutively occupied states are independent random variables with distributions that depend only on the adjoining states (cf. Cox and Miller, 1977Go). The probabilistic properties of a semi-Markov process can be characterized by the transition intensities, or cause-specific hazard functions, among states. Suppose that the process enters state i at its nth transition. Then the transition intensity functions out of state i, say {lambda}ij(·), are


Formula

where t denotes the elapsed time from entrance into state i, for i,j = 1,...,K, i!=j, and n = 1,2,... . Such dependence on the duration (t) in the current state ({sigma}n) makes semi-Markov processes distinct from Markov processes.

The probabilistic properties of semi-Markov processes have been studied extensively (Cox and Miller, 1977Go), and inferences for settings where sample paths are continuously observed or right censored can be made by extensions of methods for ordinary failure time data (cf. Lagakos and others, 1978Go). Inferences for unidirectional, or progressive, processes that lead to interval-censored data have also been developed (Sternberg and Satten, 1999Go). However, we consider settings where the independent realizations of the process X(·) corresponding to different subjects are observed only at a finite number of prespecified time points and the process may be bidirectional (nonprogressive). Consider the values of X(·) at the fixed times 0 = v0 < v1 < v2 < ··· < vM, and let X = (X0,X1,...,XM), where Xm = X(vm). Despite the simple probabilistic form of the underlying semi-Markov process X(·), the joint distribution of X0,X1,...,XM is in general complex because the time points v0,v1,...,vM do not correspond to transition times between states of the process. If the process is not progressive, the states visited in the sample path are not determined by the knowledge of the states occupied at visit times.

Inferences about the parameters {lambda}ij(·) are further complicated by misclassification errors. Rather than X, suppose that the observation for a subject is given by Y = (Y0,Y1,...,YM), where Ymisin{1,2,...,K} denotes Xm subject to misclassification error. We assume that the misclassification error probabilities satisfy the conditional independence assumption


Formula (2.1)

That is, conditional upon the true values of X(·) at the visit times, the distribution of Ym depends only on the value of X(·) at vm. We denote these error probabilities by {alpha}ik = P(Ym = k|Xm = i).

2.2 Likelihood function

The likelihood contribution for an individual can be written as


Formula (2.2)

where xm,ymisin{1,...,K}, cxy = prodFormula {alpha}xm,ym, and the summation is over all possible state sequences.

If the transition intensities from at least one of the K states can be assumed to be time homogeneous—that is, {lambda}ij(t) = {lambda}ij for j = 1,...,K if and only if iisinC, where C is a nonempty subset of {1,...,K}—then P(X = x) in the evaluation of (2.2) is nicely simplified. A consequence of this assumption is that for any times 0 ≤ t0 < t1 < ··· (whether or not these correspond to visit times) and any xmisinC for m = 0,1,...,


Formula (2.3)

where {varphi}(tm) denotes the time at which the process last entered state xm prior to tm. The semi-Markov nature of X(·), which ensures that the future of the process for times greater than {varphi}(tm) depends on the history of the process only through {varphi}(tm) and xm = X({varphi}(tm)), combined with the memoryless property of sojourn times in state xm yields this stationarity property of the process at times when the process is in a state belonging to C. To illustrate how (2.3) can greatly simplify the evaluation of the finite-dimensional distributions of X(·), suppose that C = {1}. Then, for example,


Formula

Thus, instead of having to compute the joint probability of the 8-dimensional vector [X(t0),...,X(t7)], the result in (2.3) reduces this to computing the distribution of one 3-dimensional vector and two 2-dimensional vectors. More generally, if we define,


Formula

then for any positive integer L,


Formula (2.4)

By taking L = M and tm = vm, the result in (2.4) simplifies the calculation of the likelihood contribution in (2.2) by allowing P(X = x) to be expressed as a product of one-step probabilities of the form


Formula (2.5)

and of multistep probabilities of the form


Formula (2.6)

where 0 < {Delta}1 < {Delta}2 < ··· correspond to differences between the original visit times v0,...,vM.

Calculations of the conditional probabilities in (2.5) and (2.6) can be difficult because of the unknown transitions in the underlying sample paths. However, they also can be simplified by applying the result in (2.4) to the probabilities corresponding to the potential underlying sample paths that yield the observed states in (2.5) and (2.6).


    3. A THREE-STATE SEMI-MARKOV MODEL
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCE FOR A...
 3. A THREE-STATE SEMI-MARKOV...
 4. ILLUSTRATIONS
 5. DISCUSSION
 REFERENCES
 
Consider a three-state semi-Markov model (Figure 1) where states 1 and 2 are transient while state 3 is absorbing and can be entered from either of the other states. As an example, suppose states 1, 2, and 3 represent the infection-free, currently infected, and clinical disease statuses, respectively. Then a subject who is initially uninfected can become infected for a period of time, after which the infection resolves to the infection-free status (1–2–1), or the infection leads to clinical disease (1–2–3). Alternatively, the subject could develop clinical disease from the infection-free state (1–3). For someone beginning in state 1, every sample path will consist of k visits (k ≥ 0) to state 2, followed by a visit to state 1 or 3 (if k ≥ 1).


Figure 1
View larger version (4K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. A three-state semi-Markov process. States 1 and 2 are transient and recurrent and the absorbing state 3 can be entered from either state.

 
Suppose that individuals begin in state 1 and C = {1}, so that {lambda}1j(·) = {lambda}1j for j = 2,3. The remaining transition intensities, {lambda}2j(·) for j = 1,3, are arbitrary. Let fij(·) denote the subdensity function corresponding to {lambda}ij(·); that is,


Formula

for t ≥ 0, where {Lambda}i(t) = {sum}Formula{int}Formula{lambda}ij(u)du, i!=j. The subdistribution function corresponding to fij(·) is denoted Fij(·), where Fij(t) = {int}Formulafij(u)du.

From (2.3)(2.6), calculation of the likelihood contribution of a subject requires consideration of at most the conditional probabilities,


Formula (3.1)

for j = 1,2,3, where {Delta}i > 0 and 0 < {Delta}1 < {Delta}2 < ··· denote the distinct values of vjvi and vi < vj are the visit times. When visit times are equally spaced, say every {Delta} time units, these conditional probabilities simplify to


Formula

for j = 1,2,3 and m = 1,2,... .

Note that for each observed path, there may be infinitely many underlying sample paths. For example, for the observed sequence Formula , the underlying sample path of the process can be of the form


Formula

and so on, where the unboxed states denote the unobservable state changes of the process between visit times. We now consider the probability elements for the one- and multistep conditional probabilities in (3.1) in detail.

3.1 Conditional probabilities

Let P1j(k,t) denote the conditional probability that the process is in state j at time t0 + t after k visits to state 2 in the interval (t0,t0 + t), given that the process is in state 1 at time t0, for j = 1,2,3 and k = 0,1,... . Due to stationarity in (2.3), we may set t0 = 0. Note that P12(k,t) = 0 for k = 0, since there must be at least one visit to state 2. The case of j = 1 for k = 0,1,2 is depicted in Figure 2, where u represents the unknown time of transition from state 1 to 2 and z denotes that from state 2 to 1. In this notation,


Figure 2
View larger version (4K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Depiction of sample paths for P11(k,t), k = 1,2,3. The boxed states represent the states occupied by the process at the visit time, the dots between the states indicate that the process has remained in the same state since the previous transition, and the arrows indicate the unobservable transitions. The asterisk next to 1 indicates the path segment that ends at transition to state 1.

 

Formula (3.2)

The sum will be finite if sojourn times from state 2 have a guarantee time; that is, support bounded away from zero. For example, if {lambda}2j(t) = 0 for 0 ≤ t ≤ G, then the upper limit in the sum is the greatest integer less than or equal to t/G.

For the calculation of P11(k,t), we have P11(0,t) = exp{ – {Lambda}1(t)} and


Formula

For k > 1, the probability of k transitions to state 2 is given by a convolution of probability functions PFormula(1,x) and P11(k – 1,tx), where


Formula

differs from P11(k,x) in that its corresponding underlying process ends at the transition time (marked by state 1* in Figure 2). Thus,


Formula

We can extend the definition of PFormula(1,x) to PFormula(k1,x), and generalize the convolution function to


Formula

where k1 + k2 = k, for k1,k2 ≥ 1.

Next, let P13(k,t) denote the probabilities contributing to the calculation of P[X(t) = 3|X(0) = 1]. Since state 3 can be reached from state 1 or state 2, P13(k,t) can be expressed as the sum of PFormula(k,t), which denotes that state 1 immediately precedes state 3, and PFormula(k,t), which denotes the other possibility that transition to state 3 was made from state 2. Then (see Appendix as supplementary material available at Biostatistics online)


Formula

Having computed P[X(t0 + t) = 1|X(t0) = 1] and P[X(t0 + t) = 3|X(t0) = 1], P[X(t0 + t) = 2|X(t0) = 1] is obtained as the complement of their sum.

The ideas in one-step conditional probability calculations can easily be extended to obtain the multistep conditional probabilities in (2.6). For example, consider P[X(t2) = 1,X(t1) = 2|X(0) = 1]. Define P121(k1,k2,t1,t2) to be the conditional probability that X(t1) = 2 and X(t2) = 1, with k1 visits to state 2 in (0,t1] and k2 visits to state 2 in (t1,t2], given that X(0) = 1. Note that P121(k1,k2,t1,t2) differs from P11(k1 + k2,t) in that state 2 is known to be occupied in the process at time t1 in P121(k1,k2,t1,t2). To illustrate, a sample path corresponding to P121(1,0,t1,t2) is depicted in Figure 3. Details are presented in the Appendix as supplementary material available at Biostatistics online.


Figure 3
View larger version (2K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Depiction of a sample path for P121(1,0,t1,t2). The sample path observed to be in states 1, 2, and 1 at visit times 0, t1, and t2, respectively, with one transition to state 2 in the visit interval (0,t1] is shown.

 
3.2 Estimation and model assessment

The expressions developed above can be combined to obtain an expression for P(X = x) and then the likelihood contribution for an individual from (2.2). The overall likelihood is obtained as the product of the contributions of individual subjects and will be a function of {{lambda}12,{lambda}13,{lambda}21(·),{lambda}23(·)}. If parametric forms are assumed for {lambda}21(·) and {lambda}23(·), the likelihood function will depend on a finite-dimensional parameter vector and standard numerical methods can be used to obtain the maximum likelihood estimator. Under the standard regularity conditions, the maximum likelihood estimators will be consistent and asymptotically normal.

To examine the adequacy of model fit, the visit patterns can be partitioned into several categories, and the observed frequencies of the categories can be compared with the estimated model-based frequencies. An overall {chi}2 statistic would then have an approximate chi-square distribution with the degrees of freedom equal to the number of independent cells minus the number of model parameters.


    4. ILLUSTRATIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCE FOR A...
 3. A THREE-STATE SEMI-MARKOV...
 4. ILLUSTRATIONS
 5. DISCUSSION
 REFERENCES
 

4.1 HPV application

Data, model assumptions, and methods.

We illustrate the proposed methods with the placebo data from a completed pilot clinical trial of a candidate vaccine for HPV type 16 (Koutsky and others, 2002Go). There were six scheduled visits at baseline (day 0), months 7, 12, 18, 24, and 30, at which times the presence or absence of HPV type 16 (denoted HPV16) and CIN 2/3 (denoted CIN) was assessed. We considered the 699 placebo subjects, initially HPV 16 negative and free of CIN 2/3, who provided the data on the first five (388) or full six visits (311). To facilitate the computations, we treated month 7 visit as if it occurred at month 6, so that all visits would be equally spaced. The setting can be described by the four-state process depicted in Figure 4, where state 3 (state 4) corresponds to diagnosis of CIN following a scheduled visit where HPV16 was not (was) detected. This model can be viewed as an extension of the three-state model in Figure 1 in which state 3 is divided into two states based on the presence of HPV16 when CIN was diagnosed. In practice, these two CIN states would be interpreted as having been caused by HPV16 (state 4) or another type of HPV (state 3). Overall, 10 subjects were diagnosed with CIN during the study period, five of whom were HPV16 positive at the time of diagnosis. The majority of subjects were observed to be in state 1 at all six visit times (260/311) or at all five times (344/388).


Figure 4
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. A four-state semi-Markov process. A process describing the natural history of HPV16 is presented.

 
We considered C = {1} and modeled the transition intensities for state 2 as step functions, incorporating the biological minimum time required for clearance of infection or progression to disease; that is,


Formula

for j = 1,4. Hence, {lambda}2j(t) is dependent on time due to the corresponding guarantee time, G2j. We considered guarantee times of G21 = 5 and G24 = 6 (in months) for the HPV study. We assumed that the diagnostic tests for HPV and CIN have perfect sensitivity but may have imperfect specificity, and reestimated model parameters assuming several assumed specificities.

Results.

Estimates of {lambda} assuming various misclassification errors are presented in Table 1. The maximized likelihood decreases sharply as either specificity declines below one and indicates that the assumption of no misclassification errors provides the best model fit. The very small estimates of {lambda}13 in the presence of imperfect CIN diagnostic test would suggest that the observed prevalence of women in state 3 is almost fully explainable by misclassification, implying that HPV infections of types other than 16 do not lead to CIN 2/3 during the 30-month period of follow-up. However, this would be inconsistent with the literature which suggests that various HPV types are responsible for CIN outcomes. (cf. Liaw and others, 1999Go). Therefore, the model assuming no misclassification errors seems more reasonable for the data.


View this table:
[in this window]
[in a new window]

 
Table 1. Estimates of {lambda} under various specificities. The specificity for HPV detection is denoted by P(PCR – | HPV –), where HPV detection is assessed by a polymerase chain reaction (PCR) assay, and the specificity for CIN detection is denoted by P(Dx – |CIN –) to indicate correct negative diagnosis (Dx –)

 
We found the inverse sample information to be numerically unstable and relied on the bootstrap method using 100 bootstrapped samples. The model assuming known G21 = 5, G24 = 6, and no misclassification errors gives the following estimates of {lambda} and standard errors, based on four unknown parameters in the model: Formula, Formula, Formula, and Formula. Following the guarantee times, the estimated risk of clearance is about 28 times higher than the risk of progression to CIN after 6 months. The conditional probabilities of HPV16 infection (entering state 2) and non-HPV16-related CIN development (entering state 3) are 0.944 and 0.056, given that a woman leaves the uninfected state (state 1). Once a woman is infected with HPV16, the conditional probabilities of clearing the HPV16 infection, P({sigma}n + 1 = 1|{sigma}n = 2), and progressing to CIN, P({sigma}n + 1 = 4|{sigma}n = 2), are 0.969 and 0.031, respectively. The mean times to clearance and progression to HPV, conditional on the next state, are about 16 and 17 months, respectively, consistent with commonly used definitions of "persistent infection" (Ho and others, 1998Go; Koutsky and others, 2002Go; Moscicki and others, 1998Go). The estimated cumulative probability that a woman develops CIN based on the model estimates (Figure 5) shows a steady rise over time, with about one-third of the outcomes attributed to HPV16, also consistent with the epidemiological literature (Liaw and others, 1999Go; Herrero and others, 2000Go). HPV16 prevalence (Figure 5) stabilizes to 7–8% after about 3 years.


Figure 5
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Model-based estimation of cumulative incidence of CIN 2/3 and prevalence of HPV16. The dotted, solid, and dashed lines represent the prevalence of HPV16, the cumulative probability of overall CIN 2/3, and the cumulative probability of CIN 2/3 caused by HPV16, respectively. The step functions portray the empirical estimates from the 30-month data, and the smooth curves represent the model-based estimates over a period of 10 years. The vertical lines are the pointwise 95% confidence intervals for the predicted probabilities.

 
Model fit (Table 2) was assessed by comparing the observed frequencies with the expected frequencies of the selected visit patterns. The chosen model (Model 1) appears to fit the data adequately (p = 0.65, 8d.f.), while the goodness-of-fit results for the same model but allowing a specificity of 0.99 for the HPV16 assay (Model 2) indicate poorer fit (p = 0.04).


View this table:
[in this window]
[in a new window]

 
Table 2. Model fit. The dots refer to any remaining visit patterns

 
Fitting the standard Markov model (Kalbfleisch and Lawless, 1985Go) yielded constant transition intensities (denoted by {gamma}ij) of Formula(0.00067), Formula(0.00013), Formula(0.0089), and Formula(0.0022). The estimated transition intensities from state 1 were very similar to those from the semi-Markov methods. The resulting cumulative incidence and prevalence curves were also similar to those from the semi-Markov model over the period of observation. The corresponding goodness-of-fit {chi}2 statistic was 6.93(p = 0.54) for the Markov model, somewhat worse than that of Model 1.

4.2 A simulated data example

To further illustrate the value of the methods, data were simulated from a four-state semi-Markov process of Figure 4 with C = {1} and {lambda}2j(t) = a2jb2j(tG2j)b2j – 1, j = 1,4 (Weibull hazard function). For the simulation, {lambda}12 = 0.025, {lambda}13 = 0.002, a21 = 0.2, a24 = 0.001, b21 = 0.5, b24 = 2, G21 = 4, G24 = 5, visits are every 6 months for 30 months and N = 5000, a sample size typical of a moderately sized Phase III vaccine trial. The simulated data produced 231 and 233 observations in states 3 and 4, respectively, by the end of the observation period, with 2252 subjects observed to be in state 1 at each visit, and yielded Formula, Formula, Formula, Formula, Formula, and Formula, assuming that G21 and G24 are known. Figure 6(a) shows that the estimated cumulative CIN incidence and HPV prevalence are very close to the true probabilities. However, when a Markov model is incorrectly assumed, the biases in such estimates increase with time (Figure 6(b)).


Figure 6
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. (a) Semi-Markov and (b) Markov model fits for the simulated data set. The solid symbols represent the true curves and the unfilled symbols the estimates.

 

    5. DISCUSSION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCE FOR A...
 3. A THREE-STATE SEMI-MARKOV...
 4. ILLUSTRATIONS
 5. DISCUSSION
 REFERENCES
 
Panel data pose challenges for semi-Markov processes because missing information about an individual's process between the observations can contribute to the probability of being in the current state. Hence, despite the wider applicability of semi-Markov processes compared to the Markov processes, research in this area has been slim. We showed that when the transition intensities from at least one of the states of the process are time homogeneous, the expression for the joint probability in the likelihood function is tractable. In our data applications, we considered a process that begins in a state in C, so that it is not necessary to take account of the duration of time that individuals have been in this state prior to the start of the study. If X(0){notin}C, the proposed methods would still apply if either the subjects had just entered this state or the durations of time in this state prior to entrance into the study were known. Otherwise, the methods would need to be modified to account for the duration of time in the initial state prior to the start of the study. Methods similar to those in the work by Satten and Sternberg (1999)Go might be useful for this purpose.

In the HPV data example, the data came from a proof-of-principle study, thus smaller than a Phase III vaccine trial, and the number of CIN 2/3 events was small (10 events). Several Phase III HPV vaccine trials are currently underway, and are approximately five times the size of the pilot trial, and will be more suitable for the methods developed here, as the simulated data example illustrates. Without data on sexual activities and HPV types other than type 16, we assumed C = {1}, and exponential distributions with guarantee times were chosen for state 2 for model simplicity. Whereas the guarantee times limited the number of potential paths in our example, one can also limit the number of transitions that can occur in a given time interval when enumerating the potential paths. The clinical estimates for guarantee times were not clear, but we found that varying these times (from 4 to 8 months) did not change the likelihood parameter estimates substantially in our data.

The methods in the paper can be extended to allow fixed covariates in the model, for instance, to compare two treatments in a clinical trial. A one-sample model can be fit separately for each distinct covariate vector, and the overall likelihood is obtained by multiplying the likelihoods from the homogeneous groups.


    ACKNOWLEDGMENTS
 
We are grateful to Joseph Heyse and Lisa Chiacchierini for their comments and to Merck Research Laboratories for providing the data used in Section 4. This research was supported by the Howard Hughes Medical Institute as part of the first author's doctoral thesis and by Grants U01 AI38855 (Kang) and AI24643 (Lagakos) from the National Institute of Allergy and Infectious Diseases. Conflicts of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. INFERENCE FOR A...
 3. A THREE-STATE SEMI-MARKOV...
 4. ILLUSTRATIONS
 5. DISCUSSION
 REFERENCES
 

    Andersen P. (1988) Multistate models in survival analysis: a study of nephropathy and mortality in diabetes. Statistics in Medicine 7:661–70.[Web of Science][Medline]

    Andersen P and Borgan O. (1985) Counting process models for life history data: a review. Scandinavian Journal of Statistics 12:155–8.

    Andersen P and Keiding N. (2002) Multi-state models for event history analysis. Statistical Methods in Medical Research 11:91–115.[Abstract/Free Full Text]

    Commenges D. (1999) Multi-state models in epidemiology. Lifetime Data Analysis 5:315–27.[CrossRef][Web of Science][Medline]

    Commenges D. (2002) Inference for multi-state models from interval-censored data. Statistical Methods in Medical Research 11:167–82.[Abstract/Free Full Text]

    Cox D and Miller H. (1977) The Theory of Stochastic Processes(Chapman and Hall, London).

    Gentleman R, Lawless J, Lindsey J, Yan P. (1994) Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Statistics in Medicine 13:805–21.[Web of Science][Medline]

    Herrero R, Hildesheim A, Bratti C, Sherman ME, Hutchinson M, Morales J, Balmaceda I, Greenberg M, Alfaro M, Burk R. (2000) Population-based study of human papillomavirus infection and cervical neoplasia in rural Costa Rica. Journal of the National Cancer Institute 92:464–74.[Abstract/Free Full Text]

    Ho GYF, Bierman R, Beardsley L, Chang CJ, Burk RD. (1998) Natural history of cervicovaginal papillomavirus infection in young women. The New England Journal of Medicine 338:423–8.[Abstract/Free Full Text]

    Hougaard P. (1999) Multi-state models: a review. Lifetime Data Analysis 5:239–64.[CrossRef][Web of Science][Medline]

    Kalbfleisch J and Lawless J. (1985) The analysis of panel data under a Markov assumption. Journal of the American Statistical Association 80:863–71.

    Kay R. (1986) A Markov model for analysing cancer markers and disease states in survival studies. Biometrics 42:855–65.[CrossRef][Web of Science][Medline]

    Koutsky L, Ault K, Wheeler C, Brown D, Barr E, Alvarez F, Chiacchierini L, Jansen K. (2002) A controlled trial of a human papillomavirus type 16 vaccine. The New England Journal of Medicine 347:1645–51.[Abstract/Free Full Text]

    Lagakos S, Sommer C, Zelen M. (1978) Semi-Markov models for partially censored data. Biometrika 65:311–8.[Abstract/Free Full Text]

    Liaw KL, Glass AG, Manos MM, Greer CE, Scott DR, Sherman M, Burk RD, Kurman RJ, Wacholder S, Rush BB. (1999) Detection of human papillomavirus DNA in cytologically normal women and subsequent cervical squamous intraepithelial lesions. Journal of the National Cancer Institute 91:954–60.[Abstract/Free Full Text]

    Moscicki AB, Shiboski S, Broering J, Powell K, Clayton L, Jay N, Darragh TM, Brescia R, Kanowitz S, Miller SB. (1998) The natural history of human papillomavirus infection as measured by repeated DNA testing in adolescent and young women. The Journal of Pediatrics 132:277–84.[CrossRef][Web of Science][Medline]

    Satten G and Sternberg M. (1999) Fitting semi-Markov models to interval-censored data with unknown initiation times. Biometrics 55:507–13.[CrossRef][Web of Science][Medline]

    Sternberg M and Satten G. (1999) Discrete-time nonparametric estimation for semi-Markov models of chain-of-events data subject to interval censoring and truncation. Biometrics 55:514–22.[CrossRef][Web of Science][Medline]

    Stoler M. (2000) Human papillomaviruses and cervical neoplasia: a model for carcinogenesis. International Journal of Gynecological Pathology 19:16–28.[Web of Science][Medline]

    Received October 9, 2005; revised May 18, 2006; accepted for publication May 31, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow Supplementary Data
    Right arrow All Versions of this Article:
    8/2/252    most recent
    kxl006v2
    kxl006v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Kang, M.
    Right arrow Articles by Lagakos, S. W.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Kang, M.
    Right arrow Articles by Lagakos, S. W.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?