Skip Navigation


Biostatistics Advance Access originally published online on February 16, 2006
Biostatistics 2006 7(4):530-550; doi:10.1093/biostatistics/kxj024
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/4/530    most recent
kxj024v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Pedroza, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pedroza, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

A Bayesian forecasting model: predicting U.S. male mortality

Claudia Pedroza

Division of Biostatistics, University of Texas School of Public Health at Houston, 1200 Herman Pressler Drive, E831, Houston, TX 77030, USA claudia.pedroza{at}uth.tmc.edu


    SUMMARY
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 
This article presents a Bayesian approach to forecast mortality rates. This approach formalizes the Lee–Carter method as a statistical model accounting for all sources of variability. Markov chain Monte Carlo methods are used to fit the model and to sample from the posterior predictive distribution. This paper also shows how multiple imputations can be readily incorporated into the model to handle missing data and presents some possible extensions to the model. The methodology is applied to U.S. male mortality data. Mortality rate forecasts are formed for the period 1990–1999 based on data from 1959–1989. These forecasts are compared to the actual observed values. Results from the forecasts show the Bayesian prediction intervals to be appropriately wider than those obtained from the Lee–Carter method, correctly incorporating all known sources of variability. An extension to the model is also presented and the resulting forecast variability appears better suited to the observed data.

Keywords: Bayesian prediction; Lee–Carter method; Missing data; Mortality forecasting; State-space model


    1. INTRODUCTION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Government agencies use mortality forecasts when planning and developing health policy. These agencies rely on mortality predictions in deciding the allocation of funds for government services. In the past, researchers have based many of these projections on life table methods and expert opinion. Recent interest centers on improving mortality forecasts, driven in part by an increasingly elderly population's effect on government policy (such as the U.S. Social Security). Private industries, such as insurance companies, also rely on these forecasts to plan for future programs. Thus, it is in the interest of public health to produce improved mortality forecasts.

Alho and Spencer (1990)Go among others argue that measures of uncertainty of mortality forecasts are also needed. Since a majority of the methods used to forecast were not probability based, calls for measures of uncertainty have gone unheeded. This has made the task of creating prediction intervals an almost impossible endeavor, leading to less informative procedures, e.g. reporting of a point forecast and a high and a low forecast. However, the high and low forecasts are worst and best case scenarios, not intervals expressing probabilistic uncertainty of the point forecast.

Over the past two decades, various methods have been developed to improve mortality forecasts. This paper focuses on an increasingly popular method of forecasting introduced by Lee and Carter (1992)Go to model and forecast U.S. mortality. Widely used, their method has undergone various extensions and modifications (Carter and Lee, 1992Go, Lee and Tuljapurkar, 1994Go, Wilmoth, 1995Go, Carter, 1995Go). In their original paper, Lee and Carter discuss the computation of prediction error for the mortality forecasts. However, this prediction error does not account for the estimation error of the parameters involved in the model, instead only incorporating uncertainty from the forecast of the mortality index. The omission of the estimation error leads to prediction intervals which are too erroneously narrow.

Recently, Brouhns et al. (2002, 2005)Go implemented a bootstrap procedure for a Poisson log-bilinear formulation of the Lee–Carter model. This procedure accounts for all sources of variability in the Poisson model. Czado et al. (2005)Go present a Bayesian formulation of the log-bilinear Poisson which also incorporates the different sources of error in the model when forming forecasts and prediction intervals. A state-space model formulation of the Lee–Carter method is presented in Pedroza (2002)Go, as well as various Bayesian time series models for forecasting mortality rates. Girosi and King (2005)Go present a Bayesian hierarchical modeling approach for predicting mortality rates which seeks to pool information from similar cross-sections (i.e. age groups, countries).

Here, the original Lee–Carter method is restated under a Bayesian framework. With this approach, missing data can be handled and sampling error is automatically incorporated within the model and its mortality forecasts. Extensions to the model can also be easily incorporated.

The outline of this paper is as follows: Section 2 presents a brief description of 20th century trends in mortality for developed countries, with the U.S. as an example. Section 3 describes Lee and Carter's method of forecasting. Section 4 presents the Lee–Carter method as a state-space model under a Bayesian framework and gives details on implementation and prediction. It also presents multiple imputation as a technique for handling missing data. In Section 5, extensions to the state-space model are presented. Section 6 applies the original Lee–Carter methodology and the Bayesian model to mortality data from U.S. males. Forecasts and prediction intervals from both methods are presented. Model diagnostics are presented and an extension to the Bayesian model is derived. Finally, Section 7 provides concluding remarks.


    2. HISTORICAL MORTALITY DATA
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Mortality rates for developed countries declined dramatically during the past century. Improvements in standards of living, sanitary conditions, hygiene, and medicine led to rapidly decreasing infant mortality rates in the early part of the century and consequently to first increasing and then decreasing mortality in older age groups in the latter part of the 20th century. Figure 1 shows log-mortality rates for the U.S. obtained from the Human Mortality Database (2002)Go. By mortality rates, we mean the ratio of deaths to mid-year population size for a given interval of age and time (central mortality rates). Figure 1 shows the constant decline of mortality rates over the last four decades for all-cause mortality in the U.S. for males and females. This is the general pattern for developed countries as illustrated by the two graphs of England and Spain male mortality shown in the bottom panels of Figure 1. This figure also illustrates the `arm-shaped' profile of log-mortality across age groups.


Figure 1
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Log-mortality rates for three countries. Notice the different rates of decline for different age groups over time.

 
In the youngest ages, high mortality rates are attributable to perinatal conditions. Mortality decreases during childhood and then increases again to the `accident hump' (traffic accidents and teenage violence) at the ages of 15–20. The rates then steadily increase across older age groups. Over the past five decades, mortality rates have declined for all age groups but at different rates. Age groups 0–10 have declined at the fastest rates, while the oldest ages have declined at a much slower pace. This holds true for both males and females in developed regions (Murray and Lopez, 1996Go).

Overall, females experience lower mortality rates than males. The decade of 1965–1975 marks the period with the widest gap between males and females at the oldest ages. However, in the past decade, the gap between males and females has steadily decreased. This is the general pattern for the developed countries, and in the past two decades, developing countries have exhibited the same pattern (see Murray and Lopez, 1996Go).

The calculation of mortality statistics in developing countries is plagued by missing or unreliable data from past years. Yet, mortality forecasts for these countries are critical since they are used to secure funds from international health-program organizations. Analyses based on data sets with missing data could lead to inaccurate estimation of mortality rates and in turn funds for health services. Thus, it is important that the forecasting method implement a systematic approach to the missing-data issue.


    3. LEE–CARTER METHOD
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 

3.1 Model

Lee and Carter (1992)Go introduced a method for forecasting the age-specific log-mortality rates for the entire U.S. population. Let Formula be the logarithm of the central mortality rate for age group i, where Formula and time Formula, and let Formula. Lee and Carter's method consists of fitting the following model to the log-mortality rates Formula:

Formula (3.1)

Here, Formula and Formula are vectors of unknown parameters, Formula is a vector of error terms, and Formula is an unobserved time series process. The parameters Formula, Formula, and Formula are to be estimated. The vector Formula measures the general pattern of mortality for each age group, and Formula measures the relative rate of change of mortality at each age. The vector Formula is an index of the level of mortality. To ensure identifiability, Lee and Carter impose the following constraints:

Formula (3.2)

3.2 Estimation of parameters

Lee and Carter propose the use of singular value decomposition (SVD) to estimate the parameters in the model. Let Formula be a Formula matrix of the log-mortality rates. The vector Formula is estimated by the average log-mortality over time for each age group. SVD is then applied to the mean corrected log-mortality rates Formula:

Formula

where Formula is a diagonal matrix containing singular values and both Formula and Formula are orthogonal matrices. Formula is set equal to the first column of Formula, and the Formula values are set equal to the product of the first column of Formula and the leading singular value Formula along with the normalizations given in (3.2). This estimation amounts to using the first principal component of the log-mortality matrix, as first presented by Bozik and Bell (1987)Go and Bell and Monsell (1991)Go.

3.3 Modeling and forecasting Formula

After Formula is estimated by SVD, one can make various adjustments to the estimated vector to produce a closer correspondence between estimated and observed death rates (see Lee, 2000Go, for further details). An appropriate univariate autoregressive integrated moving average (ARIMA) time series model forecasts the Formula series. Lee and Carter proposed a random walk model with drift for Formula, and most applications utilize the same approach. This model can be expressed as

Formula (3.3)

where Formula is the drift parameter which models a linear trend and Formula is an error term. The mean forecast value of Formula is calculated as

Formula

for the lth step-ahead forecast. Finally, to obtain mean forecasts of the log-mortality rates, the mean forecasted values of Formula are implemented along with the estimated values of Formula and Formula. The mean forecast of the log-mortality rate for year Formula is

Formula

3.4 Prediction intervals

To calculate prediction intervals, Lee and Carter's method first expresses the squared standard error of the forecasted mortality index Formula for the lth year-ahead forecast from model (3.3) as

Formula

where Formula is the squared standard error of Formula and Formula is the estimate of the variance of the error term Formula. The Lee–Carter 95% prediction intervals for the log-mortality rates are computed as

Formula

As pointed out by Lee and Carter (1992Go, p. 669), this prediction interval ignores errors in estimating the parameters Formula, Formula, and the variance of the error term Formula. It only accounts for the error of forecasting Formula. Their assumption is that the forecast error dominates all other errors. However, they do point out that for short-term forecasts their prediction intervals underestimate the `true' forecast error.

The Lee–Carter method has several advantages. First, it is a parsimonious model which accounts for a large proportion of the variance in the log-mortality rates. Second, the Lee–Carter method uses time series methods to generate stochastic forecasts, and it provides probabilistic prediction intervals. The main criticism of the method is the assumption of a time-invariant age component, i.e. no age–time interaction. From historical data, it appears that this assumption is invalid. For example, the rate of decline in U.S. mortality during the 1990s is different than the one exhibited in the 1970s. Furthermore, infant mortality experienced the largest decrease in the early part of the century, whereas older age groups exhibited the largest decline in the latter part of the 20th century. Data from other developed countries also show different patterns for different age groups (Booth et al., 2002Go). These violations raise important questions about the applicability of the Lee–Carter method for forecasting mortality in these countries.


    4. BAYESIAN MODEL
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 
Lee and Carter's method for forecasting mortality rates consists of (1) first fitting model (3.1) and (2) then forecasting the series Formula using an appropriate time series model. As first presented in Pedroza (2002Go, p. 48), the Lee–Carter method can be reformulated as a state-space model:

Formula (4.1)


Formula (4.2)

where Formula and Formula are assumed to be independent, and a random walk with drift model is assumed for the state vector. The multivariate normal model for the log-mortality rates provides a joint distribution for the p age groups at any given point in time. It assumes that the observations are independent across time, that is, the Formula are independent identically distributed (iid) with common variance Formula. In order to compare results from this model to those obtained from Lee and Carter's original model, the same constraints that Lee and Carter invoke, i.e. Formula and Formula, are used. Alternatively, any proper prior distribution on these parameters will lead to a valid posterior distribution. State-space models are extensively used in time series analysis (see, for example, Harvey, 1990Go; West and Harrison, 1997Go). Expressing the Lee–Carter method as a state-space model formalizes it as a statistical model. It also permits simultaneous estimation of all parameters, providing a systematic error assessment.

The Kalman filter can be used to estimate and forecast state-space models (Mehra, 1979Go, Harvey, 1990Go, Durbin and Koopman, 2001Go). Forecasting and handling of missing data in state-space models is straightforward with the Kalman filter. However, this algorithm does not incorporate uncertainty from unknown parameters and missing observations into either the forecasts or forecasting error. The forecast approach taken here follows the Bayesian forecasting methods of West and Harrison (1997)Go and makes use of simulation methods. The Kalman filter is incorporated into the fitting algorithm. Markov chain Monte Carlo (MCMC) methods are used to draw samples from the joint posterior distribution of the parameters and to form the posterior predictive distribution of the log-mortality rates. In particular, the Gibbs sampler is used to fit the model and to forecast the rates. One advantage of the Bayesian approach is that uncertainty of the parameters Formula, Formula, and Formula is incorporated in the estimation and forecasting of Formula.

Under a Bayesian paradigm, an investigator is able to formally incorporate prior information or knowledge about the problem at hand. This prior information is then combined with the observed data to form the posterior distribution of the parameters on which inference is based. Here, noninformative priors are used which lead to results comparable to those derived from Lee–Carter's original method. Standard flat prior distributions are assumed for Formula, Formula, and Formula along with noninformative priors for the variance parameters Formula and Formula: Formula, Formula, Formula. The initial prior distribution for the starting point Formula is specified as Formula, where Formula and Formula are assumed to be known.

4.1 Implementation

To implement the Bayesian model, the joint posterior distribution of all unknown parameters Formula, Formula, Formula, Formula, Formula, and Formula needs to be obtained. However, obtaining posterior inferences analytically from such a high-dimensional distribution is complicated. Instead, MCMC methods are employed to obtain the posterior distribution of the parameters. More specifically, we use a Gibbs sampler (Geman and Geman, 1984Go, Gelfand and Smith, 1990Go) to draw samples from the joint posterior distribution. This algorithm consists of iteratively sampling from the conditional distribution of each of the parameters given assigned values to all the other parameters and the data. For the state-space model, the Gibbs sampler consists of two main steps: (1) drawing parameters Formula, Formula, Formula, Formula, Formula from their respective conditional distributions given fixed values for all other parameters and observed data and (2) simulating the state vector Formula.

One way to sample from the state vector is by using the single-state Gibbs sampler which consists of sampling from Formula, where Formula is Formula excluding Formula and Formula indexes all other parameters. However, this algorithm can be extremely inefficient. Frühwirth-Schnatter (1994)Go and Carter and Kohn (1994)Go independently developed methods for simulating the state vector based on the identity:

Formula (4.3)

De Jong and Shephard (1995)Go developed a similar method which samples the disturbance vectors instead of the state vectors. This method could also be used for the Lee–Carter model. Here we follow the approach of West and Harrison (1997)Go and use the identity given in (4.3) to sample the vector Formula.

Next we give the steps of the Gibbs sampler. Let Formula be the observed data up to time n.

  1. Run the Kalman filter with updating equations

    Formula

    for Formula, and store quantities Formula and Formula. Next, sample the state vector as follows (dependance on all other parameters is implicit):

    (a) Sample Formula from Formula, then
    (b) for each Formula, sample Formula from Formula

    The conditional distribution for item (b) is

    Formula

    where

    Formula


  2. Draw Formula from

    Formula


  3. Draw Formula and Formula by performing separate regressions for each age group of Formula on Formula. Letting Formula where Formula is an Formula vector of ones and Formula is the vector of Formula values, the conditional distribution of Formula and Formula for age group i is

    Formula


  4. Draw Formula from

    Formula


  5. Draw Formula from

    Formula


Repeat steps 1 through 5 until convergence.

4.2 Prediction

In general, the posterior predictive distribution Formula for future observations can be expressed as

Formula

where Formula indicates all parameters in the model. The last equation holds because of the assumption that Formula and Formula are conditionally independent given the parameters Formula. The posterior predictive distribution can be obtained analytically or through simulation methods. Here, it is obtained by sampling iteratively from Formula and Formula to form the distribution.

Given the draws from the posterior distribution obtained from the Gibbs sampler, it is straightforward to forecast the log-mortality rates, Formula. First, samples from the predictive distribution of Formula are obtained from (4.2) and then samples from the predictive distribution of Formula are generated from (4.1).

These steps can be incorporated into the Gibbs sampler once convergence is attained as follows. Suppose we have Formula draws of all the parameters from the Gibbs sampler and let Formula indicate the lth draw from the Gibbs sampler of the parameter Formula. To forecast, we perform the following two steps:

  1. Draw Formula from

    Formula


  2. Draw Formula from

    Formula


Repeat steps 1 and 2 for Formula, where N is the number of step-ahead forecasts desired. The resulting samples of Formula form the predictive distribution of the mortality rates on which inferences are based.

4.3 Missing data

Mortality data can have missing observations across time and/or age groups. However, it is much more common to have incomplete time series data. Li et al. (2004)Go present a method for applying the original Lee–Carter model to forecast mortality for data sets with few observations over time and complete data across age groups. Their method accounts for extra variability arising from fewer time points when forecasting the Formula parameter. However, their method does not account for the extra variability which will also be present in the Formula and Formula parameters, and it is not meant to handle incomplete data across age groups.

To account for missing observations across time and/or age groups, we can use Rubin's (1987)Go method of multiple imputation. Briefly, multiple imputation replaces each missing value with m different values to create m complete data sets. Each complete data set is analyzed as if it was completely observed. The m results are then combined to produce a final inference. This final inference reflects the uncertainty due to the missing data. To impute the missing values, samples are drawn from the posterior predictive distribution of the missing values. Letting Formula and Formula be the vectors of missing and observed data, respectively, and assuming that the missing-data mechanism is ignorable (Little and Rubin, 1987Go), the posterior predictive distribution is

Formula

This is the conditional predictive distribution of Formula given Formula, averaged over the observed-data posterior of Formula. Since the state-space model assumes iid observations over time and across age groups (the covariance matrix is assumed to be of the form Formula), the predictive distribution of the missing data given the parameters Formula does not depend on the observed data, i.e. Formula. Thus, imputing the missing values is straightforward regardless of whether the missing data is across time, age groups, or both.

Here, we are mainly interested in computing estimates of the parameters in the model and in forming forecasts of the log-mortality rates. To estimate the parameters, we simulate random values of the parameters Formula from the observed-data posterior distribution, Formula. This can be carried out through simulation techniques, in particular the Gibbs sampler. The Gibbs sampler in the presence of missing data consists of two main steps (Schafer, 1997Go): (1) imputation of the missing data given both the observed data and a sample of the parameters and (2) sampling of the parameters given the complete data. The first step imputes Formula with a draw from Formula and the second steps draws the parameters from Formula. As mentioned above, the conditional distribution of Formula given the parameters does not depend on Formula regardless of whether the missing data is across time and/or age groups. Imputing the missing values consists of drawing from

Formula

Thus, in the presence of missing data, the Gibbs sampler has only one extra step. Handling missing data under the Bayesian paradigm can be incorporated seamlessly into the algorithm. The resulting forecasts of log-mortality rates account for the uncertainty due to missing data.


    5. EXTENSIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 
As noted before, the Lee–Carter method assumes no age–time interaction. As a demonstration of the flexibility afforded by the Bayesian approach, we could let each age group follow its own random walk with different rates of decline. This level of flexibility is not required. Alternatively, the age groups can be clustered so that the youngest age groups follow one random walk, the middle age groups follow another, and the oldest age groups follow a third different random walk. By adopting a Bayesian approach to the Lee–Carter method, an array of changes and/or extensions to the original model can be easily implemented. In this section, the notation of the model is generalized and some extensions are briefly discussed.

The state-space model can be generalized as follows:

Formula (5.1)

Here, Z and G are the design and state matrices, respectively, which can include unknown parameters. For the state-space model in Section 4, Formula, Formula, Formula, Formula, and

Formula (5.2)

One way to generalize the Lee–Carter model is to change the form of the state equation. Thus, a simple random walk with drift model can be replaced by different ARIMA models. For example, an autoregressive process of order 1 [AR(1)] process could be used, placing more weight on the latest observations when forming forecasts. The only thing that would differ from the original model would be the matrix G:

Formula (5.3)

Another way to extend the Lee–Carter model is to allow the drift parameter to vary over time. From empirical data, it appears that the rate of decrease of mortality has changed over the last century, suggesting that a model with time-dependent drift would be a plausible model. In this case, Formula, Formula, G is as in (5.2), and Formula Another extension would be to have both an AR(1) process for the Formula and a time-dependent drift. The model would then be as above with the matrix G equal to that of (5.3).

The correlation between age groups could also be modeled by changing the first-level covariance matrix to a nondiagonal form. From historical data, older age groups exhibit more variability due to the small size of that population. This could be readily incorporated into the model by letting different age groups or clusters of the groups have different variances. Modeling of the variance–covariance matrix would complicate the implementation of the model but could potentially allow for a more realistic model of mortality rates. Barnard et al. (2000)Go provide a detail discussion on the modeling of covariance matrices. Below, we give details of the Gibbs sampler for a model that assumes different variances for different age groups but no correlation between the groups. Details of the fitting algorithm for the other extensions discussed are given in the Appendix.

An extension to the Lee–Carter model which assumes different variance for the different age groups can be expressed in state-space form as

Formula (5.4)

where Formula is assumed to be a diagonal matrix with up to p different Formula variance parameters along the diagonal. Thus, if we grouped the age groups into three clusters of young, middle, and old, then we would have three Formula parameters. Assuming that the variance parameters Formula are independent a priori, steps 1–3 of the Gibbs sampler change as follows:

  1. Run the Kalman filter with updating equations

    Formula

    Note that the only change occurs in the equation for Formula. Sampling of the state vector Formula is carried out as described in Section 4.1.

  2. For each cluster of age groups j, draw Formula from

    Formula

    where Formula is the number of age groups in cluster j.

  3. Draw Formula and Formula by performing separate regressions for each age group of Formula on Formula. Letting Formula where 1 is an Formula vector of ones and Formula is the vector of Formula values, the conditional distribution of Formula and Formula for age group i belonging to cluster j is

    Formula


Steps 4 and 5 of the Gibbs sampler are unchanged.

The prediction of future values of Formula is replaced by drawing from the following normal distribution

Formula

If missing data are present across time, the only change in the imputation of missing observations is on the distribution used. Namely, the covariance matrix changes and the missing values are drawn from the multivariate normal distribution

Formula

If missing data are present across age groups, the imputations will consist of sampling from a conditional normal distribution. The methods for multivariate data presented in Schafer (1997)Go can be used to carry out the imputations.


    6. APPLICATION
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 
We analyze a data set for U.S. male all-cause mortality to demonstrate the Bayesian model for forecasting mortality. The data set was obtained from the Human Mortality Database (2002)Go. This data set consists of annual age-specific death rates for years 1959–1999. The age groups are 0, 1–4, 5–9, Formula–109, Formula. Figure 1, top left panel, shows a plot of the log-death rates for 5 years. To illustrate the methodology presented here and to assess the predictive quality of the models, we perform out-of-sample forecast for the last 10 years of the data. That is, we use the data from 1959 to 1989 to fit the model and then forecast for 10 years ahead. We then compare the forecasts to the observed values.

We fitted and forecasted log-mortality rates using both the original Lee–Carter method and the Bayesian model presented in Section 4. For the Bayesian model, we used normal distribution with a mean of 5 and variance of 10 as a prior distribution for the starting point of the state process, Formula, based on results from previous studies. We consider a variance of 10 to be large enough to consider this a ‘vague’ prior so as to lead to comparable results to those derived from Lee–Carter's original method. However, different choices of the prior distribution, in particular different values for the variance, lead to virtually identical results. Here, we summarize the results by presenting the out-of-sample forecasts and prediction intervals from both models.

For the Bayesian model, we ran three parallel chains of the Gibbs sampler with different overdispersed starting values for 2000 iterations. Convergence of the chains was checked using the Gelman–Rubin convergence diagnostic statistic (Gelman and Rubin, 1992Go) implemented in the CODA package for R (Plummer et al., 2005Go). This statistic is computed for each scalar estimate of interest. A statistic less than 1.1 suggests convergence. The Gelman–Rubin statistic was computed for all parameters in the model, and all were less than 1.01, indicating convergence. Trace plots for a representative subset of the parameters are presented in Figure 2. These plots show that the three chains mix well together. A final sample for each parameter was obtained by selecting the last 1000 values of each of the three chains. Results presented here are based on the combined 3000 draws from the posterior distribution.


Figure 2
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Trace plots of three separate chains for a subset of parameters in the Bayesian model.

 
An appropriate concern would be computational cost. Each of the analyses (implementation of the model and prediction of the log-mortality rates) presented here took 5 min or less to run in R (R Development Core Team, 2005Go) on a personal computer with 3.2 GHz and 1 GB of memory. However, computational efficiency was not optimized.

Estimates of the parameters Formula and Formula obtained from the Lee–Carter method and the Bayesian model are shown in Table 1. These sets of estimates are almost identical for all age groups.


View this table:
[in this window]
[in a new window]

 
Table 1 Parameter estimates for the Lee–Carter (LC) and Bayesian models

 
6.1 Display of the forecasts

For each of the Formula age groups, we present a graph of the observed log-mortality rates (solid line) for years 1990–1999 along with the point estimate of the forecast (dashed line) and 95% prediction intervals (dotted lines) (Figure 3). For the Bayesian model, the point estimate is the mean of the posterior predictive distribution. Figure 3 displays the results for a subset of five representative age groups from both the Lee–Carter method (left panel) and the Bayesian model (middle panel). The two features to focus on in these graphs are the accuracy of the point estimates and the coverage of the observed values by the prediction intervals. Comparing first the point forecasts, it appears that forecasts from the Bayesian and Lee–Carter method are very close. We calculated the mean squared error (MSE) and averaged it over the age groups:

Formula

where Formula indicates either the Bayesian or Lee–Carter forecast for age group i and year t. The average MSE was 0.0098 for the Bayesian forecasts and 0.0104 for the Lee–Carter ones, indicating a very close agreement between the two sets of forecasts. There does not appear to be a clear pattern of over- or underestimation from these two methods.


Figure 3
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Observed (solid line) and forecasted (dashed line) log-mortality rates with 95% prediction intervals for U.S. males for Lee–Carter's method (left panel), Bayesian model (middle), and extended Bayesian model (right).

 
Comparison of the prediction intervals yields very different results from the two methods. The Lee–Carter intervals for age groups 15–40 are quite narrow. The reason being that for these age groups the Formula parameters are very close to zero (Table 1), indicating that the relative rate of decline of mortality is close to zero. In other words, mortality for these age groups has seen a very small decline relative to other age groups. Given its role in the prediction interval, the magnitude of the Formula parameter makes the intervals on these age groups quite small. Overall, the prediction intervals obtained from the Bayesian model are wider than the Lee–Carter intervals. This is not a surprising result given the fact that this model incorporates all sources of variation in the model when creating forecasts and intervals. The majority of the Bayesian intervals cover the observed rates except for the oldest age groups which exhibit the most variability.

The 1-year-ahead forecast standard errors yield the starkest difference between the Bayesian and Lee–Carter model. For age group 25, the Lee–Carter error of 0.003 is 25 times smaller than the Bayesian estimate of 0.078. The closest error estimates are for the youngest age group with the Lee–Carter estimate being approximately half the value of the Bayesian error of 0.098. Looking at the forecast standard errors for the 10-year-ahead forecasts, we see a closer agreement from the two methods. The biggest discrepancy is still in the age group 25, with a Lee–Carter error of 0.011 and a Bayesian error of 0.086. The Lee–Carter method gives an error of 0.175 for age group 0 compared to the Bayesian error of 0.181 for this group. Thus, for long-term forecasts, it appears that the Lee–Carter forecasting error starts to approach the Bayesian error, but for most age groups, it is still less than 75% of the error obtained from the Bayesian model.

The assumption made by Lee and Carter that the main source of error would come from the forecasting error of the mortality index Formula yields prediction intervals which are too narrow for short-term forecasts, specially for young adults and older age groups. When we account for all sources of estimation error, we calculate prediction intervals which are wider and reflect the true prediction error in the forecasted mortality rates.

6.2 Model diagnostics

From the plots of the forecasted log-mortality rates, it appears that for some of the age groups (e.g. 50 and 90), the prediction intervals are too wide. One assumption of the fitted model is that all age groups have the same variability. However, plots of the observed data from 1959 to 1989 show that some age groups have more variability than others. In particular, the oldest age group appears to be more variable than all the other groups as would be expected. To test the assumption of homogeneity across age groups, we can look at posterior predictive diagnostics (Gelman et al., 1996Go). This approach consists of replicating data from the model and then comparing it to the observed data. Any systematic differences between the replicated and observed data could indicate a misfit in the model. Differences or discrepancies between the model and data can be measured by a test quantity or discrepancy measure. This measure can depend on either just data (i.e. test statistic) or both data and parameters. The posterior predictive check is the comparison between realized values of the measure Formula obtained from the observed data y and simulated parameters Formula and the predictive measures Formula obtained from simulated data and parameters. If the actual data are plausible, then the posterior distribution of Formula will be similar to that of Formula.

As a discrepancy measure here, we calculate the average squared residual for each age group as

Formula

Let Formula be the value of the discrepancy measure for the simulated data Formula and simulated parameters Formula for age group i and Formula the value of the measure for the actual data and simulated parameters. This provides an estimate of the regression variance. The approximate posterior distributions of Formula based on the same 500 draws of the parameters and simulated data are shown in Figure 4 (left panel) for a subset of the age groups.


Figure 4
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Scatter plot of the joint posterior distribution of the averaged squared residual Formula for age group i evaluated with observed data and simulated parameters vs. simulated data and parameters. Left panel corresponds to data simulated from the model assuming homogeneity of variance across age groups and the right corresponds to data simulated assuming heterogeneity.

 
If the posterior distributions of the discrepancy for the simulated data and the actual data are similar, then we should see large portions of the distribution of the discrepancies lie both above and below the diagonal line Formula. For age groups 0 and 25, this seems to be the case, indicating that the observed variability is similar to the expected variability under the Bayesian model. However, for age groups 50 and 90, almost all the points lie above the diagonal, indicating that the variability under the model is much larger than the observed one for these two age groups. In other words, we overestimate the variability for these age groups. In contrast, the plot for the oldest age group 110 shows a clump of points all well below the diagonal, indicating that the observed variability is much larger than that expected under the model. These results indicate a lack of fit of the model to the observed data. From these graphs, we can conclude that there is evidence of heterogeneous variances for different age groups.

The posterior predictive diagnostics seem to indicate that a model with different variances for different age groups may be more appropriate. To keep the model as simple as possible, age groups may be grouped into clusters which exhibit similar variances. To help construct these clusters, we examined the approximate posterior distributions of the average squared residuals (i.e. approximate estimates of the regression variances) obtained from the actual data. The means of these distributions for the age group 24 are shown in Table 2. To try to capture the largest differences between the age groups, we decided on a model with 10 different variance parameters. We clustered the age groups as follows: 0, 1, 5–10,15–35, 40, 45–65, 70–80, 85–100, 105, 110. We fitted this model and again calculated the discrepancy measures for both actual and simulated data. This analysis took between 5 and 7 min to run in R. The resulting posterior distributions are shown in Figure 4 (right panel). Here we see that the values for the discrepancies are much more evenly distributed above and below the diagonal line, indicating a better fit to the observed data.


View this table:
[in this window]
[in a new window]

 
Table 2 Approximate estimates of the regression variance for homogenous Bayesian model

 
Plots of the forecasts from the extended model are shown in Figure 3 (right panel). Here we see the same pattern as for the simulated in-sample data. Prediction intervals for the younger age groups do not differ much from the ones produced by the original model. For the middle groups (i.e. 50–90) the intervals are narrower, and for the oldest group, the interval is much wider and covers the observed values. In general, the prediction intervals are much tighter for age groups which have historically shown smaller variability. The prediction intervals are improved, although the point estimates of the forecast do not change by much.


    7. CONCLUSIONS
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 
This paper has presented a Bayesian approach to the Lee–Carter method for forecasting mortality rates. The Bayesian formulation of the model incorporates all sources of variation in the model when forming forecasts. The resulting prediction intervals are much wider than those obtained by Lee–Carter, and they reflect more accurately the forecasting error associated with the model. Furthermore, the Bayesian model can easily handle missing data both in the time series and across age groups and incorporate the uncertainty associated with it. This aspect of the model is important when working with data from countries where vital records are incomplete or unreliable.

We compared results from the Bayesian model and the Lee–Carter model for U.S. male data and showed that confidence interval widths are different. For some age groups, the Lee–Carter intervals do not cover the observed values. In contrast, the Bayesian intervals cover the observed values and reflect the true uncertainty of these forecasts. As an extension, we also fitted a model with heterogeneous variance for the age groups. This extension models the observed variability for different age groups and provides more realistic confidence intervals for the estimated log-mortality rates as well as prediction intervals for future observations. Another possible extension to the model would be to allow age groups to have different rates of decline. We could also let each age group have its own time-dependent drift. This model would allow each age group to have different rates of decline and also allow these rates to vary over time. Also note that different prior specifications for the model parameters can be used to incorporate prior information.

We realize that implementation of the Bayesian model is more complex. However, we believe that it is important for forecasts and their estimated errors to reflect the underlying uncertainty of the model and data. Furthermore, a Bayesian analysis allows the investigator to test the validity of the model through the use of posterior predictive checks as demonstrated in Section 6.2. We refer the reader to Gelman et al. (1995Go, 1996Go) for a more in-depth discussion of model checking in a Bayesian setting.

A second disadvantage of the Bayesian approach is in the specification of prior information. If true prior information is known, it should be incorporated into the Bayesian model. However, it is not always easy to quantify prior information in terms of a prior distribution. Careful consideration needs to be given to the prior specifications, and sensitivity analysis should be conducted to check how robust the forecasts are to the choice of the prior distribution.

One aspect of the Bayesian approach taken here is worthy of discussion. We implicitly used a squared-error loss when determining the Bayesian estimator for the forecasts. This loss function penalizes equally underestimation and overestimation. However, suppose we want forecasts for a developing country to decide funding for a public health program. In this case, underestimation would be a far worse loss than overestimation. Here, a linear loss would be more appropriate. The Bayes estimate of the mortality forecasts would then be an appropriate percentile of the posterior predictive distribution. For example, if we deemed underestimation to be three times worse than overestimation, then we would use the 75th percentile as the estimate. For a different country, the estimate could possibly be another percentile, depending on the intended use of the forecasts. Tailoring a model to a specific data set is one of the most important advantages of a Bayesian analysis. A better fitting prediction model can lead to more realistic forecasts and prediction intervals of mortality rates.


    APPENDIX
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 

A.1 The Gibbs sampler for extended models

The general model presented in (5.1) can be fitted by the Gibbs sampler. Assuming noninformative priors for the parameters, the steps of the Gibbs sampler are as follows:

Gibbs sampler for general model.

  1. Run the Kalman filter with the following updating equations

    Formula

    Store quantities Formula and Formula for Formula. Next, sample the state vector as follows:

    (a) Sample Formula from Formula, then
    (b) for each Formula, sample Formula from Formula

    The conditional distribution for item (b) is

    Formula

    where

    Formula


  2. Draw Formula from Formula
  3. Draw Formula and Z by performing separate regressions of Formula on Formula for each age group. Letting Formula, the conditional distribution of Formula and Formula for age group i is

    Formula


  4. Draw Formula from its conditional distribution. Assuming the prior

    Formula

    where d is the dimension of Formula, then the conditional distribution is an inverted-Wishart distribution:

    Formula

    where Formula.

Gibbs sampler for AR(1) model.

In the case with an AR(1) process in the state equation, the model becomes

Formula

The Gibbs sampler has an added step for drawing the parameter Formula. The steps for this model are as follows:

  1. Run the Kalman filter using the equations for the general model with the matrix G as in (5.2).
  2. Draw Formula from Formula
  3. Draw each Formula and Formula from Formula, where Formula.
  4. Draw Formula from

    Formula


  5. Draw Formula from

    Formula



    ACKNOWLEDGMENTS
 
The author thanks Lemuel Moyé, Robert Hardy, and David van Dyk for their insightful and constructive comments. Helpful comments by the associate editor and two referees have greatly improved the article. Conflict of Interest: None declared.


    REFERENCES
 TOP
 SUMMARY
 1. INTRODUCTION
 2. HISTORICAL MORTALITY DATA
 3. LEE-CARTER METHOD
 4. BAYESIAN MODEL
 5. EXTENSIONS
 6. APPLICATION
 7. CONCLUSIONS
 APPENDIX
 REFERENCES
 

    Alho JM and Spencer BD. (1990) Error models for official mortality forecasts. Journal of the American Statistical Association 85:609–616.[CrossRef][Web of Science][Medline]

    Barnard J, McCulloch R, Meng X.-L. (2000) Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statistica Sinica 10:1281–1311.[Web of Science]

    Bell W and Monsell B. (1991) Using principal components in time series modeling and forecasting of age-specific mortality rates, ASA Proceedings of the Social Statistics Section. (American Statistical Association, Alexandria, VA)154–159.

    Booth H, Maindonald J, Smith L. (2002) Applying Lee-Carter under conditions of variable mortality decline. Population Studies 56:325–336.[CrossRef][Medline]

    Bozik JE and Bell WR. (1987) Forecasting age specific fertility using principal components, ASA Proceedings of the Social Statistics Section. (American Statistical Association, Alexandria, VA)396–401.

    Brouhns N, Denuit M, Van Keilegom I. (2005) Bootstrapping the Poisson log-bilinear model for mortality forecasting. Scandinavian Actuarial Journal 212–224.

    Brouhns, N., Denuit, M. and Vermunt, J.K., (2002). Measuring the longevity risk in mortality projections. Bulletin of the Swiss Association of Actuaries, 105–130.

    Carter CK and Kohn R. (1994) On Gibbs sampling for state space models. Biometrika 81:541–553.[Abstract/Free Full Text]

    Carter L. (1995) Forecasting U.S. mortality: a comparison of Box-Jenkins ARIMA and structural time series models. The Sociological Quarterly 37:127–144.[CrossRef]

    Carter LR and Lee RD. (1992) Modeling and forecasting US sex differentials in mortality. International Journal of Forecasting 8:393–411.[CrossRef][Web of Science][Medline]

    Czado C, Delwarde A, Denuit M. (2005) Bayesian Poisson log-bilinear mortality projections. Insurance: Mathematics & Economics 36:260–284.[CrossRef][Web of Science]

    de Jong P and Shephard N. (1995) The simulation smoother for time series models. Biometrika 82:339–350.[Abstract/Free Full Text]

    Durbin J and Koopman SJ. (2001) Time Series Analysis by State Space MethodsOxford University Press.

    Frühwirth-Schnatter S. (1994) Data augmentation and dynamic linear models. Journal of Time Series Analysis 15:183–202.

    Gelfand AE and Smith AFM. (1990) Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85:398–409.[CrossRef]

    Gelman A, Carlin JB, Stern HS, Rubin DB. (1995) Bayesian Data Analysis(Chapman & Hall., London).

    Gelman A, Meng X-L, Stern H. (1996) Posterior predictive assessment of model fitness via realized discrepancies (discussion 760–807). Statistica Sinica 6:733–760.[Web of Science]

    Gelman A and Rubin DB. (1992) Inference from iterative simulations using multiple sequences (with discussion). Statistical Science 7:457–472.

    Geman S and Geman D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6:721–741.[CrossRef][Web of Science]

    Girosi, F. and King, G., (2005). Demographic Forecasting. Monograph (unpublished).

    Harvey AC. (1990) Forecasting, Structural Time Series Models, and the Kalman Filter(Cambridge University Press., Cambridge).

    Human Mortality Database. (2002). University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at http://www.mortality.org or http://www.humanmortality.de (accessed on April 1, 2005).

    Lee R. (2000) The Lee-Carter method for forecasting mortality, with various extensions and applications. North American Actuarial Journal 4:80–93.

    Lee RD and Carter LR. (1992) Modeling and forecasting U.S. mortality (discussion 671–675). Journal of the American Statistical Association 87:659–671.[CrossRef]

    Lee RD and Tuljapurkar S. (1994) Stochastic population forecasts for the United States: beyond high, medium and low. Journal of the American Statistical Association 89:1175–1189.[CrossRef]

    Li N, Lee R, Tuljapurkar S. (2004) Using the Lee-Carter method to forecast mortality for populations with limited data. International Statistical Review 72:19–36.

    Little RJA and Rubin DB. (1987) Statistical Analysis with Missing Data(John Wiley & Sons., New York).

    Mehra RK. (1979) Kalman filters and their applications to forecasting. In Makridakis S and Wheelwright SC (Eds.). Forecasting. Studies in the Management Sciences(Elsevier, New York) Volume 12: pp. 75–94.

    Murray CJ and Lopez AD. (1996) The Global Burden of Disease(Harvard University Press., Cambridge, MA).

    Pedroza C. (2002) Bayesian hierarchical time series modeling of mortality rates, Ph.D. Thesis(Harvard University, Cambridge, MA.).

    Plummer, M., Best, N., Cowles, K. and Vines, K., (2005). Coda: Output Analysis and Diagnostics for MCMC. R package version 0.9–2. Available at http://www-fis.iarc.fr/coda/.

    R Development Core Team C. (2005) R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, Austria) ISBN 3-900051-07-0. Available at http://www.R-project.org.

    Rubin DB. (1987) Multiple Imputation for Nonresponse in Surveys(John Wiley & Sons., New York).

    Schafer JL. (1997) Analysis of Incomplete Multivariate Data(Chapman & Hall., London).

    West M and Harrison J. (1997) Bayesian Forecasting and Dynamic Models(Springer., New York).

    Wilmoth JR. (1995) Are mortality projections always more pessimistic when disaggregated by cause of death? Mathematical Population Studies 5:293–319.

    Received June 21, 2005; revised January 25, 2006; accepted for publication February 10, 2006.


    Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



    This Article
    Right arrow Abstract Freely available
    Right arrow FREE Full Text (PDF) Freely available
    Right arrow All Versions of this Article:
    7/4/530    most recent
    kxj024v1
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Add to My Personal Archive
    Right arrow Download to citation manager
    Right arrowRequest Permissions
    Right arrow Disclaimer
    Google Scholar
    Right arrow Articles by Pedroza, C.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Pedroza, C.
    Social Bookmarking
     Add to CiteULike   Add to Connotea   Add to Del.icio.us  
    What's this?