Biostatistics Advance Access published online on July 3, 2008
Biostatistics, doi:10.1093/biostatistics/kxn022
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Creating unbiased cross-sectional covariate-related reference ranges from serial correlated measurements
Centre for Paediatric Epidemiology and Biostatistics, Institute of Child Health, 30 Guilford Street, London WC1N 1EH, England awade{at}ich.ucl.ac.uk
Perinatal Research Unit, Department of Obstetrics, Zurich University Hospital, Frauenklinikstrasse 10, 8091 Zurich, Switzerland
Cross-sectional covariate-related reference ranges are widely used in clinical medicine to put individual observations in the context of population values. Usually, such reference ranges are created from data sets of independent observations. If multiple measurements per individual are available, then ignoring the within-person correlation between repeats will lead to overestimation of centile precision. Furthermore, if abnormal measurements have triggered more frequent assessment, the data set will be biased thus producing biased centiles. Where multiple measures per individual exist, the methods commonly used are either randomly or systematically to select one observation per individual or to model individual trajectories and combine these. The first of these approaches may result in discarding a large proportion of the available data and may itself cause bias and the latter requires the form of the changes within individuals to be characterized. We have developed an approach to the modeling of the median, spread, and skew across individuals using maximum likelihood, which can incorporate correlations between dependent observations. Heavily biased data sets are simulated to illustrate how the methodology can eliminate the biases inherent in the data collection process and produce valid centiles plus estimates of the within-person correlations. The "select one per individual" approach is shown to be liable to bias and to produce less precise centiles. We recommend that the maximum likelihood method incorporating correlations be used with existing data sets. Furthermore, this is a potentially more efficient approach to be considered when planning the future collection of data solely for the purposes of creating cross-sectional covariate-related reference ranges.
Keywords: Age-related reference ranges; Correlated measurements; Dependence; Serial measures; Unbiased; z-scores
* To whom correspondence should be addressed.
Received January 12, 2007; revised November 7, 2007; revised April 25, 2008; revised May 20, 2008; revised June 5, 2008; accepted for publication June 6, 2008.