<?xml version="1.0" encoding="ISO-8859-1"?>

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/"
 xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
 xmlns:prism="http://purl.org/rss/1.0/modules/prism/"
 xmlns:admin="http://webns.net/mvcb/"
>

<channel rdf:about="http://biostatistics.oxfordjournals.org">
<title>Biostatistics - recent issues</title>
<link>http://biostatistics.oxfordjournals.org</link>
<description>Biostatistics - RSS feed of recent issues (covers the latest 3 issues, including the current issue) </description>
<prism:eIssn>1468-4357</prism:eIssn>
<prism:publicationName>Biostatistics</prism:publicationName>
<prism:issn>1465-4644</prism:issn>
<items>
 <rdf:Seq>
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/405?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/409?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/424?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/436?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/446?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/451?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/468?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/481?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/501?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/515?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/535?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/550?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/561?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/575?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/588?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/205?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/219?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/228?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/245?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/258?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/275?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/282?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/297?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/310?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/324?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/327?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/335?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/352?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/364?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/374?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/390?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/1?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/3?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/17?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/32?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/46?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/60?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/70?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/80?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/94?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/106?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/121?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/136?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/147?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/155?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/172?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/187?rss=1" />
  <rdf:li rdf:resource="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/201?rss=1" />
 </rdf:Seq>
</items>
</channel>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/405?rss=1">
<title><![CDATA[Reproducible research and Biostatistics]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/405?rss=1</link>
<description><![CDATA[]]></description>
<dc:creator><![CDATA[Peng, R. D.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp014</dc:identifier>
<dc:title><![CDATA[Reproducible research and Biostatistics]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>408</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>405</prism:startingPage>
<prism:section>Editorial</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/409?rss=1">
<title><![CDATA[Air pollution and health in Scotland: a multicity study]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/409?rss=1</link>
<description><![CDATA[
<p>This paper presents an epidemiological study investigating the effects of long-term air pollution exposure on public health in Scotland, focusing on the 4 major urban areas, Aberdeen, Dundee, Edinburgh, and Glasgow. In particular, the associations between respiratory hospital admissions in 2005 and exposure to both PM<SUB>10</SUB> and NO<SUB>2</SUB> between 2002 and 2004 are estimated using a small-area ecological design. The implementation of such studies requires careful consideration of a number of statistical issues, including how to model spatial correlation, identifiability of the model parameters, and the possible effects of ecological bias. The results show that long-term exposures (over 3 years) to PM<SUB>10</SUB> and NO<SUB>2</SUB> are significantly associated with respiratory hospital admissions in Edinburgh and Glasgow, whereas the risks for Aberdeen and Dundee are generally positive but nonsignificant.</p>
]]></description>
<dc:creator><![CDATA[Lee, D., Ferguson, C., Mitchell, R.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp010</dc:identifier>
<dc:title><![CDATA[Air pollution and health in Scotland: a multicity study]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>423</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>409</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/424?rss=1">
<title><![CDATA[A simulation-approximation approach to sample size planning for high-dimensional classification studies]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/424?rss=1</link>
<description><![CDATA[
<p>Classification studies with high-dimensional measurements and relatively small sample sizes are increasingly common. Prospective analysis of the role of sample sizes in the performance of such studies is important for study design and interpretation of results, but the complexity of typical pattern discovery methods makes this problem challenging. The approach developed here combines Monte Carlo methods and new approximations for linear discriminant analysis, assuming multivariate normal distributions. Monte Carlo methods are used to sample the distribution of which features are selected for a classifier and the mean and variance of features given that they are selected. Given selected features, the linear discriminant problem involves different distributions of training data and generalization data, for which 2 approximations are compared: one based on Taylor series approximation of the generalization error and the other on approximating the discriminant scores as normally distributed. Combining the Monte Carlo and approximation approaches to different aspects of the problem allows efficient estimation of expected generalization error without full simulations of the entire sampling and analysis process. To evaluate the method and investigate realistic study design questions, full simulations are used to ask how validation error rate depends on the strength and number of informative features, the number of noninformative features, the sample size, and the number of features allowed into the pattern. Both approximation methods perform well for most cases but only the normal discriminant score approximation performs well for cases of very many weakly informative or uninformative dimensions. The simulated cases show that many realistic study designs will typically estimate substantially suboptimal patterns and may have low probability of statistically significant validation results.</p>
]]></description>
<dc:creator><![CDATA[de Valpine, P., Bitter, H.-M., Brown, M. P. S., Heller, J.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp001</dc:identifier>
<dc:title><![CDATA[A simulation-approximation approach to sample size planning for high-dimensional classification studies]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>435</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>424</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/436?rss=1">
<title><![CDATA[Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/436?rss=1</link>
<description><![CDATA[
<p>The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (<cross-ref type="bib" refid="bib12">Liang and Zeger, 1986</cross-ref>). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (<cross-ref type="bib" refid="bib22">Qin and Lawless, 1994</cross-ref>). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.</p>
]]></description>
<dc:creator><![CDATA[Leung, D. H. Y., Wang, Y.-G., Zhu, M.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp002</dc:identifier>
<dc:title><![CDATA[Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>445</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>436</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/446?rss=1">
<title><![CDATA[A note on oligonucleotide expression values not being normally distributed]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/446?rss=1</link>
<description><![CDATA[
<p>Novel techniques for analyzing microarray data are constantly being developed. Though many of the methods contribute to biological discoveries, inability to properly evaluate the novel techniques limits their ability to advance science. Because the underlying distribution of microarray data is unknown, novel methods are typically tested against the assumed normal distribution. However, microarray data are not, in fact, normally distributed, and assuming so can have misleading consequences. Using an Affymetrix technical replicate spike-in data set, we show that oligonucleotide expression values are not normally distributed for any of the standard methods for calculating expression values. The resulting data tend to have a large proportion of skew and heavy tailed genes. Additionally, we show that standard methods can give unexpected and misleading results when the data are not well approximated by the normal distribution. Robust methods are therefore recommended when analyzing microarray data. Additionally, new techniques should be evaluated with skewed and/or heavy-tailed data distributions.</p>
]]></description>
<dc:creator><![CDATA[Hardin, J., Wilson, J.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp003</dc:identifier>
<dc:title><![CDATA[A note on oligonucleotide expression values not being normally distributed]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>450</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>446</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/451?rss=1">
<title><![CDATA[Conditional GEE for recurrent event gap times]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/451?rss=1</link>
<description><![CDATA[
<p>This paper deals with the analysis of recurrent event data subject to censored observation. Using a suitable adaptation of generalized estimating equations for longitudinal data, we propose a straightforward methodology for estimating the parameters indexing the conditional means and variances of the process interevent (i.e. gap) times. The proposed methodology permits the use of both time-fixed and time-varying covariates, as well as transformations of the gap times, creating a flexible and useful class of methods for analyzing gap-time data. Censoring is dealt with by imposing a parametric assumption on the censored gap times, and extensive simulation results demonstrate the relative robustness of parameter estimates even when this parametric assumption is incorrect. A suitable large-sample theory is developed. Finally, we use our methods to analyze data from a randomized trial of asthma prevention in young children.</p>
]]></description>
<dc:creator><![CDATA[Clement, D. Y., Strawderman, R. L.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp004</dc:identifier>
<dc:title><![CDATA[Conditional GEE for recurrent event gap times]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>467</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>451</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/468?rss=1">
<title><![CDATA[Estimating equation-based causality analysis with application to microarray time series data]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/468?rss=1</link>
<description><![CDATA[
<p>Microarray time-course data can be used to explore interactions among genes and infer gene network. The crucial step in constructing gene network is to develop an appropriate causality test. In this regard, the expression profile of each gene can be treated as a time series. A typical existing method establishes the Granger causality based on Wald type of test, which relies on the homoscedastic normality assumption of the data distribution. However, this assumption can be seriously violated in real microarray experiments and thus may lead to inconsistent test results and false scientific conclusions. To overcome the drawback, we propose an estimating equation&ndash;based method which is robust to both heteroscedasticity and nonnormality of the gene expression data. In fact, it only requires the residuals to be uncorrelated. We will use simulation studies and a real-data example to demonstrate the applicability of the proposed method.</p>
]]></description>
<dc:creator><![CDATA[Hu, J., Hu, F.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp005</dc:identifier>
<dc:title><![CDATA[Estimating equation-based causality analysis with application to microarray time series data]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>480</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>468</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/481?rss=1">
<title><![CDATA[An insight into high-resolution mass-spectrometry data]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/481?rss=1</link>
<description><![CDATA[
<p>Mass spectrometry is a powerful tool with much promise in global proteomic studies. The discipline of statistics offers robust methodologies to extract and interpret high-dimensional mass-spectrometry data and will be a valuable contributor to the field. Here, we describe the process by which data are produced, characteristics of the data, and the analytical preprocessing steps that are taken in order to interpret the data and use it in downstream statistical analyses. Because of the complexity of data acquisition, statistical methods developed for gene expression microarray data are not directly applicable to proteomic data. Areas in need of statistical research for proteomic data include alignment, experimental design, abundance normalization, and statistical analysis.</p>
]]></description>
<dc:creator><![CDATA[Eckel-passow, J. E., Oberg, A. L., Therneau, T. M., Bergen, H. R.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp006</dc:identifier>
<dc:title><![CDATA[An insight into high-resolution mass-spectrometry data]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>500</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>481</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/501?rss=1">
<title><![CDATA[Frailty modeling of bimodal age-incidence curves of nasopharyngeal carcinoma in low-risk populations]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/501?rss=1</link>
<description><![CDATA[
<p>The incidence of nasopharyngeal carcinoma (NPC) varies widely according to age at diagnosis, geographic location, and ethnic background. On a global scale, NPC incidence is common among specific populations primarily living in southern and eastern Asia and northern Africa, but in most areas, including almost all western countries, it remains a relatively uncommon malignancy. Specific to these low-risk populations is a general observation of possible bimodality in the observed age-incidence curves. We have developed a multiplicative frailty model that allows for the demonstrated points of inflection at ages 15&ndash;24 and 65&ndash;74. The bimodal frailty model has 2 independent compound Poisson-distributed frailties and gives a significant improvement in fit over a unimodal frailty model. Applying the model to population-based cancer registry data worldwide, 2 biologically relevant estimates are derived, namely the proportion of susceptible individuals and the number of genetic and epigenetic events required for the tumor to develop. The results are critically compared and discussed in the context of existing knowledge of the epidemiology and pathogenesis of NPC.</p>
]]></description>
<dc:creator><![CDATA[Haugen, M., Bray, F., Grotmol, T., Tretli, S., Aalen, O. O., Moger, T. A.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp007</dc:identifier>
<dc:title><![CDATA[Frailty modeling of bimodal age-incidence curves of nasopharyngeal carcinoma in low-risk populations]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>514</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>501</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/515?rss=1">
<title><![CDATA[A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/515?rss=1</link>
<description><![CDATA[
<p>We present a penalized matrix decomposition (PMD), a new framework for computing a rank-<I>K</I> approximation for a matrix. We approximate the matrix <b>X</b> as <f><inline-fig>
<link locator="biostskxp008fx1_ht"></inline-fig></f>, where <I>d</I><SUB><I>k</I></SUB>, <b>u</b><SUB><I>k</I></SUB>, and <b>v</b><SUB><I>k</I></SUB> minimize the squared Frobenius norm of <b>X</b><f><inline-fig>
<link locator="biostskxp008fx2_ht"></inline-fig></f>, subject to penalties on <b>u</b><SUB><I>k</I></SUB> and <b>v</b><SUB><I>k</I></SUB>. This results in a regularized version of the singular value decomposition. Of particular interest is the use of <I>L</I><SUB>1</SUB>-penalties on <b>u</b><SUB><I>k</I></SUB> and <b>v</b><SUB><I>k</I></SUB>, which yields a decomposition of <b>X</b> using sparse vectors. We show that when the PMD is applied using an <I>L</I><SUB>1</SUB>-penalty on <b>v</b><SUB><I>k</I></SUB> but not on <b>u</b><SUB><I>k</I></SUB>, a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (<cross-ref type="bib" refid="bib11">Jolliffe <I>and others</I> 2003</cross-ref>) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of <cross-ref type="bib" refid="bib32">Zou <I>and others</I> (2006)</cross-ref>. In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.</p>
]]></description>
<dc:creator><![CDATA[Witten, D. M., Tibshirani, R., Hastie, T.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp008</dc:identifier>
<dc:title><![CDATA[A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>534</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>515</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/535?rss=1">
<title><![CDATA[Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/535?rss=1</link>
<description><![CDATA[
<p>Prostate-specific antigen (PSA) is a biomarker routinely and repeatedly measured on prostate cancer patients treated by radiation therapy (RT). It was shown recently that its whole pattern over time rather than just its current level was strongly associated with prostate cancer recurrence. To more accurately guide clinical decision making, monitoring of PSA after RT would be aided by dynamic powerful prognostic tools that incorporate the complete posttreatment PSA evolution. In this work, we propose a dynamic prognostic tool derived from a joint latent class model and provide a measure of variability obtained from the parameters asymptotic distribution. To validate this prognostic tool, we consider predictive accuracy measures and provide an empirical estimate of their variability. We also show how to use them in the longitudinal context to compare the dynamic prognostic tool we developed with a proportional hazard model including either baseline covariates or baseline covariates and the expected level of PSA at the time of prediction in a landmark model. Using data from 3 large cohorts of patients treated after the diagnosis of prostate cancer, we show that the dynamic prognostic tool based on the joint model reduces the error of prediction and offers a powerful tool for individual prediction.</p>
]]></description>
<dc:creator><![CDATA[Proust-Lima, C., Taylor, J. M. G.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp009</dc:identifier>
<dc:title><![CDATA[Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>549</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>535</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/550?rss=1">
<title><![CDATA[Testing the prediction error difference between 2 predictors]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/550?rss=1</link>
<description><![CDATA[
<p>We develop an inference framework for the difference in errors between 2 prediction procedures. The 2 procedures may differ in any aspect and possibly utilize different sets of covariates. We apply training and testing on the same data set, which is accommodated by sample splitting. For each split, both procedures predict the response of the same samples, which results in paired residuals to which a signed-rank test is applied. Multiple splits result in multiple <I>p</I>-values. The median <I>p</I>-value and the mean inverse normal transformed <I>p</I>-value are proposed as summary (test) statistics, for which bounds on the overall type I error rate under a variety of assumptions are proven. A simulation study is performed to check type I error control of the least conservative bound. Moreover, it confirms superior power of our method with respect to a one-split approach. Our inference framework is applied to genomic survival data sets to study 2 issues: compare lasso and ridge regression and decide upon use of both methylation and gene expression markers or the latter only. The framework easily accommodates any prediction paradigm and allows comparing any 2, possibly nonmodel-based, prediction procedures.</p>
]]></description>
<dc:creator><![CDATA[van de Wiel, M. A., Berkhof, J., van Wieringen, W. N.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp011</dc:identifier>
<dc:title><![CDATA[Testing the prediction error difference between 2 predictors]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>560</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>550</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/561?rss=1">
<title><![CDATA[Optimal designs for 2-color microarray experiments]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/561?rss=1</link>
<description><![CDATA[
<p>Statisticians can play a crucial role in the design of gene expression studies to ensure the most effective allocation of available resources. This paper considers Pareto optimal designs for gene expression studies involving 2-color microarrays. Pareto optimality enables the recommendation of designs that are particularly efficient for the effects of most interest to biologists. This is relevant in the microarray context where analysis is typically carried out separately for those effects. Our approach will allow for effects of interest that correspond to contrasts rather than solely considering parameters of the linear model. We further develop the approach to cater for additional experimental considerations such as contrasts that are of equal scientific interest. This amounts to partitioning all relevant contrasts into subsets of effects that are of equal importance. Based on the partitions, a penalty is employed in order to recommend designs for complex and varied microarray experiments. Finally, we address the issue of gene-specific dye bias. We illustrate using studies of leukemia and breast cancer.</p>
]]></description>
<dc:creator><![CDATA[Sanchez, P. S., Glonek, G. F. V.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp012</dc:identifier>
<dc:title><![CDATA[Optimal designs for 2-color microarray experiments]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>574</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>561</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/575?rss=1">
<title><![CDATA[Joint analysis of prevalence and incidence data using conditional likelihood]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/575?rss=1</link>
<description><![CDATA[
<p>Disease prevalence is the combined result of duration, disease incidence, case fatality, and other mortality. If information is available on all these factors, and on fixed covariates such as genotypes, prevalence information can be utilized in the estimation of the effects of the covariates on disease incidence. Study cohorts that are recruited as cross-sectional samples and subsequently followed up for disease events of interest produce both prevalence and incidence information. In this paper, we make use of both types of information using a likelihood, which is conditioned on survival until the cross section. In a simulation study making use of real cohort data, we compare the proposed conditional likelihood method to a standard analysis where prevalent cases are omitted and the likelihood expression is conditioned on healthy status at the cross section.</p>
]]></description>
<dc:creator><![CDATA[Saarela, O., Kulathinal, S., Karvanen, J.]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp013</dc:identifier>
<dc:title><![CDATA[Joint analysis of prevalence and incidence data using conditional likelihood]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>587</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>575</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/588?rss=1">
<title><![CDATA[Biostatistics - Referees of Manuscripts Submitted Mid-2007 to Mid-2008]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/3/588?rss=1</link>
<description><![CDATA[]]></description>
<dc:creator><![CDATA[]]></dc:creator>
<dc:date>2009-06-16</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxp017</dc:identifier>
<dc:title><![CDATA[Biostatistics - Referees of Manuscripts Submitted Mid-2007 to Mid-2008]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>3</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>589</prism:endingPage>
<prism:publicationDate>2009-07-01</prism:publicationDate>
<prism:startingPage>588</prism:startingPage>
<prism:section>Referees</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/205?rss=1">
<title><![CDATA[Generalized linear models with unspecified reference distribution]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/205?rss=1</link>
<description><![CDATA[
<p>We propose a new class of semiparametric generalized linear models. As with existing models, these models are specified via a linear predictor and a link function for the mean of response <I>Y</I> as a function of predictors <I>X</I>. Here, however, the "baseline" distribution of <I>Y</I> at a given reference mean &micro;<SUB>0</SUB> is left unspecified and is estimated from the data. The response distribution when the mean differs from &micro;<SUB>0</SUB> is then generated via exponential tilting of the baseline distribution, yielding a response model that is a natural exponential family, with corresponding canonical link and variance functions. The resulting model has a level of flexibility similar to the popular proportional odds model. Maximum likelihood estimation is developed for response distributions with finite support, and the new model is studied and illustrated through simulations and example analyses from aging research.</p>
]]></description>
<dc:creator><![CDATA[Rathouz, P. J., Gao, L.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn030</dc:identifier>
<dc:title><![CDATA[Generalized linear models with unspecified reference distribution]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>218</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>205</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/219?rss=1">
<title><![CDATA[Modified test statistics by inter-voxel variance shrinkage with an application to f MRI]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/219?rss=1</link>
<description><![CDATA[
<p>Functional magnetic resonance imaging (f MRI) is a noninvasive technique which is commonly used to quantify changes in blood oxygenation and flow coupled to neuronal activation. One of the primary goals of f MRI studies is to identify localized brain regions where neuronal activation levels vary between groups. Single voxel <I>t</I>-tests have been commonly used to determine whether activation related to the protocol differs across groups. Due to the generally limited number of subjects within each study, accurate estimation of variance at each voxel is difficult. Thus, combining information across voxels is desirable in order to improve efficiency. Here, we construct a hierarchical model and apply an empirical Bayesian framework for the analysis of group f MRI data, employing techniques used in high-throughput genomic studies. The key idea is to shrink residual variances by combining information across voxels and subsequently to construct an improved test statistic. This hierarchical model results in a shrinkage of voxel-wise residual sample variances toward a common value. The shrunken estimator for voxel-specific variance components on the group analyses outperforms the classical residual error estimator in terms of mean-squared error. Moreover, the shrunken test statistic decreases false-positive rates when testing differences in brain contrast maps across a wide range of simulation studies. This methodology was also applied to experimental data regarding a cognitive activation task.</p>
]]></description>
<dc:creator><![CDATA[Su, S.-C., Caffo, B., Garrett-Mayer, E., Bassett, S. S.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn028</dc:identifier>
<dc:title><![CDATA[Modified test statistics by inter-voxel variance shrinkage with an application to f MRI]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>227</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>219</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/228?rss=1">
<title><![CDATA[Biomarker evaluation and comparison using the controls as a reference population]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/228?rss=1</link>
<description><![CDATA[
<p>The classification accuracy of a continuous marker is typically evaluated with the receiver operating characteristic (ROC) curve. In this paper, we study an alternative conceptual framework, the "percentile value." In this framework, the controls only provide a reference distribution to standardize the marker. The analysis proceeds by analyzing the standardized marker in cases. The approach is shown to be equivalent to ROC analysis. Advantages are that it provides a framework familiar to a broad spectrum of biostatisticians and it opens up avenues for new statistical techniques in biomarker evaluation. We develop several new procedures based on this framework for comparing biomarkers and biomarker performance in different populations. We develop methods that adjust such comparisons for covariates. The methods are illustrated on data from 2 cancer biomarker studies.</p>
]]></description>
<dc:creator><![CDATA[Huang, Y., Pepe, M. S.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn029</dc:identifier>
<dc:title><![CDATA[Biomarker evaluation and comparison using the controls as a reference population]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>244</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>228</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/245?rss=1">
<title><![CDATA[A new serially correlated gamma-frailty process for longitudinal count data]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/245?rss=1</link>
<description><![CDATA[
<p>We describe a new multivariate gamma distribution and discuss its implication in a Poisson-correlated gamma-frailty model. This model is introduced to account for between-subjects correlation occurring in longitudinal count data. For likelihood-based inference involving distributions in which high-dimensional dependencies are present, it may be useful to approximate likelihoods based on the univariate or bivariate marginal distributions. The merit of composite likelihood is to reduce the computational complexity of the full likelihood. A 2-stage composite-likelihood procedure is developed for estimating the model parameters. The suggested method is applied to a meta-analysis study for survival curves.</p>
]]></description>
<dc:creator><![CDATA[Fiocco, M., Putter, H., Van Houwelingen, J.C.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn031</dc:identifier>
<dc:title><![CDATA[A new serially correlated gamma-frailty process for longitudinal count data]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>257</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>245</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/258?rss=1">
<title><![CDATA[Measurement error caused by spatial misalignment in environmental epidemiology]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/258?rss=1</link>
<description><![CDATA[
<p>In many environmental epidemiology studies, the locations and/or times of exposure measurements and health assessments do not match. In such settings, health effects analyses often use the predictions from an exposure model as a covariate in a regression model. Such exposure predictions contain some measurement error as the predicted values do not equal the true exposures. We provide a framework for spatial measurement error modeling, showing that smoothing induces a Berkson-type measurement error with nondiagonal error structure. From this viewpoint, we review the existing approaches to estimation in a linear regression health model, including direct use of the spatial predictions and exposure simulation, and explore some modified approaches, including Bayesian models and out-of-sample regression calibration, motivated by measurement error principles. We then extend this work to the generalized linear model framework for health outcomes. Based on analytical considerations and simulation results, we compare the performance of all these approaches under several spatial models for exposure. Our comparisons underscore several important points. First, exposure simulation can perform very poorly under certain realistic scenarios. Second, the relative performance of the different methods depends on the nature of the underlying exposure surface. Third, traditional measurement error concepts can help to explain the relative practical performance of the different methods. We apply the methods to data on the association between levels of particulate matter and birth weight in the greater Boston area.</p>
]]></description>
<dc:creator><![CDATA[Gryparis, A., Paciorek, C. J., Zeka, A., Schwartz, J., Coull, B. A.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn033</dc:identifier>
<dc:title><![CDATA[Measurement error caused by spatial misalignment in environmental epidemiology]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>274</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>258</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/275?rss=1">
<title><![CDATA[Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 x 2 tables with all available data but without artificial continuity correction]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/275?rss=1</link>
<description><![CDATA[
<p>Recently, meta-analysis has been widely utilized to combine information across comparative clinical studies for evaluating drug efficacy or safety profile. When dealing with rather rare events, a substantial proportion of studies may not have any events of interest. Conventional methods either exclude such studies or add an arbitrary positive value to each cell of the corresponding 2<FONT FACE="arial,helvetica">x</FONT>2 tables in the analysis. In this article, we present a simple, effective procedure to make valid inferences about the parameter of interest with all available data without artificial continuity corrections. We then use the procedure to analyze the data from 48 comparative trials involving rosiglitazone with respect to its possible cardiovascular toxicity.</p>
]]></description>
<dc:creator><![CDATA[Tian, L., Cai, T., Pfeffer, M. A., Piankov, N., Cremieux, P.-Y., Wei, L. J.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn034</dc:identifier>
<dc:title><![CDATA[Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 x 2 tables with all available data but without artificial continuity correction]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>281</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>275</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/282?rss=1">
<title><![CDATA[A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/282?rss=1</link>
<description><![CDATA[
<p>Constructing a confidence interval for the actual, conditional error rate of a prediction rule from multivariate data is problematic because this error rate is not a population parameter in the traditional sense&mdash;it is a functional of the training set. When the training set changes, so does this "parameter." A valid method for constructing confidence intervals for the actual error rate had been previously developed by McLachlan. However, McLachlan's method cannot be applied in many cancer research settings because it requires the number of samples to be much larger than the number of dimensions (<I>n</I> &gt;&gt; <I>p</I>), and it assumes that no dimension-reducing feature selection step is performed. Here, an alternative to McLachlan's method is presented that can be applied when <I>p</I> &gt;&gt; <I>n</I>, with an additional adjustment in the presence of feature selection. Coverage probabilities of the new method are shown to be nominal or conservative over a wide range of scenarios. The new method is relatively simple to implement and not computationally burdensome.</p>
]]></description>
<dc:creator><![CDATA[Dobbin, K. K.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn035</dc:identifier>
<dc:title><![CDATA[A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>296</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>282</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/297?rss=1">
<title><![CDATA[Optimal multistage designs--a general framework for efficient genome-wide association studies]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/297?rss=1</link>
<description><![CDATA[
<p>Genome-wide association studies (GWAS) have become increasingly affordable but they are still costly. Therefore, cost saving 2-stage designs were proposed in the literature. The restriction to 2 stages, however, seems artificial and does not exploit the full potential of the underlying methods. We extend the 2-stage approach to the general framework of any number of stages. Based on the theory of group sequential methods, we derive optimal multistage designs. With current genotyping cost structures, our results suggest that up to 4 stages are sufficient in order to get feasible and efficient designs. Furthermore, we consider the problem of choosing the optimal number of stages depending on the costs of the statistical interim analysis at each stage and provide guidelines for planning the number of stages in practice. In particular, we found that in the majority of cases both 3-stage designs and 4-stage designs are more efficient than 2-stage designs. Although prices for marker panels are showing a continuing downward trend, we still recommend implementing and using optimal multistage designs in practice. In addition to the immediate benefit, it will be necessary to acquire know-how regarding the application of multistage designs in order to be able to adapt the general framework of multistage designs to upcoming technologies in the area of GWAS.</p>
]]></description>
<dc:creator><![CDATA[Pahl, R., Schafer, H., Muller, H.-H.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn036</dc:identifier>
<dc:title><![CDATA[Optimal multistage designs--a general framework for efficient genome-wide association studies]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>309</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>297</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/310?rss=1">
<title><![CDATA[Statistical monitoring of clinical trials with multivariate response and/or multiple arms: a flexible approach]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/310?rss=1</link>
<description><![CDATA[
<p>Randomized clinical trials with a multivariate response and/or multiple treatment arms are increasingly common, in part because of their efficiency and a greater concern about balancing risks with benefits. In some trials, the specific types and magnitudes of treatment group differences that would warrant early termination cannot easily be specified prior to the onset of the trial and/or could change as the trial progresses. This underscores the need for more flexible monitoring methods than traditional approaches. This paper extends the repeated confidence bands approach for interim monitoring to more general settings where there can be a multivariate response and/or multiple treatment arms and where the metrics for comparing treatment groups can change during the conduct of the trial. We illustrate the approach using the results of a recent AIDS clinical trial and examine its efficiency and robustness via simulation.</p>
]]></description>
<dc:creator><![CDATA[Zhao, L., Hu, X. J., Lagakos, S. W.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn037</dc:identifier>
<dc:title><![CDATA[Statistical monitoring of clinical trials with multivariate response and/or multiple arms: a flexible approach]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>323</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>310</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/324?rss=1">
<title><![CDATA[Optimal 2-stage design with given power in association studies]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/324?rss=1</link>
<description><![CDATA[]]></description>
<dc:creator><![CDATA[Wang, J., Liang, H., Zou, G.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn038</dc:identifier>
<dc:title><![CDATA[Optimal 2-stage design with given power in association studies]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>326</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>324</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/327?rss=1">
<title><![CDATA[Statistical independence of the colocalized association signals for type 1 diabetes and RPS26 gene expression on chromosome 12q13]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/327?rss=1</link>
<description><![CDATA[
<p>Following the recent success of genome-wide association studies in uncovering disease-associated genetic variants, the next challenge is to understand how these variants affect downstream pathways. The most proximal trait to a disease-associated variant, most commonly a single nucleotide polymorphism (SNP), is differential gene expression due to the <I>cis</I> effect of SNP alleles on transcription, translation, and/or splicing gene expression quantitative trait loci (eQTL). Several genome-wide SNP&ndash;gene expression association studies have already provided convincing evidence of widespread association of eQTLs. As a consequence, some eQTL associations are found in the same genomic region as a disease variant, either as a coincidence or a causal relationship. Cis-regulation of <I>RPS26</I> gene expression and a type 1 diabetes (T1D) susceptibility locus have been colocalized to the 12q13 genomic region. A recent study has also suggested <I>RPS26</I> as the most likely susceptibility gene for T1D in this genomic region. However, it is still not clear whether this colocalization is the result of chance alone or if <I>RPS26</I> expression is directly correlated with T1D susceptibility, and therefore, potentially causal. Here, we derive and apply a statistical test of this hypothesis. We conclude that <I>RPS26</I> expression is unlikely to be the molecular trait responsible for T1D susceptibility at this locus, at least not in a direct, linear connection.</p>
]]></description>
<dc:creator><![CDATA[Plagnol, V., Smyth, D. J., Todd, J. A., Clayton, D. G.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn039</dc:identifier>
<dc:title><![CDATA[Statistical independence of the colocalized association signals for type 1 diabetes and RPS26 gene expression on chromosome 12q13]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>334</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>327</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/335?rss=1">
<title><![CDATA[Bayesian graphical models for regression on multiple data sets with different variables]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/335?rss=1</link>
<description><![CDATA[
<p>Routinely collected administrative data sets, such as national registers, aim to collect information on a limited number of variables for the whole population. In contrast, survey and cohort studies contain more detailed data from a sample of the population. This paper describes Bayesian graphical models for fitting a common regression model to a combination of data sets with different sets of covariates. The methods are applied to a study of low birth weight and air pollution in England and Wales using a combination of register, survey, and small-area aggregate data. We discuss issues such as multiple imputation of confounding variables missing in one data set, survey selection bias, and appropriate propagation of information between model components. From the register data, there appears to be an association between low birth weight and environmental exposure to NO<SUB>2</SUB>, but after adjusting for confounding by ethnicity and maternal smoking by combining the register and survey data under our models, we find there is no significant association. However, NO<SUB>2</SUB> was associated with a small but significant reduction in birth weight, modeled as a continuous variable.</p>
]]></description>
<dc:creator><![CDATA[Jackson, C. H., Best, N. G., Richardson, S.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn041</dc:identifier>
<dc:title><![CDATA[Bayesian graphical models for regression on multiple data sets with different variables]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>351</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>335</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/352?rss=1">
<title><![CDATA[Microarray background correction: maximum likelihood estimation for the normal-exponential convolution]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/352?rss=1</link>
<description><![CDATA[
<p>Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixel intensities as the sum of 2 random variables, one normally distributed and the other exponentially distributed, representing background noise and signal, respectively. Using a saddle-point approximation, Ritchie <I>and others</I> (2007) found normexp to be the best background correction method for 2-color microarray data. This article develops the normexp method further by improving the estimation of the parameters. A complete mathematical development is given of the normexp model and the associated saddle-point approximation. Some subtle numerical programming issues are solved which caused the original normexp method to fail occasionally when applied to unusual data sets. A practical and reliable algorithm is developed for exact maximum likelihood estimation (MLE) using high-quality optimization software and using the saddle-point estimates as starting values. "MLE" is shown to outperform heuristic estimators proposed by other authors, both in terms of estimation accuracy and in terms of performance on real data. The saddle-point approximation is an adequate replacement in most practical situations. The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.</p>
]]></description>
<dc:creator><![CDATA[Silver, J. D., Ritchie, M. E., Smyth, G. K.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn042</dc:identifier>
<dc:title><![CDATA[Microarray background correction: maximum likelihood estimation for the normal-exponential convolution]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>363</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>352</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/364?rss=1">
<title><![CDATA[A robust method for finely stratified familial studies with proband-based sampling]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/364?rss=1</link>
<description><![CDATA[
<p>This paper presents a robust method to conduct inference in finely stratified familial studies under proband-based sampling. We assume that the interest is in both the marginal effects of subject-specific covariates on a binary response and the familial aggregation of the response, as quantified by intrafamilial pairwise odds ratios. We adopt an estimating function for proband-based family studies originally developed by <cross-ref type="bib" refid="bib15">Zhao <I>and others</I> (1998)</cross-ref> in the context of an unstratified design and treat the stratification effects as fixed nuisance parameters. Our method requires modeling only the first 2 joint moments of the observations and reduces by 2 orders of magnitude the bias induced by fitting the stratum-specific nuisance parameters. An analytical standard error estimator for the proposed estimator is also provided. The proposed approach is applied to a matched case&ndash;control familial study of sleep apnea. A simulation study confirms the usefulness of the approach.</p>
]]></description>
<dc:creator><![CDATA[Wang, M., Hanfelt, J. J.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn043</dc:identifier>
<dc:title><![CDATA[A robust method for finely stratified familial studies with proband-based sampling]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>373</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>364</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/374?rss=1">
<title><![CDATA[Bias in 2-part mixed models for longitudinal semicontinuous data]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/374?rss=1</link>
<description><![CDATA[
<p>Semicontinuous data in the form of a mixture of zeros and continuously distributed positive values frequently arise in biomedical research. Two-part mixed models with correlated random effects are an attractive approach to characterize the complex structure of longitudinal semicontinuous data. In practice, however, an independence assumption about random effects in these models may often be made for convenience and computational feasibility. In this article, we show that bias can be induced for regression coefficients when random effects are truly correlated but misspecified as independent in a 2-part mixed model. Paralleling work on bias under nonignorable missingness within a shared parameter model, we derive and investigate the asymptotic bias in selected settings for misspecified 2-part mixed models. The performance of these models in practice is further evaluated using Monte Carlo simulations. Additionally, the potential bias is investigated when artificial zeros, due to left censoring from some detection or measuring limit, are incorporated. To illustrate, we fit different 2-part mixed models to the data from the University of Toronto Psoriatic Arthritis Clinic, the aim being to examine whether there are differential effects of disease activity and damage on physical functioning as measured by the health assessment questionnaire scores over the course of psoriatic arthritis. Some practical issues on variance component estimation revealed through this data analysis are considered.</p>
]]></description>
<dc:creator><![CDATA[Su, L., Tom, B. D. M., Farewell, V. T.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn044</dc:identifier>
<dc:title><![CDATA[Bias in 2-part mixed models for longitudinal semicontinuous data]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>389</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>374</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/390?rss=1">
<title><![CDATA[A Bayesian model for evaluating influenza antiviral efficacy in household studies with asymptomatic infections]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/2/390?rss=1</link>
<description><![CDATA[
<p>Antiviral agents are an important component in mitigation/containment strategies for pandemic influenza. However, most research for mitigation/containment strategies relies on the antiviral efficacies evaluated from limited data of clinical trials. Which efficacy measures can be reliably estimated from these studies depends on the trial design, the size of the epidemics, and the statistical methods. We propose a Bayesian framework for modeling the influenza transmission dynamics within households. This Bayesian framework takes into account asymptomatic infections and is able to estimate efficacies with respect to protecting against viral infection, infection with clinical disease, and pathogenicity (the probability of disease given infection). We use the method to reanalyze 2 clinical studies of oseltamivir, an influenza antiviral agent, and compare the results with previous analyses. We found significant prophylactic efficacies in reducing the risk of viral infection and infection with disease but no prophylactic efficacy in reducing pathogenicity. We also found significant therapeutic efficacies in reducing pathogenicity and the risk of infection with disease but no therapeutic efficacy in reducing the risk of viral infection in the contacts.</p>
]]></description>
<dc:creator><![CDATA[Yang, Y., Halloran, M. E., Longini, I. M.]]></dc:creator>
<dc:date>2009-02-27</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn045</dc:identifier>
<dc:title><![CDATA[A Bayesian model for evaluating influenza antiviral efficacy in household studies with asymptomatic infections]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>2</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>403</prism:endingPage>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:startingPage>390</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/1?rss=1">
<title><![CDATA[Effective communication of standard errors and confidence intervals]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/1?rss=1</link>
<description><![CDATA[]]></description>
<dc:creator><![CDATA[Louis, T. A., Zeger, S. L.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn014</dc:identifier>
<dc:title><![CDATA[Effective communication of standard errors and confidence intervals]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>2</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>1</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/3?rss=1">
<title><![CDATA[Case series analysis for censored, perturbed, or curtailed post-event exposures]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/3?rss=1</link>
<description><![CDATA[
<p>A new method is developed for analyzing case series data in situations where occurrence of the event censors, curtails, or otherwise affects post-event exposures. Unbiased estimating equations derived from the self-controlled case series model are adapted to allow for exposures whose occurrence or observation is influenced by the event. The method applies to transient point exposures and rare nonrecurrent events. Asymptotic efficiency is studied in some special cases. A computational scheme based on a pseudo-likelihood is proposed to make the computations feasible in complex models. Simulations, a validation study, and 2 applications are described.</p>
]]></description>
<dc:creator><![CDATA[Farrington, C. P., Whitaker, H. J., Hocine, M. N.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn013</dc:identifier>
<dc:title><![CDATA[Case series analysis for censored, perturbed, or curtailed post-event exposures]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>16</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>3</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/17?rss=1">
<title><![CDATA[Adjusting for selection bias in retrospective, case-control studies]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/17?rss=1</link>
<description><![CDATA[
<p>Retrospective case&ndash;control studies are more susceptible to selection bias than other epidemiologic studies as by design they require that both cases and controls are representative of the same population. However, as cases and control recruitment processes are often different, it is not always obvious that the necessary exchangeability conditions hold. Selection bias typically arises when the selection criteria are associated with the risk factor under investigation. We develop a method which produces bias-adjusted estimates for the odds ratio. Our method hinges on 2 conditions. The first is that a variable that separates the risk factor from the selection criteria can be identified. This is termed the "bias breaking" variable. The second condition is that data can be found such that a bias-corrected estimate of the distribution of the bias breaking variable can be obtained. We show by means of a set of examples that such bias breaking variables are not uncommon in epidemiologic settings. We demonstrate using simulations that the estimates of the odds ratios produced by our method are consistently closer to the true odds ratio than standard odds ratio estimates using logistic regression. Further, by applying it to a case&ndash;control study, we show that our method can help to determine whether selection bias is present and thus confirm the validity of study conclusions when no evidence of selection bias can be found.</p>
]]></description>
<dc:creator><![CDATA[Geneletti, S., Richardson, S., Best, N.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn010</dc:identifier>
<dc:title><![CDATA[Adjusting for selection bias in retrospective, case-control studies]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>31</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>17</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/32?rss=1">
<title><![CDATA[Time-synchronized clustering of gene expression trajectories]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/32?rss=1</link>
<description><![CDATA[
<p>Current clustering methods are routinely applied to gene expression time course data to find genes with similar activation patterns and ultimately to understand the dynamics of biological processes. As the dynamic unfolding of a biological process often involves the activation of genes at different rates, successful clustering in this context requires dealing with varying time and shape patterns simultaneously. This motivates the combination of a novel pairwise warping with a suitable clustering method to discover expression shape clusters. We develop a novel clustering method that combines an initial pairwise curve alignment to adjust for time variation within likely clusters. The cluster-specific time synchronization method shows excellent performance over standard clustering methods in terms of cluster quality measures in simulations and for yeast and human fibroblast data sets. In the yeast example, the discovered clusters have high concordance with the known biological processes.</p>
]]></description>
<dc:creator><![CDATA[Tang, R., Muller, H.-G.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn011</dc:identifier>
<dc:title><![CDATA[Time-synchronized clustering of gene expression trajectories]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>45</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>32</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/46?rss=1">
<title><![CDATA[Marginal structural models for partial exposure regimes]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/46?rss=1</link>
<description><![CDATA[
<p>Intensive care unit (ICU) patients are highly susceptible to hospital-acquired infections due to their poor health and many invasive therapeutic treatments. The effect on mortality of acquiring such infections is, however, poorly understood. Our goal is to quantify this using data from the National Surveillance Study of Nosocomial Infections in ICUs (Belgium). This is challenging because of the presence of time-dependent confounders, such as mechanical ventilation, which lie on the causal path from infection to mortality. Standard statistical analyses may be severely misleading in such settings and have shown contradictory results. Inverse probability weighting for marginal structural models may instead be used but is not directly applicable because these models parameterize the effect of acquiring infection on a given day in ICU, versus "never" acquiring infection in ICU, and this is ill-defined when ICU discharge precedes that day. Additional complications arise from the informative censoring of the survival time by hospital discharge and the instability of the inverse weighting estimation procedure. We accommodate this by introducing a new class of marginal structural models for so-called partial exposure regimes. These describe the effect on the hazard of death of acquiring infection on a given day <I>s</I>, versus not acquiring infection "up to that day," had patients stayed in the ICU for at least <I>s</I> days.</p>
]]></description>
<dc:creator><![CDATA[Vansteelandt, S., Mertens, K., Suetens, C., Goetghebeur, E.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn012</dc:identifier>
<dc:title><![CDATA[Marginal structural models for partial exposure regimes]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>59</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>46</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/60?rss=1">
<title><![CDATA[Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/60?rss=1</link>
<description><![CDATA[
<p>In most analyses of large-scale genomic data sets, differential expression analysis is typically assessed by testing for differences in the mean of the distributions between 2 groups. A recent finding by Tomlins <I>and others</I> (2005) is of a different type of pattern of differential expression in which a fraction of samples in one group have overexpression relative to samples in the other group. In this work, we describe a general mixture model framework for the assessment of this type of expression, called outlier profile analysis. We start by considering the single-gene situation and establishing results on identifiability. We propose 2 nonparametric estimation procedures that have natural links to familiar multiple testing procedures. We then develop multivariate extensions of this methodology to handle genome-wide measurements. The proposed methodologies are compared using simulation studies as well as data from a prostate cancer gene expression study.</p>
]]></description>
<dc:creator><![CDATA[Ghosh, D., Chinnaiyan, A. M.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn015</dc:identifier>
<dc:title><![CDATA[Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>69</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>60</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/70?rss=1">
<title><![CDATA[Combining data from 2 nested case-control studies of overlapping cohorts to improve efficiency]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/70?rss=1</link>
<description><![CDATA[
<p>Researchers subject to time and budget constraints may conduct small nested case&ndash;control studies with individually matched controls to help optimize statistical power. In this paper, we show how precision can be improved considerably by combining data from a small nested case&ndash;control study with data from a larger nested case&ndash;control study of a different outcome in the same or overlapping cohort. Our approach is based on the inverse probability weighting concept, in which the log-likelihood contribution of each individual observation is weighted by the inverse of its probability of inclusion in either study. We illustrate our approach using simulated data and an application where we combine data sets from 2 nested case&ndash;control studies to investigate risk factors for anorexia nervosa in a cohort of young women in Sweden.</p>
]]></description>
<dc:creator><![CDATA[Salim, A., Hultman, C., Sparen, P., Reilly, M.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn016</dc:identifier>
<dc:title><![CDATA[Combining data from 2 nested case-control studies of overlapping cohorts to improve efficiency]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>79</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>70</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/80?rss=1">
<title><![CDATA[Gene profiling for determining pluripotent genes in a time course microarray experiment]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/80?rss=1</link>
<description><![CDATA[
<p>In microarray experiments, it is often of interest to identify genes which have a prespecified gene expression profile with respect to time. Methods available in the literature are, however, typically not stringent enough in identifying such genes, particularly when the profile requires equivalence of gene expression levels at certain time points. In this paper, the authors introduce a new methodology, called gene profiling, that uses simultaneous differential and equivalent gene expression level testing to rank genes according to a prespecified gene expression profile. Gene profiling treats the vector of true gene expression levels as a linear combination of appropriate vectors, for example, vectors that give the required criteria for the profile. This gene profile model is fitted to the data, and the resulting parameter estimates are summarized in a single test statistic that is then used to rank the genes. The theoretical underpinnings of gene profiling (equivalence testing, intersection&ndash;union tests) are discussed in this paper, and the gene profiling methodology is applied to our motivating stem-cell experiment.</p>
]]></description>
<dc:creator><![CDATA[Tuke, J., Glonek, G. F. V., Solomon, P. J.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn017</dc:identifier>
<dc:title><![CDATA[Gene profiling for determining pluripotent genes in a time course microarray experiment]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>93</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>80</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/94?rss=1">
<title><![CDATA[Sample size for positive and negative predictive value in diagnostic research using case-control designs]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/94?rss=1</link>
<description><![CDATA[
<p>Important properties of diagnostic methods are their sensitivity, specificity, and positive and negative predictive values (PPV and NPV). These methods are typically assessed via case&ndash;control samples, which include one cohort of cases known to have the disease and a second control cohort of disease-free subjects. Such studies give direct estimates of sensitivity and specificity but only indirect estimates of PPV and NPV, which also depend on the disease prevalence in the tested population. The motivating example arises in assay testing, where usage is contemplated in populations with known prevalences. Further instances include biomarker development, where subjects are selected from a population with known prevalence and assessment of PPV and NPV is crucial, and the assessment of diagnostic imaging procedures for rare diseases, where case&ndash;control studies may be the only feasible designs. We develop formulas for optimal allocation of the sample between the case and control cohorts and for computing sample size when the goal of the study is to prove that the test procedure exceeds pre-stated bounds for PPV and/or NPV. Surprisingly, the optimal sampling schemes for many purposes are highly unbalanced, even when information is desired on both PPV and NPV.</p>
]]></description>
<dc:creator><![CDATA[Steinberg, D. M., Fine, J., Chappell, R.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn018</dc:identifier>
<dc:title><![CDATA[Sample size for positive and negative predictive value in diagnostic research using case-control designs]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>105</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>94</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/106?rss=1">
<title><![CDATA[StepBrothers: inferring partially shared ancestries among recombinant viral sequences]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/106?rss=1</link>
<description><![CDATA[
<p>Phylogeneticists have developed several statistical methods to infer recombination among molecular sequences that are evolutionarily related. Of these methods, Markov change-point models currently provide the most coherent framework. Yet, the Markov assumption is faulty in that the inferred relatedness of homologous sequences across regions divided by recombinant events is not independent, particularly for nonrecombinant sequences as they share the same history. To correct this limitation, we introduce a novel random tips (RT) model. The model springs from the idea that a recombinant sequence inherits its characters from an unknown number of ancestral full-length sequences, of which one only observes the incomplete portions. The RT model decomposes recombinant sequences into their ancestral portions and then augments each portion onto the data set as unique partially observed sequences. This data augmentation generates a random number of sequences related to each other through a single inferable tree with the same random number of tips. While intuitively pleasing, this single tree corrects the independence assumptions plaguing previous methods while permitting the detection of recombination. The single tree also allows for inference of the relative times of recombination events and generalizes to incorporate multiple recombinant sequences. This generalization answers important questions with which previous models struggle. For example, we demonstrate that a group of human immunodeficiency type 1 recombinant viruses from Argentina, previously thought to have the same recombinant history, actually consist of 2 groups: one, a clonal expansion of a reference sequence and another that predates the formation of the reference sequence. In another example, we demonstrate that 2 hepatitis B virus recombinant strains share similar splicing locations, suggesting a common descent of the 2 viruses. We implement and run both examples in a software package called StepBrothers, freely available to interested parties.</p>
]]></description>
<dc:creator><![CDATA[Bloomquist, E. W., Dorman, K. S., Suchard, M. A.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn019</dc:identifier>
<dc:title><![CDATA[StepBrothers: inferring partially shared ancestries among recombinant viral sequences]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>120</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>106</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/121?rss=1">
<title><![CDATA[Extension of the SAEM algorithm for nonlinear mixed models with 2 levels of random effects]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/121?rss=1</link>
<description><![CDATA[
<p>This article focuses on parameter estimation of multilevel nonlinear mixed-effects models (MNLMEMs). These models are used to analyze data presenting multiple hierarchical levels of grouping (cluster data, clinical trials with several observation periods, ...). The variability of the individual parameters of the regression function is thus decomposed as a between-subject variability and higher levels of variability (e.g. within-subject variability). We propose maximum likelihood estimates of parameters of those MNLMEMs with 2 levels of random effects, using an extension of the stochastic approximation version of expectation&ndash;maximization (SAEM)&ndash;Monte Carlo Markov chain algorithm. The extended SAEM algorithm is split into an explicit direct expectation&ndash;maximization (EM) algorithm and a stochastic EM part. Compared to the original algorithm, additional sufficient statistics have to be approximated by relying on the conditional distribution of the second level of random effects. This estimation method is evaluated on pharmacokinetic crossover simulated trials, mimicking theophylline concentration data. Results obtained on those data sets with either the SAEM algorithm or the first-order conditional estimates (FOCE) algorithm (implemented in the nlme function of R software) are compared: biases and root mean square errors of almost all the SAEM estimates are smaller than the FOCE ones. Finally, we apply the extended SAEM algorithm to analyze the pharmacokinetic interaction of tenofovir on atazanavir, a novel protease inhibitor, from the Agence Nationale de Recherche sur le Sida 107-Puzzle 2 study. A significant decrease of the area under the curve of atazanavir is found in patients receiving both treatments.</p>
]]></description>
<dc:creator><![CDATA[Panhard, X., Samson, A.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn020</dc:identifier>
<dc:title><![CDATA[Extension of the SAEM algorithm for nonlinear mixed models with 2 levels of random effects]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>135</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>121</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/136?rss=1">
<title><![CDATA[An approach to estimation in relative survival regression]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/136?rss=1</link>
<description><![CDATA[
<p>The goal of relative survival methodology is to compare the survival experience of a cohort with that of the background population. Most often an additive excess hazard model is employed, which assumes that each person's hazard is a sum of 2 components&mdash;the population hazard obtained from life tables and an excess hazard attributable to the specific condition. Usually covariate effects on the excess hazard are assumed to have a proportional hazards structure with parametrically modelled baseline. In this paper, we introduce a new fitting procedure using the expectation&ndash;maximization algorithm, treating the cause of death as missing data. The method requires no assumptions about the baseline excess hazard thus reducing the risk of bias through misspecification. It accommodates the possibility of knowledge of cause of death for some patients, and as a side effect, the method yields an estimate of the ratio between the excess and the population hazard for each subject. More importantly, it estimates the baseline excess hazard flexibly with no additional degrees of freedom spent. Finally, it is a generalization of the Cox model, meaning that all the wealth of options in existing software for the Cox model can be used in relative survival. The method is applied to a data set on survival after myocardial infarction, where it shows how a particular form of the hazard function could be missed using the existing methods.</p>
]]></description>
<dc:creator><![CDATA[Perme, M. P., Henderson, R., Stare, J.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn021</dc:identifier>
<dc:title><![CDATA[An approach to estimation in relative survival regression]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>146</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>136</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/147?rss=1">
<title><![CDATA[Creating unbiased cross-sectional covariate-related reference ranges from serial correlated measurements]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/147?rss=1</link>
<description><![CDATA[
<p>Cross-sectional covariate-related reference ranges are widely used in clinical medicine to put individual observations in the context of population values. Usually, such reference ranges are created from data sets of independent observations. If multiple measurements per individual are available, then ignoring the within-person correlation between repeats will lead to overestimation of centile precision. Furthermore, if abnormal measurements have triggered more frequent assessment, the data set will be biased thus producing biased centiles. Where multiple measures per individual exist, the methods commonly used are either randomly or systematically to select one observation per individual or to model individual trajectories and combine these. The first of these approaches may result in discarding a large proportion of the available data and may itself cause bias and the latter requires the form of the changes within individuals to be characterized. We have developed an approach to the modeling of the median, spread, and skew across individuals using maximum likelihood, which can incorporate correlations between dependent observations. Heavily biased data sets are simulated to illustrate how the methodology can eliminate the biases inherent in the data collection process and produce valid centiles plus estimates of the within-person correlations. The "select one per individual" approach is shown to be liable to bias and to produce less precise centiles. We recommend that the maximum likelihood method incorporating correlations be used with existing data sets. Furthermore, this is a potentially more efficient approach to be considered when planning the future collection of data solely for the purposes of creating cross-sectional covariate-related reference ranges.</p>
]]></description>
<dc:creator><![CDATA[Wade, A., Kurmanavicius, J.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn022</dc:identifier>
<dc:title><![CDATA[Creating unbiased cross-sectional covariate-related reference ranges from serial correlated measurements]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>154</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>147</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/155?rss=1">
<title><![CDATA[Bayesian hierarchically weighted finite mixture models for samples of distributions]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/155?rss=1</link>
<description><![CDATA[
<p>Finite mixtures of Gaussian distributions are known to provide an accurate approximation to any unknown density. Motivated by DNA repair studies in which data are collected for samples of cells from different individuals, we propose a class of hierarchically weighted finite mixture models. The modeling framework incorporates a collection of <I>k</I> Gaussian basis distributions, with the individual-specific response densities expressed as mixtures of these bases. To allow heterogeneity among individuals and predictor effects, we model the mixture weights, while treating the basis distributions as unknown but common to all distributions. This results in a flexible hierarchical model for samples of distributions. We consider analysis of variance&ndash;type structures and a parsimonious latent factor representation, which leads to simplified inferences on non-Gaussian covariance structures. Methods for posterior computation are developed, and the model is used to select genetic predictors of baseline DNA damage, susceptibility to induced damage, and rate of repair.</p>
]]></description>
<dc:creator><![CDATA[Rodriguez, A., Dunson, D. B., Taylor, J.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn024</dc:identifier>
<dc:title><![CDATA[Bayesian hierarchically weighted finite mixture models for samples of distributions]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>171</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>155</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/172?rss=1">
<title><![CDATA[Estimating the capacity for improvement in risk prediction with a marker]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/172?rss=1</link>
<description><![CDATA[
<p>Consider a set of baseline predictors <I>X</I> to predict a binary outcome <I>D</I> and let <I>Y</I> be a novel marker or predictor. This paper is concerned with evaluating the performance of the augmented risk model <I>P</I>(<I>D</I> = 1|<I>Y</I>,<I>X</I>) compared with the baseline model <I>P</I>(<I>D</I> = 1|<I>X</I>). The diagnostic likelihood ratio, DLR<SUB><I>X</I></SUB>(<I>y</I>), quantifies the change in risk obtained with knowledge of <I>Y</I> = <I>y</I> for a subject with baseline risk factors <I>X</I>. The notion is commonly used in clinical medicine to quantify the increment in risk prediction due to <I>Y</I>. It is contrasted here with the notion of covariate-adjusted effect of <I>Y</I> in the augmented risk model. We also propose methods for making inference about DLR<SUB><I>X</I></SUB>(<I>y</I>). Case&ndash;control study designs are accommodated. The methods provide a mechanism to investigate if the predictive information in <I>Y</I> varies with baseline covariates. In addition, we show that when combined with a baseline risk model and information about the population distribution of <I>Y</I> given <I>X</I>, covariate-specific predictiveness curves can be estimated. These curves are useful to an individual in deciding if ascertainment of <I>Y</I> is likely to be informative or not for him. We illustrate with data from 2 studies: one is a study of the performance of hearing screening tests for infants, and the other concerns the value of serum creatinine in diagnosing renal artery stenosis.</p>
]]></description>
<dc:creator><![CDATA[Gu, W., Pepe, M. S.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn025</dc:identifier>
<dc:title><![CDATA[Estimating the capacity for improvement in risk prediction with a marker]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>186</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>172</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/187?rss=1">
<title><![CDATA[Gamma frailty model for linkage analysis with application to interval-censored migraine data]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/187?rss=1</link>
<description><![CDATA[
<p>For many diseases, it seems that the age at onset is genetically influenced. Therefore, the age-at-onset data are often collected in order to map the disease gene(s). The ages are often (right) censored or truncated, and therefore, many standard techniques for linkage analysis cannot be used. In this paper, we present a correlated frailty model for censored survival data of siblings. The model is used for testing heritability for the age at onset and linkage between the loci and the gene(s) that influence(s) the survival time. The model is applied to interval-censored migraine twin data. Heritability (obtained from the frailties rather than actual onset times) was estimated as 0.42; this value was highly significant. The highest lod score, a score of 1.9, was found at the end of chromosome 19.</p>
]]></description>
<dc:creator><![CDATA[Jonker, M. A., Bhulai, S., Boomsma, D. I., Ligthart, R. S. L., Posthuma, D., Van Der Vaart, A. W.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn027</dc:identifier>
<dc:title><![CDATA[Gamma frailty model for linkage analysis with application to interval-censored migraine data]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>200</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>187</prism:startingPage>
<prism:section>Articles</prism:section>
</item>

<item rdf:about="http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/201?rss=1">
<title><![CDATA[Letter to the editor]]></title>
<link>http://biostatistics.oxfordjournals.org/cgi/content/short/10/1/201?rss=1</link>
<description><![CDATA[]]></description>
<dc:creator><![CDATA[Chu, H., Guo, H.]]></dc:creator>
<dc:date>2008-12-12</dc:date>
<dc:identifier>info:doi/10.1093/biostatistics/kxn040</dc:identifier>
<dc:title><![CDATA[Letter to the editor]]></dc:title>
<dc:publisher>Biometrika Trust</dc:publisher>
<prism:number>1</prism:number>
<prism:volume>10</prism:volume>
<prism:endingPage>203</prism:endingPage>
<prism:publicationDate>2009-01-01</prism:publicationDate>
<prism:startingPage>201</prism:startingPage>
<prism:section>Letter</prism:section>
</item>

</rdf:RDF>