Toward improved statistical methods for analyzing Cotinine-Biomarker health association data
- Equal contributors
1 Department of Epidemiology and Public Health at Leonard Miller School of Medicine, University of Miami, Miami, Florida, USA
2 Sylvester Comprehensive Cancer Center at Leonard Miller School of Medicine, University of Miami, Miami, Florida, USA
3 Department of Internal Medicine, Kaiser Permanente, Los Angeles, California, USA
4 European Centre of Environment and Human Health (ECEHH), Peninsula College of Medicine and Dentistry, Truro, Cornwall, UK
Tobacco Induced Diseases 2011, 9:11 doi:10.1186/1617-9625-9-11Published: 3 October 2011
Serum cotinine, a metabolite of nicotine, is frequently used in research as a biomarker of recent tobacco smoke exposure. Historically, secondhand smoke (SHS) research uses suboptimal statistical methods due to censored serum cotinine values, meaning a measurement below the limit of detection (LOD).
We compared commonly used methods for analyzing censored serum cotinine data using parametric and non-parametric techniques employing data from the 1999-2004 National Health and Nutrition Examination Surveys (NHANES). To illustrate the differences in associations obtained by various analytic methods, we compared parameter estimates for the association between cotinine and the inflammatory marker homocysteine using complete case analysis, single and multiple imputation, "reverse" Kaplan-Meier, and logistic regression models.
Parameter estimates and statistical significance varied according to the statistical method used with censored serum cotinine values. Single imputation of censored values with either 0, LOD or LOD/√2 yielded similar estimates and significance; multiple imputation method yielded smaller estimates than the other methods and without statistical significance. Multiple regression modelling using the "reverse" Kaplan-Meier method yielded statistically significant estimates that were larger than those from parametric methods.
Analyses of serum cotinine data with values below the LOD require special attention. "Reverse" Kaplan-Meier was the only method inherently able to deal with censored data with multiple LODs, and may be the most accurate since it avoids data manipulation needed for use with other commonly used statistical methods. Additional research is needed into the identification of optimal statistical methods for analysis of SHS biomarkers subject to a LOD.