TY - JOUR
T1 - Validated inference of smoking habits from blood with a finite DNA methylation marker set
AU - Maas, Silvana C. E.
AU - Vidaki, Athina
AU - Wilson, Rory
AU - Teumer, Alexander
AU - Liu, Fan
AU - van Meurs, Joyce B. J.
AU - Uitterlinden, Andre G.
AU - Boomsma, Dorret I.
AU - de Geus, Eco J. C.
AU - Willemsen, Gonneke
AU - van Dongen, Jenny
AU - van der Kallen, Carla J. H.
AU - Slagboom, P. Eline
AU - Beekman, Marian
AU - van Heemst, Diana
AU - van den Berg, Leonard H.
AU - Duijts, Liesbeth
AU - Jaddoe, Vincent W. V.
AU - Ladwig, Karl-Heinz
AU - Kunze, Sonja
AU - Peters, Annette
AU - Ikram, M. Arfan
AU - Grabe, Hans J.
AU - Felix, Janine F.
AU - Waldenberger, Melanie
AU - Franco, Oscar H.
AU - Ghanbari, Mohsen
AU - Kayser, Manfred
AU - BIOS Consortium
N1 - Funding Information:
The authors are grateful to the participants of the cohorts used: LifeLines (http://lifelines.nl/lifelines-research/general), the Leiden Longevity Study (http://www.leidenlangleven.nl), the Netherlands Twin Registry (http://www.tweelingenregister.org), the Rotterdam studies (http://www.erasmus epidemiology.nl/research/ergo.htm), the CODAM study (http://www.carimmaastricht.nl/), and the PAN study (http://www.alsonderzoek.nl/), the KORA study (https://www.helmholtz muenchen.de/en/kora/index.html), SHIP-Trend (http://www.medizin.uni greifswald.de/cm/fv/ship.html), Generation R (https://www.generationr.nl/). We also thank Dr. Hannah R Elliott for kindly sharing the R script, and Michael Verbiest, Mila Jhamai, Sarah Higgins, Marijn Verkerk and Dr. Lisette Stolk for their help in creating the EWAS database for RS and Generation R Study.
Funding Information:
H.J. Grabe has received funding from Fresenius Medical Care and speaker’s honoraria as well as travel funds from Fresenius Medical Care, Neuraxpharm and Janssen-Cilag. Other than that, the authors declared no conflict of interest.
Funding Information:
This work was performed within the framework of the Biobank-Based Integrative Omics Studies (BIOS) Consortium funded by BBMRI-NL, a research infrastructure financed by the Netherlands Organization for Scientific Research (NWO 184.021.007). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant agreements No. 633595 (DynaHEALTH) and 733206 (LIFECYCLE). SCEM was supported by Netherlands Institute for Health Sciences scholarship. AV and MK were supported by the Erasmus MC University Medical Center Rotterdam. AV was additionally supported with an EUR fellowship by Erasmus University Rotterdam. LD received funding from the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 696295; 2017) co-funded by ERA-Net on Biomarkers for Nutrition and Health (ERA HDHL) and ZonMW The Netherlands (No. 529051014; 2017) (ALPHABET project). VWVJ received funding from the Netherlands Organization for Health Research and Development (VIDI 016.136.361) and a Consolidator Grant from the European Research Council (ERC-2014-CoG-648916). MW has received funding from the European Union Seventh Framework Programme (FP7/2007–2013) under Grant agreements n°603288 (SysVasc) and n°602736 (PAIN-OMICS). The establishment of the RS EWAS data was funded by the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC, and by the Netherlands Organization for Scientific Research (NWO; Project Number 184021007). The Rotterdam Study is funded by Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. The general design of the Generation R Study is made possible by financial support from the Erasmus MC, the Erasmus University Rotterdam, the Netherlands Organization for Health Research and Development, and the Ministry of Health, Welfare and Sport. The generation and management of the Illumina 450 K methylation array data was funded by a grant to VWJ from the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) Netherlands Consortium for Healthy Aging (NCHA; Project No. 050-060-810), by funds from the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC, and by a grant from the National Institute of Child and Human Development (R01HD068437). CODAM was supported by Grants of the Netherlands Organization for Scientific Research (940–35–034) and the Dutch Diabetes Research Foundation (98.901). Funding for the NTR was obtained from the Netherlands Organization for Scientific Research (NWO) and The Netherlands Organisation for Health Research and Development (ZonMW) Grants 904-61-090, 985-10-002, 912-10-020, 904-61-193,480-04-004, 463-06-001, 451-04-034, 400-05-717, Addiction-31160008, 016-115-035, 481-08-011, 056-32-010, Middelgroot-911-09-032, and NWO-Groot 480-15-001/674. The KORA study was initiated and financed by the Helmholtz Zentrum München –German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria. SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research (Grants No. 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania, and the network ‘Greifswald Approach to Individualized Medicine (GANI_MED)’ funded by the Federal Ministry of Education and Research (Grant 03IS2061A). DNA methylation data have been supported by the DZHK (Grant 81X3400104). The University of Greifswald is a member of the Caché Campus program of the InterSystems GmbH. The researchers are independent from the funders. The study sponsors had no role in the study design, data collection, data analysis, interpretation of data, and preparation, review or approval of the manuscript. Acknowledgements
Publisher Copyright:
© 2019, The Author(s).
PY - 2019/11
Y1 - 2019/11
N2 - Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 +/- 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUC(crossvalidation) 0.925 +/- 0.021, AUC(externalvalidation)0.914), former (0.766 +/- 0.023, 0.699) and never smoking (0.830 +/- 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 +/- 0.068, 0.796; 15 pack-years 0.767 +/- 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 +/- 0.024, 0.760; 10 years 0.766 +/- 0.033, 0.764; 15 years 0.767 +/- 0.020, 0.754). Model application to children revealed highly accurate inference of the true non- smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications.
AB - Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 +/- 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUC(crossvalidation) 0.925 +/- 0.021, AUC(externalvalidation)0.914), former (0.766 +/- 0.023, 0.699) and never smoking (0.830 +/- 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 +/- 0.068, 0.796; 15 pack-years 0.767 +/- 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 +/- 0.024, 0.760; 10 years 0.766 +/- 0.033, 0.764; 15 years 0.767 +/- 0.020, 0.754). Model application to children revealed highly accurate inference of the true non- smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications.
KW - Epigenetics
KW - DNA methylation
KW - Smoking inference
KW - Epidemiology
KW - Forensics
KW - EPIGENOME-WIDE ASSOCIATION
KW - SELF-REPORTED SMOKING
KW - CIGARETTE-SMOKING
KW - MATERNAL SMOKING
KW - TOBACCO-SMOKING
KW - EXPOSURE
KW - COTININE
KW - BIOMARKER
KW - NEWBORNS
KW - HYPOMETHYLATION
U2 - 10.1007/s10654-019-00555-w
DO - 10.1007/s10654-019-00555-w
M3 - Article
C2 - 31494793
SN - 0393-2990
VL - 34
SP - 1055
EP - 1074
JO - European Journal of Epidemiology
JF - European Journal of Epidemiology
IS - 11
ER -