TY - JOUR
T1 - Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data
AU - van Hilten, Arno
AU - van Rooij, Jeroen
AU - Heijmans, Bastiaan T.
AU - ’t Hoen, Peter A.C.
AU - Meurs, Joyce van
AU - Jansen, Rick
AU - Franke, Lude
AU - Boomsma, Dorret I.
AU - Pool, René
AU - van Dongen, Jenny
AU - Hottenga, Jouke J.
AU - van Greevenbroek, Marleen M.J.
AU - Stehouwer, Coen D.A.
AU - van der Kallen, Carla J.H.
AU - Schalkwijk, Casper G.
AU - Wijmenga, Cisca
AU - Zhernakova, Sasha
AU - Tigchelaar, Ettje F.
AU - Slagboom, P. Eline
AU - Beekman, Marian
AU - Deelen, Joris
AU - van Heemst, Diana
AU - Veldink, Jan H.
AU - van den Berg, Leonard H.
AU - van Duijn, Cornelia M.
AU - Hofman, Bert A.
AU - Isaacs, Aaron
AU - Uitterlinden, André G.
AU - Jhamai, P. Mila
AU - Verbiest, Michael
AU - Suchiman, H. Eka D.
AU - Verkerk, Marijn
AU - van der Breggen, Ruud
AU - van Rooij, Jeroen
AU - Lakenberg, Nico
AU - Mei, Hailiang
AU - van Iterson, Maarten
AU - van Galen, Michiel
AU - Bot, Jan
AU - van ’t Hof, Peter
AU - Deelen, Patrick
AU - Nooren, Irene
AU - Moed, Matthijs
AU - Vermaat, Martijn
AU - Luijk, René
AU - Jan Bonder, Marc
AU - van Dijk, Freerk
AU - Arindrarto, Wibowo
AU - Kielbasa, Szymon M.
AU - Swertz, Morris A.
AU - BIOS Consortium
N1 - Funding Information:
This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative. Samples used in this study were contributed by the LL, the LLS, the NTR and the RS. We thank all BIOS consortium members and participants of these cohorts for their contributions to this study. This work was funded by the Dutch Technology Foundation (STW) through the 2005 Simon Steven Meester grant 2015 to W.J. Niessen. G.V. Roshchupkin was supported by the ZonMw Veni grant (Veni, 1936320). The RS is supported by the Netherlands Organization for Scientific Research (NWO, 91203014, 175.010.2005.011, 91103012). This research was financially supported by BBMRI-NL, a Research Infrastructure financed by the Dutch government (NWO, numbers 184.021.007 and 184.033.111).
Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12/1
Y1 - 2024/12/1
N2 - Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90–1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05–0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97–6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
AB - Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90–1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05–0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97–6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
U2 - 10.1038/s41540-024-00405-w
DO - 10.1038/s41540-024-00405-w
M3 - Article
SN - 2056-7189
VL - 10
JO - NPJ systems biology and applications
JF - NPJ systems biology and applications
IS - 1
M1 - 81
ER -