TY - JOUR
T1 - Machine learning in Alzheimer's disease genetics
AU - Bracher-Smith, Matthew
AU - Melograna, Federico
AU - Ulm, Brittany
AU - Bellenguez, Céline
AU - Grenier-Boley, Benjamin
AU - Duroux, Diane
AU - Nevado, Alejo J
AU - Holmans, Peter
AU - Tijms, Betty M
AU - Hulsman, Marc
AU - de Rojas, Itziar
AU - Campos-Martin, Rafael
AU - der Lee, Sven van
AU - Castillo, Atahualpa
AU - Küçükali, Fahri
AU - Peters, Oliver
AU - Schneider, Anja
AU - Dichgans, Martin
AU - Rujescu, Dan
AU - Scherbaum, Norbert
AU - Deckert, Jürgen
AU - Riedel-Heller, Steffi
AU - Hausner, Lucrezia
AU - Molina-Porcel, Laura
AU - Düzel, Emrah
AU - Grimmer, Timo
AU - Wiltfang, Jens
AU - Heilmann-Heimbach, Stefanie
AU - Moebus, Susanne
AU - Tegos, Thomas
AU - Scarmeas, Nikolaos
AU - Dols-Icardo, Oriol
AU - Moreno, Fermin
AU - Pérez-Tur, Jordi
AU - Bullido, María J
AU - Pastor, Pau
AU - Sánchez-Valle, Raquel
AU - Álvarez, Victoria
AU - Boada, Mercè
AU - García-González, Pablo
AU - Puerta, Raquel
AU - Mir, Pablo
AU - Real, Luis M
AU - Piñol-Ripoll, Gerard
AU - García-Alberca, Jose María
AU - Rodriguez-Rodriguez, Eloy
AU - Soininen, Hilkka
AU - Heikkinen, Sami
AU - de Mendonça, Alexandre
AU - Mehrabian, Shima
AU - EADB
AU - Verhey, Frans
AU - Van Steen, Kristel
AU - van Duijn, Cornelia M.
AU - Escott-Price, Valentina
AU - Ramakers, Inez
PY - 2025/7/22
Y1 - 2025/7/22
N2 - Traditional statistical approaches have advanced our understanding of the genetics of complex diseases, yet are limited to linear additive models. Here we applied machine learning (ML) to genome-wide data from 41,686 individuals in the largest European consortium on Alzheimer's disease (AD) to investigate the effectiveness of various ML algorithms in replicating known findings, discovering novel loci, and predicting individuals at risk. We utilised Gradient Boosting Machines (GBMs), biological pathway-informed Neural Networks (NNs), and Model-based Multifactor Dimensionality Reduction (MB-MDR) models. ML approaches successfully captured all genome-wide significant genetic variants identified in the training set and 22% of associations from larger meta-analyses. They highlight 6 novel loci which replicate in an external dataset, including variants which map to ARHGAP25, LY6H, COG7, SOD1 and ZNF597. They further identify novel association in AP4E1, refining the genetic landscape of the known SPPL2A locus. Our results demonstrate that machine learning methods can achieve predictive performance comparable to classical approaches in genetic epidemiology and have the potential to uncover novel loci that remain undetected by traditional GWAS. These insights provide a complementary avenue for advancing the understanding of AD genetics.
AB - Traditional statistical approaches have advanced our understanding of the genetics of complex diseases, yet are limited to linear additive models. Here we applied machine learning (ML) to genome-wide data from 41,686 individuals in the largest European consortium on Alzheimer's disease (AD) to investigate the effectiveness of various ML algorithms in replicating known findings, discovering novel loci, and predicting individuals at risk. We utilised Gradient Boosting Machines (GBMs), biological pathway-informed Neural Networks (NNs), and Model-based Multifactor Dimensionality Reduction (MB-MDR) models. ML approaches successfully captured all genome-wide significant genetic variants identified in the training set and 22% of associations from larger meta-analyses. They highlight 6 novel loci which replicate in an external dataset, including variants which map to ARHGAP25, LY6H, COG7, SOD1 and ZNF597. They further identify novel association in AP4E1, refining the genetic landscape of the known SPPL2A locus. Our results demonstrate that machine learning methods can achieve predictive performance comparable to classical approaches in genetic epidemiology and have the potential to uncover novel loci that remain undetected by traditional GWAS. These insights provide a complementary avenue for advancing the understanding of AD genetics.
KW - Alzheimer Disease/genetics
KW - Humans
KW - Machine Learning
KW - Genome-Wide Association Study
KW - Genetic Predisposition to Disease
KW - Polymorphism, Single Nucleotide
KW - Algorithms
KW - GTPase-Activating Proteins/genetics
KW - Neural Networks, Computer
U2 - 10.1038/s41467-025-61650-z
DO - 10.1038/s41467-025-61650-z
M3 - Article
SN - 2041-1723
VL - 16
SP - 6726
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 6726
ER -