TY - JOUR
T1 - Lung Cancer Detection Using Bayesian Networks
T2 - A Retrospective Development and Validation Study on a Danish Population of High-Risk Individuals
AU - Henriksen, Margrethe Bang
AU - Van Daalen, Florian
AU - Wee, Leonard
AU - Hansen, Torben Frostrup
AU - Jensen, Lars Henrik
AU - Brasen, Claus Lohman
AU - Hilberg, Ole
AU - Bermejo, Inigo
PY - 2025/2/1
Y1 - 2025/2/1
N2 - BackgroundLung cancer (LC) is the top cause of cancer deaths globally, prompting many countries to adopt LC screening programs. While screening typically relies on age and smoking intensity, more efficient risk models exist. We devised a Bayesian network (BN) for LC detection, testing its resilience with varying degrees of missing data and comparing it to a prior machine learning (ML) model.MethodsWe analyzed data from 9940 patients referred for LC assessment in Southern Denmark from 2009 to 2018. Variables included age, sex, smoking, and lab results. Our experiments varied missing data (0%-30%), BN structure (expert-based vs. data-driven), and discretization method (standard vs. data-driven).ResultsAcross all missing data levels, area under the curve (AUC) remained steady, ranging from 0.737 to 0.757, compared to the ML model's AUC of 0.77. BN structure and discretization method had minimal impact on performance. BNs were well calibrated overall, with a net benefit in decision curve analysis when predicted risk exceeded 5%.ConclusionBN models showed resilience with up to 30% missing values. Moreover, these BNs exhibited similar performance, calibration, and clinical utility compared to the machine learning model developed using the same dataset. Considering their effectiveness in handling missing data, BNs emerge as a relevant method for the development of future lung cancer detection models.
AB - BackgroundLung cancer (LC) is the top cause of cancer deaths globally, prompting many countries to adopt LC screening programs. While screening typically relies on age and smoking intensity, more efficient risk models exist. We devised a Bayesian network (BN) for LC detection, testing its resilience with varying degrees of missing data and comparing it to a prior machine learning (ML) model.MethodsWe analyzed data from 9940 patients referred for LC assessment in Southern Denmark from 2009 to 2018. Variables included age, sex, smoking, and lab results. Our experiments varied missing data (0%-30%), BN structure (expert-based vs. data-driven), and discretization method (standard vs. data-driven).ResultsAcross all missing data levels, area under the curve (AUC) remained steady, ranging from 0.737 to 0.757, compared to the ML model's AUC of 0.77. BN structure and discretization method had minimal impact on performance. BNs were well calibrated overall, with a net benefit in decision curve analysis when predicted risk exceeded 5%.ConclusionBN models showed resilience with up to 30% missing values. Moreover, these BNs exhibited similar performance, calibration, and clinical utility compared to the machine learning model developed using the same dataset. Considering their effectiveness in handling missing data, BNs emerge as a relevant method for the development of future lung cancer detection models.
KW - SCREENING TRIAL
KW - PREDICTION
KW - CRITERIA
KW - MODELS
U2 - 10.1002/cam4.70458
DO - 10.1002/cam4.70458
M3 - Article
SN - 2045-7634
VL - 14
JO - Cancer Medicine
JF - Cancer Medicine
IS - 3
M1 - e70458
ER -