TY - JOUR
T1 - A distributed feature selection pipeline for survival analysis using radiomics in non-small cell lung cancer patients
AU - Gottardelli, Benedetta
AU - Gouthamchand, Varsha
AU - Masciocchi, Carlotta
AU - Boldrini, Luca
AU - Martino, Antonella
AU - Mazzarella, Ciro
AU - Massaccesi, Mariangela
AU - Monshouwer, René
AU - Findhammer, Jeroen
AU - Wee, Leonard
AU - Dekker, Andre
AU - Gambacorta, Maria Antonietta
AU - Damiani, Andrea
N1 - Funding Information:
This study received partial funding from Italian Ministry for University and Research (MUR) under the Program PON "Research and Innovation" supporting the development of artificial intelligence platform Gemelli Generator at Policlinico Universitario A. Gemelli IRCCS. Varsha Gouthamchand’s work is funded by Dutch Research Council (TRAIN). Leonard Wee’s work is funded by Hanarth Foundation, Dutch Research Council (AMICUS and TRAIN) and Horizon Europe (STRONG AYA).
Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12/1
Y1 - 2024/12/1
N2 - Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined can help overcome these issues, as one provides the means of training models without exchanging sensitive data, while the other identifies the most informative features, reduces overfitting, and improves model interpretability. Our proposed FS pipeline based on FL principles targets data-driven radiomics FS in a multivariate survival study of non-small cell lung cancer patients. The pipeline was run across datasets from three institutions without patient-level data exchange. It includes two FS techniques, Correlation-based Feature Selection and LASSO regularization, and Cox Proportional-Hazard regression with Overall Survival as endpoint. Trained and validated on 828 patients overall, our pipeline yielded a radiomic signature comprising "intensity-based energy" and "mean discretised intensity". Validation resulted in a mean Harrell C-index of 0.59, showcasing fair efficacy in risk stratification. In conclusion, we suggest a distributed radiomics approach that incorporates preliminary feature selection to systematically decrease the feature set based on data-driven considerations. This aims to address dimensionality challenges beyond those associated with data constraints and interpretability concerns.
AB - Predictive modelling of cancer outcomes using radiomics faces dimensionality problems and data limitations, as radiomics features often number in the hundreds, and multi-institutional data sharing is ()often unfeasible. Federated learning (FL) and feature selection (FS) techniques combined can help overcome these issues, as one provides the means of training models without exchanging sensitive data, while the other identifies the most informative features, reduces overfitting, and improves model interpretability. Our proposed FS pipeline based on FL principles targets data-driven radiomics FS in a multivariate survival study of non-small cell lung cancer patients. The pipeline was run across datasets from three institutions without patient-level data exchange. It includes two FS techniques, Correlation-based Feature Selection and LASSO regularization, and Cox Proportional-Hazard regression with Overall Survival as endpoint. Trained and validated on 828 patients overall, our pipeline yielded a radiomic signature comprising "intensity-based energy" and "mean discretised intensity". Validation resulted in a mean Harrell C-index of 0.59, showcasing fair efficacy in risk stratification. In conclusion, we suggest a distributed radiomics approach that incorporates preliminary feature selection to systematically decrease the feature set based on data-driven considerations. This aims to address dimensionality challenges beyond those associated with data constraints and interpretability concerns.
KW - Distributed learning
KW - Feature selection
KW - NSCLC
KW - Radiomics
U2 - 10.1038/s41598-024-58241-1
DO - 10.1038/s41598-024-58241-1
M3 - Article
SN - 2045-2322
VL - 14
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 7814
ER -