Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization

Amal Joseph Varghese, Varsha Gouthamchand, Balu Krishna Sasidharan, Leonard Wee, Sharief K. Sidhique, Julia Priyadarshini Rao, Andre Dekker, Frank Hoebers, Devadhas Devakumar, Aparna Irodi, Timothy Peace Balasingh, Henry Finlay Godson, T. Joel, Manu Mathew, Rajesh Gunasingam Isiah, Simon Pradeep Pavamani, Hannah Mary T. Thomas*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background and purpose: Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes. Materials and methods: 562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models’ performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated. Results: LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481–0.559) and 0.632 (0.586–0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536–0.590) and 0.662 (0.606–0.690), respectively. Compared to single cohort AUCs (0.562–0.629), SVM models from pooled data performed significantly better at AUC = 0.680. Conclusions: Multi-institutional retrospective data accentuates the existing variabilities that affect radiomics. Carefully designed prospective, multi-institutional studies and data sharing are necessary for clinically relevant head-and-neck cancer prognostication models.
Original languageEnglish
Article number100450
Number of pages8
JournalPhysics & Imaging in Radiation Oncology
Volume26
Issue number1
DOIs
Publication statusPublished - 1 Apr 2023

Keywords

  • Head-and-neck cancer
  • Loco-regional recurrence
  • Machine learning
  • Multi-institutional
  • Prognosis
  • Radiomics

Cite this