Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization

Amal Joseph Varghese; Varsha Gouthamchand; Balu Krishna Sasidharan; Leonard Wee; Sharief K. Sidhique; Julia Priyadarshini Rao; Andre Dekker; Frank Hoebers; Devadhas Devakumar; Aparna Irodi; Timothy Peace Balasingh; Henry Finlay Godson; T. Joel; Manu Mathew; Rajesh Gunasingam Isiah; Simon Pradeep Pavamani; Hannah Mary T. Thomas

doi:10.1016/j.phro.2023.100450

Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization

Amal Joseph Varghese, Varsha Gouthamchand, Balu Krishna Sasidharan, Leonard Wee, Sharief K. Sidhique, Julia Priyadarshini Rao, Andre Dekker, Frank Hoebers, Devadhas Devakumar, Aparna Irodi, Timothy Peace Balasingh, Henry Finlay Godson, T. Joel, Manu Mathew, Rajesh Gunasingam Isiah, Simon Pradeep Pavamani, Hannah Mary T. Thomas^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background and purpose: Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes. Materials and methods: 562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models’ performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated. Results: LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481–0.559) and 0.632 (0.586–0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536–0.590) and 0.662 (0.606–0.690), respectively. Compared to single cohort AUCs (0.562–0.629), SVM models from pooled data performed significantly better at AUC = 0.680. Conclusions: Multi-institutional retrospective data accentuates the existing variabilities that affect radiomics. Carefully designed prospective, multi-institutional studies and data sharing are necessary for clinically relevant head-and-neck cancer prognostication models.

Original language	English
Article number	100450
Number of pages	8
Journal	Physics & Imaging in Radiation Oncology
Volume	26
Issue number	1
DOIs	https://doi.org/10.1016/j.phro.2023.100450
Publication status	Published - 1 Apr 2023

Keywords

Head-and-neck cancer
Loco-regional recurrence
Machine learning
Multi-institutional
Prognosis
Radiomics

Access to Document

10.1016/j.phro.2023.100450Licence: CC BY-NC-ND

Cite this

Varghese, A. J., Gouthamchand, V., Sasidharan, B. K., Wee, L., Sidhique, S. K., Rao, J. P., Dekker, A., Hoebers, F., Devakumar, D., Irodi, A., Balasingh, T. P., Godson, H. F., Joel, T., Mathew, M., Gunasingam Isiah, R., Pavamani, S. P., & Thomas, H. M. T. (2023). Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization. Physics & Imaging in Radiation Oncology, 26(1), Article 100450. https://doi.org/10.1016/j.phro.2023.100450

@article{57914615de7a46b3a5ce52b81221a0c1,

title = "Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization",

abstract = "Background and purpose: Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes. Materials and methods: 562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models{\textquoteright} performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated. Results: LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481–0.559) and 0.632 (0.586–0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536–0.590) and 0.662 (0.606–0.690), respectively. Compared to single cohort AUCs (0.562–0.629), SVM models from pooled data performed significantly better at AUC = 0.680. Conclusions: Multi-institutional retrospective data accentuates the existing variabilities that affect radiomics. Carefully designed prospective, multi-institutional studies and data sharing are necessary for clinically relevant head-and-neck cancer prognostication models.",

keywords = "Head-and-neck cancer, Loco-regional recurrence, Machine learning, Multi-institutional, Prognosis, Radiomics",

author = "Varghese, {Amal Joseph} and Varsha Gouthamchand and Sasidharan, {Balu Krishna} and Leonard Wee and Sidhique, {Sharief K.} and Rao, {Julia Priyadarshini} and Andre Dekker and Frank Hoebers and Devadhas Devakumar and Aparna Irodi and Balasingh, {Timothy Peace} and Godson, {Henry Finlay} and T. Joel and Manu Mathew and {Gunasingam Isiah}, Rajesh and Pavamani, {Simon Pradeep} and Thomas, {Hannah Mary T.}",

note = "Funding Information: This work was supported by the DBT/Wellcome Trust India Alliance Early Career Fellowship [Grant number: IA/E/18/1/504306] awarded to HMT. Author BS acknowledges the support by the Foundation I-DAIR. Authors LW and FH acknowledge support by the Hanarth Foundation. LW and AD further acknowledge financial support from the Dutch Research Council (NWO) via the BIONIC, TRAIN and AMICUS grants. Publisher Copyright: {\textcopyright} 2023",

year = "2023",

month = apr,

day = "1",

doi = "10.1016/j.phro.2023.100450",

language = "English",

volume = "26",

journal = "Physics & Imaging in Radiation Oncology",

issn = "2405-6316",

publisher = "Elsevier Ireland Ltd",

number = "1",

}

Varghese, AJ, Gouthamchand, V, Sasidharan, BK, Wee, L, Sidhique, SK, Rao, JP, Dekker, A , Hoebers, F, Devakumar, D, Irodi, A, Balasingh, TP, Godson, HF, Joel, T, Mathew, M, Gunasingam Isiah, R, Pavamani, SP & Thomas, HMT 2023, 'Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization', Physics & Imaging in Radiation Oncology, vol. 26, no. 1, 100450. https://doi.org/10.1016/j.phro.2023.100450

Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization. / Varghese, Amal Joseph; Gouthamchand, Varsha; Sasidharan, Balu Krishna et al.
In: Physics & Imaging in Radiation Oncology, Vol. 26, No. 1, 100450, 01.04.2023.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers

T2 - Consequences of feature selection, machine learning classifiers and batch-effect harmonization

AU - Varghese, Amal Joseph

AU - Gouthamchand, Varsha

AU - Sasidharan, Balu Krishna

AU - Wee, Leonard

AU - Sidhique, Sharief K.

AU - Rao, Julia Priyadarshini

AU - Dekker, Andre

AU - Hoebers, Frank

AU - Devakumar, Devadhas

AU - Irodi, Aparna

AU - Balasingh, Timothy Peace

AU - Godson, Henry Finlay

AU - Joel, T.

AU - Mathew, Manu

AU - Gunasingam Isiah, Rajesh

AU - Pavamani, Simon Pradeep

AU - Thomas, Hannah Mary T.

N1 - Funding Information: This work was supported by the DBT/Wellcome Trust India Alliance Early Career Fellowship [Grant number: IA/E/18/1/504306] awarded to HMT. Author BS acknowledges the support by the Foundation I-DAIR. Authors LW and FH acknowledge support by the Hanarth Foundation. LW and AD further acknowledge financial support from the Dutch Research Council (NWO) via the BIONIC, TRAIN and AMICUS grants. Publisher Copyright: © 2023

PY - 2023/4/1

Y1 - 2023/4/1

N2 - Background and purpose: Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes. Materials and methods: 562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models’ performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated. Results: LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481–0.559) and 0.632 (0.586–0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536–0.590) and 0.662 (0.606–0.690), respectively. Compared to single cohort AUCs (0.562–0.629), SVM models from pooled data performed significantly better at AUC = 0.680. Conclusions: Multi-institutional retrospective data accentuates the existing variabilities that affect radiomics. Carefully designed prospective, multi-institutional studies and data sharing are necessary for clinically relevant head-and-neck cancer prognostication models.

AB - Background and purpose: Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes. Materials and methods: 562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models’ performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated. Results: LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481–0.559) and 0.632 (0.586–0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536–0.590) and 0.662 (0.606–0.690), respectively. Compared to single cohort AUCs (0.562–0.629), SVM models from pooled data performed significantly better at AUC = 0.680. Conclusions: Multi-institutional retrospective data accentuates the existing variabilities that affect radiomics. Carefully designed prospective, multi-institutional studies and data sharing are necessary for clinically relevant head-and-neck cancer prognostication models.

KW - Head-and-neck cancer

KW - Loco-regional recurrence

KW - Machine learning

KW - Multi-institutional

KW - Prognosis

KW - Radiomics

U2 - 10.1016/j.phro.2023.100450

DO - 10.1016/j.phro.2023.100450

M3 - Article

C2 - 37260438

SN - 2405-6316

VL - 26

JO - Physics & Imaging in Radiation Oncology

JF - Physics & Imaging in Radiation Oncology

IS - 1

M1 - 100450

ER -

Varghese AJ, Gouthamchand V, Sasidharan BK, Wee L, Sidhique SK, Rao JP et al. Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization. Physics & Imaging in Radiation Oncology. 2023 Apr 1;26(1):100450. doi: 10.1016/j.phro.2023.100450