Multi-institutional Prognostic Modeling in Head and Neck Cancer: Evaluating Impact and Generalizability of Deep Learning and Radiomics

Michal Kazmierski; Mattea Welch; Sejin Kim; Chris McIntosh; Katrina Rey-McIntyre; Shao Hui Huang; Tirth Patel; Tony Tadic; Michael Milosevic; Fei-Fei Liu; Adam Ryczkowski; Joanna Kazmierska; Zezhong Ye; Deborah Plana; Hugo J. W. L. Aerts; Benjamin H. Kann; Scott V. Bratman; Andrew J. Hope; Benjamin Haibe-Kains

doi:10.1158/2767-9764.CRC-22-0152

Multi-institutional Prognostic Modeling in Head and Neck Cancer: Evaluating Impact and Generalizability of Deep Learning and Radiomics

Michal Kazmierski, Mattea Welch, Sejin Kim, Chris McIntosh, Katrina Rey-McIntyre, Shao Hui Huang, Tirth Patel, Tony Tadic, Michael Milosevic, Fei-Fei Liu, Adam Ryczkowski, Joanna Kazmierska, Zezhong Ye, Deborah Plana, Hugo J. W. L. Aerts, Benjamin H. Kann, Scott V. Bratman, Andrew J. Hope, Benjamin Haibe-Kains^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Artificial intelligence (AI) and machine learning (ML) are becoming criti-cal in developing and deploying personalized medicine and targeted clinical trials. Recent advances in ML have enabled the integration of wider ranges of data including both medical records and imaging (radiomics). However, the development of prognostic models is complex as no modeling strat-egy is universally superior to others and validation of developed models requires large and diverse datasets to demonstrate that prognostic mod-els developed (regardless of method) from one dataset are applicable to other datasets both internally and externally. Using a retrospective dataset of 2,552 patients from a single institution and a strict evaluation frame-work that included external validation on three external patient cohorts (873 patients), we crowdsourced the development of ML models to predict overall survival in head and neck cancer (HNC) using electronic medical records (EMR) and pretreatment radiological images. To assess the rela-tive contributions of radiomics in predicting HNC prognosis, we compared 12 different models using imaging and/or EMR data. The model with the highest accuracy used multitask learning on clinical data and tumor vol-ume, achieving high prognostic accuracy for 2-year and lifetime survival prediction, outperforming models relying on clinical data only, engineered radiomics, or complex deep neural network architecture. However, when we attempted to extend the best performing models from this large train-ing dataset to other institutions, we observed significant reductions in the performance of the model in those datasets, highlighting the importance of detailed population-based reporting for AI/ML model utility and stronger validation frameworks. 1. We have developed highly prognostic models for overall survival in HNC using EMRs and pretreatment radiological images based on a large, retrospective dataset of 2,552 patients from our institution. 2. Diverse ML approaches were used by independent investigators. The model with the highest accuracy used multitask learning on clinical data and tumor volume. 3. External validation of the top three performing models on three datasets (873 patients) with significant differences in the distribu-tions of clinical and demographic variables demonstrated significant decreases in model performance. Significance: ML combined with simple prognostic factors outperformed multiple advanced CT radiomics and deep learning methods. ML mod-els provided diverse solutions for prognosis of patients with HNC but their prognostic value is affected by differences in patient populations and require extensive validation.

Original language	English
Pages (from-to)	1140-1151
Number of pages	12
Journal	Cancer Research Communications
Volume	3
Issue number	6
DOIs	https://doi.org/10.1158/2767-9764.CRC-22-0152
Publication status	Published - 1 Jun 2023

Keywords

REPRODUCIBILITY
INFORMATION
SELECTION
TRIALS
ISSUES
VOLUME

Access to Document

10.1158/2767-9764.CRC-22-0152Licence: CC BY

Cite this

Kazmierski, M., Welch, M., Kim, S., McIntosh, C., Rey-McIntyre, K., Huang, S. H., Patel, T., Tadic, T., Milosevic, M., Liu, F.-F., Ryczkowski, A., Kazmierska, J., Ye, Z., Plana, D., Aerts, H. J. W. L., Kann, B. H., V. Bratman, S., Hope, A. J., & Haibe-Kains, B. (2023). Multi-institutional Prognostic Modeling in Head and Neck Cancer: Evaluating Impact and Generalizability of Deep Learning and Radiomics. Cancer Research Communications, 3(6), 1140-1151. https://doi.org/10.1158/2767-9764.CRC-22-0152

@article{5107d229082f48dbaa4cfdf14d983500,

title = "Multi-institutional Prognostic Modeling in Head and Neck Cancer: Evaluating Impact and Generalizability of Deep Learning and Radiomics",

abstract = "Artificial intelligence (AI) and machine learning (ML) are becoming criti-cal in developing and deploying personalized medicine and targeted clinical trials. Recent advances in ML have enabled the integration of wider ranges of data including both medical records and imaging (radiomics). However, the development of prognostic models is complex as no modeling strat-egy is universally superior to others and validation of developed models requires large and diverse datasets to demonstrate that prognostic mod-els developed (regardless of method) from one dataset are applicable to other datasets both internally and externally. Using a retrospective dataset of 2,552 patients from a single institution and a strict evaluation frame-work that included external validation on three external patient cohorts (873 patients), we crowdsourced the development of ML models to predict overall survival in head and neck cancer (HNC) using electronic medical records (EMR) and pretreatment radiological images. To assess the rela-tive contributions of radiomics in predicting HNC prognosis, we compared 12 different models using imaging and/or EMR data. The model with the highest accuracy used multitask learning on clinical data and tumor vol-ume, achieving high prognostic accuracy for 2-year and lifetime survival prediction, outperforming models relying on clinical data only, engineered radiomics, or complex deep neural network architecture. However, when we attempted to extend the best performing models from this large train-ing dataset to other institutions, we observed significant reductions in the performance of the model in those datasets, highlighting the importance of detailed population-based reporting for AI/ML model utility and stronger validation frameworks. 1. We have developed highly prognostic models for overall survival in HNC using EMRs and pretreatment radiological images based on a large, retrospective dataset of 2,552 patients from our institution. 2. Diverse ML approaches were used by independent investigators. The model with the highest accuracy used multitask learning on clinical data and tumor volume. 3. External validation of the top three performing models on three datasets (873 patients) with significant differences in the distribu-tions of clinical and demographic variables demonstrated significant decreases in model performance. Significance: ML combined with simple prognostic factors outperformed multiple advanced CT radiomics and deep learning methods. ML mod-els provided diverse solutions for prognosis of patients with HNC but their prognostic value is affected by differences in patient populations and require extensive validation.",

keywords = "REPRODUCIBILITY, INFORMATION, SELECTION, TRIALS, ISSUES, VOLUME",

author = "Michal Kazmierski and Mattea Welch and Sejin Kim and Chris McIntosh and Katrina Rey-McIntyre and Huang, {Shao Hui} and Tirth Patel and Tony Tadic and Michael Milosevic and Fei-Fei Liu and Adam Ryczkowski and Joanna Kazmierska and Zezhong Ye and Deborah Plana and Aerts, {Hugo J. W. L.} and Kann, {Benjamin H.} and {V. Bratman}, Scott and Hope, {Andrew J.} and Benjamin Haibe-Kains",

year = "2023",

month = jun,

day = "1",

doi = "10.1158/2767-9764.CRC-22-0152",

language = "English",

volume = "3",

pages = "1140--1151",

journal = "Cancer Research Communications",

issn = "2767-9764",

publisher = "American Association for Cancer Research Inc.",

number = "6",

}

Kazmierski, M, Welch, M, Kim, S, McIntosh, C, Rey-McIntyre, K, Huang, SH, Patel, T, Tadic, T, Milosevic, M, Liu, F-F, Ryczkowski, A, Kazmierska, J, Ye, Z, Plana, D, Aerts, HJWL, Kann, BH, V. Bratman, S, Hope, AJ & Haibe-Kains, B 2023, 'Multi-institutional Prognostic Modeling in Head and Neck Cancer: Evaluating Impact and Generalizability of Deep Learning and Radiomics', Cancer Research Communications, vol. 3, no. 6, pp. 1140-1151. https://doi.org/10.1158/2767-9764.CRC-22-0152

TY - JOUR

T1 - Multi-institutional Prognostic Modeling in Head and Neck Cancer

T2 - Evaluating Impact and Generalizability of Deep Learning and Radiomics

AU - Kazmierski, Michal

AU - Welch, Mattea

AU - Kim, Sejin

AU - McIntosh, Chris

AU - Rey-McIntyre, Katrina

AU - Huang, Shao Hui

AU - Patel, Tirth

AU - Tadic, Tony

AU - Milosevic, Michael

AU - Liu, Fei-Fei

AU - Ryczkowski, Adam

AU - Kazmierska, Joanna

AU - Ye, Zezhong

AU - Plana, Deborah

AU - Aerts, Hugo J. W. L.

AU - Kann, Benjamin H.

AU - V. Bratman, Scott

AU - Hope, Andrew J.

AU - Haibe-Kains, Benjamin

PY - 2023/6/1

Y1 - 2023/6/1

N2 - Artificial intelligence (AI) and machine learning (ML) are becoming criti-cal in developing and deploying personalized medicine and targeted clinical trials. Recent advances in ML have enabled the integration of wider ranges of data including both medical records and imaging (radiomics). However, the development of prognostic models is complex as no modeling strat-egy is universally superior to others and validation of developed models requires large and diverse datasets to demonstrate that prognostic mod-els developed (regardless of method) from one dataset are applicable to other datasets both internally and externally. Using a retrospective dataset of 2,552 patients from a single institution and a strict evaluation frame-work that included external validation on three external patient cohorts (873 patients), we crowdsourced the development of ML models to predict overall survival in head and neck cancer (HNC) using electronic medical records (EMR) and pretreatment radiological images. To assess the rela-tive contributions of radiomics in predicting HNC prognosis, we compared 12 different models using imaging and/or EMR data. The model with the highest accuracy used multitask learning on clinical data and tumor vol-ume, achieving high prognostic accuracy for 2-year and lifetime survival prediction, outperforming models relying on clinical data only, engineered radiomics, or complex deep neural network architecture. However, when we attempted to extend the best performing models from this large train-ing dataset to other institutions, we observed significant reductions in the performance of the model in those datasets, highlighting the importance of detailed population-based reporting for AI/ML model utility and stronger validation frameworks. 1. We have developed highly prognostic models for overall survival in HNC using EMRs and pretreatment radiological images based on a large, retrospective dataset of 2,552 patients from our institution. 2. Diverse ML approaches were used by independent investigators. The model with the highest accuracy used multitask learning on clinical data and tumor volume. 3. External validation of the top three performing models on three datasets (873 patients) with significant differences in the distribu-tions of clinical and demographic variables demonstrated significant decreases in model performance. Significance: ML combined with simple prognostic factors outperformed multiple advanced CT radiomics and deep learning methods. ML mod-els provided diverse solutions for prognosis of patients with HNC but their prognostic value is affected by differences in patient populations and require extensive validation.

AB - Artificial intelligence (AI) and machine learning (ML) are becoming criti-cal in developing and deploying personalized medicine and targeted clinical trials. Recent advances in ML have enabled the integration of wider ranges of data including both medical records and imaging (radiomics). However, the development of prognostic models is complex as no modeling strat-egy is universally superior to others and validation of developed models requires large and diverse datasets to demonstrate that prognostic mod-els developed (regardless of method) from one dataset are applicable to other datasets both internally and externally. Using a retrospective dataset of 2,552 patients from a single institution and a strict evaluation frame-work that included external validation on three external patient cohorts (873 patients), we crowdsourced the development of ML models to predict overall survival in head and neck cancer (HNC) using electronic medical records (EMR) and pretreatment radiological images. To assess the rela-tive contributions of radiomics in predicting HNC prognosis, we compared 12 different models using imaging and/or EMR data. The model with the highest accuracy used multitask learning on clinical data and tumor vol-ume, achieving high prognostic accuracy for 2-year and lifetime survival prediction, outperforming models relying on clinical data only, engineered radiomics, or complex deep neural network architecture. However, when we attempted to extend the best performing models from this large train-ing dataset to other institutions, we observed significant reductions in the performance of the model in those datasets, highlighting the importance of detailed population-based reporting for AI/ML model utility and stronger validation frameworks. 1. We have developed highly prognostic models for overall survival in HNC using EMRs and pretreatment radiological images based on a large, retrospective dataset of 2,552 patients from our institution. 2. Diverse ML approaches were used by independent investigators. The model with the highest accuracy used multitask learning on clinical data and tumor volume. 3. External validation of the top three performing models on three datasets (873 patients) with significant differences in the distribu-tions of clinical and demographic variables demonstrated significant decreases in model performance. Significance: ML combined with simple prognostic factors outperformed multiple advanced CT radiomics and deep learning methods. ML mod-els provided diverse solutions for prognosis of patients with HNC but their prognostic value is affected by differences in patient populations and require extensive validation.

KW - REPRODUCIBILITY

KW - INFORMATION

KW - SELECTION

KW - TRIALS

KW - ISSUES

KW - VOLUME

U2 - 10.1158/2767-9764.CRC-22-0152

DO - 10.1158/2767-9764.CRC-22-0152

M3 - Article

C2 - 37397861

SN - 2767-9764

VL - 3

SP - 1140

EP - 1151

JO - Cancer Research Communications

JF - Cancer Research Communications

IS - 6

ER -