TY - JOUR
T1 - Screening for extranodal extension in HPV-associated oropharyngeal carcinoma
T2 - evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial
AU - Kann, Benjamin H.
AU - Likitlersuang, Jirapat
AU - Bontempi, Dennis
AU - Ye, Zezhong
AU - Aneja, Sanjay
AU - Bakst, Richard
AU - Kelly, Hillary R.
AU - Juliano, Amy F.
AU - Payabvash, Sam
AU - Guenette, Jeffrey P.
AU - Uppaluri, Ravindra
AU - Margalit, Danielle N.
AU - Schoenfeld, Jonathan D.
AU - Tishler, Roy B.
AU - Haddad, Robert
AU - Aerts, Hugo J.W.L.
AU - Garcia, Joaquin J.
AU - Flamand, Yael
AU - Subramaniam, Rathan M.
AU - Burtness, Barbara A.
AU - Ferris, Robert L.
N1 - Funding Information:
This study was supported by the ECOG-ACRIN Cancer Research Group (Peter J O'Dwyer and Mitchell D Schnall, Group Co-Chairs, University of Pennsylvania School of Medicine) and the National Cancer Institute of the US National Institutes of Health under the following award numbers: U10CA180794, U10CA180820, UG1CA233180, UG1CA233184, UG1CA233337, UG1CA233253, and UG1CA232760. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
© 2023 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license
PY - 2023/6/1
Y1 - 2023/6/1
N2 - Background: Pretreatment identification of pathological extranodal extension (ENE) would guide therapy de-escalation strategies for in human papillomavirus (HPV)-associated oropharyngeal carcinoma but is diagnostically challenging. ECOG-ACRIN Cancer Research Group E3311 was a multicentre trial wherein patients with HPV-associated oropharyngeal carcinoma were treated surgically and assigned to a pathological risk-based adjuvant strategy of observation, radiation, or concurrent chemoradiation. Despite protocol exclusion of patients with overt radiographic ENE, more than 30% had pathological ENE and required postoperative chemoradiation. We aimed to evaluate a CT-based deep learning algorithm for prediction of ENE in E3311, a diagnostically challenging cohort wherein algorithm use would be impactful in guiding decision-making. Methods: For this retrospective evaluation of deep learning algorithm performance, we obtained pretreatment CTs and corresponding surgical pathology reports from the multicentre, randomised de-escalation trial E3311. All enrolled patients on E3311 required pretreatment and diagnostic head and neck imaging; patients with radiographically overt ENE were excluded per study protocol. The lymph node with largest short-axis diameter and up to two additional nodes were segmented on each scan and annotated for ENE per pathology reports. Deep learning algorithm performance for ENE prediction was compared with four board-certified head and neck radiologists. The primary endpoint was the area under the curve (AUC) of the receiver operating characteristic. Findings: From 178 collected scans, 313 nodes were annotated: 71 (23%) with ENE in general, 39 (13%) with ENE larger than 1 mm ENE. The deep learning algorithm AUC for ENE classification was 0·86 (95% CI 0·82–0·90), outperforming all readers (p<0·0001 for each). Among radiologists, there was high variability in specificity (43–86%) and sensitivity (45–96%) with poor inter-reader agreement (κ 0·32). Matching the algorithm specificity to that of the reader with highest AUC (R2, false positive rate 22%) yielded improved sensitivity to 75% (+ 13%). Setting the algorithm false positive rate to 30% yielded 90% sensitivity. The algorithm showed improved performance compared with radiologists for ENE larger than 1 mm (p<0·0001) and in nodes with short-axis diameter 1 cm or larger. Interpretation: The deep learning algorithm outperformed experts in predicting pathological ENE on a challenging cohort of patients with HPV-associated oropharyngeal carcinoma from a randomised clinical trial. Deep learning algorithms should be evaluated prospectively as a treatment selection tool. Funding: ECOG-ACRIN Cancer Research Group and the National Cancer Institute of the US National Institutes of Health.
AB - Background: Pretreatment identification of pathological extranodal extension (ENE) would guide therapy de-escalation strategies for in human papillomavirus (HPV)-associated oropharyngeal carcinoma but is diagnostically challenging. ECOG-ACRIN Cancer Research Group E3311 was a multicentre trial wherein patients with HPV-associated oropharyngeal carcinoma were treated surgically and assigned to a pathological risk-based adjuvant strategy of observation, radiation, or concurrent chemoradiation. Despite protocol exclusion of patients with overt radiographic ENE, more than 30% had pathological ENE and required postoperative chemoradiation. We aimed to evaluate a CT-based deep learning algorithm for prediction of ENE in E3311, a diagnostically challenging cohort wherein algorithm use would be impactful in guiding decision-making. Methods: For this retrospective evaluation of deep learning algorithm performance, we obtained pretreatment CTs and corresponding surgical pathology reports from the multicentre, randomised de-escalation trial E3311. All enrolled patients on E3311 required pretreatment and diagnostic head and neck imaging; patients with radiographically overt ENE were excluded per study protocol. The lymph node with largest short-axis diameter and up to two additional nodes were segmented on each scan and annotated for ENE per pathology reports. Deep learning algorithm performance for ENE prediction was compared with four board-certified head and neck radiologists. The primary endpoint was the area under the curve (AUC) of the receiver operating characteristic. Findings: From 178 collected scans, 313 nodes were annotated: 71 (23%) with ENE in general, 39 (13%) with ENE larger than 1 mm ENE. The deep learning algorithm AUC for ENE classification was 0·86 (95% CI 0·82–0·90), outperforming all readers (p<0·0001 for each). Among radiologists, there was high variability in specificity (43–86%) and sensitivity (45–96%) with poor inter-reader agreement (κ 0·32). Matching the algorithm specificity to that of the reader with highest AUC (R2, false positive rate 22%) yielded improved sensitivity to 75% (+ 13%). Setting the algorithm false positive rate to 30% yielded 90% sensitivity. The algorithm showed improved performance compared with radiologists for ENE larger than 1 mm (p<0·0001) and in nodes with short-axis diameter 1 cm or larger. Interpretation: The deep learning algorithm outperformed experts in predicting pathological ENE on a challenging cohort of patients with HPV-associated oropharyngeal carcinoma from a randomised clinical trial. Deep learning algorithms should be evaluated prospectively as a treatment selection tool. Funding: ECOG-ACRIN Cancer Research Group and the National Cancer Institute of the US National Institutes of Health.
U2 - 10.1016/S2589-7500(23)00046-8
DO - 10.1016/S2589-7500(23)00046-8
M3 - Article
SN - 2589-7500
VL - 5
SP - E360-E369
JO - The Lancet Digital Health
JF - The Lancet Digital Health
IS - 6
ER -