Screening for extranodal extension in HPV-associated oropharyngeal carcinoma: evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial

Benjamin H. Kann; Jirapat Likitlersuang; Dennis Bontempi; Zezhong Ye; Sanjay Aneja; Richard Bakst; Hillary R. Kelly; Amy F. Juliano; Sam Payabvash; Jeffrey P. Guenette; Ravindra Uppaluri; Danielle N. Margalit; Jonathan D. Schoenfeld; Roy B. Tishler; Robert Haddad; Hugo J.W.L. Aerts; Joaquin J. Garcia; Yael Flamand; Rathan M. Subramaniam; Barbara A. Burtness; Robert L. Ferris

doi:10.1016/S2589-7500(23)00046-8

Screening for extranodal extension in HPV-associated oropharyngeal carcinoma: evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial

Benjamin H. Kann^*, Jirapat Likitlersuang, Dennis Bontempi, Zezhong Ye, Sanjay Aneja, Richard Bakst, Hillary R. Kelly, Amy F. Juliano, Sam Payabvash, Jeffrey P. Guenette, Ravindra Uppaluri, Danielle N. Margalit, Jonathan D. Schoenfeld, Roy B. Tishler, Robert Haddad, Hugo J.W.L. Aerts, Joaquin J. Garcia, Yael Flamand, Rathan M. Subramaniam, Barbara A. BurtnessRobert L. Ferris

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background: Pretreatment identification of pathological extranodal extension (ENE) would guide therapy de-escalation strategies for in human papillomavirus (HPV)-associated oropharyngeal carcinoma but is diagnostically challenging. ECOG-ACRIN Cancer Research Group E3311 was a multicentre trial wherein patients with HPV-associated oropharyngeal carcinoma were treated surgically and assigned to a pathological risk-based adjuvant strategy of observation, radiation, or concurrent chemoradiation. Despite protocol exclusion of patients with overt radiographic ENE, more than 30% had pathological ENE and required postoperative chemoradiation. We aimed to evaluate a CT-based deep learning algorithm for prediction of ENE in E3311, a diagnostically challenging cohort wherein algorithm use would be impactful in guiding decision-making. Methods: For this retrospective evaluation of deep learning algorithm performance, we obtained pretreatment CTs and corresponding surgical pathology reports from the multicentre, randomised de-escalation trial E3311. All enrolled patients on E3311 required pretreatment and diagnostic head and neck imaging; patients with radiographically overt ENE were excluded per study protocol. The lymph node with largest short-axis diameter and up to two additional nodes were segmented on each scan and annotated for ENE per pathology reports. Deep learning algorithm performance for ENE prediction was compared with four board-certified head and neck radiologists. The primary endpoint was the area under the curve (AUC) of the receiver operating characteristic. Findings: From 178 collected scans, 313 nodes were annotated: 71 (23%) with ENE in general, 39 (13%) with ENE larger than 1 mm ENE. The deep learning algorithm AUC for ENE classification was 0·86 (95% CI 0·82–0·90), outperforming all readers (p<0·0001 for each). Among radiologists, there was high variability in specificity (43–86%) and sensitivity (45–96%) with poor inter-reader agreement (κ 0·32). Matching the algorithm specificity to that of the reader with highest AUC (R2, false positive rate 22%) yielded improved sensitivity to 75% (+ 13%). Setting the algorithm false positive rate to 30% yielded 90% sensitivity. The algorithm showed improved performance compared with radiologists for ENE larger than 1 mm (p<0·0001) and in nodes with short-axis diameter 1 cm or larger. Interpretation: The deep learning algorithm outperformed experts in predicting pathological ENE on a challenging cohort of patients with HPV-associated oropharyngeal carcinoma from a randomised clinical trial. Deep learning algorithms should be evaluated prospectively as a treatment selection tool. Funding: ECOG-ACRIN Cancer Research Group and the National Cancer Institute of the US National Institutes of Health.

Original language	English
Pages (from-to)	E360-E369
Number of pages	10
Journal	The Lancet Digital Health
Volume	5
Issue number	6
DOIs	https://doi.org/10.1016/S2589-7500(23)00046-8
Publication status	Published - 1 Jun 2023

Access to Document

10.1016/S2589-7500(23)00046-8Licence: CC BY-NC-ND

Cite this

Kann, B. H., Likitlersuang, J., Bontempi, D., Ye, Z., Aneja, S., Bakst, R., Kelly, H. R., Juliano, A. F., Payabvash, S., Guenette, J. P., Uppaluri, R., Margalit, D. N., Schoenfeld, J. D., Tishler, R. B., Haddad, R., Aerts, H. J. W. L., Garcia, J. J., Flamand, Y., Subramaniam, R. M., ... Ferris, R. L. (2023). Screening for extranodal extension in HPV-associated oropharyngeal carcinoma: evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial. The Lancet Digital Health, 5(6), E360-E369. https://doi.org/10.1016/S2589-7500(23)00046-8

@article{2f07dd94c694472c956e5481fe8de1c0,

title = "Screening for extranodal extension in HPV-associated oropharyngeal carcinoma: evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial",

abstract = "Background: Pretreatment identification of pathological extranodal extension (ENE) would guide therapy de-escalation strategies for in human papillomavirus (HPV)-associated oropharyngeal carcinoma but is diagnostically challenging. ECOG-ACRIN Cancer Research Group E3311 was a multicentre trial wherein patients with HPV-associated oropharyngeal carcinoma were treated surgically and assigned to a pathological risk-based adjuvant strategy of observation, radiation, or concurrent chemoradiation. Despite protocol exclusion of patients with overt radiographic ENE, more than 30% had pathological ENE and required postoperative chemoradiation. We aimed to evaluate a CT-based deep learning algorithm for prediction of ENE in E3311, a diagnostically challenging cohort wherein algorithm use would be impactful in guiding decision-making. Methods: For this retrospective evaluation of deep learning algorithm performance, we obtained pretreatment CTs and corresponding surgical pathology reports from the multicentre, randomised de-escalation trial E3311. All enrolled patients on E3311 required pretreatment and diagnostic head and neck imaging; patients with radiographically overt ENE were excluded per study protocol. The lymph node with largest short-axis diameter and up to two additional nodes were segmented on each scan and annotated for ENE per pathology reports. Deep learning algorithm performance for ENE prediction was compared with four board-certified head and neck radiologists. The primary endpoint was the area under the curve (AUC) of the receiver operating characteristic. Findings: From 178 collected scans, 313 nodes were annotated: 71 (23%) with ENE in general, 39 (13%) with ENE larger than 1 mm ENE. The deep learning algorithm AUC for ENE classification was 0·86 (95% CI 0·82–0·90), outperforming all readers (p<0·0001 for each). Among radiologists, there was high variability in specificity (43–86%) and sensitivity (45–96%) with poor inter-reader agreement (κ 0·32). Matching the algorithm specificity to that of the reader with highest AUC (R2, false positive rate 22%) yielded improved sensitivity to 75% (+ 13%). Setting the algorithm false positive rate to 30% yielded 90% sensitivity. The algorithm showed improved performance compared with radiologists for ENE larger than 1 mm (p<0·0001) and in nodes with short-axis diameter 1 cm or larger. Interpretation: The deep learning algorithm outperformed experts in predicting pathological ENE on a challenging cohort of patients with HPV-associated oropharyngeal carcinoma from a randomised clinical trial. Deep learning algorithms should be evaluated prospectively as a treatment selection tool. Funding: ECOG-ACRIN Cancer Research Group and the National Cancer Institute of the US National Institutes of Health.",

author = "Kann, {Benjamin H.} and Jirapat Likitlersuang and Dennis Bontempi and Zezhong Ye and Sanjay Aneja and Richard Bakst and Kelly, {Hillary R.} and Juliano, {Amy F.} and Sam Payabvash and Guenette, {Jeffrey P.} and Ravindra Uppaluri and Margalit, {Danielle N.} and Schoenfeld, {Jonathan D.} and Tishler, {Roy B.} and Robert Haddad and Aerts, {Hugo J.W.L.} and Garcia, {Joaquin J.} and Yael Flamand and Subramaniam, {Rathan M.} and Burtness, {Barbara A.} and Ferris, {Robert L.}",

note = "Funding Information: This study was supported by the ECOG-ACRIN Cancer Research Group (Peter J O'Dwyer and Mitchell D Schnall, Group Co-Chairs, University of Pennsylvania School of Medicine) and the National Cancer Institute of the US National Institutes of Health under the following award numbers: U10CA180794, U10CA180820, UG1CA233180, UG1CA233184, UG1CA233337, UG1CA233253, and UG1CA232760. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Publisher Copyright: {\textcopyright} 2023 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license",

year = "2023",

month = jun,

day = "1",

doi = "10.1016/S2589-7500(23)00046-8",

language = "English",

volume = "5",

pages = "E360--E369",

journal = "The Lancet Digital Health",

issn = "2589-7500",

publisher = "Lancet Publishing Group",

number = "6",

}

Kann, BH, Likitlersuang, J, Bontempi, D, Ye, Z, Aneja, S, Bakst, R, Kelly, HR, Juliano, AF, Payabvash, S, Guenette, JP, Uppaluri, R, Margalit, DN, Schoenfeld, JD, Tishler, RB, Haddad, R, Aerts, HJWL, Garcia, JJ, Flamand, Y, Subramaniam, RM, Burtness, BA & Ferris, RL 2023, 'Screening for extranodal extension in HPV-associated oropharyngeal carcinoma: evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial', The Lancet Digital Health, vol. 5, no. 6, pp. E360-E369. https://doi.org/10.1016/S2589-7500(23)00046-8

Screening for extranodal extension in HPV-associated oropharyngeal carcinoma: evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial. / Kann, Benjamin H.; Likitlersuang, Jirapat; Bontempi, Dennis et al.
In: The Lancet Digital Health, Vol. 5, No. 6, 01.06.2023, p. E360-E369.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Screening for extranodal extension in HPV-associated oropharyngeal carcinoma

T2 - evaluation of a CT-based deep learning algorithm in patient data from a multicentre, randomised de-escalation trial

AU - Kann, Benjamin H.

AU - Likitlersuang, Jirapat

AU - Bontempi, Dennis

AU - Ye, Zezhong

AU - Aneja, Sanjay

AU - Bakst, Richard

AU - Kelly, Hillary R.

AU - Juliano, Amy F.

AU - Payabvash, Sam

AU - Guenette, Jeffrey P.

AU - Uppaluri, Ravindra

AU - Margalit, Danielle N.

AU - Schoenfeld, Jonathan D.

AU - Tishler, Roy B.

AU - Haddad, Robert

AU - Aerts, Hugo J.W.L.

AU - Garcia, Joaquin J.

AU - Flamand, Yael

AU - Subramaniam, Rathan M.

AU - Burtness, Barbara A.

AU - Ferris, Robert L.

N1 - Funding Information: This study was supported by the ECOG-ACRIN Cancer Research Group (Peter J O'Dwyer and Mitchell D Schnall, Group Co-Chairs, University of Pennsylvania School of Medicine) and the National Cancer Institute of the US National Institutes of Health under the following award numbers: U10CA180794, U10CA180820, UG1CA233180, UG1CA233184, UG1CA233337, UG1CA233253, and UG1CA232760. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Publisher Copyright: © 2023 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license

PY - 2023/6/1

Y1 - 2023/6/1

N2 - Background: Pretreatment identification of pathological extranodal extension (ENE) would guide therapy de-escalation strategies for in human papillomavirus (HPV)-associated oropharyngeal carcinoma but is diagnostically challenging. ECOG-ACRIN Cancer Research Group E3311 was a multicentre trial wherein patients with HPV-associated oropharyngeal carcinoma were treated surgically and assigned to a pathological risk-based adjuvant strategy of observation, radiation, or concurrent chemoradiation. Despite protocol exclusion of patients with overt radiographic ENE, more than 30% had pathological ENE and required postoperative chemoradiation. We aimed to evaluate a CT-based deep learning algorithm for prediction of ENE in E3311, a diagnostically challenging cohort wherein algorithm use would be impactful in guiding decision-making. Methods: For this retrospective evaluation of deep learning algorithm performance, we obtained pretreatment CTs and corresponding surgical pathology reports from the multicentre, randomised de-escalation trial E3311. All enrolled patients on E3311 required pretreatment and diagnostic head and neck imaging; patients with radiographically overt ENE were excluded per study protocol. The lymph node with largest short-axis diameter and up to two additional nodes were segmented on each scan and annotated for ENE per pathology reports. Deep learning algorithm performance for ENE prediction was compared with four board-certified head and neck radiologists. The primary endpoint was the area under the curve (AUC) of the receiver operating characteristic. Findings: From 178 collected scans, 313 nodes were annotated: 71 (23%) with ENE in general, 39 (13%) with ENE larger than 1 mm ENE. The deep learning algorithm AUC for ENE classification was 0·86 (95% CI 0·82–0·90), outperforming all readers (p<0·0001 for each). Among radiologists, there was high variability in specificity (43–86%) and sensitivity (45–96%) with poor inter-reader agreement (κ 0·32). Matching the algorithm specificity to that of the reader with highest AUC (R2, false positive rate 22%) yielded improved sensitivity to 75% (+ 13%). Setting the algorithm false positive rate to 30% yielded 90% sensitivity. The algorithm showed improved performance compared with radiologists for ENE larger than 1 mm (p<0·0001) and in nodes with short-axis diameter 1 cm or larger. Interpretation: The deep learning algorithm outperformed experts in predicting pathological ENE on a challenging cohort of patients with HPV-associated oropharyngeal carcinoma from a randomised clinical trial. Deep learning algorithms should be evaluated prospectively as a treatment selection tool. Funding: ECOG-ACRIN Cancer Research Group and the National Cancer Institute of the US National Institutes of Health.

AB - Background: Pretreatment identification of pathological extranodal extension (ENE) would guide therapy de-escalation strategies for in human papillomavirus (HPV)-associated oropharyngeal carcinoma but is diagnostically challenging. ECOG-ACRIN Cancer Research Group E3311 was a multicentre trial wherein patients with HPV-associated oropharyngeal carcinoma were treated surgically and assigned to a pathological risk-based adjuvant strategy of observation, radiation, or concurrent chemoradiation. Despite protocol exclusion of patients with overt radiographic ENE, more than 30% had pathological ENE and required postoperative chemoradiation. We aimed to evaluate a CT-based deep learning algorithm for prediction of ENE in E3311, a diagnostically challenging cohort wherein algorithm use would be impactful in guiding decision-making. Methods: For this retrospective evaluation of deep learning algorithm performance, we obtained pretreatment CTs and corresponding surgical pathology reports from the multicentre, randomised de-escalation trial E3311. All enrolled patients on E3311 required pretreatment and diagnostic head and neck imaging; patients with radiographically overt ENE were excluded per study protocol. The lymph node with largest short-axis diameter and up to two additional nodes were segmented on each scan and annotated for ENE per pathology reports. Deep learning algorithm performance for ENE prediction was compared with four board-certified head and neck radiologists. The primary endpoint was the area under the curve (AUC) of the receiver operating characteristic. Findings: From 178 collected scans, 313 nodes were annotated: 71 (23%) with ENE in general, 39 (13%) with ENE larger than 1 mm ENE. The deep learning algorithm AUC for ENE classification was 0·86 (95% CI 0·82–0·90), outperforming all readers (p<0·0001 for each). Among radiologists, there was high variability in specificity (43–86%) and sensitivity (45–96%) with poor inter-reader agreement (κ 0·32). Matching the algorithm specificity to that of the reader with highest AUC (R2, false positive rate 22%) yielded improved sensitivity to 75% (+ 13%). Setting the algorithm false positive rate to 30% yielded 90% sensitivity. The algorithm showed improved performance compared with radiologists for ENE larger than 1 mm (p<0·0001) and in nodes with short-axis diameter 1 cm or larger. Interpretation: The deep learning algorithm outperformed experts in predicting pathological ENE on a challenging cohort of patients with HPV-associated oropharyngeal carcinoma from a randomised clinical trial. Deep learning algorithms should be evaluated prospectively as a treatment selection tool. Funding: ECOG-ACRIN Cancer Research Group and the National Cancer Institute of the US National Institutes of Health.

U2 - 10.1016/S2589-7500(23)00046-8

DO - 10.1016/S2589-7500(23)00046-8

M3 - Article

SN - 2589-7500

VL - 5

SP - E360-E369

JO - The Lancet Digital Health

JF - The Lancet Digital Health

IS - 6

ER -