TY - JOUR
T1 - Stepwise Transfer Learning for Expert-Level Pediatric Brain Tumor MRI Segmentation in a Limited Data Scenario
AU - Boyd, Aidan
AU - Ye, Zezhong
AU - Prabhu, Sanjay
AU - Tjong, Michael C
AU - Zha, Yining
AU - Zapaischykova, Anna
AU - Vajapeyam, Sridhar
AU - Catalano, Paul J
AU - Hayat, Hasaan
AU - Chopra, Rishi
AU - Liu, Kevin X
AU - Nabavizadeh, Ali
AU - Resnick, Adam
AU - Mueller, Sabine
AU - Haas-Kogan, Daphne
AU - Aerts, Hugo J W L
AU - Poussaint, Tina
AU - Kann, Benjamin H
PY - 2024/7
Y1 - 2024/7
N2 - Purpose: To develop, externally test, and evaluate clinical acceptability of a deep learning pediatric brain tumor segmentation model using stepwise transfer learning. Materials and Methods: In this retrospective study, the authors leveraged two T2-weighted MRI datasets (May 2001 through December 2015) from a national brain tumor consortium (n = 184; median age, 7 years [range, 1–23 years]; 94 male patients) and a pediatric cancer center (n = 100; median age, 8 years [range, 1–19 years]; 47 male patients) to develop and evaluate deep learning neural networks for pediatric low-grade glioma segmentation using a stepwise transfer learning approach to maximize performance in a limited data scenario. The best model was externally tested on an independent test set and subjected to randomized blinded evaluation by three clinicians, wherein they assessed clinical acceptability of expert-and artificial intelligence (AI)–generated segmentations via 10-point Likert scales and Turing tests. Results: The best AI model used in-domain stepwise transfer learning (median Dice score coefficient, 0.88 [IQR, 0.72–0.91] vs 0.812 [IQR, 0.56–0.89] for baseline model; P = .049). With external testing, the AI model yielded excellent accuracy using reference standards from three clinical experts (median Dice similarity coefficients: expert 1, 0.83 [IQR, 0.75–0.90]; expert 2, 0.81 [IQR, 0.70–0.89]; expert 3, 0.81 [IQR, 0.68–0.88]; mean accuracy, 0.82). For clinical benchmarking (n = 100 scans), experts rated AI-based segmentations higher on average compared with other experts (median Likert score, 9 [IQR, 7–9] vs 7 [IQR 7–9]) and rated more AI segmentations as clinically acceptable (80.2% vs 65.4%). Experts correctly predicted the origin of AI segmentations in an average of 26.0% of cases. Conclusion: Stepwise transfer learning enabled expert-level automated pediatric brain tumor autosegmentation and volumetric measurement with a high level of clinical acceptability.
AB - Purpose: To develop, externally test, and evaluate clinical acceptability of a deep learning pediatric brain tumor segmentation model using stepwise transfer learning. Materials and Methods: In this retrospective study, the authors leveraged two T2-weighted MRI datasets (May 2001 through December 2015) from a national brain tumor consortium (n = 184; median age, 7 years [range, 1–23 years]; 94 male patients) and a pediatric cancer center (n = 100; median age, 8 years [range, 1–19 years]; 47 male patients) to develop and evaluate deep learning neural networks for pediatric low-grade glioma segmentation using a stepwise transfer learning approach to maximize performance in a limited data scenario. The best model was externally tested on an independent test set and subjected to randomized blinded evaluation by three clinicians, wherein they assessed clinical acceptability of expert-and artificial intelligence (AI)–generated segmentations via 10-point Likert scales and Turing tests. Results: The best AI model used in-domain stepwise transfer learning (median Dice score coefficient, 0.88 [IQR, 0.72–0.91] vs 0.812 [IQR, 0.56–0.89] for baseline model; P = .049). With external testing, the AI model yielded excellent accuracy using reference standards from three clinical experts (median Dice similarity coefficients: expert 1, 0.83 [IQR, 0.75–0.90]; expert 2, 0.81 [IQR, 0.70–0.89]; expert 3, 0.81 [IQR, 0.68–0.88]; mean accuracy, 0.82). For clinical benchmarking (n = 100 scans), experts rated AI-based segmentations higher on average compared with other experts (median Likert score, 9 [IQR, 7–9] vs 7 [IQR 7–9]) and rated more AI segmentations as clinically acceptable (80.2% vs 65.4%). Experts correctly predicted the origin of AI segmentations in an average of 26.0% of cases. Conclusion: Stepwise transfer learning enabled expert-level automated pediatric brain tumor autosegmentation and volumetric measurement with a high level of clinical acceptability.
U2 - 10.1148/ryai.230254
DO - 10.1148/ryai.230254
M3 - Article
SN - 2638-6100
VL - 6
JO - Radiology: Artificial Intelligence
JF - Radiology: Artificial Intelligence
IS - 4
M1 - e230254
ER -