TY - JOUR
T1 - Deep-learning system to improve the quality and efficiency of volumetric heart segmentation for breast cancer
AU - Zeleznik, Roman
AU - Weiss, Jakob
AU - Taron, Jana
AU - Guthier, Christian
AU - Bitterman, Danielle S.
AU - Hancox, Cindy
AU - Kann, Benjamin H.
AU - Kim, Daniel W.
AU - Punglia, Rinaa S.
AU - Bredfeldt, Jeremy
AU - Foldyna, Borek
AU - Eslami, Parastou
AU - Lu, Michael T.
AU - Hoffmann, Udo
AU - Mak, Raymond
AU - Aerts, Hugo J. W. L.
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021/3/5
Y1 - 2021/3/5
N2 - Although artificial intelligence algorithms are often developed and applied for narrow tasks, their implementation in other medical settings could help to improve patient care. Here we assess whether a deep-learning system for volumetric heart segmentation on computed tomography (CT) scans developed in cardiovascular radiology can optimize treatment planning in radiation oncology. The system was trained using multi-center data (n = 858) with manual heart segmentations provided by cardiovascular radiologists. Validation of the system was performed in an independent real-world dataset of 5677 breast cancer patients treated with radiation therapy at the Dana-Farber/Brigham and Women's Cancer Center between 2008-2018. In a subset of 20 patients, the performance of the system was compared to eight radiation oncology experts by assessing segmentation time, agreement between experts, and accuracy with and without deep-learning assistance. To compare the performance to segmentations used in the clinic, concordance and failures (defined as Dice < 0.85) of the system were evaluated in the entire dataset. The system was successfully applied without retraining. With deep-learning assistance, segmentation time significantly decreased (4.0 min [IQR 3.1-5.0] vs. 2.0 min [IQR 1.3-3.5]; p < 0.001), and agreement increased (Dice 0.95 [IQR = 0.02]; vs. 0.97 [IQR = 0.02], p < 0.001). Expert accuracy was similar with and without deep-learning assistance (Dice 0.92 [IQR = 0.02] vs. 0.92 [IQR = 0.02]; p = 0.48), and not significantly different from deep-learning-only segmentations (Dice 0.92 [IQR = 0.02]; p >= 0.1). In comparison to real-world data, the system showed high concordance (Dice 0.89 [IQR = 0.06]) across 5677 patients and a significantly lower failure rate (p < 0.001). These results suggest that deep-learning algorithms can successfully be applied across medical specialties and improve clinical care beyond the original field of interest.
AB - Although artificial intelligence algorithms are often developed and applied for narrow tasks, their implementation in other medical settings could help to improve patient care. Here we assess whether a deep-learning system for volumetric heart segmentation on computed tomography (CT) scans developed in cardiovascular radiology can optimize treatment planning in radiation oncology. The system was trained using multi-center data (n = 858) with manual heart segmentations provided by cardiovascular radiologists. Validation of the system was performed in an independent real-world dataset of 5677 breast cancer patients treated with radiation therapy at the Dana-Farber/Brigham and Women's Cancer Center between 2008-2018. In a subset of 20 patients, the performance of the system was compared to eight radiation oncology experts by assessing segmentation time, agreement between experts, and accuracy with and without deep-learning assistance. To compare the performance to segmentations used in the clinic, concordance and failures (defined as Dice < 0.85) of the system were evaluated in the entire dataset. The system was successfully applied without retraining. With deep-learning assistance, segmentation time significantly decreased (4.0 min [IQR 3.1-5.0] vs. 2.0 min [IQR 1.3-3.5]; p < 0.001), and agreement increased (Dice 0.95 [IQR = 0.02]; vs. 0.97 [IQR = 0.02], p < 0.001). Expert accuracy was similar with and without deep-learning assistance (Dice 0.92 [IQR = 0.02] vs. 0.92 [IQR = 0.02]; p = 0.48), and not significantly different from deep-learning-only segmentations (Dice 0.92 [IQR = 0.02]; p >= 0.1). In comparison to real-world data, the system showed high concordance (Dice 0.89 [IQR = 0.06]) across 5677 patients and a significantly lower failure rate (p < 0.001). These results suggest that deep-learning algorithms can successfully be applied across medical specialties and improve clinical care beyond the original field of interest.
KW - ARTIFICIAL-INTELLIGENCE
U2 - 10.1038/s41746-021-00416-5
DO - 10.1038/s41746-021-00416-5
M3 - Article
C2 - 33674717
SN - 2398-6352
VL - 4
JO - npj Digital Medicine
JF - npj Digital Medicine
IS - 1
M1 - 43
ER -