TY - JOUR
T1 - Deep learning with uncertainty estimation for automatic tumor segmentation in PET/CT of head and neck cancers
T2 - Impact of model complexity, image processing and augmentation
AU - Huynh, Bao Ngoc
AU - Groendahl, Aurora Rosvoll
AU - Tomic, Oliver
AU - Liland, Kristian Hovde
AU - Knudtsen, Ingerid Skjei
AU - Hoebers, Frank
AU - van Elmpt, Wouter J C
AU - Dale, Einar
AU - Malinen, Eirik
AU - Futsaether, Cecilia Marie
PY - 2024/9/1
Y1 - 2024/9/1
N2 - Objective. Target volumes for radiotherapy are usually contoured manually, which can be time-consuming and prone to inter- and intra-observer variability. Automatic contouring by convolutional neural networks (CNN) can be fast and consistent but may produce unrealistic contours or miss relevant structures. We evaluate approaches for increasing the quality and assessing the uncertainty of CNN-generated contours of head and neck cancers with PET/CT as input. Approach. Two patient cohorts with head and neck squamous cell carcinoma and baseline
18F-fluorodeoxyglucose positron emission tomography and computed tomography images (FDG-PET/CT) were collected retrospectively from two centers. The union of manual contours of the gross primary tumor and involved nodes was used to train CNN models for generating automatic contours. The impact of image preprocessing, image augmentation, transfer learning and CNN complexity, architecture, and dimension (2D or 3D) on model performance and generalizability across centers was evaluated. A Monte Carlo dropout technique was used to quantify and visualize the uncertainty of the automatic contours. Main results. CNN models provided contours with good overlap with the manually contoured ground truth (median Dice Similarity Coefficient: 0.75-0.77), consistent with reported inter-observer variations and previous auto-contouring studies. Image augmentation and model dimension, rather than model complexity, architecture, or advanced image preprocessing, had the largest impact on model performance and cross-center generalizability. Transfer learning on a limited number of patients from a separate center increased model generalizability without decreasing model performance on the original training cohort. High model uncertainty was associated with false positive and false negative voxels as well as low Dice coefficients. Significance. High quality automatic contours can be obtained using deep learning architectures that are not overly complex. Uncertainty estimation of the predicted contours shows potential for highlighting regions of the contour requiring manual revision or flagging segmentations requiring manual inspection and intervention.
AB - Objective. Target volumes for radiotherapy are usually contoured manually, which can be time-consuming and prone to inter- and intra-observer variability. Automatic contouring by convolutional neural networks (CNN) can be fast and consistent but may produce unrealistic contours or miss relevant structures. We evaluate approaches for increasing the quality and assessing the uncertainty of CNN-generated contours of head and neck cancers with PET/CT as input. Approach. Two patient cohorts with head and neck squamous cell carcinoma and baseline
18F-fluorodeoxyglucose positron emission tomography and computed tomography images (FDG-PET/CT) were collected retrospectively from two centers. The union of manual contours of the gross primary tumor and involved nodes was used to train CNN models for generating automatic contours. The impact of image preprocessing, image augmentation, transfer learning and CNN complexity, architecture, and dimension (2D or 3D) on model performance and generalizability across centers was evaluated. A Monte Carlo dropout technique was used to quantify and visualize the uncertainty of the automatic contours. Main results. CNN models provided contours with good overlap with the manually contoured ground truth (median Dice Similarity Coefficient: 0.75-0.77), consistent with reported inter-observer variations and previous auto-contouring studies. Image augmentation and model dimension, rather than model complexity, architecture, or advanced image preprocessing, had the largest impact on model performance and cross-center generalizability. Transfer learning on a limited number of patients from a separate center increased model generalizability without decreasing model performance on the original training cohort. High model uncertainty was associated with false positive and false negative voxels as well as low Dice coefficients. Significance. High quality automatic contours can be obtained using deep learning architectures that are not overly complex. Uncertainty estimation of the predicted contours shows potential for highlighting regions of the contour requiring manual revision or flagging segmentations requiring manual inspection and intervention.
KW - GTV auto-segmentation
KW - contour uncertainty estimation
KW - deep learning
KW - head and neck cancers
U2 - 10.1088/2057-1976/ad6dcd
DO - 10.1088/2057-1976/ad6dcd
M3 - Article
SN - 2057-1976
VL - 10
JO - Biomedical Physics & Engineering Express
JF - Biomedical Physics & Engineering Express
IS - 5
M1 - 055038
ER -