TY - JOUR
T1 - Appropriate trust in artificial intelligence for the optical diagnosis of colorectal polyps
T2 - The role of human/artificial intelligence interaction
AU - van der Zander, Quirine E W
AU - Roumans, Rachel
AU - Kusters, Carolus H J
AU - Dehghani, Nikoo
AU - Masclee, Ad A M
AU - de With, Peter H N
AU - van der Sommen, Fons
AU - Snijders, Chris C P
AU - Schoon, Erik J
PY - 2024/6/26
Y1 - 2024/6/26
N2 - BACKGROUND AND AIMS: Computer-aided diagnosis (CADx) for optical diagnosis of colorectal polyps is thoroughly investigated. However, studies on human-artificial intelligence (AI) interaction are lacking. Aim was to investigate endoscopists' trust in CADx by evaluating whether communicating a calibrated algorithm confidence improved trust. METHODS: Endoscopists optically diagnosed 60 colorectal polyps. Initially, endoscopists diagnosed the polyps without CADx assistance (initial diagnosis). Immediately afterwards, the same polyp was again shown with CADx prediction; either only a prediction (benign or pre-malignant) or a prediction accompanied by a calibrated confidence score (0-100). A confidence score of 0 indicated a benign prediction, 100 a (pre-)malignant prediction. In half of the polyps CADx was mandatory, for the other half CADx was optional. After reviewing the CADx prediction, endoscopists made a final diagnosis. Histopathology was used as gold standard. Endoscopists' trust in CADx was measured as CADx prediction utilization; the willingness to follow CADx predictions when the endoscopists initially disagreed with the CADx prediction. RESULTS: Twenty-three endoscopists participated. Presenting CADx predictions increased the endoscopists' diagnostic accuracy (69.3% initial vs 76.6% final diagnosis, p<0.001). The CADx prediction was utilized in 36.5% (n=183/501) disagreements. Adding a confidence score led to a lower CADx prediction utilization, except when the confidence score surpassed 60. A mandatory CADx decreased CADx prediction utilization compared to an optional CADx. Appropriate trust, utilizing correct or disregarding incorrect CADx predictions was 48.7% (n=244/501). CONCLUSIONS: Appropriate trust was common and CADx prediction utilization was highest for the optional CADx without confidence scores. These results express the importance of a better understanding of human-AI interaction.
AB - BACKGROUND AND AIMS: Computer-aided diagnosis (CADx) for optical diagnosis of colorectal polyps is thoroughly investigated. However, studies on human-artificial intelligence (AI) interaction are lacking. Aim was to investigate endoscopists' trust in CADx by evaluating whether communicating a calibrated algorithm confidence improved trust. METHODS: Endoscopists optically diagnosed 60 colorectal polyps. Initially, endoscopists diagnosed the polyps without CADx assistance (initial diagnosis). Immediately afterwards, the same polyp was again shown with CADx prediction; either only a prediction (benign or pre-malignant) or a prediction accompanied by a calibrated confidence score (0-100). A confidence score of 0 indicated a benign prediction, 100 a (pre-)malignant prediction. In half of the polyps CADx was mandatory, for the other half CADx was optional. After reviewing the CADx prediction, endoscopists made a final diagnosis. Histopathology was used as gold standard. Endoscopists' trust in CADx was measured as CADx prediction utilization; the willingness to follow CADx predictions when the endoscopists initially disagreed with the CADx prediction. RESULTS: Twenty-three endoscopists participated. Presenting CADx predictions increased the endoscopists' diagnostic accuracy (69.3% initial vs 76.6% final diagnosis, p<0.001). The CADx prediction was utilized in 36.5% (n=183/501) disagreements. Adding a confidence score led to a lower CADx prediction utilization, except when the confidence score surpassed 60. A mandatory CADx decreased CADx prediction utilization compared to an optional CADx. Appropriate trust, utilizing correct or disregarding incorrect CADx predictions was 48.7% (n=244/501). CONCLUSIONS: Appropriate trust was common and CADx prediction utilization was highest for the optional CADx without confidence scores. These results express the importance of a better understanding of human-AI interaction.
U2 - 10.1016/j.gie.2024.06.029
DO - 10.1016/j.gie.2024.06.029
M3 - Article
SN - 0016-5107
JO - Gastrointestinal Endoscopy
JF - Gastrointestinal Endoscopy
ER -