Appropriate trust in artificial intelligence for the optical diagnosis of colorectal polyps: The role of human/artificial intelligence interaction

Quirine E W van der Zander*, Rachel Roumans, Carolus H J Kusters, Nikoo Dehghani, Ad A M Masclee, Peter H N de With, Fons van der Sommen, Chris C P Snijders, Erik J Schoon

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

BACKGROUND AND AIMS: Computer-aided diagnosis (CADx) for optical diagnosis of colorectal polyps is thoroughly investigated. However, studies on human-artificial intelligence (AI) interaction are lacking. Aim was to investigate endoscopists' trust in CADx by evaluating whether communicating a calibrated algorithm confidence improved trust. METHODS: Endoscopists optically diagnosed 60 colorectal polyps. Initially, endoscopists diagnosed the polyps without CADx assistance (initial diagnosis). Immediately afterwards, the same polyp was again shown with CADx prediction; either only a prediction (benign or pre-malignant) or a prediction accompanied by a calibrated confidence score (0-100). A confidence score of 0 indicated a benign prediction, 100 a (pre-)malignant prediction. In half of the polyps CADx was mandatory, for the other half CADx was optional. After reviewing the CADx prediction, endoscopists made a final diagnosis. Histopathology was used as gold standard. Endoscopists' trust in CADx was measured as CADx prediction utilization; the willingness to follow CADx predictions when the endoscopists initially disagreed with the CADx prediction. RESULTS: Twenty-three endoscopists participated. Presenting CADx predictions increased the endoscopists' diagnostic accuracy (69.3% initial vs 76.6% final diagnosis, p<0.001). The CADx prediction was utilized in 36.5% (n=183/501) disagreements. Adding a confidence score led to a lower CADx prediction utilization, except when the confidence score surpassed 60. A mandatory CADx decreased CADx prediction utilization compared to an optional CADx. Appropriate trust, utilizing correct or disregarding incorrect CADx predictions was 48.7% (n=244/501). CONCLUSIONS: Appropriate trust was common and CADx prediction utilization was highest for the optional CADx without confidence scores. These results express the importance of a better understanding of human-AI interaction.
Original languageEnglish
JournalGastrointestinal Endoscopy
DOIs
Publication statusE-pub ahead of print - 26 Jun 2024

Fingerprint

Dive into the research topics of 'Appropriate trust in artificial intelligence for the optical diagnosis of colorectal polyps: The role of human/artificial intelligence interaction'. Together they form a unique fingerprint.

Cite this