Deep learning diagnostic performance and visual insights in differentiating benign and malignant thyroid nodules on ultrasound images

Yujiang Liu; Ying Feng; Linxue Qian; Zhixiang Wang; Xiangdong Hu

doi:10.1177/15353702231220664

Deep learning diagnostic performance and visual insights in differentiating benign and malignant thyroid nodules on ultrasound images

Yujiang Liu, Ying Feng, Linxue Qian, Zhixiang Wang, Xiangdong Hu

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

This study aims to construct and evaluate a deep learning model, utilizing ultrasound images, to accurately differentiate benign and malignant thyroid nodules. The objective includes visualizing the model's process for interpretability and comparing its diagnostic precision with a cohort of 80 radiologists. We employed ResNet as the classification backbone for thyroid nodule prediction. The model was trained using 2096 ultrasound images of 655 distinct thyroid nodules. For performance evaluation, an independent test set comprising 100 cases of thyroid nodules was curated. In addition, to demonstrate the superiority of the artificial intelligence (AI) model over radiologists, a Turing test was conducted with 80 radiologists of varying clinical experience. This was meant to assess which group of radiologists' conclusions were in closer alignment with AI predictions. Furthermore, to highlight the interpretability of the AI model, gradient-weighted class activation mapping (Grad-CAM) was employed to visualize the model's areas of focus during its prediction process. In this cohort, AI diagnostics demonstrated a sensitivity of 81.67%, a specificity of 60%, and an overall diagnostic accuracy of 73%. In comparison, the panel of radiologists on average exhibited a diagnostic accuracy of 62.9%. The AI's diagnostic process was significantly faster than that of the radiologists. The generated heat-maps highlighted the model's focus on areas characterized by calcification, solid echo and higher echo intensity, suggesting these areas might be indicative of malignant thyroid nodules. Our study supports the notion that deep learning can be a valuable diagnostic tool with comparable accuracy to experienced senior radiologists in the diagnosis of malignant thyroid nodules. The interpretability of the AI model's process suggests that it could be clinically meaningful. Further studies are necessary to improve diagnostic accuracy and support auxiliary diagnoses in primary care settings.

Original language	English
Pages (from-to)	2538-2546
Number of pages	9
Journal	Experimental biology and medicine (Maywood, N.J.)
Volume	248
Issue number	24
DOIs	https://doi.org/10.1177/15353702231220664
Publication status	Published - 26 Jan 2024

Keywords

AI interpretability
Grad-CAM
ResNet
Thyroid nodules
deep learning
diagnostic accuracy
ultrasound images

Access to Document

10.1177/15353702231220664

Cite this

@article{6a7dddf656984046992f3c14f146483a,

title = "Deep learning diagnostic performance and visual insights in differentiating benign and malignant thyroid nodules on ultrasound images",

abstract = "This study aims to construct and evaluate a deep learning model, utilizing ultrasound images, to accurately differentiate benign and malignant thyroid nodules. The objective includes visualizing the model's process for interpretability and comparing its diagnostic precision with a cohort of 80 radiologists. We employed ResNet as the classification backbone for thyroid nodule prediction. The model was trained using 2096 ultrasound images of 655 distinct thyroid nodules. For performance evaluation, an independent test set comprising 100 cases of thyroid nodules was curated. In addition, to demonstrate the superiority of the artificial intelligence (AI) model over radiologists, a Turing test was conducted with 80 radiologists of varying clinical experience. This was meant to assess which group of radiologists' conclusions were in closer alignment with AI predictions. Furthermore, to highlight the interpretability of the AI model, gradient-weighted class activation mapping (Grad-CAM) was employed to visualize the model's areas of focus during its prediction process. In this cohort, AI diagnostics demonstrated a sensitivity of 81.67%, a specificity of 60%, and an overall diagnostic accuracy of 73%. In comparison, the panel of radiologists on average exhibited a diagnostic accuracy of 62.9%. The AI's diagnostic process was significantly faster than that of the radiologists. The generated heat-maps highlighted the model's focus on areas characterized by calcification, solid echo and higher echo intensity, suggesting these areas might be indicative of malignant thyroid nodules. Our study supports the notion that deep learning can be a valuable diagnostic tool with comparable accuracy to experienced senior radiologists in the diagnosis of malignant thyroid nodules. The interpretability of the AI model's process suggests that it could be clinically meaningful. Further studies are necessary to improve diagnostic accuracy and support auxiliary diagnoses in primary care settings.",

keywords = "AI interpretability, Grad-CAM, ResNet, Thyroid nodules, deep learning, diagnostic accuracy, ultrasound images",

author = "Yujiang Liu and Ying Feng and Linxue Qian and Zhixiang Wang and Xiangdong Hu",

year = "2024",

month = jan,

day = "26",

doi = "10.1177/15353702231220664",

language = "English",

volume = "248",

pages = "2538--2546",

journal = "Experimental biology and medicine (Maywood, N.J.)",

issn = "1535-3699",

publisher = "Frontiers",

number = "24",

}

TY - JOUR

T1 - Deep learning diagnostic performance and visual insights in differentiating benign and malignant thyroid nodules on ultrasound images

AU - Liu, Yujiang

AU - Feng, Ying

AU - Qian, Linxue

AU - Wang, Zhixiang

AU - Hu, Xiangdong

PY - 2024/1/26

Y1 - 2024/1/26

N2 - This study aims to construct and evaluate a deep learning model, utilizing ultrasound images, to accurately differentiate benign and malignant thyroid nodules. The objective includes visualizing the model's process for interpretability and comparing its diagnostic precision with a cohort of 80 radiologists. We employed ResNet as the classification backbone for thyroid nodule prediction. The model was trained using 2096 ultrasound images of 655 distinct thyroid nodules. For performance evaluation, an independent test set comprising 100 cases of thyroid nodules was curated. In addition, to demonstrate the superiority of the artificial intelligence (AI) model over radiologists, a Turing test was conducted with 80 radiologists of varying clinical experience. This was meant to assess which group of radiologists' conclusions were in closer alignment with AI predictions. Furthermore, to highlight the interpretability of the AI model, gradient-weighted class activation mapping (Grad-CAM) was employed to visualize the model's areas of focus during its prediction process. In this cohort, AI diagnostics demonstrated a sensitivity of 81.67%, a specificity of 60%, and an overall diagnostic accuracy of 73%. In comparison, the panel of radiologists on average exhibited a diagnostic accuracy of 62.9%. The AI's diagnostic process was significantly faster than that of the radiologists. The generated heat-maps highlighted the model's focus on areas characterized by calcification, solid echo and higher echo intensity, suggesting these areas might be indicative of malignant thyroid nodules. Our study supports the notion that deep learning can be a valuable diagnostic tool with comparable accuracy to experienced senior radiologists in the diagnosis of malignant thyroid nodules. The interpretability of the AI model's process suggests that it could be clinically meaningful. Further studies are necessary to improve diagnostic accuracy and support auxiliary diagnoses in primary care settings.

AB - This study aims to construct and evaluate a deep learning model, utilizing ultrasound images, to accurately differentiate benign and malignant thyroid nodules. The objective includes visualizing the model's process for interpretability and comparing its diagnostic precision with a cohort of 80 radiologists. We employed ResNet as the classification backbone for thyroid nodule prediction. The model was trained using 2096 ultrasound images of 655 distinct thyroid nodules. For performance evaluation, an independent test set comprising 100 cases of thyroid nodules was curated. In addition, to demonstrate the superiority of the artificial intelligence (AI) model over radiologists, a Turing test was conducted with 80 radiologists of varying clinical experience. This was meant to assess which group of radiologists' conclusions were in closer alignment with AI predictions. Furthermore, to highlight the interpretability of the AI model, gradient-weighted class activation mapping (Grad-CAM) was employed to visualize the model's areas of focus during its prediction process. In this cohort, AI diagnostics demonstrated a sensitivity of 81.67%, a specificity of 60%, and an overall diagnostic accuracy of 73%. In comparison, the panel of radiologists on average exhibited a diagnostic accuracy of 62.9%. The AI's diagnostic process was significantly faster than that of the radiologists. The generated heat-maps highlighted the model's focus on areas characterized by calcification, solid echo and higher echo intensity, suggesting these areas might be indicative of malignant thyroid nodules. Our study supports the notion that deep learning can be a valuable diagnostic tool with comparable accuracy to experienced senior radiologists in the diagnosis of malignant thyroid nodules. The interpretability of the AI model's process suggests that it could be clinically meaningful. Further studies are necessary to improve diagnostic accuracy and support auxiliary diagnoses in primary care settings.

KW - AI interpretability

KW - Grad-CAM

KW - ResNet

KW - Thyroid nodules

KW - deep learning

KW - diagnostic accuracy

KW - ultrasound images

U2 - 10.1177/15353702231220664

DO - 10.1177/15353702231220664

M3 - Article

SN - 1535-3699

VL - 248

SP - 2538

EP - 2546

JO - Experimental biology and medicine (Maywood, N.J.)

JF - Experimental biology and medicine (Maywood, N.J.)

IS - 24

ER -