Reinvestigating the performance of artificial intelligence classification algorithms on COVID-19 X-Ray and CT images

Rui Cao; Yanan Liu; Xin Wen; Caiqing Liao; Xin Wang; Yuan Gao; Tao Tan

doi:10.1016/j.isci.2024.109712

Reinvestigating the performance of artificial intelligence classification algorithms on COVID-19 X-Ray and CT images

Rui Cao, Yanan Liu, Xin Wen, Caiqing Liao, Xin Wang, Yuan Gao, Tao Tan^*

^*Corresponding author for this work

GROW - Innovative Cancer Diagnostics & Therapy

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

There are concerns that artificial intelligence (AI) algorithms may create underdiagnosis bias by mislabeling patient individuals with certain attributes (e.g., female and young) as healthy. Addressing this bias is crucial given the urgent need for AI diagnostics facing rapidly spreading infectious diseases like COVID-19. We find the prevalent AI diagnostic models show an underdiagnosis rate among specific patient populations, and the underdiagnosis rate is higher in some intersectional specific patient populations (for example, females aged 20–40 years). Additionally, we find training AI models on heterogeneous datasets (positive and negative samples from different datasets) may lead to poor model generalization. The model's classification performance varies significantly across test sets, with the accuracy of the better performance being over 40% higher than that of the poor performance. In conclusion, we developed an AI bias analysis pipeline to help researchers recognize and address biases that impact medical equality and ethics.

Original language	English
Article number	109712
Journal	iScience
Volume	27
Issue number	5
DOIs	https://doi.org/10.1016/j.isci.2024.109712
Publication status	Published - 17 May 2024

Keywords

Artificial intelligence applications
Health informatics
Microbiology

Access to Document

10.1016/j.isci.2024.109712Licence: CC BY-NC

Cite this

@article{a21b630a899f48fc905b863ff6eace38,

title = "Reinvestigating the performance of artificial intelligence classification algorithms on COVID-19 X-Ray and CT images",

abstract = "There are concerns that artificial intelligence (AI) algorithms may create underdiagnosis bias by mislabeling patient individuals with certain attributes (e.g., female and young) as healthy. Addressing this bias is crucial given the urgent need for AI diagnostics facing rapidly spreading infectious diseases like COVID-19. We find the prevalent AI diagnostic models show an underdiagnosis rate among specific patient populations, and the underdiagnosis rate is higher in some intersectional specific patient populations (for example, females aged 20–40 years). Additionally, we find training AI models on heterogeneous datasets (positive and negative samples from different datasets) may lead to poor model generalization. The model's classification performance varies significantly across test sets, with the accuracy of the better performance being over 40% higher than that of the poor performance. In conclusion, we developed an AI bias analysis pipeline to help researchers recognize and address biases that impact medical equality and ethics.",

keywords = "Artificial intelligence applications, Health informatics, Microbiology",

author = "Rui Cao and Yanan Liu and Xin Wen and Caiqing Liao and Xin Wang and Yuan Gao and Tao Tan",

note = "Funding Information: This work was supported by the National Natural Science Foundation of China ( 62206196 ), the Natural Science Foundation of Shanxi ( 202103021223035 ), Macao Polytechnic University grant ( RP/FCA-05/2022 ), and Science and Technology Development Fund, Macao ( 0021/2022/AGJ ). Publisher Copyright: {\textcopyright} 2024 The Authors",

year = "2024",

month = may,

day = "17",

doi = "10.1016/j.isci.2024.109712",

language = "English",

volume = "27",

journal = "iScience",

issn = "2589-0042",

publisher = "Elsevier Inc.",

number = "5",

}

TY - JOUR

T1 - Reinvestigating the performance of artificial intelligence classification algorithms on COVID-19 X-Ray and CT images

AU - Cao, Rui

AU - Liu, Yanan

AU - Wen, Xin

AU - Liao, Caiqing

AU - Wang, Xin

AU - Gao, Yuan

AU - Tan, Tao

N1 - Funding Information: This work was supported by the National Natural Science Foundation of China ( 62206196 ), the Natural Science Foundation of Shanxi ( 202103021223035 ), Macao Polytechnic University grant ( RP/FCA-05/2022 ), and Science and Technology Development Fund, Macao ( 0021/2022/AGJ ). Publisher Copyright: © 2024 The Authors

PY - 2024/5/17

Y1 - 2024/5/17

N2 - There are concerns that artificial intelligence (AI) algorithms may create underdiagnosis bias by mislabeling patient individuals with certain attributes (e.g., female and young) as healthy. Addressing this bias is crucial given the urgent need for AI diagnostics facing rapidly spreading infectious diseases like COVID-19. We find the prevalent AI diagnostic models show an underdiagnosis rate among specific patient populations, and the underdiagnosis rate is higher in some intersectional specific patient populations (for example, females aged 20–40 years). Additionally, we find training AI models on heterogeneous datasets (positive and negative samples from different datasets) may lead to poor model generalization. The model's classification performance varies significantly across test sets, with the accuracy of the better performance being over 40% higher than that of the poor performance. In conclusion, we developed an AI bias analysis pipeline to help researchers recognize and address biases that impact medical equality and ethics.

AB - There are concerns that artificial intelligence (AI) algorithms may create underdiagnosis bias by mislabeling patient individuals with certain attributes (e.g., female and young) as healthy. Addressing this bias is crucial given the urgent need for AI diagnostics facing rapidly spreading infectious diseases like COVID-19. We find the prevalent AI diagnostic models show an underdiagnosis rate among specific patient populations, and the underdiagnosis rate is higher in some intersectional specific patient populations (for example, females aged 20–40 years). Additionally, we find training AI models on heterogeneous datasets (positive and negative samples from different datasets) may lead to poor model generalization. The model's classification performance varies significantly across test sets, with the accuracy of the better performance being over 40% higher than that of the poor performance. In conclusion, we developed an AI bias analysis pipeline to help researchers recognize and address biases that impact medical equality and ethics.

KW - Artificial intelligence applications

KW - Health informatics

KW - Microbiology

U2 - 10.1016/j.isci.2024.109712

DO - 10.1016/j.isci.2024.109712

M3 - Article

SN - 2589-0042

VL - 27

JO - iScience

JF - iScience

IS - 5

M1 - 109712

ER -