TY - JOUR
T1 - Diagnosis of Idiopathic Pulmonary Fibrosis in High-Resolution Computed Tomography Scans Using a Combination of Handcrafted Radiomics and Deep Learning
AU - Refaee, Turkey
AU - Salahuddin, Zohaib
AU - Frix, Anne-Noelle
AU - Yan, Chenggong
AU - Wu, Guangyao
AU - Woodruff, Henry C
AU - Gietema, Hester
AU - Meunier, Paul
AU - Louis, Renaud
AU - Guiot, Julien
AU - Lambin, Philippe
N1 - Copyright © 2022 Refaee, Salahuddin, Frix, Yan, Wu, Woodruff, Gietema, Meunier, Louis, Guiot and Lambin.
PY - 2022/6/23
Y1 - 2022/6/23
N2 - Purpose: To develop handcrafted radiomics (HCR) and deep learning (DL) based automated diagnostic tools that can differentiate between idiopathic pulmonary fibrosis (IPF) and non-IPF interstitial lung diseases (ILDs) in patients using high-resolution computed tomography (HRCT) scans.Material and Methods: In this retrospective study, 474 HRCT scans were included (mean age, 64.10 years ± 9.57 [SD]). Five-fold cross-validation was performed on 365 HRCT scans. Furthermore, an external dataset comprising 109 patients was used as a test set. An HCR model, a DL model, and an ensemble of HCR and DL model were developed. A virtual in-silico trial was conducted with two radiologists and one pulmonologist on the same external test set for performance comparison. The performance was compared using DeLong method and McNemar test. Shapley Additive exPlanations (SHAP) plots and Grad-CAM heatmaps were used for the post-hoc interpretability of HCR and DL models, respectively.Results: In five-fold cross-validation, the HCR model, DL model, and the ensemble of HCR and DL models achieved accuracies of 76.2 ± 6.8, 77.9 ± 4.6, and 85.2 ± 2.7%, respectively. For the diagnosis of IPF and non-IPF ILDs on the external test set, the HCR, DL, and the ensemble of HCR and DL models achieved accuracies of 76.1, 77.9, and 85.3%, respectively. The ensemble model outperformed the diagnostic performance of clinicians who achieved a mean accuracy of 66.3 ± 6.7% (p < 0.05) during the in-silico trial. The area under the receiver operating characteristic curve (AUC) for the ensemble model on the test set was 0.917 which was significantly higher than the HCR model (0.817, p = 0.02) and the DL model (0.823, p = 0.005). The agreement between HCR and DL models was 61.4%, and the accuracy and specificity for the predictions when both the models agree were 93 and 97%, respectively. SHAP analysis showed the texture features as the most important features for IPF diagnosis and Grad-CAM showed that the model focused on the clinically relevant part of the image.Conclusion: Deep learning and HCR models can complement each other and serve as useful clinical aids for the diagnosis of IPF and non-IPF ILDs.
AB - Purpose: To develop handcrafted radiomics (HCR) and deep learning (DL) based automated diagnostic tools that can differentiate between idiopathic pulmonary fibrosis (IPF) and non-IPF interstitial lung diseases (ILDs) in patients using high-resolution computed tomography (HRCT) scans.Material and Methods: In this retrospective study, 474 HRCT scans were included (mean age, 64.10 years ± 9.57 [SD]). Five-fold cross-validation was performed on 365 HRCT scans. Furthermore, an external dataset comprising 109 patients was used as a test set. An HCR model, a DL model, and an ensemble of HCR and DL model were developed. A virtual in-silico trial was conducted with two radiologists and one pulmonologist on the same external test set for performance comparison. The performance was compared using DeLong method and McNemar test. Shapley Additive exPlanations (SHAP) plots and Grad-CAM heatmaps were used for the post-hoc interpretability of HCR and DL models, respectively.Results: In five-fold cross-validation, the HCR model, DL model, and the ensemble of HCR and DL models achieved accuracies of 76.2 ± 6.8, 77.9 ± 4.6, and 85.2 ± 2.7%, respectively. For the diagnosis of IPF and non-IPF ILDs on the external test set, the HCR, DL, and the ensemble of HCR and DL models achieved accuracies of 76.1, 77.9, and 85.3%, respectively. The ensemble model outperformed the diagnostic performance of clinicians who achieved a mean accuracy of 66.3 ± 6.7% (p < 0.05) during the in-silico trial. The area under the receiver operating characteristic curve (AUC) for the ensemble model on the test set was 0.917 which was significantly higher than the HCR model (0.817, p = 0.02) and the DL model (0.823, p = 0.005). The agreement between HCR and DL models was 61.4%, and the accuracy and specificity for the predictions when both the models agree were 93 and 97%, respectively. SHAP analysis showed the texture features as the most important features for IPF diagnosis and Grad-CAM showed that the model focused on the clinically relevant part of the image.Conclusion: Deep learning and HCR models can complement each other and serve as useful clinical aids for the diagnosis of IPF and non-IPF ILDs.
KW - IMAGES
KW - SURVIVAL
KW - artificial intelligence (AI)
KW - computed tomography
KW - idiopathic pulmonary fibrosis
KW - interpretability
KW - interstitial lung disease
KW - radiomics
U2 - 10.3389/fmed.2022.915243
DO - 10.3389/fmed.2022.915243
M3 - Article
C2 - 35814761
SN - 2296-858X
VL - 9
JO - Frontiers in medicine
JF - Frontiers in medicine
M1 - 915243
ER -