Diagnosis of Idiopathic Pulmonary Fibrosis in High-Resolution Computed Tomography Scans Using a Combination of Handcrafted Radiomics and Deep Learning

Turkey Refaee*, Zohaib Salahuddin, Anne-Noelle Frix, Chenggong Yan, Guangyao Wu, Henry C Woodruff, Hester Gietema, Paul Meunier, Renaud Louis, Julien Guiot, Philippe Lambin

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Purpose: To develop handcrafted radiomics (HCR) and deep learning (DL) based automated diagnostic tools that can differentiate between idiopathic pulmonary fibrosis (IPF) and non-IPF interstitial lung diseases (ILDs) in patients using high-resolution computed tomography (HRCT) scans.

Material and Methods: In this retrospective study, 474 HRCT scans were included (mean age, 64.10 years ± 9.57 [SD]). Five-fold cross-validation was performed on 365 HRCT scans. Furthermore, an external dataset comprising 109 patients was used as a test set. An HCR model, a DL model, and an ensemble of HCR and DL model were developed. A virtual in-silico trial was conducted with two radiologists and one pulmonologist on the same external test set for performance comparison. The performance was compared using DeLong method and McNemar test. Shapley Additive exPlanations (SHAP) plots and Grad-CAM heatmaps were used for the post-hoc interpretability of HCR and DL models, respectively.

Results: In five-fold cross-validation, the HCR model, DL model, and the ensemble of HCR and DL models achieved accuracies of 76.2 ± 6.8, 77.9 ± 4.6, and 85.2 ± 2.7%, respectively. For the diagnosis of IPF and non-IPF ILDs on the external test set, the HCR, DL, and the ensemble of HCR and DL models achieved accuracies of 76.1, 77.9, and 85.3%, respectively. The ensemble model outperformed the diagnostic performance of clinicians who achieved a mean accuracy of 66.3 ± 6.7% (p < 0.05) during the in-silico trial. The area under the receiver operating characteristic curve (AUC) for the ensemble model on the test set was 0.917 which was significantly higher than the HCR model (0.817, p = 0.02) and the DL model (0.823, p = 0.005). The agreement between HCR and DL models was 61.4%, and the accuracy and specificity for the predictions when both the models agree were 93 and 97%, respectively. SHAP analysis showed the texture features as the most important features for IPF diagnosis and Grad-CAM showed that the model focused on the clinically relevant part of the image.

Conclusion: Deep learning and HCR models can complement each other and serve as useful clinical aids for the diagnosis of IPF and non-IPF ILDs.

Original languageEnglish
Article number915243
Number of pages10
JournalFrontiers in medicine
Volume9
DOIs
Publication statusPublished - 23 Jun 2022

Keywords

  • IMAGES
  • SURVIVAL
  • artificial intelligence (AI)
  • computed tomography
  • idiopathic pulmonary fibrosis
  • interpretability
  • interstitial lung disease
  • radiomics

Cite this