Challenges and caveats of a multi-center retrospective radiomics study: an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics

Janna E. van Timmeren*, Sara Carvalho, Ralph T. H. Leijenaar, Esther G. C. Troost, Wouter van Elmpt, Dirk de Ruysscher, Jean-Pierre Muratet, Fabrice Denis, Tanj A. Schimek-Jasch, Ursula Nestle, Arthur Jochems, Henry C. Woodruff, Cary Oberije, Philippe Lambin

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background

Prognostic models based on individual patient characteristics can improve treatment decisions and outcome in the future. In many (radiomic) studies, small size and heterogeneity of datasets is a challenge that often limits performance and potential clinical applicability of these models. The current study is example of a retrospective multi-centric study with challenges and caveats. To highlight common issues and emphasize potential pitfalls, we aimed for an extensive analysis of these multi-center pre-treatment datasets, with an additional 18 F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) scan acquired during treatment.

Methods

The dataset consisted of 138 stage II-IV non-small cell lung cancer (NSCLC) patients from four different cohorts acquired from three different institutes. The differences between the cohorts were compared in terms of clinical characteristics and using the so-called 'cohort differences model' approach. Moreover, the potential prognostic performances for overall survival of radiomic features extracted from CT or FDG-PET, or relative or absolute differences between the scans at the two time points, were assessed using the LASSO regression method. Furthermore, the performances of five different classifiers were evaluated for all image sets.

Results

The individual cohorts substantially differed in terms of patient characteristics. Moreover, the cohort differences model indicated statistically significant differences between the cohorts. Neither LASSO nor any of the tested classifiers resulted in a clinical relevant prognostic model that could be validated on the available datasets.

Conclusion

The results imply that the study might have been influenced by a limited sample size, heterogeneous patient characteristics, and inconsistent imaging parameters. No prognostic performance of FDG-PET or CT based radiomics models can be reported. This study highlights the necessity of extensive evaluations of cohorts and of validation datasets, especially in retrospective multi-centric datasets.

Original languageEnglish
Article number0217536
Number of pages17
JournalPLOS ONE
Volume14
Issue number6
DOIs
Publication statusPublished - 3 Jun 2019

Keywords

  • CELL LUNG-CANCER
  • PROGNOSTIC VALUE
  • FEATURES
  • RECONSTRUCTION
  • SUV
  • DISCRETIZATION
  • VARIABILITY
  • PREDICTION
  • SURVIVAL
  • IMPACT

Cite this