Challenges and caveats of a multi-center retrospective radiomics study: an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics

Janna E. van Timmeren; Sara Carvalho; Ralph T. H. Leijenaar; Esther G. C. Troost; Wouter van Elmpt; Dirk de Ruysscher; Jean-Pierre Muratet; Fabrice Denis; Tanj A. Schimek-Jasch; Ursula Nestle; Arthur Jochems; Henry C. Woodruff; Cary Oberije; Philippe Lambin

doi:10.1371/journal.pone.0217536

Challenges and caveats of a multi-center retrospective radiomics study: an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics

Janna E. van Timmeren^*, Sara Carvalho, Ralph T. H. Leijenaar, Esther G. C. Troost, Wouter van Elmpt, Dirk de Ruysscher, Jean-Pierre Muratet, Fabrice Denis, Tanj A. Schimek-Jasch, Ursula Nestle, Arthur Jochems, Henry C. Woodruff, Cary Oberije, Philippe Lambin

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background

Prognostic models based on individual patient characteristics can improve treatment decisions and outcome in the future. In many (radiomic) studies, small size and heterogeneity of datasets is a challenge that often limits performance and potential clinical applicability of these models. The current study is example of a retrospective multi-centric study with challenges and caveats. To highlight common issues and emphasize potential pitfalls, we aimed for an extensive analysis of these multi-center pre-treatment datasets, with an additional 18 F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) scan acquired during treatment.

Methods

The dataset consisted of 138 stage II-IV non-small cell lung cancer (NSCLC) patients from four different cohorts acquired from three different institutes. The differences between the cohorts were compared in terms of clinical characteristics and using the so-called 'cohort differences model' approach. Moreover, the potential prognostic performances for overall survival of radiomic features extracted from CT or FDG-PET, or relative or absolute differences between the scans at the two time points, were assessed using the LASSO regression method. Furthermore, the performances of five different classifiers were evaluated for all image sets.

Results

The individual cohorts substantially differed in terms of patient characteristics. Moreover, the cohort differences model indicated statistically significant differences between the cohorts. Neither LASSO nor any of the tested classifiers resulted in a clinical relevant prognostic model that could be validated on the available datasets.

Conclusion

The results imply that the study might have been influenced by a limited sample size, heterogeneous patient characteristics, and inconsistent imaging parameters. No prognostic performance of FDG-PET or CT based radiomics models can be reported. This study highlights the necessity of extensive evaluations of cohorts and of validation datasets, especially in retrospective multi-centric datasets.

Original language	English
Article number	0217536
Number of pages	17
Journal	PLOS ONE
Volume	14
Issue number	6
DOIs	https://doi.org/10.1371/journal.pone.0217536
Publication status	Published - 3 Jun 2019

Keywords

CELL LUNG-CANCER
PROGNOSTIC VALUE
FEATURES
RECONSTRUCTION
SUV
DISCRETIZATION
VARIABILITY
PREDICTION
SURVIVAL
IMPACT

Access to Document

10.1371/journal.pone.0217536Licence: CC BY

Cite this

van Timmeren, J. E., Carvalho, S., Leijenaar, R. T. H., Troost, E. G. C., van Elmpt, W., de Ruysscher, D., Muratet, J.-P., Denis, F., Schimek-Jasch, T. A., Nestle, U., Jochems, A., Woodruff, H. C., Oberije, C., & Lambin, P. (2019). Challenges and caveats of a multi-center retrospective radiomics study: an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics. PLOS ONE, 14(6), Article 0217536. https://doi.org/10.1371/journal.pone.0217536

@article{87f55220b1584c4a8ac841106c22ca27,

title = "Challenges and caveats of a multi-center retrospective radiomics study: an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics",

abstract = "BackgroundPrognostic models based on individual patient characteristics can improve treatment decisions and outcome in the future. In many (radiomic) studies, small size and heterogeneity of datasets is a challenge that often limits performance and potential clinical applicability of these models. The current study is example of a retrospective multi-centric study with challenges and caveats. To highlight common issues and emphasize potential pitfalls, we aimed for an extensive analysis of these multi-center pre-treatment datasets, with an additional 18 F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) scan acquired during treatment.MethodsThe dataset consisted of 138 stage II-IV non-small cell lung cancer (NSCLC) patients from four different cohorts acquired from three different institutes. The differences between the cohorts were compared in terms of clinical characteristics and using the so-called 'cohort differences model' approach. Moreover, the potential prognostic performances for overall survival of radiomic features extracted from CT or FDG-PET, or relative or absolute differences between the scans at the two time points, were assessed using the LASSO regression method. Furthermore, the performances of five different classifiers were evaluated for all image sets.ResultsThe individual cohorts substantially differed in terms of patient characteristics. Moreover, the cohort differences model indicated statistically significant differences between the cohorts. Neither LASSO nor any of the tested classifiers resulted in a clinical relevant prognostic model that could be validated on the available datasets.ConclusionThe results imply that the study might have been influenced by a limited sample size, heterogeneous patient characteristics, and inconsistent imaging parameters. No prognostic performance of FDG-PET or CT based radiomics models can be reported. This study highlights the necessity of extensive evaluations of cohorts and of validation datasets, especially in retrospective multi-centric datasets.",

keywords = "CELL LUNG-CANCER, PROGNOSTIC VALUE, FEATURES, RECONSTRUCTION, SUV, DISCRETIZATION, VARIABILITY, PREDICTION, SURVIVAL, IMPACT",

author = "{van Timmeren}, {Janna E.} and Sara Carvalho and Leijenaar, {Ralph T. H.} and Troost, {Esther G. C.} and {van Elmpt}, Wouter and {de Ruysscher}, Dirk and Jean-Pierre Muratet and Fabrice Denis and Schimek-Jasch, {Tanj A.} and Ursula Nestle and Arthur Jochems and Woodruff, {Henry C.} and Cary Oberije and Philippe Lambin",

note = "Funding Information: Authors acknowledge financial support from ERC advanced grant (ERC-ADG-2015, n 694812 - Hypoximmuno) and the QuIC-ConCePT project, which is partly funded by EFPI A companies and the Innovative Medicine Initiative Joint Undertaking (IMI JU) under Grant Agreement No. 115151. This research is also supported by the Dutch technology Foundation STW (n P14-19 Radiomics STRaTegy), which is the applied science division of NWO, and the Technology Programme of the Ministry of Economic Affairs. Authors also acknowledge financial support from SME Phase 2 (RAIL - n673780), EUROSTARS (DART, DECIDE,), the European Program H2020-2015-17 (ImmunoSABR - n 733008 and PREDICT - ITN - n 766276), Interreg V-A Euregio Meuse-Rhine (“Euradiomics”) and Kankeronderzoekfonds Limburg from the Health Foundation Limburg. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Publisher Copyright: {\textcopyright} 2019 van Timmeren et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",

year = "2019",

month = jun,

day = "3",

doi = "10.1371/journal.pone.0217536",

language = "English",

volume = "14",

journal = "PLOS ONE",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "6",

}

van Timmeren, JE, Carvalho, S, Leijenaar, RTH, Troost, EGC, van Elmpt, W , de Ruysscher, D, Muratet, J-P, Denis, F, Schimek-Jasch, TA, Nestle, U, Jochems, A, Woodruff, HC, Oberije, C & Lambin, P 2019, 'Challenges and caveats of a multi-center retrospective radiomics study: an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics', PLOS ONE, vol. 14, no. 6, 0217536. https://doi.org/10.1371/journal.pone.0217536

Challenges and caveats of a multi-center retrospective radiomics study: an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics. / van Timmeren, Janna E.; Carvalho, Sara; Leijenaar, Ralph T. H. et al.
In: PLOS ONE, Vol. 14, No. 6, 0217536, 03.06.2019.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Challenges and caveats of a multi-center retrospective radiomics study

T2 - an example of early treatment response assessment for NSCLC patients using FDG-PET/CT radiomics

AU - van Timmeren, Janna E.

AU - Carvalho, Sara

AU - Leijenaar, Ralph T. H.

AU - Troost, Esther G. C.

AU - van Elmpt, Wouter

AU - de Ruysscher, Dirk

AU - Muratet, Jean-Pierre

AU - Denis, Fabrice

AU - Schimek-Jasch, Tanj A.

AU - Nestle, Ursula

AU - Jochems, Arthur

AU - Woodruff, Henry C.

AU - Oberije, Cary

AU - Lambin, Philippe

N1 - Funding Information: Authors acknowledge financial support from ERC advanced grant (ERC-ADG-2015, n 694812 - Hypoximmuno) and the QuIC-ConCePT project, which is partly funded by EFPI A companies and the Innovative Medicine Initiative Joint Undertaking (IMI JU) under Grant Agreement No. 115151. This research is also supported by the Dutch technology Foundation STW (n P14-19 Radiomics STRaTegy), which is the applied science division of NWO, and the Technology Programme of the Ministry of Economic Affairs. Authors also acknowledge financial support from SME Phase 2 (RAIL - n673780), EUROSTARS (DART, DECIDE,), the European Program H2020-2015-17 (ImmunoSABR - n 733008 and PREDICT - ITN - n 766276), Interreg V-A Euregio Meuse-Rhine (“Euradiomics”) and Kankeronderzoekfonds Limburg from the Health Foundation Limburg. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Publisher Copyright: © 2019 van Timmeren et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2019/6/3

Y1 - 2019/6/3

N2 - BackgroundPrognostic models based on individual patient characteristics can improve treatment decisions and outcome in the future. In many (radiomic) studies, small size and heterogeneity of datasets is a challenge that often limits performance and potential clinical applicability of these models. The current study is example of a retrospective multi-centric study with challenges and caveats. To highlight common issues and emphasize potential pitfalls, we aimed for an extensive analysis of these multi-center pre-treatment datasets, with an additional 18 F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) scan acquired during treatment.MethodsThe dataset consisted of 138 stage II-IV non-small cell lung cancer (NSCLC) patients from four different cohorts acquired from three different institutes. The differences between the cohorts were compared in terms of clinical characteristics and using the so-called 'cohort differences model' approach. Moreover, the potential prognostic performances for overall survival of radiomic features extracted from CT or FDG-PET, or relative or absolute differences between the scans at the two time points, were assessed using the LASSO regression method. Furthermore, the performances of five different classifiers were evaluated for all image sets.ResultsThe individual cohorts substantially differed in terms of patient characteristics. Moreover, the cohort differences model indicated statistically significant differences between the cohorts. Neither LASSO nor any of the tested classifiers resulted in a clinical relevant prognostic model that could be validated on the available datasets.ConclusionThe results imply that the study might have been influenced by a limited sample size, heterogeneous patient characteristics, and inconsistent imaging parameters. No prognostic performance of FDG-PET or CT based radiomics models can be reported. This study highlights the necessity of extensive evaluations of cohorts and of validation datasets, especially in retrospective multi-centric datasets.

AB - BackgroundPrognostic models based on individual patient characteristics can improve treatment decisions and outcome in the future. In many (radiomic) studies, small size and heterogeneity of datasets is a challenge that often limits performance and potential clinical applicability of these models. The current study is example of a retrospective multi-centric study with challenges and caveats. To highlight common issues and emphasize potential pitfalls, we aimed for an extensive analysis of these multi-center pre-treatment datasets, with an additional 18 F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) scan acquired during treatment.MethodsThe dataset consisted of 138 stage II-IV non-small cell lung cancer (NSCLC) patients from four different cohorts acquired from three different institutes. The differences between the cohorts were compared in terms of clinical characteristics and using the so-called 'cohort differences model' approach. Moreover, the potential prognostic performances for overall survival of radiomic features extracted from CT or FDG-PET, or relative or absolute differences between the scans at the two time points, were assessed using the LASSO regression method. Furthermore, the performances of five different classifiers were evaluated for all image sets.ResultsThe individual cohorts substantially differed in terms of patient characteristics. Moreover, the cohort differences model indicated statistically significant differences between the cohorts. Neither LASSO nor any of the tested classifiers resulted in a clinical relevant prognostic model that could be validated on the available datasets.ConclusionThe results imply that the study might have been influenced by a limited sample size, heterogeneous patient characteristics, and inconsistent imaging parameters. No prognostic performance of FDG-PET or CT based radiomics models can be reported. This study highlights the necessity of extensive evaluations of cohorts and of validation datasets, especially in retrospective multi-centric datasets.

KW - CELL LUNG-CANCER

KW - PROGNOSTIC VALUE

KW - FEATURES

KW - RECONSTRUCTION

KW - SUV

KW - DISCRETIZATION

KW - VARIABILITY

KW - PREDICTION

KW - SURVIVAL

KW - IMPACT

U2 - 10.1371/journal.pone.0217536

DO - 10.1371/journal.pone.0217536

M3 - Article

C2 - 31158263

SN - 1932-6203

VL - 14

JO - PLOS ONE

JF - PLOS ONE

IS - 6

M1 - 0217536

ER -