Distributed learning on 20 000+ lung cancer patients - The Personal Health Train

Timo M. Deist; Frank J. W. M. Dankers; Priyanka Ojha; M. Scott Marshall; Tomas Janssen; Corinne Faivre-Finn; Carlotta Masciocchi; Vincenzo Valentini; Jiazhou Wang; Jiayan Chen; Zhen Zhang; Emiliano Spezi; Mick Button; Joost Jan Nuyttens; Rene Vernhout; Johan van Soest; Arthur Jochems; Rene Monshouwer; Johan Bussink; Gareth Price; Philippe Lambin; Andre Dekker

doi:10.1016/j.radonc.2019.11.019

Distributed learning on 20 000+ lung cancer patients - The Personal Health Train

Timo M. Deist, Frank J. W. M. Dankers, Priyanka Ojha, M. Scott Marshall, Tomas Janssen, Corinne Faivre-Finn, Carlotta Masciocchi, Vincenzo Valentini, Jiazhou Wang, Jiayan Chen, Zhen Zhang, Emiliano Spezi, Mick Button, Joost Jan Nuyttens, Rene Vernhout, Johan van Soest, Arthur Jochems, Rene Monshouwer, Johan Bussink, Gareth PricePhilippe Lambin, Andre Dekker^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute.

Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots.

Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015.

Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy. (C) 2019 The Authors. Published by Elsevier B.V.

Original language	English
Pages (from-to)	189-200
Number of pages	12
Journal	Radiotherapy and Oncology
Volume	144
DOIs	https://doi.org/10.1016/j.radonc.2019.11.019
Publication status	Published - Mar 2020

Keywords

Lung cancer
Big data
Distributed learning
Federated learning
Machine learning
Survival analysis
Prediction modeling
FAIR data
CARE

Access to Document

10.1016/j.radonc.2019.11.019Licence: CC BY-NC-ND

Cite this

Deist, T. M., Dankers, F. J. W. M., Ojha, P., Marshall, M. S., Janssen, T., Faivre-Finn, C., Masciocchi, C., Valentini, V., Wang, J., Chen, J., Zhang, Z., Spezi, E., Button, M., Nuyttens, J. J., Vernhout, R., van Soest, J., Jochems, A., Monshouwer, R., Bussink, J., ... Dekker, A. (2020). Distributed learning on 20 000+ lung cancer patients - The Personal Health Train. Radiotherapy and Oncology, 144, 189-200. https://doi.org/10.1016/j.radonc.2019.11.019

@article{9682251d4a7b4f88b554d4cd937e7dee,

title = "Distributed learning on 20 000+ lung cancer patients - The Personal Health Train",

abstract = "Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute.Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots.Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015.Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy. (C) 2019 The Authors. Published by Elsevier B.V.",

keywords = "Lung cancer, Big data, Distributed learning, Federated learning, Machine learning, Survival analysis, Prediction modeling, FAIR data, CARE",

author = "Deist, {Timo M.} and Dankers, {Frank J. W. M.} and Priyanka Ojha and Marshall, {M. Scott} and Tomas Janssen and Corinne Faivre-Finn and Carlotta Masciocchi and Vincenzo Valentini and Jiazhou Wang and Jiayan Chen and Zhen Zhang and Emiliano Spezi and Mick Button and Nuyttens, {Joost Jan} and Rene Vernhout and {van Soest}, Johan and Arthur Jochems and Rene Monshouwer and Johan Bussink and Gareth Price and Philippe Lambin and Andre Dekker",

note = "Funding Information: We would like to thank Wolfgang Wiessler (Varian Medical Systems) for his advice and technical support. Sophie Stovold is acknowledged for her work in developing the Velindre (Cardiff) database. Mieke Basten and Thierry Felkers are acknowledged for their work in developing the Nijmegen database. Els Berenschot-Huijbregts and Andras Zolnay are acknowledged for their work in developing the Rotterdam database. Robbert Hardenberg and Tony van de Velde are acknowledged for their work in developing the Amsterdam database. We would like to thank the following colleagues of the MDTB: - Giovanna Mantini, [6,7] Department Radiation Oncology. - A. Martino, [7] Department Radiation Oncology. - L. Boldrini, [6,7] Department Radiation Oncology. - A. Damiani, [7] Department Radiation Oncology. - S. Margaritora, [6,7] Department of Surgery. - M.T. Cogedo, [7] Department of Surgery. - F. Lococo, [7] Department of Surgery. - A. Farchione [7] Department Radiology. - G. Rindi, [6,7] Department of Pathology. We wish to acknowledge technical and financial support from the following organizations: Varian Medical Systems (VLP, SAGE); Netherlands Organisation for Scientific Research (grant n° 10696 DuCAT, BIONIC, VWData, grant n° P14-19 Radiomics STRaTegy); Province of Limburg (LIME); Dutch Cancer Society (TraIT2HealthRI, PROTRAIT); Health-RI; Netherlands Federation of University Medical Centres (Data4LifeSciences). This research is also supported by ERC advanced grant (ERC-ADG-2015, n° 694812), EUROSTARS (DART, DECIDE), the European Program H2020-2015-17 ImmunoSABR - n° 733008, PREDICT - ITN - n° 766276, TRANSCAN Joint Transnational Call 2016 (JTC2016 “CLEARLY”- n° UM 2017-8295), Interreg V-A Euregio Meuse-Rhine (“Euradiomics”) and Kankeronderzoekfonds Limburg from the Health Foundation Limburg; Cardiff University Data Innovation Research Institute Seedcorn Fund grant n° 23020-AC23024072/16; Velindre NHS Trust Charitable Funds grant n° 2017/12. Gareth Price and Corinne Faivre-Finn acknowledge the support of Cancer Research UK via funding to the Cancer Research Manchester Centre [C147/A18083] and [C147/A25254]. Corinne Faivre-Finn is supported by the NiHR Manchester Biomedical Research Centre. Appendix A Funding Information: We would like to thank Wolfgang Wiessler (Varian Medical Systems) for his advice and technical support. Sophie Stovold is acknowledged for her work in developing the Velindre (Cardiff) database. Mieke Basten and Thierry Felkers are acknowledged for their work in developing the Nijmegen database. Els Berenschot-Huijbregts and Andras Zolnay are acknowledged for their work in developing the Rotterdam database. Robbert Hardenberg and Tony van de Velde are acknowledged for their work in developing the Amsterdam database. We would like to thank the following colleagues of the MDTB:, - Giovanna Mantini, [6,7] Department Radiation Oncology. - A. Martino, [7] Department Radiation Oncology. - L. Boldrini, [6,7] Department Radiation Oncology. - A. Damiani, [7] Department Radiation Oncology. - S. Margaritora, [6,7] Department of Surgery. - M.T. Cogedo, [7] Department of Surgery. - F. Lococo, [7] Department of Surgery. - A. Farchione [7] Department Radiology. - G. Rindi, [6,7] Department of Pathology. We wish to acknowledge technical and financial support from the following organizations: Varian Medical Systems (VLP, SAGE); Netherlands Organisation for Scientific Research (grant n? 10696 DuCAT, BIONIC, VWData, grant n? P14-19 Radiomics STRaTegy); Province of Limburg (LIME); Dutch Cancer Society (TraIT2HealthRI, PROTRAIT); Health-RI; Netherlands Federation of University Medical Centres (Data4LifeSciences). This research is also supported by ERC advanced grant (ERC-ADG-2015, n? 694812), EUROSTARS (DART, DECIDE), the European Program H2020-2015-17 ImmunoSABR - n? 733008, PREDICT - ITN - n? 766276, TRANSCAN Joint Transnational Call 2016 (JTC2016 ?CLEARLY?- n? UM 2017-8295), Interreg V-A Euregio Meuse-Rhine (?Euradiomics?) and Kankeronderzoekfonds Limburg from the Health Foundation Limburg; Cardiff University Data Innovation Research Institute Seedcorn Fund grant n? 23020-AC23024072/16; Velindre NHS Trust Charitable Funds grant n? 2017/12. Gareth Price and Corinne Faivre-Finn acknowledge the support of Cancer Research UK via funding to the Cancer Research Manchester Centre [C147/A18083] and [C147/A25254]. Corinne Faivre-Finn is supported by the NiHR Manchester Biomedical Research Centre. Publisher Copyright: {\textcopyright} 2019 The Authors",

year = "2020",

month = mar,

doi = "10.1016/j.radonc.2019.11.019",

language = "English",

volume = "144",

pages = "189--200",

journal = "Radiotherapy and Oncology",

issn = "0167-8140",

publisher = "Elsevier Ireland Ltd",

}

Deist, TM, Dankers, FJWM, Ojha, P, Marshall, MS, Janssen, T, Faivre-Finn, C, Masciocchi, C, Valentini, V, Wang, J, Chen, J, Zhang, Z, Spezi, E, Button, M, Nuyttens, JJ, Vernhout, R, van Soest, J, Jochems, A, Monshouwer, R, Bussink, J, Price, G, Lambin, P & Dekker, A 2020, 'Distributed learning on 20 000+ lung cancer patients - The Personal Health Train', Radiotherapy and Oncology, vol. 144, pp. 189-200. https://doi.org/10.1016/j.radonc.2019.11.019

TY - JOUR

T1 - Distributed learning on 20 000+ lung cancer patients - The Personal Health Train

AU - Deist, Timo M.

AU - Dankers, Frank J. W. M.

AU - Ojha, Priyanka

AU - Marshall, M. Scott

AU - Janssen, Tomas

AU - Faivre-Finn, Corinne

AU - Masciocchi, Carlotta

AU - Valentini, Vincenzo

AU - Wang, Jiazhou

AU - Chen, Jiayan

AU - Zhang, Zhen

AU - Spezi, Emiliano

AU - Button, Mick

AU - Nuyttens, Joost Jan

AU - Vernhout, Rene

AU - van Soest, Johan

AU - Jochems, Arthur

AU - Monshouwer, Rene

AU - Bussink, Johan

AU - Price, Gareth

AU - Lambin, Philippe

AU - Dekker, Andre

N1 - Funding Information: We would like to thank Wolfgang Wiessler (Varian Medical Systems) for his advice and technical support. Sophie Stovold is acknowledged for her work in developing the Velindre (Cardiff) database. Mieke Basten and Thierry Felkers are acknowledged for their work in developing the Nijmegen database. Els Berenschot-Huijbregts and Andras Zolnay are acknowledged for their work in developing the Rotterdam database. Robbert Hardenberg and Tony van de Velde are acknowledged for their work in developing the Amsterdam database. We would like to thank the following colleagues of the MDTB: - Giovanna Mantini, [6,7] Department Radiation Oncology. - A. Martino, [7] Department Radiation Oncology. - L. Boldrini, [6,7] Department Radiation Oncology. - A. Damiani, [7] Department Radiation Oncology. - S. Margaritora, [6,7] Department of Surgery. - M.T. Cogedo, [7] Department of Surgery. - F. Lococo, [7] Department of Surgery. - A. Farchione [7] Department Radiology. - G. Rindi, [6,7] Department of Pathology. We wish to acknowledge technical and financial support from the following organizations: Varian Medical Systems (VLP, SAGE); Netherlands Organisation for Scientific Research (grant n° 10696 DuCAT, BIONIC, VWData, grant n° P14-19 Radiomics STRaTegy); Province of Limburg (LIME); Dutch Cancer Society (TraIT2HealthRI, PROTRAIT); Health-RI; Netherlands Federation of University Medical Centres (Data4LifeSciences). This research is also supported by ERC advanced grant (ERC-ADG-2015, n° 694812), EUROSTARS (DART, DECIDE), the European Program H2020-2015-17 ImmunoSABR - n° 733008, PREDICT - ITN - n° 766276, TRANSCAN Joint Transnational Call 2016 (JTC2016 “CLEARLY”- n° UM 2017-8295), Interreg V-A Euregio Meuse-Rhine (“Euradiomics”) and Kankeronderzoekfonds Limburg from the Health Foundation Limburg; Cardiff University Data Innovation Research Institute Seedcorn Fund grant n° 23020-AC23024072/16; Velindre NHS Trust Charitable Funds grant n° 2017/12. Gareth Price and Corinne Faivre-Finn acknowledge the support of Cancer Research UK via funding to the Cancer Research Manchester Centre [C147/A18083] and [C147/A25254]. Corinne Faivre-Finn is supported by the NiHR Manchester Biomedical Research Centre. Appendix A Funding Information: We would like to thank Wolfgang Wiessler (Varian Medical Systems) for his advice and technical support. Sophie Stovold is acknowledged for her work in developing the Velindre (Cardiff) database. Mieke Basten and Thierry Felkers are acknowledged for their work in developing the Nijmegen database. Els Berenschot-Huijbregts and Andras Zolnay are acknowledged for their work in developing the Rotterdam database. Robbert Hardenberg and Tony van de Velde are acknowledged for their work in developing the Amsterdam database. We would like to thank the following colleagues of the MDTB:, - Giovanna Mantini, [6,7] Department Radiation Oncology. - A. Martino, [7] Department Radiation Oncology. - L. Boldrini, [6,7] Department Radiation Oncology. - A. Damiani, [7] Department Radiation Oncology. - S. Margaritora, [6,7] Department of Surgery. - M.T. Cogedo, [7] Department of Surgery. - F. Lococo, [7] Department of Surgery. - A. Farchione [7] Department Radiology. - G. Rindi, [6,7] Department of Pathology. We wish to acknowledge technical and financial support from the following organizations: Varian Medical Systems (VLP, SAGE); Netherlands Organisation for Scientific Research (grant n? 10696 DuCAT, BIONIC, VWData, grant n? P14-19 Radiomics STRaTegy); Province of Limburg (LIME); Dutch Cancer Society (TraIT2HealthRI, PROTRAIT); Health-RI; Netherlands Federation of University Medical Centres (Data4LifeSciences). This research is also supported by ERC advanced grant (ERC-ADG-2015, n? 694812), EUROSTARS (DART, DECIDE), the European Program H2020-2015-17 ImmunoSABR - n? 733008, PREDICT - ITN - n? 766276, TRANSCAN Joint Transnational Call 2016 (JTC2016 ?CLEARLY?- n? UM 2017-8295), Interreg V-A Euregio Meuse-Rhine (?Euradiomics?) and Kankeronderzoekfonds Limburg from the Health Foundation Limburg; Cardiff University Data Innovation Research Institute Seedcorn Fund grant n? 23020-AC23024072/16; Velindre NHS Trust Charitable Funds grant n? 2017/12. Gareth Price and Corinne Faivre-Finn acknowledge the support of Cancer Research UK via funding to the Cancer Research Manchester Centre [C147/A18083] and [C147/A25254]. Corinne Faivre-Finn is supported by the NiHR Manchester Biomedical Research Centre. Publisher Copyright: © 2019 The Authors

PY - 2020/3

Y1 - 2020/3

N2 - Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute.Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots.Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015.Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy. (C) 2019 The Authors. Published by Elsevier B.V.

AB - Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute.Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots.Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015.Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy. (C) 2019 The Authors. Published by Elsevier B.V.

KW - Lung cancer

KW - Big data

KW - Distributed learning

KW - Federated learning

KW - Machine learning

KW - Survival analysis

KW - Prediction modeling

KW - FAIR data

KW - CARE

U2 - 10.1016/j.radonc.2019.11.019

DO - 10.1016/j.radonc.2019.11.019

M3 - Article

C2 - 31911366

SN - 0167-8140

VL - 144

SP - 189

EP - 200

JO - Radiotherapy and Oncology

JF - Radiotherapy and Oncology

ER -