Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept

Arthur Jochems*, Timo M. Deist, Johan van Soest, Michael Eble, Paul Bulens, Philippe Coucke, Wim Dries, Philippe Lambin, Andre Dekker

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

PURPOSE\nOne of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital. \n\nPATIENTS AND METHODS\nClinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)). A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer. \n\nRESULTS\nWe show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51–0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets. \n\nCONCLUSION\nDistributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws.
Original languageEnglish
Pages (from-to)459-467
Number of pages9
JournalRadiotherapy and Oncology
Volume121
Issue number3
DOIs
Publication statusPublished - Dec 2016

Keywords

  • Bayesian networks
  • Distributed learning
  • Privacy preserving data-mining
  • Dyspnea
  • Machine learning
  • LUNG-CANCER
  • BAYESIAN NETWORK
  • RADIOTHERAPY RESEARCH
  • EXTERNAL VALIDATION
  • CLINICAL-DATA
  • HEALTH-CARE
  • TOXICITY
  • ONCOLOGY

Cite this