Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept

Arthur Jochems; Timo M. Deist; Johan van Soest; Michael Eble; Paul Bulens; Philippe Coucke; Wim Dries; Philippe Lambin; Andre Dekker

doi:10.1016/j.radonc.2016.10.002

Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept

Arthur Jochems^*, Timo M. Deist, Johan van Soest, Michael Eble, Paul Bulens, Philippe Coucke, Wim Dries, Philippe Lambin, Andre Dekker

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

PURPOSE\nOne of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital. \n\nPATIENTS AND METHODS\nClinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)). A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer. \n\nRESULTS\nWe show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51–0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets. \n\nCONCLUSION\nDistributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws.

Original language	English
Pages (from-to)	459-467
Number of pages	9
Journal	Radiotherapy and Oncology
Volume	121
Issue number	3
DOIs	https://doi.org/10.1016/j.radonc.2016.10.002
Publication status	Published - Dec 2016

Keywords

Bayesian networks
Distributed learning
Privacy preserving data-mining
Dyspnea
Machine learning
LUNG-CANCER
BAYESIAN NETWORK
RADIOTHERAPY RESEARCH
EXTERNAL VALIDATION
CLINICAL-DATA
HEALTH-CARE
TOXICITY
ONCOLOGY

Access to Document

10.1016/j.radonc.2016.10.002Licence: CC BY-NC-ND

http://www.mendeley.com/research/distributed-learning-developing-predictive-model-based-data-multiple-hospitals-without-data-leaving

Cite this

@article{dd0b3f622b634fa7a700932d312befcf,

title = "Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept",

abstract = "PURPOSE\nOne of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital. \n\nPATIENTS AND METHODS\nClinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)). A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer. \n\nRESULTS\nWe show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51–0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets. \n\nCONCLUSION\nDistributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws.",

keywords = "Bayesian networks, Distributed learning, Privacy preserving data-mining, Dyspnea, Machine learning, LUNG-CANCER, BAYESIAN NETWORK, RADIOTHERAPY RESEARCH, EXTERNAL VALIDATION, CLINICAL-DATA, HEALTH-CARE, TOXICITY, ONCOLOGY",

author = "Arthur Jochems and Deist, {Timo M.} and {van Soest}, Johan and Michael Eble and Paul Bulens and Philippe Coucke and Wim Dries and Philippe Lambin and Andre Dekker",

year = "2016",

month = dec,

doi = "10.1016/j.radonc.2016.10.002",

language = "English",

volume = "121",

pages = "459--467",

journal = "Radiotherapy and Oncology",

issn = "0167-8140",

publisher = "Elsevier Ireland Ltd",

number = "3",

}

TY - JOUR

T1 - Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept

AU - Jochems, Arthur

AU - Deist, Timo M.

AU - van Soest, Johan

AU - Eble, Michael

AU - Bulens, Paul

AU - Coucke, Philippe

AU - Dries, Wim

AU - Lambin, Philippe

AU - Dekker, Andre

PY - 2016/12

Y1 - 2016/12

N2 - PURPOSE\nOne of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital. \n\nPATIENTS AND METHODS\nClinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)). A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer. \n\nRESULTS\nWe show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51–0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets. \n\nCONCLUSION\nDistributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws.

AB - PURPOSE\nOne of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital. \n\nPATIENTS AND METHODS\nClinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)). A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer. \n\nRESULTS\nWe show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51–0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets. \n\nCONCLUSION\nDistributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws.

KW - Bayesian networks

KW - Distributed learning

KW - Privacy preserving data-mining

KW - Dyspnea

KW - Machine learning

KW - LUNG-CANCER

KW - BAYESIAN NETWORK

KW - RADIOTHERAPY RESEARCH

KW - EXTERNAL VALIDATION

KW - CLINICAL-DATA

KW - HEALTH-CARE

KW - TOXICITY

KW - ONCOLOGY

U2 - 10.1016/j.radonc.2016.10.002

DO - 10.1016/j.radonc.2016.10.002

M3 - Article

C2 - 28029405

SN - 0167-8140

VL - 121

SP - 459

EP - 467

JO - Radiotherapy and Oncology

JF - Radiotherapy and Oncology

IS - 3

ER -