Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT

Timo Deist; Arthur Jochems; Johan van Soest; Georgi Nalbantov; Cary Oberije; Sean Walsh; Michael Eble; Paul Bulens; Philippe Coucke; Wim Dries; Andre Dekker; Philippe Lambin

doi:10.1016/j.ctro.2016.12.004

Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT

Timo Deist^*, Arthur Jochems, Johan van Soest, Georgi Nalbantov, Cary Oberije, Sean Walsh, Michael Eble, Paul Bulens, Philippe Coucke, Wim Dries, Andre Dekker, Philippe Lambin

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Machine learning applications for personalized medicine are highly dependent on access to sufﬁcient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to
ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identiﬁable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible.
We developed and implemented an IT infrastructure in ﬁve radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all ﬁve locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade P 2. The discriminative performance was assessed by the area under the curve (AUC) in a ﬁve-fold cross-validation (learning on four sites and validating on the ﬁfth). The perfor-
mance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed.
The euroCAT infrastructure has been successfully implemented in ﬁve radiation clinics across three countries. SVM models can be learned on data distributed over all ﬁve clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufﬁcient variation will pave the way for generalizable prediction models and personalized medicine.

Original language	English
Pages (from-to)	24-31
Number of pages	8
Journal	Clinical and Translational Radiation Oncology
Volume	4
DOIs	https://doi.org/10.1016/j.ctro.2016.12.004
Publication status	Published - Jun 2017

Keywords

Distributed learning
Support vector machine
Decision support systems
Predictive models
Dyspnea

Access to Document

10.1016/j.ctro.2016.12.004Licence: CC BY-NC-ND

Cite this

Deist, T., Jochems, A., van Soest, J., Nalbantov, G., Oberije, C., Walsh, S., Eble, M., Bulens, P., Coucke, P., Dries, W., Dekker, A., & Lambin, P. (2017). Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT. Clinical and Translational Radiation Oncology, 4, 24-31. https://doi.org/10.1016/j.ctro.2016.12.004

@article{d64490f83ffd4396afaabff49a5d30fe,

title = "Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT",

abstract = "Machine learning applications for personalized medicine are highly dependent on access to sufﬁcient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries toensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identiﬁable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible.We developed and implemented an IT infrastructure in ﬁve radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future {\textquoteleft}big data{\textquoteright} infrastructures and distributed learning studies. Lung cancer patient data was collected in all ﬁve locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade P 2. The discriminative performance was assessed by the area under the curve (AUC) in a ﬁve-fold cross-validation (learning on four sites and validating on the ﬁfth). The perfor-mance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in ﬁve radiation clinics across three countries. SVM models can be learned on data distributed over all ﬁve clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufﬁcient variation will pave the way for generalizable prediction models and personalized medicine.",

keywords = "Distributed learning, Support vector machine, Decision support systems, Predictive models, Dyspnea",

author = "Timo Deist and Arthur Jochems and {van Soest}, Johan and Georgi Nalbantov and Cary Oberije and Sean Walsh and Michael Eble and Paul Bulens and Philippe Coucke and Wim Dries and Andre Dekker and Philippe Lambin",

year = "2017",

month = jun,

doi = "10.1016/j.ctro.2016.12.004",

language = "English",

volume = "4",

pages = "24--31",

journal = "Clinical and Translational Radiation Oncology",

issn = "2405-6308",

publisher = "Elsevier Ireland Ltd",

}

Deist, T, Jochems, A, van Soest, J, Nalbantov, G, Oberije, C, Walsh, S, Eble, M, Bulens, P, Coucke, P, Dries, W, Dekker, A & Lambin, P 2017, 'Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT', Clinical and Translational Radiation Oncology, vol. 4, pp. 24-31. https://doi.org/10.1016/j.ctro.2016.12.004

TY - JOUR

T1 - Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care

T2 - euroCAT

AU - Deist, Timo

AU - Jochems, Arthur

AU - van Soest, Johan

AU - Nalbantov, Georgi

AU - Oberije, Cary

AU - Walsh, Sean

AU - Eble, Michael

AU - Bulens, Paul

AU - Coucke, Philippe

AU - Dries, Wim

AU - Dekker, Andre

AU - Lambin, Philippe

PY - 2017/6

Y1 - 2017/6

N2 - Machine learning applications for personalized medicine are highly dependent on access to sufﬁcient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries toensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identiﬁable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible.We developed and implemented an IT infrastructure in ﬁve radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all ﬁve locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade P 2. The discriminative performance was assessed by the area under the curve (AUC) in a ﬁve-fold cross-validation (learning on four sites and validating on the ﬁfth). The perfor-mance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in ﬁve radiation clinics across three countries. SVM models can be learned on data distributed over all ﬁve clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufﬁcient variation will pave the way for generalizable prediction models and personalized medicine.

AB - Machine learning applications for personalized medicine are highly dependent on access to sufﬁcient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries toensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identiﬁable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible.We developed and implemented an IT infrastructure in ﬁve radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all ﬁve locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade P 2. The discriminative performance was assessed by the area under the curve (AUC) in a ﬁve-fold cross-validation (learning on four sites and validating on the ﬁfth). The perfor-mance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in ﬁve radiation clinics across three countries. SVM models can be learned on data distributed over all ﬁve clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufﬁcient variation will pave the way for generalizable prediction models and personalized medicine.

KW - Distributed learning

KW - Support vector machine

KW - Decision support systems

KW - Predictive models

KW - Dyspnea

U2 - 10.1016/j.ctro.2016.12.004

DO - 10.1016/j.ctro.2016.12.004

M3 - Article

C2 - 29594204

SN - 2405-6308

VL - 4

SP - 24

EP - 31

JO - Clinical and Translational Radiation Oncology

JF - Clinical and Translational Radiation Oncology

ER -