Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT

Timo Deist*, Arthur Jochems, Johan van Soest, Georgi Nalbantov, Cary Oberije, Sean Walsh, Michael Eble, Paul Bulens, Philippe Coucke, Wim Dries, Andre Dekker, Philippe Lambin

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to
ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible.
We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade P 2. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The perfor-
mance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed.
The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine.
Original languageEnglish
Pages (from-to)24-31
Number of pages8
JournalClinical and Translational Radiation Oncology
Volume4
DOIs
Publication statusPublished - Jun 2017

Keywords

  • Distributed learning
  • Support vector machine
  • Decision support systems
  • Predictive models
  • Dyspnea

Cite this