Blockchain for Privacy Preserving and Trustworthy Distributed Machine Learning in Multicentric Medical Imaging (C-DistriM)

Fadila Zerka*, Visara Urovi, Akshayaa Vaidyanathan, Samir Barakat, Ralph T. H. Leijenaar, Sean Walsh, Hanif Gabrani-Juma, Benjamin Miraglio, Henry C. Woodruff, Michel Dumontier, Philippe Lambin

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

7 Citations (Web of Science)

Abstract

The utility of Artificial Intelligence (AI) in healthcare strongly depends upon the quality of the data used to build models, and the confidence in the predictions they generate. Access to sufficient amounts of high-quality data to build accurate and reliable models remains problematic owing to substantive legal and ethical constraints in making clinically relevant research data available offsite. New technologies such as distributed learning offer a pathway forward, but unfortunately tend to suffer from a lack of transparency, which undermines trust in what data are used for the analysis. To address such issues, we hypothesized that, a novel distributed learning that combines sequential distributed learning with a blockchain-based platform, namely Chained Distributed Machine learning C-DistriM, would be feasible and would give a similar result as a standard centralized approach. C-DistriM enables health centers to dynamically participate in training distributed learning models. We demonstrate C-DistriM using the NSCLC-Radiomics open data to predict two-year lung-cancer survival. A comparison of the performance of this distributed solution, evaluated in six different scenarios, and the centralized approach, showed no statistically significant difference (AUCs between central and distributed models), all DeLong tests yielded p-val > 0.05. This methodology removes the need to blindly trust the computation in one specific server on a distributed learning network. This fusion of blockchain and distributed learning serves as a proof-of-concept to increase transparency, trust, and ultimately accelerate the adoption of AI in multicentric studies. We conclude that our blockchain-based model for sequential training on distributed datasets is a feasible approach, provides equivalent performance to the centralized approach.

Original languageEnglish
Pages (from-to)183939-183951
Number of pages13
JournalIEEE Access
Volume8
DOIs
Publication statusPublished - 2020

Keywords

  • Data models
  • Training
  • Machine learning
  • Servers
  • Biomedical imaging
  • Blockchain
  • data privacy
  • decentralized learning
  • distributed learning
  • HEALTH-CARE
  • MODEL

Cite this