TY - JOUR
T1 - Blockchain for Privacy Preserving and Trustworthy Distributed Machine Learning in Multicentric Medical Imaging (C-DistriM)
AU - Zerka, Fadila
AU - Urovi, Visara
AU - Vaidyanathan, Akshayaa
AU - Barakat, Samir
AU - Leijenaar, Ralph T. H.
AU - Walsh, Sean
AU - Gabrani-Juma, Hanif
AU - Miraglio, Benjamin
AU - Woodruff, Henry C.
AU - Dumontier, Michel
AU - Lambin, Philippe
PY - 2020
Y1 - 2020
N2 - The utility of Artificial Intelligence (AI) in healthcare strongly depends upon the quality of the data used to build models, and the confidence in the predictions they generate. Access to sufficient amounts of high-quality data to build accurate and reliable models remains problematic owing to substantive legal and ethical constraints in making clinically relevant research data available offsite. New technologies such as distributed learning offer a pathway forward, but unfortunately tend to suffer from a lack of transparency, which undermines trust in what data are used for the analysis. To address such issues, we hypothesized that, a novel distributed learning that combines sequential distributed learning with a blockchain-based platform, namely Chained Distributed Machine learning C-DistriM, would be feasible and would give a similar result as a standard centralized approach. C-DistriM enables health centers to dynamically participate in training distributed learning models. We demonstrate C-DistriM using the NSCLC-Radiomics open data to predict two-year lung-cancer survival. A comparison of the performance of this distributed solution, evaluated in six different scenarios, and the centralized approach, showed no statistically significant difference (AUCs between central and distributed models), all DeLong tests yielded p-val > 0.05. This methodology removes the need to blindly trust the computation in one specific server on a distributed learning network. This fusion of blockchain and distributed learning serves as a proof-of-concept to increase transparency, trust, and ultimately accelerate the adoption of AI in multicentric studies. We conclude that our blockchain-based model for sequential training on distributed datasets is a feasible approach, provides equivalent performance to the centralized approach.
AB - The utility of Artificial Intelligence (AI) in healthcare strongly depends upon the quality of the data used to build models, and the confidence in the predictions they generate. Access to sufficient amounts of high-quality data to build accurate and reliable models remains problematic owing to substantive legal and ethical constraints in making clinically relevant research data available offsite. New technologies such as distributed learning offer a pathway forward, but unfortunately tend to suffer from a lack of transparency, which undermines trust in what data are used for the analysis. To address such issues, we hypothesized that, a novel distributed learning that combines sequential distributed learning with a blockchain-based platform, namely Chained Distributed Machine learning C-DistriM, would be feasible and would give a similar result as a standard centralized approach. C-DistriM enables health centers to dynamically participate in training distributed learning models. We demonstrate C-DistriM using the NSCLC-Radiomics open data to predict two-year lung-cancer survival. A comparison of the performance of this distributed solution, evaluated in six different scenarios, and the centralized approach, showed no statistically significant difference (AUCs between central and distributed models), all DeLong tests yielded p-val > 0.05. This methodology removes the need to blindly trust the computation in one specific server on a distributed learning network. This fusion of blockchain and distributed learning serves as a proof-of-concept to increase transparency, trust, and ultimately accelerate the adoption of AI in multicentric studies. We conclude that our blockchain-based model for sequential training on distributed datasets is a feasible approach, provides equivalent performance to the centralized approach.
KW - Data models
KW - Training
KW - Machine learning
KW - Servers
KW - Biomedical imaging
KW - Blockchain
KW - data privacy
KW - decentralized learning
KW - distributed learning
KW - HEALTH-CARE
KW - MODEL
U2 - 10.1109/ACCESS.2020.3029445
DO - 10.1109/ACCESS.2020.3029445
M3 - Article
VL - 8
SP - 183939
EP - 183951
JO - IEEE Access
JF - IEEE Access
SN - 2169-3536
ER -