Machine Learning based Drug Indication Prediction using Linked Open Data

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

41 Downloads (Pure)

Abstract

In this study, drug and disease features were obtained by querying open linked data to train our classifier for predicting new drug indications, and the predictive performance of the classifier for different validation schemes was evaluated. We collected the drug and disease data from Bio2RDF, an open source project that uses semantic web technologies to link data from multiple sources. A binary feature matrix was generated using drug target, substructure and side effects and disease ontology terms. We collected a broader collection of data containing 816 drugs and 1393 diseases with their features and gold standard data we generated by combining multiple drug indication data sources. We tried our method on a different dataset, compiled by other researchers, that confirmed the predictive value of our method independent of the primary data. A crucial flaw in the typical evaluation scheme for drug indication predictions that would yield unrealistic predictions is to fail to consider the paired nature of inputs. We partitioned the data in distinct training and test sets where not only pairs but also drugs/diseases are were not overlapped. We tested several classifiers under different cross validation schemes and compared our approach with existing methods. We observed that our model had better predictive performance than the existing models in disjoint cross-validation settings.

Original languageEnglish
Title of host publication10th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences
Publication statusPublished - 2017

Fingerprint

Dive into the research topics of 'Machine Learning based Drug Indication Prediction using Linked Open Data'. Together they form a unique fingerprint.

Cite this