Relation extraction from DailyMed structured product labels by optimally combining crowd, experts and machines

K. Shingjergji, R. Celebi*, J. Scholtes, M. Dumontier

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


ABSTR A C T The effectiveness of machine learning models to provide accurate and consistent results in drug discovery and clinical decision support is strongly dependent on the quality of the data used. However, substantive amounts of open data that drive drug discovery suffer from a number of issues including inconsistent representation, inaccurate reporting, and incomplete context. For example, databases of FDA-approved drug indications used in computational drug repositioning studies do not distinguish between treatments that simply offer symptomatic relief from those that target the underlying pathology. Moreover, drug indication sources often lack proper provenance and have little overlap. Consequently, new predictions can be of poor quality as they offer little in the way of new insights. Hence, work remains to be done to establish higher quality databases of drug indications that are suitable for use in drug discovery and repositioning studies. Here, we report on the combination of weak supervision (i.e., programmatic labeling and crowdsourcing) and deep learning methods for relation extraction from DailyMed text to create a higher quality drug-disease relation dataset. The generated drug-disease relation data shows a high overlap with DrugCentral, a manually curated dataset. Using this dataset, we constructed a machine learning model to classify relations between drugs and diseases from text into four categories; treat-ment, symptomatic relief, contradiction, and effect, exhibiting an improvement of 15.5% with Bi-LSTM (F1 score of 71.8%) over the best performing discrete method. Access to high quality data is crucial to building accurate and reliable drug repurposing prediction models. Our work suggests how the combination of crowds, experts, and machine learning methods can go hand-in-hand to improve datasets and predictive models.
Original languageEnglish
Article number103902
Number of pages11
JournalJournal of Biomedical Informatics
Publication statusPublished - 1 Oct 2021


  • Drug-disease relation classification
  • Drug indications
  • Drug data quality
  • Drug repositioning
  • Weak supervision
  • Programmatic labeling
  • Crowdsourcing
  • Human-in-the-loop
  • Machine learning


Dive into the research topics of 'Relation extraction from DailyMed structured product labels by optimally combining crowd, experts and machines'. Together they form a unique fingerprint.

Cite this