Identifying disease trajectories with predicate information from a knowledge graph

Wytze J. Vlietstra*, Rein Vos, Marjan van den Akker, Erik M. van Mulligen, Jan A. Kors

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Web of Science)

Abstract

Background Knowledge graphs can represent the contents of biomedical literature and databases as subject-predicate-object triples, thereby enabling comprehensive analyses that identify e.g. relationships between diseases. Some diseases are often diagnosed in patients in specific temporal sequences, which are referred to as disease trajectories. Here, we determine whether a sequence of two diseases forms a trajectory by leveraging the predicate information from paths between (disease) proteins in a knowledge graph. Furthermore, we determine the added value of directional information of predicates for this task. To do so, we create four feature sets, based on two methods for representing indirect paths, and both with and without directional information of predicates (i.e., which protein is considered subject and which object). The added value of the directional information of predicates is quantified by comparing the classification performance of the feature sets that include or exclude it. Results Our method achieved a maximum area under the ROC curve of 89.8% and 74.5% when evaluated with two different reference sets. Use of directional information of predicates significantly improved performance by 6.5 and 2.0 percentage points respectively. Conclusions Our work demonstrates that predicates between proteins can be used to identify disease trajectories. Using the directional information of predicates significantly improved performance over not using this information.

Original languageEnglish
Article number9
Number of pages11
JournalJournal of biomedical semantics
Volume11
Issue number1
DOIs
Publication statusPublished - 20 Aug 2020

Keywords

  • Knowledge graph
  • Disease trajectories
  • Predicates
  • Temporal relationships
  • Directionality of predicates
  • Protein-protein interactions
  • NETWORK ANALYSIS
  • SEMANTIC WEB
  • ASSOCIATIONS
  • SEPSIS
  • GENES
  • RISK

Cite this