Distant supervision of relation extraction in sparse data

Bijan Ranjbar-Sahraei, Hossein Rahmani*, Gerhard Weiss, Karl Tuyls

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

109 Downloads (Pure)

Abstract

To extract structured knowledge from unstructured text sources we need to understand the semantic relationships between entities. State-of-the-art relation extraction techniques take advantage of the abundance of data on the web. However, in domains with sparse data such as social networks which have limited occurrences of entities and relationship patterns, bootstrapping techniques and pattern detection methods are inefficient and inaccurate. In this paper, we introduce REDS, a Relation Extraction approach based on Distant Supervision. REDS extracts the named entities from text documents and assigns a fingerprint to each potential relationship among the named entities. Then, it queries a knowledge repository for similar matches to each fingerprint. An assessor uses the query results and the data statistics to measure the validity of the relationships corresponding to the queried fingerprints, and labels each potential relationship with the predicted type. In addition to handling the relation extraction in presence of data sparsity, REDS uses an information retrieval framework that makes it scalable and capable of dealing with noisy data. We implement and test REDS on a non-English historical archive consisting of unstructured notarial acts and structured civil registers; By means of manual evaluations REDS achieves precision of 0.90.

Original languageEnglish
Pages (from-to)1145-1166
Number of pages22
JournalIntelligent Data Analysis
Volume23
Issue number5
DOIs
Publication statusPublished - 2019

Keywords

  • Relation extraction
  • distant supervision
  • identity resolution
  • NAMED ENTITY RECOGNITION

Cite this