Abstract
To extract structured knowledge from unstructured text sources we need to understand the semantic relationships between entities. State-of-the-art relation extraction techniques take advantage of the abundance of data on the web. However, in domains with sparse data such as social networks which have limited occurrences of entities and relationship patterns, bootstrapping techniques and pattern detection methods are inefficient and inaccurate. In this paper, we introduce REDS, a Relation Extraction approach based on Distant Supervision. REDS extracts the named entities from text documents and assigns a fingerprint to each potential relationship among the named entities. Then, it queries a knowledge repository for similar matches to each fingerprint. An assessor uses the query results and the data statistics to measure the validity of the relationships corresponding to the queried fingerprints, and labels each potential relationship with the predicted type. In addition to handling the relation extraction in presence of data sparsity, REDS uses an information retrieval framework that makes it scalable and capable of dealing with noisy data. We implement and test REDS on a non-English historical archive consisting of unstructured notarial acts and structured civil registers; By means of manual evaluations REDS achieves precision of 0.90.
Original language | English |
---|---|
Pages (from-to) | 1145-1166 |
Number of pages | 22 |
Journal | Intelligent Data Analysis |
Volume | 23 |
Issue number | 5 |
DOIs | |
Publication status | Published - 2019 |
Keywords
- Relation extraction
- distant supervision
- identity resolution
- NAMED ENTITY RECOGNITION