TY - GEN
T1 - HiDER - Query-Driven Entity Resolution for Historical Data.
AU - Sahraei, Bijan Ranjbar
AU - Efremova, Julia
AU - Rahmani, Hossein
AU - Calders, Toon
AU - Tuyls, Karl
AU - Weiss, Gerhard
N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2015
Y1 - 2015
N2 - Entity Resolution (ER) is the task of finding references that refer to the same entity across different data sources. Cleaning a data warehouse and applying ER on it is a computationally demanding task, particularly for large data sets that change dynamically. Therefore, a query-driven approach which analyses a small subset of the entire data set and integrates the results in real-time is significantly beneficial. Here, we present an interactive tool, called HiDER, which allows for query-driven ER in large collections of uncertain dynamic historical data. The input data includes civil registers such as birth, marriage and death certificates in the form of structured data, and notarial acts such as estate tax and property transfers in the form of free text. The outputs are family networks and event timelines visualized in an integrated way. The HiDER is being used and tested at BHIC center(Brabant Historical Information Center, https://www.bhic.nl); despite the uncertainties of the BHIC input data, the extracted entities have high certainty and are enriched by extra information.
AB - Entity Resolution (ER) is the task of finding references that refer to the same entity across different data sources. Cleaning a data warehouse and applying ER on it is a computationally demanding task, particularly for large data sets that change dynamically. Therefore, a query-driven approach which analyses a small subset of the entire data set and integrates the results in real-time is significantly beneficial. Here, we present an interactive tool, called HiDER, which allows for query-driven ER in large collections of uncertain dynamic historical data. The input data includes civil registers such as birth, marriage and death certificates in the form of structured data, and notarial acts such as estate tax and property transfers in the form of free text. The outputs are family networks and event timelines visualized in an integrated way. The HiDER is being used and tested at BHIC center(Brabant Historical Information Center, https://www.bhic.nl); despite the uncertainties of the BHIC input data, the extracted entities have high certainty and are enriched by extra information.
U2 - 10.1007/978-3-319-23461-8_30
DO - 10.1007/978-3-319-23461-8_30
M3 - Conference article in proceeding
SN - 978-3-319-23460-1
T3 - Lecture Notes in Computer Science
SP - 281
EP - 284
BT - Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015
PB - Springer Nature Switzerland AG
CY - Cham
T2 - European Conference on Machine Learning and Practice of Knowledge Discovery in Databases
Y2 - 7 September 2015 through 11 September 2015
ER -