Detecting Linked Data quality issues via crowdsourcing: A DBpedia study

Maribel Acosta; Amrapali Zaveri; Elena Simperl; Dimitris Kontokostas; Fabian Flöck; Jens Lehmann

doi:10.3233/SW-160239

Detecting Linked Data quality issues via crowdsourcing: A DBpedia study

Maribel Acosta^*, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck, Jens Lehmann

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

In this paper we examine the use of crowdsourcing as a means to detect Linked Data quality problems that are difficult to uncover automatically. We base our approach on the analysis of the most common errors encountered in the DBpedia dataset, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and study different crowdsourcing approaches to identify these Linked Data quality issues, employing DBpedia as our use case: (i) a contest targeting the Linked Data expert community, and (ii) paid microtasks published on Amazon Mechanical Turk. We secondly focus on adapting the Find-Fix-Verify crowdsourcing pattern to exploit the strengths of experts and lay workers. By testing two distinct Find-Verify workflows (lay users only and experts verified by lay users) we reveal how to best combine different crowds' complementary aptitudes in Linked Data quality issue detection. Empirical results show that a combination of the two styles of crowdsourcing is likely to achieve more effective results than each of them used in isolation, and that human computation is a promising and affordable way to enhance the quality of DBpedia.

Original language	English
Pages (from-to)	303-335
Number of pages	33
Journal	Semantic web
Volume	9
Issue number	3
Early online date	2016
DOIs	https://doi.org/10.3233/SW-160239
Publication status	Published - 12 Apr 2018
Externally published	Yes

Keywords

2016 event_eswc group_aksw sys:relevantFor:infai sys:relevantFor:bis lehmann MOLE

Access to Document

10.3233/SW-160239

Cite this

@article{70b453e33ac24b9d90efbd462bc4ed54,

title = "Detecting Linked Data quality issues via crowdsourcing: A DBpedia study",

abstract = "In this paper we examine the use of crowdsourcing as a means to detect Linked Data quality problems that are difficult to uncover automatically. We base our approach on the analysis of the most common errors encountered in the DBpedia dataset, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and study different crowdsourcing approaches to identify these Linked Data quality issues, employing DBpedia as our use case: (i) a contest targeting the Linked Data expert community, and (ii) paid microtasks published on Amazon Mechanical Turk. We secondly focus on adapting the Find-Fix-Verify crowdsourcing pattern to exploit the strengths of experts and lay workers. By testing two distinct Find-Verify workflows (lay users only and experts verified by lay users) we reveal how to best combine different crowds' complementary aptitudes in Linked Data quality issue detection. Empirical results show that a combination of the two styles of crowdsourcing is likely to achieve more effective results than each of them used in isolation, and that human computation is a promising and affordable way to enhance the quality of DBpedia.",

keywords = "2016 event_eswc group_aksw sys:relevantFor:infai sys:relevantFor:bis lehmann MOLE",

author = "Maribel Acosta and Amrapali Zaveri and Elena Simperl and Dimitris Kontokostas and Fabian Fl{\"o}ck and Jens Lehmann",

year = "2018",

month = apr,

day = "12",

doi = "10.3233/SW-160239",

language = "English",

volume = "9",

pages = "303--335",

journal = "Semantic web",

issn = "1570-0844",

publisher = "IOS Press",

number = "3",

}

TY - JOUR

T1 - Detecting Linked Data quality issues via crowdsourcing: A DBpedia study

AU - Acosta, Maribel

AU - Zaveri, Amrapali

AU - Simperl, Elena

AU - Kontokostas, Dimitris

AU - Flöck, Fabian

AU - Lehmann, Jens

PY - 2018/4/12

Y1 - 2018/4/12

N2 - In this paper we examine the use of crowdsourcing as a means to detect Linked Data quality problems that are difficult to uncover automatically. We base our approach on the analysis of the most common errors encountered in the DBpedia dataset, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and study different crowdsourcing approaches to identify these Linked Data quality issues, employing DBpedia as our use case: (i) a contest targeting the Linked Data expert community, and (ii) paid microtasks published on Amazon Mechanical Turk. We secondly focus on adapting the Find-Fix-Verify crowdsourcing pattern to exploit the strengths of experts and lay workers. By testing two distinct Find-Verify workflows (lay users only and experts verified by lay users) we reveal how to best combine different crowds' complementary aptitudes in Linked Data quality issue detection. Empirical results show that a combination of the two styles of crowdsourcing is likely to achieve more effective results than each of them used in isolation, and that human computation is a promising and affordable way to enhance the quality of DBpedia.

AB - In this paper we examine the use of crowdsourcing as a means to detect Linked Data quality problems that are difficult to uncover automatically. We base our approach on the analysis of the most common errors encountered in the DBpedia dataset, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and study different crowdsourcing approaches to identify these Linked Data quality issues, employing DBpedia as our use case: (i) a contest targeting the Linked Data expert community, and (ii) paid microtasks published on Amazon Mechanical Turk. We secondly focus on adapting the Find-Fix-Verify crowdsourcing pattern to exploit the strengths of experts and lay workers. By testing two distinct Find-Verify workflows (lay users only and experts verified by lay users) we reveal how to best combine different crowds' complementary aptitudes in Linked Data quality issue detection. Empirical results show that a combination of the two styles of crowdsourcing is likely to achieve more effective results than each of them used in isolation, and that human computation is a promising and affordable way to enhance the quality of DBpedia.

KW - 2016 event_eswc group_aksw sys:relevantFor:infai sys:relevantFor:bis lehmann MOLE

U2 - 10.3233/SW-160239

DO - 10.3233/SW-160239

M3 - Article

SN - 1570-0844

VL - 9

SP - 303

EP - 335

JO - Semantic web

JF - Semantic web

IS - 3

ER -