Quality assessment for Linked Data: A Survey

Amrapali Zaveri; Anisa Rula; Andrea Maurino; Ricardo Pietrobon; Jens Lehmann; Soeren Auer

doi:10.3233/SW-150175

Quality assessment for Linked Data: A Survey

Amrapali Zaveri^*, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Soeren Auer

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.

Original language	English
Pages (from-to)	63-93
Journal	Semantic web
Volume	7
Issue number	1
DOIs	https://doi.org/10.3233/SW-150175
Publication status	Published - 2016
Externally published	Yes

Keywords

Data quality
Linked Data
assessment
survey

Access to Document

10.3233/SW-150175

Cite this

@article{fe878e096be44de59416bf73417de354,

title = "Quality assessment for Linked Data: A Survey",

abstract = "The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.",

keywords = "Data quality, Linked Data, assessment, survey",

author = "Amrapali Zaveri and Anisa Rula and Andrea Maurino and Ricardo Pietrobon and Jens Lehmann and Soeren Auer",

year = "2016",

doi = "10.3233/SW-150175",

language = "English",

volume = "7",

pages = "63--93",

journal = "Semantic web",

issn = "1570-0844",

publisher = "IOS Press",

number = "1",

}

TY - JOUR

T1 - Quality assessment for Linked Data: A Survey

AU - Zaveri, Amrapali

AU - Rula, Anisa

AU - Maurino, Andrea

AU - Pietrobon, Ricardo

AU - Lehmann, Jens

AU - Auer, Soeren

PY - 2016

Y1 - 2016

N2 - The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.

AB - The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.

KW - Data quality

KW - Linked Data

KW - assessment

KW - survey

U2 - 10.3233/SW-150175

DO - 10.3233/SW-150175

M3 - Article

SN - 1570-0844

VL - 7

SP - 63

EP - 93

JO - Semantic web

JF - Semantic web

IS - 1

ER -