Luzzu—A Methodology and Framework for Linked Data Quality Assessment

Author:

Debattista Jeremy1,Auer SÖren1,Lange Christoph1

Affiliation:

1. Enterprise Information Systems, University of Bonn 8 Fraunhofer IAIS Bonn, Germany

Abstract

The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use . This article describes a conceptual methodology for assessing Linked Datasets, and Luzzu; a framework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an extensible interface for defining new quality metrics; (2) an interoperable , ontology-driven back-end for representing quality metadata and quality problems that can be re-used within different semantic frameworks; (3) scalable dataset processors for data dumps, SPARQL endpoints, and big data infrastructures; and (4) a customisable ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets against a variety of metrics. This article contributes towards the definition of a holistic data quality lifecycle, in terms of the co-evolution of linked datasets, with the final aim of improving their quality.

Funder

European Commission under the Seventh Framework Program FP7

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Reference40 articles.

1. Riccardo Albertoni Antoine Isaac Christophe Guéret Jeremy Debattista Deirdre Lee Nandana Mihindukulasooriya and Amrapali Zaveri. 2015. Data Quality Vocabulary (DQV). W3C Interest Group Note. World Wide Web Consortium (W3C). Riccardo Albertoni Antoine Isaac Christophe Guéret Jeremy Debattista Deirdre Lee Nandana Mihindukulasooriya and Amrapali Zaveri. 2015. Data Quality Vocabulary (DQV). W3C Interest Group Note. World Wide Web Consortium (W3C).

2. Keith Alexander Richard Cyganiak Michael Hausenblas and Jun Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note. World Wide Web Consortium. Keith Alexander Richard Cyganiak Michael Hausenblas and Jun Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note. World Wide Web Consortium.

3. Mario Arias Javier D Fernández Miguel A Martínez-Prieto and Claudio Gutiérrez. 2011. HDT-it: Storing sharing and visualizing huge RDF datasets. In ISWC. 23--27. Mario Arias Javier D Fernández Miguel A Martínez-Prieto and Claudio Gutiérrez. 2011. HDT-it: Storing sharing and visualizing huge RDF datasets. In ISWC. 23--27.

4. A systematic review of open government data initiatives

5. Sören Auer Lorenz Bühmann Christian Dirschl Orri Erling Michael Hausenblas Robert Isele Jens Lehmann Michael Martin Pablo N. Mendes Bert Van Nuffelen Claus Stadler Sebastian Tramp and Hugh Williams. 2012. Managing the life-cycle of linked data with the LOD2 stack. In The Semantic Web ISWC 2012 11th International Semantic Web Conference on the Semantic Web (ISWC 2012) Boston MA USA November 11--15 2012 Proceedings Part II (Lecture Notes in Computer Science) Philippe Cudré-Mauroux Jeff Heflin Evren Sirin Tania Tudorache Jérôme Euzenat Manfred Hauswirth Josiane Xavier Parreira Jim Hendler Guus Schreiber Abraham Bernstein and Eva Blomqvist (Eds.) Vol. 7650. Springer 1--16. DOI:http://dx.doi.org/10.1007/978-3-642-35173-0_1 10.1007/978-3-642-35173-0_1 Sören Auer Lorenz Bühmann Christian Dirschl Orri Erling Michael Hausenblas Robert Isele Jens Lehmann Michael Martin Pablo N. Mendes Bert Van Nuffelen Claus Stadler Sebastian Tramp and Hugh Williams. 2012. Managing the life-cycle of linked data with the LOD2 stack. In The Semantic Web ISWC 2012 11th International Semantic Web Conference on the Semantic Web (ISWC 2012) Boston MA USA November 11--15 2012 Proceedings Part II (Lecture Notes in Computer Science) Philippe Cudré-Mauroux Jeff Heflin Evren Sirin Tania Tudorache Jérôme Euzenat Manfred Hauswirth Josiane Xavier Parreira Jim Hendler Guus Schreiber Abraham Bernstein and Eva Blomqvist (Eds.) Vol. 7650. Springer 1--16. DOI:http://dx.doi.org/10.1007/978-3-642-35173-0_1 10.1007/978-3-642-35173-0_1

Cited by 61 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Spatial Semantics for the Evaluation of Administrative Geospatial Ontologies;ISPRS International Journal of Geo-Information;2024-08-17

2. Use of Context in Data Quality Management: a Systematic Literature Review;Journal of Data and Information Quality;2024-06-17

3. Knowledge Assessment;An Introduction to Knowledge Graphs;2024

4. Automated Transparency Evaluation of Knowledge Graphs for Clinical Risk Governance;2023 31st Irish Conference on Artificial Intelligence and Cognitive Science (AICS);2023-12-07

5. Assessing resolvability, parsability, and consistency of RDF resources: a use case in rare diseases;Journal of Biomedical Semantics;2023-12-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3