Affiliation:
1. Enterprise Information Systems, University of Bonn 8 Fraunhofer IAIS Bonn, Germany
Abstract
The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of
fitness for use
. This article describes a conceptual methodology for assessing Linked Datasets, and Luzzu; a framework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an
extensible
interface for defining new quality metrics; (2) an
interoperable
, ontology-driven back-end for representing quality metadata and quality problems that can be re-used within different semantic frameworks; (3)
scalable
dataset processors for data dumps, SPARQL endpoints, and big data infrastructures; and (4) a
customisable
ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets against a variety of metrics. This article contributes towards the definition of a holistic data quality lifecycle, in terms of the co-evolution of linked datasets, with the final aim of improving their quality.
Funder
European Commission under the Seventh Framework Program FP7
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Cited by
61 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献