Affiliation:
1. Trinity College Dublin, Dublin, Ireland
Abstract
This paper describes a semi-automated process, framework and tools for harvesting, assessing, improving and maintaining high-quality linked-data. The framework, known as DaCura1, provides dataset curators, who may not be knowledge engineers, with tools to collect and curate evolving linked data datasets that maintain quality over time. The framework encompasses a novel process, workflow and architecture. A working implementation has been produced and applied firstly to the publication of an existing social-sciences dataset, then to the harvesting and curation of a related dataset from an unstructured data-source. The framework's performance is evaluated using data quality measures that have been developed to measure existing published datasets. An analysis of the framework against these dimensions demonstrates that it addresses a broad range of real-world data quality concerns. Experimental results quantify the impact of the DaCura process and tools on data quality through an assessment framework and methodology which combines automated and human data quality controls.
Subject
Computer Networks and Communications,Information Systems
Reference38 articles.
1. W3C Dataset Dynamics. (n.d.). Retrieved from http://www.w3.org/wiki/DatasetDynamics
2. Managing the Life-Cycle of Linked Data with the LOD2 Stack
3. An Empirical Evaluation of the System Usability Scale
4. Best practices for publishing linked data. (2014). W3C note. Retrieved from http://www.w3.org/TR/2014/NOTE-ld-bp-20140109/
5. Linked Data - The Story So Far
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献