Improving the Quality of Linked Data Using Statistical Distributions-Reference-Cited by-同舟云学术

Improving the Quality of Linked Data Using Statistical Distributions

Published:2018 Issue: Volume: Page:1638-1664
ISSN:
Container-title:Information Retrieval and Management
language:
Short-container-title:

Author:

Paulheim Heiko¹,Bizer Christian¹

Affiliation:

1. University of Mannheim, Germany

Abstract

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Publisher

IGI Global

Reference40 articles.

1. Crowdsourcing linked data quality assessment.;M.Acosta;Proceedings of the 12th International Semantic Web Conference,2013

2. Automatic expansion of DBpedia exploiting Wikipedia cross-language information.;A. P.Aprosio;Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013),2013

3. LODifier: Generating Linked Data from Unstructured Text

4. Bizer, C., & Cyganiak, R. (2006). D2R Server – Publishing relational databases on the Semantic Web. Poster at the 5th International Semantic Web Conference.

5. Linked Data - The Story So Far

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Linked Data Quality Assessment: A Survey;Web Services – ICWS 2021;2022

2. Improving Data Quality in Crowdsourced Data for Indonesian Election Monitor: A Case Study in KawalPilpres;Journal of Physics: Conference Series;2020-06-01