Two-stage Detection of Semantic Redundancies in RDF Data-Reference-Cited by-同舟云学术

Two-stage Detection of Semantic Redundancies in RDF Data

Published:2023-03-19 Issue: Volume: Page:
ISSN:1544-5976
Container-title:Journal of Web Engineering
language:
Short-container-title:JWE

Author:

Chen Yiming,Li Daiyi,Yan Li,Ma Zongmin

Abstract

With the enrichment of the RDF (resource description framework), integrating diverse data sources may result in RDF data duplication. Failure to effectively detect the duplicates brings redundancies into the integrated RDF datasets. This not only increases unnecessarily the size of the datasets, but also reduces the dataset quality. Traditionally a similarity calculation is applied to detect if a pair of candidates contains duplicates. For massive RDF data, a simple similarity calculation may lead to extremely low efficiency. To detect duplicates in the massive RDF data, in this paper we propose a detection approach based on RDF data clustering and similarity measurements. We first propose a clustering method based on locality sensitive hashing (LSH), which can efficiently select candidate pairs in RDF data. Then, a similarity calculation is performed on the selected candidate pairs. We finally obtain the duplicate candidates. We show through experiments that our approach can quickly extract the duplicate candidates in RDF datasets. Our approach had the highest F score and time performance in the OAEI (Ontology Alignment Evaluation Initiative) 2019 competition.

Publisher

River Publishers

Subject

Computer Networks and Communications,Information Systems,Software

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Lake Conceptualized Web Platform for Food Research Data Collection;Journal of Web Engineering;2024-05-25

2. Data Quality Analysis and Improvement: A Case Study of a Bus Transportation System;Applied Sciences;2023-10-06