Affiliation:
1. Institute of Computer Science, FORTH-ICS, Greece 8 Computer Science Department, University of Crete, Greece
Abstract
Although the ultimate objective of Linked Data is linking and integration, it is not currently evident how
connected
the current Linked Open Data (LOD) cloud is. In this article, we focus on methods, supported by special indexes and algorithms, for performing measurements related to the connectivity of more than two datasets that are useful in various tasks including (a)
Dataset Discovery
and
Selection
; (b)
Object Coreference
, i.e., for obtaining
complete information
about a set of entities, including provenance information; (c)
Data Quality Assessment and Improvement
, i.e., for assessing the connectivity between any set of datasets and monitoring their evolution over time, as well as for estimating data veracity; (d)
Dataset Visualizations
; and various other tasks. Since it would be prohibitively expensive to perform all these measurements in a naïve way, in this article, we introduce indexes (and their construction algorithms) that can speed up such tasks. In brief, we introduce (i) a namespace-based prefix index, (ii) a sameAs catalog for computing the symmetric and transitive closure of the owl:sameAs relationships encountered in the datasets, (iii) a semantics-aware element index (that exploits the aforementioned indexes), and, finally, (iv) two lattice-based incremental algorithms for speeding up the computation of the intersection of URIs of any set of datasets. For enhancing scalability, we propose parallel index construction algorithms and parallel lattice-based incremental algorithms, we evaluate the achieved speedup using either a single machine or a cluster of machines, and we provide insights regarding the factors that affect efficiency. Finally, we report measurements about the connectivity of the (billion triples-sized) LOD cloud that have never been carried out so far.
Funder
the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation
European Union's Horizon 2020 research BlueBRIDGE project
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献