Semantic connection set-based massive RDF data query processing in Spark environment-Reference-Cited by-同舟云学术

Semantic connection set-based massive RDF data query processing in Spark environment

Published:2019-11-27 Issue:1 Volume:2019 Page:
ISSN:1687-1499
Container-title:EURASIP Journal on Wireless Communications and Networking
language:en
Short-container-title:J Wireless Com Network

Author:

Xu Jiuyun^ORCID,Zhang Chao

Abstract

AbstractResource Description Framework (RDF) is a data representation of the Semantic Web, and its data volume is growing rapidly. Cloud-based systems provide a rich platform for managing RDF data. However, there is a performance challenge in the distributed environment when RDF queries, which contain multiple join operations, such as network reshuffle and memory overhead, are processed. To get over this challenge, this paper proposes a Spark-based RDF query architecture, which is based on Semantic Connection Set (SCS). First of all, the proposed Spark-based query architecture adopts the mechanism of re-partitioning class data based on vertical partitioning, which can reduce memory overhead and spend up index data. Secondly, a method for generating query plans based on semantic connection set is proposed in this paper. In addition, some statistics and broadcast variable optimization strategies are introduced to reduce shuffling and data communication costs. The experiments of this paper are based on the latest SPARQLGX on the Spark platform RDF system. Two synthetic benchmarks are used to evaluate the query. The experiment results illustrate that the proposed approach in this paper is more efficient in data search than contrast systems.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Computer Science Applications,Signal Processing

Link

http://link.springer.com/content/pdf/10.1186/s13638-019-1588-9.pdf

Reference26 articles.

1. E. Miller, An introduction to the resource description framework. Bulletin Am. Soc. Inf. Sci. Technol.25(1), 15–19 (1998).

2. J. Pérez, M. Arenas, C. Gutierrez, Semantics and complexity of SPARQL. ACM Trans. Database Syst. (TODS). 34(3), 16 (2009).

3. Neumann, Thomas, Weikum, Gerhard, The RDF-3x engine for scalable management of RDF data. Vldb J.19(1), 91–113 (2010).

4. C. Weiss, P. Karras, A. Bernstein, Hexastore: sextuple indexing for semantic web data management. Proc. Vldb Endowment. 1(1), 1008–1019 (2008).

5. D. J. Abadi, A. Marcus, S. R. Madden, K. Hollenbach, SW-Store: a vertically partitioned DBMs for Semantic Web data management. Vldb J.18(2), 385–406 (2009).

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RDF Subgraph Query Based on Common Subgraph in Distributed Environment;Wireless Communications and Mobile Computing;2023-01-13

2. Charting Past, Present, and Future Research in the Semantic Web and Interoperability;Future Internet;2022-05-25

3. DPISCAN: Distributed and parallel architecture with indexing for structural clustering of massive dynamic graphs;International Journal of Data Science and Analytics;2022-01-12

4. A scalable parallel Chinese online encyclopedia knowledge denoising method based on entry tags and Spark cluster;Applied Intelligence;2021-03-20

5. Storage, partitioning, indexing and retrieval in Big RDF frameworks: A survey;Computer Science Review;2020-11