Using Machine Learning and Routing Protocols for Optimizing Distributed SPARQL Queries in Collaboration-Reference-Cited by-同舟云学术

Using Machine Learning and Routing Protocols for Optimizing Distributed SPARQL Queries in Collaboration

Published:2023-10-17 Issue:10 Volume:12 Page:210
ISSN:2073-431X
Container-title:Computers
language:en
Short-container-title:Computers

Author:

Warnke Benjamin¹^ORCID,Fischer Stefan²^ORCID,Groppe Sven¹^ORCID

Affiliation:

1. Institute of Information Systems, University of Luebeck, Ratzeburger Allee 160, 23562 Luebeck, Germany

2. Institute of Telematics (ITM), University of Luebeck, Ratzeburger Allee 160, 23562 Luebeck, Germany

Abstract

Due to increasing digitization, the amount of data in the Internet of Things (IoT) is constantly increasing. In order to be able to process queries efficiently, strategies must, therefore, be found to reduce the transmitted data as much as possible. SPARQL is particularly well-suited to the IoT environment because it can handle various data structures. Due to the flexibility of data structures, however, more data have to be joined again during processing. Therefore, a good join order is crucial as it significantly impacts the number of intermediate results. However, computing the best linking order is an NP-hard problem because the total number of possible linking orders increases exponentially with the number of inputs to be combined. In addition, there are different definitions of optimal join orders. Machine learning uses stochastic methods to achieve good results even with complex problems quickly. Other DBMSs also consider reducing network traffic but neglect the network topology. Network topology is crucial in IoT as devices are not evenly distributed. Therefore, we present new techniques for collaboration between routing, application, and machine learning. Our approach, which pushes the operators as close as possible to the data source, minimizes the produced network traffic by 10%. Additionally, the model can reduce the number of intermediate results by a factor of 100 in comparison to other state-of-the-art approaches.

Funder

Deutsche Forschungsgemeinschaft

Publisher

MDPI AG

Subject

Computer Networks and Communications,Human-Computer Interaction

Link

https://www.mdpi.com/2073-431X/12/10/210/pdf

Reference50 articles.

1. Emergent models, frameworks, and hardware technologies for Big data analytics;Groppe;J. Supercomput.,2020

2. A distributed graph engine for web scale RDF data;Zeng;Proc. VLDB Endow.,2013

3. Rohloff, K., and Schantz, R.E. (2011, January 17). Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store. Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, New York, NY, USA.

4. Haziiev, E. (2020, January 3–5). DISE: A Distributed in-Memory SPARQL Processing Engine over Tensor Data. Proceedings of the 2020 IEEE 14th International Conference on Semantic Computing (ICSC), San Diego, CA, USA.

5. DREAM: Distributed RDF engine with adaptive query planner and minimal communication;Hammoud;Proc. VLDB Endow.,2015