Abstract
AbstractThe subgraph searching is a fundamental operation for the analysis and exploration of graphs. Nowadays, molecular databases are nearing close to one hundred million molecules. Since finding all the data graphs in a graph database that contain the query graph using subgraph isomorphism is an NP-complete problem, indexes are built and processed. Further, to assist the formulation of the query by a user, the visual exploratory subgraph query paradigm proposes a graphical user interface and leverages exploration time to reduce query processing time. However, state-of-the-art approaches need to scale better to dynamic graph databases and suffer from efficiency problems. In addition, the existing Summarisation-based frequent subgraph mining for visual exploratory subgraph searching (SuMExplorer) is lacking implementation and evaluation study for handling visual subgraph similarity search and modify operations. In this paper, we present a novel index structure, which aids the subgraph searching using the summarised-based weighted frequent subgraph mining on data graphs. By the structure-preserving, we exploit the indexes to support similarity and modify operations. We conduct extensive performance studies on both real-world and synthetic datasets to evaluate the overall performance of the extended SuMExplorer to the recent visual exploratory FERRARI and traditional subgraph search algorithms (such as the gIndex and the GRAPES-DD). Our results showed that our indexes can query up to 3 times faster in comparison to the FERRARI while reducing the storage footprint by 2 orders of magnitude.
Funder
Deutscher Akademischer Austauschdienst
Johann Wolfgang Goethe-Universität, Frankfurt am Main
Publisher
Springer Science and Business Media LLC
Reference30 articles.
1. AIDS. 2004. https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data. 26 Jul 2023
2. Angriman E, van der Grinten A, Hamann M, et al. Algorithms for large-scale network analysis and the NetworKit toolkit. In: Algorithms for big data. Lecture notes in computer science, vol. 13201. Berlin: Springer; 2022. p. 3–20.
3. Ayed R. Aggregated search in distributed graph databases. (recherche d’information agrégative dans des bases de graphes distribuées). PhD thesis, University of Lyon, France. 2019. https://tel.archives-ouvertes.fr/tel-02520460.
4. Bollig B, Wegener I. Improving the variable ordering of OBDDs is NP-complete. IEEE Trans Comput. 1996;45(9):993–1002. https://doi.org/10.1109/12.537122.
5. Bonnici V, Ferro A, Giugno R, et al. Enhancing graph database indexing by suffix tree structure. In: Dijkstra T, Tsivtsivadze E, Marchiori E, et al., editors. Pattern recognition in bioinformatics—5th IAPR international conference, PRIB 2010, Nijmegen, The Netherlands, September 22–24, 2010. Proceedings. Lecture notes in computer science, vol. 6282. Berlin: Springer; 2010. p. 195–203. https://doi.org/10.1007/978-3-642-16001-1_17.