Robust Text-to-Cypher Using Combination of BERT, GraphSAGE, and Transformer (CoBGT) Model

Author:

Tran Quoc-Bao-Huy1,Waheed Aagha Abdul1ORCID,Chung Sun-Tae1

Affiliation:

1. Department of Intelligent Systems, Soongsil University, Seoul 06978, Republic of Korea

Abstract

Graph databases have become essential for managing and analyzing complex data relationships, with Neo4j emerging as a leading player in this domain. Neo4j, a high-performance NoSQL graph database, excels in efficiently handling connected data, offering powerful querying capabilities through its Cypher query language. However, due to Cypher’s complexities, making it more accessible for nonexpert users requires translating natural language queries into Cypher. Thus, in this paper, we propose a text-to-Cypher model to effectively translate natural language queries into Cypher. In our proposed model, we combine several methods to enable nonexpert users to interact with graph databases using the English language. Our approach includes three modules: key-value extraction, relation–properties prediction, and Cypher query generation. For key-value extraction and relation–properties prediction, we leverage BERT and GraphSAGE to extract features from natural language. Finally, we use a Transformer model to generate the Cypher query from these features. Additionally, due to the lack of text-to-Cypher datasets, we introduced a new dataset that contains English questions querying information within a graph database, paired with corresponding Cypher query ground truths. This dataset aids future model learning, validation, and comparison on text-to-Cypher task. Through experiments and evaluations, we demonstrate that our model achieves high accuracy and efficiency when comparing with some well-known seq2seq model such as T5 and GPT2, with an 87.1% exact match score on the dataset.

Publisher

MDPI AG

Reference33 articles.

1. Introduction to graph databases;Reasoning Web International Summer School,2014

2. Cao, R., Chen, L., Chen, Z., Zhao, Y., Zhu, S., and Yu, K. (2021, January 1–6). LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online.

3. Nadime, F., Alastair, G., Paolo, G., Leonid, L., Tobias, L., Victor, M., Stefan, P., Mats, R., Mats, R., and Petra, S. (2018, January 10–15). Cypher: An Evolving Query Language for Property Graphs. Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA.

4. On the Defense of Spoofing Countermeasures against Adversarial Attacks;Doan;IEEE Access,2023

5. Cisse, M., Adi, Y., Neverova, N., and Keshet, J. (2017, January 4–9). Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3