Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets-Reference-Cited by-同舟云学术

Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets

Published:2024-02-14 Issue:4 Volume:14 Page:1521
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Qi Jiexing¹^ORCID,Su Chang¹,Guo Zhixin¹,Wu Lyuwen¹,Shen Zanwei¹,Fu Luoyi¹,Wang Xinbing¹,Zhou Chenghu¹²

Affiliation:

1. School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

2. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

Abstract

Generating SPARQL queries from natural language questions is challenging in Knowledge Base Question Answering (KBQA) systems. The current state-of-the-art models heavily rely on fine-tuning pretrained models such as T5. However, these methods still encounter critical issues such as triple-flip errors (e.g., (subject, relation, object) is predicted as (object, relation, subject)). To address this limitation, we introduce TSET (Triplet Structure Enhanced T5), a model with a novel pretraining stage positioned between the initial T5 pretraining and the fine-tuning for the Text-to-SPARQL task. In this intermediary stage, we introduce a new objective called Triplet Structure Correction (TSC) to train the model on a SPARQL corpus derived from Wikidata. This objective aims to deepen the model’s understanding of the order of triplets. After this specialized pretraining, the model undergoes fine-tuning for SPARQL query generation, augmenting its query-generation capabilities. We also propose a method named “semantic transformation” to fortify the model’s grasp of SPARQL syntax and semantics without compromising the pre-trained weights of T5. Experimental results demonstrate that our proposed TSET outperforms existing methods on three well-established KBQA datasets: LC-QuAD 2.0, QALD-9 plus, and QALD-10, establishing a new state-of-the-art performance (95.0% F1 and 93.1% QM on LC-QuAD 2.0, 75.85% F1 and 61.76% QM on QALD-9 plus, 51.37% F1 and 40.05% QM on QALD-10).

Funder

NSF China

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/4/1521/pdf

Reference52 articles.

1. The semantic web revisited;Shadbolt;IEEE Intell. Syst.,2006

2. A review of the semantic web field;Hitzler;Commun. ACM,2021

3. Boumechaal, H., and Boufaida, Z. (2023). Complex Queries for Querying Linked Data. Future Internet, 15.

4. Zhang, C., Zha, D., Wang, L., Mu, N., Yang, C., Wang, B., and Xu, F. (2023). Graph Convolution Network over Dependency Structure Improve Knowledge Base Question Answering. Electronics, 12.

5. Hu, S., Zhang, H., and Zhang, W. (2023). Domain Knowledge Graph Question Answering Based on Semantic Analysis and Data Augmentation. Appl. Sci., 13.