Domain-Specific Few-Shot Table Prompt Question Answering via Contrastive Exemplar Selection
-
Published:2024-06-26
Issue:7
Volume:17
Page:278
-
ISSN:1999-4893
-
Container-title:Algorithms
-
language:en
-
Short-container-title:Algorithms
Author:
Mo Tianjin1, Xiao Qiao2, Zhang Hongyi2, Li Ren2, Wu Yunsong3
Affiliation:
1. Business School, Chongqing College of Electronic Engineering, Chongqing 401331, China 2. School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China 3. School of Big Data and Software Engineering, Chongqing University, Chongqing 400044, China
Abstract
As a crucial task in natural language processing, table question answering has garnered significant attention from both the academic and industrial communities. It enables intelligent querying and question answering over structured data by translating natural language into corresponding SQL statements. Recently, there have been notable advancements in the general domain table question answering task, achieved through prompt learning with large language models. However, in specific domains, where tables often have a higher number of columns and questions tend to be more complex, large language models are prone to generating invalid SQL or NoSQL statements. To address the above issue, this paper proposes a novel few-shot table prompt question answering approach. Specifically, we design a prompt template construction strategy for structured SQL generation. It utilizes prompt templates to restructure the input for each test data and standardizes the model output, which can enhance the integrity and validity of generated SQL. Furthermore, this paper introduces a contrastive exemplar selection approach based on the question patterns and formats in domain-specific contexts. This enables the model to quickly retrieve the relevant exemplars and learn characteristics about given question. Experimental results on the two datasets in the domains of electric energy and structural inspection show that the proposed approach outperforms the baseline models across all comparison settings.
Funder
science and technology research program of the Chongqing Municipal Education Commission of China Natural Science Foundation of Chongqing, China
Reference45 articles.
1. A survey on deep learning approaches for text-to-SQL;Meimarakis;VLDB J.,2023 2. Wang, L.-J., Zhang, A., Wu, K., Sun, K., Li, Z.-H., Wu, H., Zhang, M., and Wang, H.-F. (2020, January 8–12). DuSQL: A large-scale and pragmatic Chinese text-to-SQL dataset. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic. 3. Chen, Z.-Y., Chen, W.-H., Smiley, C., Shah, S., Borova, I., Langdon, D., Moussa, R., Beane, M., Huang, T.-H., and Routledg, B. (2021, January 7–11). FinQA: A Dataset of Numerical Reasoning over Financial Data. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic. 4. Survey of Multimodal Medical Question Answering;Demirhan;BioMedInformatics,2023 5. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019, January 2–7). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 NAACL-HLT, Minneapolis, MN, USA.
|
|