Exploring Prompts in Few-Shot Cross-Linguistic Topic Classification Scenarios
-
Published:2023-09-02
Issue:17
Volume:13
Page:9944
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Zhang Zhipeng12, Liu Shengquan12ORCID, Cheng Jianming12
Affiliation:
1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China 2. Xinjiang Multilingual Information Technology Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Abstract
In recent years, large-scale pretrained language models have become widely used in natural language processing tasks. On this basis, prompt learning has achieved excellent performance in specific few-shot classification scenarios. The core idea of prompt learning is to convert a downstream task into a masked language modelling task. However, different prompt templates can greatly affect the results, and finding an appropriate template is difficult and time-consuming. To this end, this study proposes a novel hybrid prompt approach, which combines discrete prompts and continuous prompts, to motivate the model to learn more semantic knowledge from a small number of training samples. By comparing the performance difference between discrete prompts and continuous prompts, we find that hybrid prompts achieve the best results, reaching a 73.82% F1 value in the test set. In addition, we analyze the effect of different virtual token lengths in continuous prompts and hybrid prompts in a few-shot cross-language topic classification scenario. The results demonstrate that there is a threshold for the length of virtual tokens, and too many virtual tokens decrease the performance of the model. It is better not to exceed the average length of the training set corpus. Finally, this paper designs a method based on vector similarity to explore the real meanings represented by virtual tokens. The experimental results show that the prompt automatically learnt from the virtual token has a certain correlation with the input text.
Funder
Key Projects of the Xinjiang Education Department Foundation's Scientific Research Program
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference33 articles.
1. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020, January 6–12). Language Models Are Few-Shot Learners. Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual. 2. How Can We Know What Language Models Know?;Jiang;Trans. Assoc. Comput. Linguist.,2020 3. Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H., and Riedel, S. (2019). Language Models as Knowledge Bases?. arXiv. 4. Gao, T., Fisch, A., and Chen, D. (2021). Making Pre-Trained Language Models Better Few-Shot Learners. arXiv. 5. Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., and Tang, J. (2021). GPT Understands, Too, AI Open.
|
|