Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing-Reference-Cited by-同舟云学术

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Published:2023-01-16 Issue:9 Volume:55 Page:1-35
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Liu Pengfei¹^ORCID,Yuan Weizhe¹^ORCID,Fu Jinlan²^ORCID,Jiang Zhengbao¹^ORCID,Hayashi Hiroaki¹^ORCID,Neubig Graham¹^ORCID

Affiliation:

1. Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

2. National University of Singapore, Singapore

Abstract

This article surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning.” Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P ( y|x ), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x′ that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x̂ , from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: It allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this article, we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g., the choice of pre-trained language models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts but also release other resources, e.g., a website NLPedia–Pretrain including constantly updated survey and paperlist.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3560815

Reference155 articles.

1. HTLM: Hyper-text pre-training and prompting of language models;Aghajanyan Armen;arXiv:2107.06955,2021

2. Zeyuan Allen-Zhu and Yuanzhi Li. 2020. Towards understanding ensemble knowledge distillation and self-distillation in deep learning. arxiv:2012.09816. Retrieved from https://arxiv.org/abs/2012.09816.

3. Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. 2017. A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning. PMLR, 233–242.

4. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473.

5. PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains

Cited by 1018 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SSTtrack: A unified hyperspectral video tracking framework via modeling spectral-spatial-temporal conditions;Information Fusion;2025-02

2. Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook;Information Fusion;2025-01

3. Prompting large language models for user simulation in task-oriented dialogue systems;Computer Speech & Language;2025-01

4. AMGPT: A large language model for contextual querying in additive manufacturing;Additive Manufacturing Letters;2024-12

5. Graph-based document-level relationship extraction for risk analysis: A transitive and dialog coherence approach;Expert Systems with Applications;2024-12