Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering-Reference-Cited by-同舟云学术

Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering

Published:2023-10-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 31st ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Yuan Bowen¹^ORCID,You Sisi¹^ORCID,Bao Bing-Kun²^ORCID

Affiliation:

1. Nanjing University of Posts and Telecommunications, Nanjing, China

2. Nanjing University of Posts and Telecommunications & Peng Cheng Laboratory, Nanjing & Shenzhen, China

Funder

National Key Research and Development Project

National Nature Science Foundation of China

Opening Foundation of Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, China

Graduate Research and Innovation Projects in Jiangsu Province

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3581783.3612222

Reference54 articles.

1. VQA: Visual Question Answering

2. Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. Advances in neural information processing systems Vol. 33 (2020) 1877--1901. Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems Vol. 33 (2020) 1877--1901.

3. Revisiting Parameter-Efficient Tuning: Are We Really There Yet?

4. Jaemin Cho , Jie Lei , Hao Tan , and Mohit Bansal . 2021 . Unifying vision-and-language tasks via text generation . In International Conference on Machine Learning. PMLR , 1931--1942. Jaemin Cho, Jie Lei, Hao Tan, and Mohit Bansal. 2021. Unifying vision-and-language tasks via text generation. In International Conference on Machine Learning. PMLR, 1931--1942.

5. An Empirical Study of Training End-to-End Vision-and-Language Transformers