Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model-Reference-Cited by-同舟云学术

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Published:2023-11-07 Issue:2 Volume:42 Page:1-25
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Chen Xiaolin¹^ORCID,Song Xuemeng²^ORCID,Jing Liqiang²^ORCID,Li Shuo²^ORCID,Hu Linmei³^ORCID,Nie Liqiang⁴^ORCID

Affiliation:

1. School of Software, Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, China

2. School of Computer Science and Technology, Shandong University, China

3. School of Computer Science and Technology, Beijing Institute of Technology, China

4. School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China

Abstract

Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: (1) overlook the benefit of generative pretraining and (2) ignore the textual context-related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language mode for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly, integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.

Funder

National Key Research and Development Project of New Generation Artificial Intelligence

National Natural Science Foundation of China

Shandong Provincial Natural Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3606368

Reference63 articles.

1. CATS: Customizable abstractive topic-based summarization;Bahrainian Seyed Ali;ACM Trans. Inf. Syst.,2022

2. Jie Cai, Zhengzhou Zhu, Ping Nie, and Qian Liu. 2020. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1665–1668.

3. Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, and Ruslan Salakhutdinov. 2018. Gated-attention architectures for task-oriented language grounding. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 2819–2826.

4. Shubham Chatterjee and Laura Dietz. 2022. BERT-ER: Query-specific BERT entity representations for entity ranking. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1466–1477.

5. Hardik Chauhan, Mauajama Firdaus, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Ordinal and attribute aware response generation in a multimodal dialogue system. In Proceedings of the Conference of the Association for Computational Linguistics. Association for Computational Linguistics, 5437–5447.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Study on updating finite element model of steel truss structure based on knowledge-enhanced deep reinforcement learning;Engineering Structures;2024-10

2. Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10

3. Enabling Multi-modal Conversational Interface for Clinical Imaging;Extended Abstracts of the CHI Conference on Human Factors in Computing Systems;2024-05-11