Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Author:

Chen Xiaolin1ORCID,Song Xuemeng2ORCID,Jing Liqiang2ORCID,Li Shuo2ORCID,Hu Linmei3ORCID,Nie Liqiang4ORCID

Affiliation:

1. School of Software, Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, China

2. School of Computer Science and Technology, Shandong University, China

3. School of Computer Science and Technology, Beijing Institute of Technology, China

4. School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China

Abstract

Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: (1)  overlook the benefit of generative pretraining and (2) ignore the textual context-related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language mode for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly, integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.

Funder

National Key Research and Development Project of New Generation Artificial Intelligence

National Natural Science Foundation of China

Shandong Provincial Natural Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Reference63 articles.

1. CATS: Customizable abstractive topic-based summarization;Bahrainian Seyed Ali;ACM Trans. Inf. Syst.,2022

2. Jie Cai, Zhengzhou Zhu, Ping Nie, and Qian Liu. 2020. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1665–1668.

3. Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, and Ruslan Salakhutdinov. 2018. Gated-attention architectures for task-oriented language grounding. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 2819–2826.

4. Shubham Chatterjee and Laura Dietz. 2022. BERT-ER: Query-specific BERT entity representations for entity ranking. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1466–1477.

5. Hardik Chauhan, Mauajama Firdaus, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Ordinal and attribute aware response generation in a multimodal dialogue system. In Proceedings of the Conference of the Association for Computational Linguistics. Association for Computational Linguistics, 5437–5447.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Study on updating finite element model of steel truss structure based on knowledge-enhanced deep reinforcement learning;Engineering Structures;2024-10

2. Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10

3. Enabling Multi-modal Conversational Interface for Clinical Imaging;Extended Abstracts of the CHI Conference on Human Factors in Computing Systems;2024-05-11

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3