Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation-Reference-Cited by-同舟云学术

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation

Published:2023-11-08 Issue:2 Volume:42 Page:1-26
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Liu Han¹^ORCID,Wei Yinwei²^ORCID,Liu Fan²^ORCID,Wang Wenjie²^ORCID,Nie Liqiang³^ORCID,Chua Tat-Seng²^ORCID

Affiliation:

1. Shandong University, China

2. National University of Singapore, Singapore

3. Harbin Institute of Technology (Shenzhen), China

Abstract

Multimodal information (e.g., visual, acoustic, and textual) has been widely used to enhance representation learning for micro-video recommendation. For integrating multimodal information into a joint representation of micro-video, multimodal fusion plays a vital role in the existing micro-video recommendation approaches. However, the static multimodal fusion used in previous studies is insufficient to model the various relationships among multimodal information of different micro-videos. In this article, we develop a novel meta-learning-based multimodal fusion framework called Meta Multimodal Fusion (MetaMMF), which dynamically assigns parameters to the multimodal fusion function for each micro-video during its representation learning. Specifically, MetaMMF regards the multimodal fusion of each micro-video as an independent task. Based on the meta information extracted from the multimodal features of the input task, MetaMMF parameterizes a neural network as the item-specific fusion function via a meta learner. We perform extensive experiments on three benchmark datasets, demonstrating the significant improvements over several state-of-the-art multimodal recommendation models, like MMGCN, LATTICE, and InvRL. Furthermore, we lighten our model by adopting canonical polyadic decomposition to improve the training efficiency, and validate its effectiveness through experimental results. Codes are available at https://github.com/hanliu95/MetaMMF .

Funder

National Natural Science Foundation of China

Program of China Scholarship Council

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3617827

Reference47 articles.

1. Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In ICLR. 1–16.

2. Multimodal Machine Learning: A Survey and Taxonomy

3. Heterogeneous hierarchical feature aggregation network for personalized micro-video recommendation;Cai Desheng;TMM,2021

4. Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In SIGIR. 335–344.

5. Emotion recognition in human-computer interaction;Cowie Roddy;SPM,2001