Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion-Reference-Cited by-同舟云学术

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

Published:2021-03-31 Issue:1s Volume:17 Page:1-25
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Wang Yang¹

Affiliation:

1. Hefei University of Technology, China

Abstract

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3408317

Reference175 articles.

Cited by 126 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Latent Semantic Consensus for Deterministic Geometric Model Fitting;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-09

2. MTFR: An universal multimodal fusion method through Modality Transfer and Fusion Refinement;Engineering Applications of Artificial Intelligence;2024-09

3. Image-text multimodal classification via cross-attention contextual transformer with modality-collaborative learning;Journal of Electronic Imaging;2024-08-13

4. Federated Learning Using Multi-Modal Sensors with Heterogeneous Privacy Sensitivity Levels;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-08-05

5. Online multi-hypergraph fusion learning for cross-subject emotion recognition;Information Fusion;2024-08