Efficient Large-Scale Multi-Modal Classification-Reference-Cited by-同舟云学术

Efficient Large-Scale Multi-Modal Classification

Published:2018-04-27 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Kiela Douwe,Grave Edouard,Joulin Armand,Mikolov Tomas

Abstract

While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi-modal fusion, with the additional benefit of improved interpretability.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 45 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel multi-modal fusion method based on uncertainty-guided meta-learning;Pattern Recognition;2025-02

2. Pre-gating and contextual attention gate — A new fusion method for multi-modal data tasks;Neural Networks;2024-11

3. Knowledge Distillation and Training Balance for Heterogeneous Decentralized Multi-Modal Learning Over Wireless Networks;IEEE Transactions on Mobile Computing;2024-10

4. Multi-modal co-learning for silent speech recognition based on ultrasound tongue images;Speech Communication;2024-09

5. Scalable multimodal assessment of the micro-neighborhood using orthogonal visual inputs;Journal of Housing and the Built Environment;2024-08-19