MuMu: Cooperative Multitask Learning-Based Guided Multimodal Fusion-Reference-Cited by-同舟云学术

MuMu: Cooperative Multitask Learning-Based Guided Multimodal Fusion

Published:2022-06-28 Issue:1 Volume:36 Page:1043-1051
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Islam Md Mofijul,Iqbal Tariq

Abstract

Multimodal sensors (visual, non-visual, and wearable) can provide complementary information to develop robust perception systems for recognizing activities accurately. However, it is challenging to extract robust multimodal representations due to the heterogeneous characteristics of data from multimodal sensors and disparate human activities, especially in the presence of noisy and misaligned sensor data. In this work, we propose a cooperative multitask learning-based guided multimodal fusion approach, MuMu, to extract robust multimodal representations for human activity recognition (HAR). MuMu employs an auxiliary task learning approach to extract features specific to each set of activities with shared characteristics (activity-group). MuMu then utilizes activity-group-specific features to direct our proposed Guided Multimodal Fusion Approach (GM-Fusion) for extracting complementary multimodal representations, designed as the target task. We evaluated MuMu by comparing its performance to state-of-the-art multimodal HAR approaches on three activity datasets. Our extensive experimental results suggest that MuMu outperforms all the evaluated approaches across all three datasets. Additionally, the ablation study suggests that MuMu significantly outperforms the baseline models (p

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multimodal semantic enhanced representation network for micro-video event detection;Knowledge-Based Systems;2024-10

2. Perceiving a humorous robot as a social partner;Putting AI in the Critical Loop;2024

3. VADER: Vector-Quantized Generative Adversarial Network for Motion Prediction;2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS);2023-10-01

4. MAWKDN: A Multimodal Fusion Wavelet Knowledge Distillation Approach Based on Cross-View Attention for Action Recognition;IEEE Transactions on Circuits and Systems for Video Technology;2023-10

5. MMTSA;Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies;2023-09-27