Comparison Analysis of Multimodal Fusion for Dangerous Action Recognition in Railway Construction Sites-Reference-Cited by-同舟云学术

Comparison Analysis of Multimodal Fusion for Dangerous Action Recognition in Railway Construction Sites

Published:2024-06-12 Issue:12 Volume:13 Page:2294
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Amel Otmane¹^ORCID,Siebert Xavier²,Mahmoudi Sidi Ahmed¹^ORCID

Affiliation:

1. ILIA Lab, Faculty of Engineering, University of Mons, 7000 Mons, Belgium

2. Department of Mathematics and Operational Research, University of Mons, 7000 Mons, Belgium

Abstract

The growing demand for advanced tools to ensure safety in railway construction projects highlights the need for systems that can smoothly integrate and analyze multiple data modalities, such as multimodal learning algorithms. The latter, inspired by the human brain’s ability to integrate many sensory inputs, has emerged as a promising field in artificial intelligence. In light of this, there has been a rise in research on multimodal fusion approaches, which have the potential to outperform standard unimodal solutions. However, the integration of multiple data sources presents significant challenges to be addressed. This work attempts to apply multimodal learning to detect dangerous actions using RGB-D inputs. The key contributions include the evaluation of various fusion strategies and modality encoders, as well as identifying the most effective methods for capturing complex cross-modal interactions. The superior performance of the MultConcat multimodal fusion method was demonstrated, achieving an accuracy of 89.3%. Results also underscore the critical need for robust modality encoders and advanced fusion techniques to outperform unimodal solutions.

Funder

UMONS

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/12/2294/pdf

Reference82 articles.

1. Mahmoudi, S.A., Amel, O., Stassin, S., Liagre, M., Benkedadra, M., and Mancas, M. (2023). A Review and Comparative Study of Explainable Deep Learning Models Applied on Action Recognition in Real Time. Electronics, 12.

2. Multimodal machine learning: A survey and taxonomy;Ahuja;IEEE Trans. Pattern Anal. Mach. Intell.,2018

3. Liang, P.P., Zadeh, A., and Morency, L.P. (2022). Foundations and recent trends in multimodal machine learning: Principles, challenges, and open questions. arXiv.

4. What makes multi-modal learning better than single (provably);Huang;Adv. Neural Inf. Process. Syst.,2021

5. Liang, P.P., Lyu, Y., Fan, X., Wu, Z., Cheng, Y., Wu, J., Chen, L., Wu, P., Lee, M.A., and Zhu, Y. (2021). Multibench: Multiscale benchmarks for multimodal representation learning. arXiv.