Video Abnormal Action Recognition Based on Multimodal Heterogeneous Transfer Learning-Reference-Cited by-同舟云学术

Video Abnormal Action Recognition Based on Multimodal Heterogeneous Transfer Learning

Published:2024-01-19 Issue: Volume:2024 Page:1-12
ISSN:1687-5699
Container-title:Advances in Multimedia
language:en
Short-container-title:Advances in Multimedia

Author:

Huang Hong-Bo¹²^ORCID,Zheng Yao-Lin¹^ORCID,Hu Zhi-Ying¹

Affiliation:

1. Computer School, Beijing Information Science and Technology University, Beijing 100101, China

2. Institute of Computing Intelligence, Beijing Information Science and Technology University, Beijing 100192, China

Abstract

Human abnormal action recognition is crucial for video understanding and intelligent surveillance. However, the scarcity of labeled data for abnormal human actions often hinders the development of high-performance models. Inspired by the multimodal approach, this paper proposes a novel approach that leverages text descriptions associated with abnormal human action videos. Our method exploits the correlation between the text domain and the video domain in the semantic feature space and introduces a multimodal heterogeneous transfer learning framework from the text domain to the video domain. The text of the videos is used for feature encoding and knowledge extraction, and knowledge transfer and sharing are realized in the feature space, which is used to assist in the training of the abnormal action recognition model. The proposed method reduces the reliance on labeled video data, improves the performance of the abnormal human action recognition algorithm, and outperforms the popular video-based models, particularly in scenarios with sparse data. Moreover, our framework contributes to the advancement of automatic video analysis and abnormal action recognition, providing insights for the application of multimodal methods in a broader context.

Funder

Beijing Municipal Education Committee Scientific and Technological Planning Project

Publisher

Hindawi Limited

Link

http://downloads.hindawi.com/journals/am/2024/4187991.pdf

Reference45 articles.

1. Causal Reasoning Meets Visual Representation Learning: A Prospective Study

2. Improved anti-occlusion object tracking algorithm using Unscented Rauch-Tung-Striebel smoother and kernel correlation filter

3. MFFN: image super-resolution via multi-level features fusion network

4. Image super-resolution reconstruction based on feature map attention mechanism

5. FFTI: Image inpainting algorithm via features fusion and two-steps inpainting