Weakly-Supervised Video Anomaly Detection with MTDA-Net-Reference-Cited by-同舟云学术

Weakly-Supervised Video Anomaly Detection with MTDA-Net

Published:2023-11-12 Issue:22 Volume:12 Page:4623
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Wu Huixin¹,Yang Mengfan¹,Wei Fupeng¹^ORCID,Shi Ge¹^ORCID,Jiang Wei¹^ORCID,Qiao Yaqiong¹,Dong Hangcheng²

Affiliation:

1. School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

2. School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin 150001, China

Abstract

Weakly supervised anomalous behavior detection is a popular area at present. Compared to semi-supervised anomalous behavior detection, weakly-supervised learning both eliminates the need to crop videos and solves the problem of semi-supervised learning’s difficulty in handling long videos. Previous work has used graph convolution or self-attention mechanisms to model temporal relationships. However, these methods tend to model temporal relationships at a single scale and lack consideration of the aggregation problem for different temporal relationships. In this paper, we propose a weakly supervised anomaly detection framework, MTDA-Net, with emphasis on modeling different temporal relationships and enhanced semantic discrimination. To this end, we construct a new plug-and-play module, MTDA, which uses three branches, Multi-headed Attention (MHA), Temporal Shift (TS), and Dilated Aggregation (DA), to extract different temporal sequences. Specifically, the MHA branch can globally model the video information and project the features into different semantic spaces to enhance the expressiveness and discrimination of the features. The DA branch extracts temporal information of different scales via dilated convolution and captures the temporal features of local regions in the video. The TS branch can fuse the features of adjacent frames on a local scale and enhance the information flow. MTDA-Net can learn the temporal relationships between video segments on different branches and learn powerful video representations based on these relationships. The experimental results on the XD-Violence dataset show that MTDA-Net can significantly improve the detection accuracy of abnormal behaviors.

Funder

National Natural Science Foundation of China

Key Research Projects of Henan Higher Education Institutions

Open Foundation of Henan Key Laboratory of Cyberspace Situation Awareness

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/22/4623/pdf

Reference45 articles.

1. Lin, J., Gan, C., and Han, S. (November, January 27). Tsm: Temporal shift module for efficient video understanding. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republich of Korea.

2. Feichtenhofer, C. (2020, January 13–19). X3d: Expanding architectures for efficient video recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.

3. Fu, W., An, Z., Huang, W., Sun, H., Gong, W., and Gonzàlez, J. (2023). A Spatio-Temporal Spotting Network with Sliding Windows for Micro-Expression Detection. Electronics, 12.

4. Online video-based abnormal detection using highly motion techniques and statistical measures;Sudirman;TELKOMNIKA (Telecommun. Comput. Electron. Control),2019

5. Antić, B., and Ommer, B. (2011, January 6–13). Video parsing for abnormality detection. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.