Integrating Spatial and Temporal Information for Violent Activity Detection from Video Using Deep Spiking Neural Networks-Reference-Cited by-同舟云学术

Integrating Spatial and Temporal Information for Violent Activity Detection from Video Using Deep Spiking Neural Networks

Published:2023-05-06 Issue:9 Volume:23 Page:4532
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Wang Xiang¹^ORCID,Yang Jie¹,Kasabov Nikola K.²^ORCID

Affiliation:

1. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200400, China

2. Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1020, New Zealand

Abstract

Increasing violence in workplaces such as hospitals seriously challenges public safety. However, it is time- and labor-consuming to visually monitor masses of video data in real time. Therefore, automatic and timely violent activity detection from videos is vital, especially for small monitoring systems. This paper proposes a two-stream deep learning architecture for video violent activity detection named SpikeConvFlowNet. First, RGB frames and their optical flow data are used as inputs for each stream to extract the spatiotemporal features of videos. After that, the spatiotemporal features from the two streams are concatenated and fed to the classifier for the final decision. Each stream utilizes a supervised neural network consisting of multiple convolutional spiking and pooling layers. Convolutional layers are used to extract high-quality spatial features within frames, and spiking neurons can efficiently extract temporal features across frames by remembering historical information. The spiking neuron-based optical flow can strengthen the capability of extracting critical motion information. This method combines their advantages to enhance the performance and efficiency for recognizing violent actions. The experimental results on public datasets demonstrate that, compared with the latest methods, this approach greatly reduces parameters and achieves higher inference efficiency with limited accuracy loss. It is a potential solution for applications in embedded devices that provide low computing power but require fast processing speeds.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/9/4532/pdf

Reference49 articles.

1. Workplace violence in healthcare settings: The risk factors, implications and collaborative preventive measures;Lim;Ann. Med. Surg.,2022

2. Workplace violence among healthcare workers during COVID-19 pandemic in a Jordanian governmental hospital: The tip of the iceberg;Ghareeb;Environ. Sci. Pollut. Res.,2021

3. Big data analytics for video surveillance;Subudhi;Multimed. Tools Appl.,2019

4. Chen, L.H., Hsu, H.W., Wang, L.Y., and Su, C.W. (2011, January 17–19). Violence detection in movies. Proceedings of the 2011 Eighth International Conference Computer Graphics, Imaging and Visualization, Singapore.

5. Rodríguez-Moreno, I., Martínez-Otzeta, J.M., Sierra, B., Rodriguez, I., and Jauregi, E. (2019). Video activity recognition: State-of-the-art. Sensors, 19.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Threshold Active Learning Approach for Physical Violence Detection on Images Obtained from Video (Frame-Level) Using Pre-Trained Deep Learning Neural Network Models;Algorithms;2024-07-18

2. Resstanet: deep residual spatio-temporal attention network for violent action recognition;International Journal of Information Technology;2024-03-25

3. Integrating Spatial and Temporal Contextual Information for Improved Video Visualization;Lecture Notes in Networks and Systems;2024