Motion sensitive network for action recognition in control and decision-making of autonomous systems-Reference-Cited by-同舟云学术

Motion sensitive network for action recognition in control and decision-making of autonomous systems

Published:2024-03-25 Issue: Volume:18 Page:
ISSN:1662-453X
Container-title:Frontiers in Neuroscience
language:
Short-container-title:Front. Neurosci.

Author:

Gu Jialiang,Yi Yang,Li Qiang

Abstract

Spatial-temporal modeling is crucial for action recognition in videos within the field of artificial intelligence. However, robustly extracting motion information remains a primary challenge due to temporal deformations of appearances and variations in motion frequencies between different actions. In order to address these issues, we propose an innovative and effective method called the Motion Sensitive Network (MSN), incorporating the theories of artificial neural networks and key concepts of autonomous system control and decision-making. Specifically, we employ an approach known as Spatial-Temporal Pyramid Motion Extraction (STP-ME) module, adjusting convolution kernel sizes and time intervals synchronously to gather motion information at different temporal scales, aligning with the learning and prediction characteristics of artificial neural networks. Additionally, we introduce a new module called Variable Scale Motion Excitation (DS-ME), utilizing a differential model to capture motion information in resonance with the flexibility of autonomous system control. Particularly, we employ a multi-scale deformable convolutional network to alter the motion scale of the target object before computing temporal differences across consecutive frames, providing theoretical support for the flexibility of autonomous systems. Temporal modeling is a crucial step in understanding environmental changes and actions within autonomous systems, and MSN, by integrating the advantages of Artificial Neural Networks (ANN) in this task, provides an effective framework for the future utilization of artificial neural networks in autonomous systems. We evaluate our proposed method on three challenging action recognition datasets (Kinetics-400, Something-Something V1, and Something-Something V2). The results indicate an improvement in accuracy ranging from 1.1% to 2.2% on the test set. When compared with state-of-the-art (SOTA) methods, the proposed approach achieves a maximum performance of 89.90%. In ablation experiments, the performance gain of this module also shows an increase ranging from 2% to 5.3%. The introduced Motion Sensitive Network (MSN) demonstrates significant potential in various challenging scenarios, providing an initial exploration into integrating artificial neural networks into the domain of autonomous systems.

Publisher

Frontiers Media SA

Reference39 articles.

1. Flamingo: a visual language model for few-shot learning;Alayrac;Adv. Neur. Inf. Proc. Syst,2022

2. A short note about kinetics-600;Carreira;arXiv preprint arXiv:1808.01340,2018

3. “Quo vadis, action recognition? A new model and the kinetics dataset,”;Carreira;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017

4. “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks,”;Chattopadhay;2018 IEEE Winter Conference on Applications of Computer Vision (WACV),2018

5. Complementary fusion of multi-features and multi-modalities in sentiment analysis;Chen;arXiv preprint arXiv:1904.08138,2019