Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling-Reference-Cited by-同舟云学术

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Published:2020-11-11 Issue:4 Volume:5 Page:104
ISSN:2306-5729
Container-title:Data
language:en
Short-container-title:Data

Author:

Sarabu Ashok^ORCID,Santra Ajit Kumar

Abstract

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Publisher

MDPI AG

Subject

Information Systems and Management,Computer Science Applications,Information Systems

Link

https://www.mdpi.com/2306-5729/5/4/104/pdf

Reference38 articles.

1. A Neuromorphic Person Re-Identification Framework for Video Surveillance

2. Illumination and scale invariant relevant visual features with hypergraph-based learning for multi-shot person re-identification

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Less is More: Decoupled High-Semantic Encoding for Action Recognition;Proceedings of the 2023 ACM International Conference on Multimedia Retrieval;2023-06-12

2. Human Action Recognition Using Key-Frame Attention-Based LSTM Networks;Electronics;2023-06-10

3. Configurable Spatial-Temporal Hierarchical Analysis for Video Anomaly Detection;2023

4. Behavior Recognition Network for Small Target Vehicle based on Attention Mechanism;Proceedings of the 4th World Symposium on Software Engineering;2022-09-28

5. Cut-in Prediction in Egocentric Videos using Extended Environment Perception with Status Descriptors;2022 International Conference on Advanced Robotics and Mechatronics (ICARM);2022-07-09