Semi-CNN Architecture for Effective Spatio-Temporal Learning in Action Recognition-Reference-Cited by-同舟云学术

Semi-CNN Architecture for Effective Spatio-Temporal Learning in Action Recognition

Published:2020-01-12 Issue:2 Volume:10 Page:557
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Leong Mei Chee^ORCID,Prasad Dilip K.^ORCID,Lee Yong Tsui,Lin Feng^ORCID

Abstract

This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal features in video action recognition. Unlike 2D convolutional neural networks (CNNs), 3D CNNs can be applied directly on consecutive frames to extract spatio-temporal features. The aim of this work is to fuse the convolution layers from 2D and 3D CNNs to allow temporal encoding with fewer parameters than 3D CNNs. We adopt transfer learning from pre-trained 2D CNNs for spatial extraction, followed by temporal encoding, before connecting to 3D convolution layers at the top of the architecture. We construct our fusion architecture, semi-CNN, based on three popular models: VGG-16, ResNets and DenseNets, and compare the performance with their corresponding 3D models. Our empirical results evaluated on the action recognition dataset UCF-101 demonstrate that our fusion of 1D, 2D and 3D convolutions outperforms its 3D model of the same depth, with fewer parameters and reduces overfitting. Our semi-CNN architecture achieved an average of 16–30% boost in the top-1 accuracy when evaluated on an input video of 16 frames.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/2/557/pdf

Reference22 articles.

1. Gradient-based learning applied to document recognition

2. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild;Soomro;arXiv,2012

Cited by 40 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Healthcare System from Multisensor Collaboration and Human Action Recognition;Sensors and Materials;2024-08-08

2. Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation;Sensors;2024-07-17

3. Exploring the power of photoplethysmogram matrix for atrial fibrillation detection with integrated explainability;Engineering Applications of Artificial Intelligence;2024-07

4. RN‐Net: Reservoir Nodes‐Enabled Neuromorphic Vision Sensing Network;Advanced Intelligent Systems;2024-05-23

5. RehabGPT: a MaaS-based solution to large model building for digital rehabilitation;International Workshop on Advanced Imaging Technology (IWAIT) 2024;2024-05-02