TEINet: Towards an Efficient Architecture for Video Recognition-Reference-Cited by-同舟云学术

TEINet: Towards an Efficient Architecture for Video Recognition

Published:2020-04-03 Issue:07 Volume:34 Page:11669-11676
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Liu Zhaoyang,Luo Donghao,Wang Yabiao,Wang Limin,Tai Ying,Wang Chengjie,Li Jilin,Huang Feiyue,Lu Tong

Abstract

Efficiency is an important issue in designing video architectures for action recognition. 3D CNNs have witnessed remarkable progress in action recognition from videos. However, compared with their 2D counterparts, 3D convolutions often introduce a large amount of parameters and cause high computational cost. To relieve this problem, we propose an efficient temporal module, termed as Temporal Enhancement-and-Interaction (TEI Module), which could be plugged into the existing 2D CNNs (denoted by TEINet). The TEI module presents a different paradigm to learn temporal features by decoupling the modeling of channel correlation and temporal interaction. First, it contains a Motion Enhanced Module (MEM) which is to enhance the motion-related features while suppress irrelevant information (e.g., background). Then, it introduces a Temporal Interaction Module (TIM) which supplements the temporal contextual information in a channel-wise manner. This two-stage modeling scheme is not only able to capture temporal structure flexibly and effectively, but also efficient for model inference. We conduct extensive experiments to verify the effectiveness of TEINet on several benchmarks (e.g., Something-Something V1&V2, Kinetics, UCF101 and HMDB51). Our proposed TEINet can achieve a good recognition accuracy on these datasets but still preserve a high efficiency.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 126 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Spatio-temporal adaptive convolution and bidirectional motion difference fusion for video action recognition;Expert Systems with Applications;2024-12

2. Can lies be faked? Comparing low-stakes and high-stakes deception video datasets from a Machine Learning perspective;Expert Systems with Applications;2024-09

3. Efficient spatio-temporal network for action recognition;Journal of Real-Time Image Processing;2024-08-23

4. Cross-modal guides spatio-temporal enrichment network for few-shot action recognition;Applied Intelligence;2024-08-13

5. CANet: Comprehensive Attention Network for video-based action recognition;Knowledge-Based Systems;2024-07