SMART Frame Selection for Action Recognition-Reference-Cited by-同舟云学术

SMART Frame Selection for Action Recognition

Published:2021-05-18 Issue:2 Volume:35 Page:1451-1459
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Gowda Shreyank N,Rohrbach Marcus,Sevilla-Lara Laura

Abstract

Video classification is computationally expensive. In this paper, we address theproblem of frame selection to reduce the computational cost of video classification.Recent work has successfully leveraged frame selection for long, untrimmed videos,where much of the content is not relevant, and easy to discard. In this work, however,we focus on the more standard short, trimmed video classification problem. Weargue that good frame selection can not only reduce the computational cost of videoclassification but also increase the accuracy by getting rid of frames that are hard toclassify. In contrast to previous work, we propose a method that instead of selectingframes by considering one at a time, considers them jointly. This results in a moreefficient selection, where “good" frames are more effectively distributed over thevideo, like snapshots that tell a story. We call the proposed frame selection SMARTand we test it in combination with different backbone architectures and on multiplebenchmarks (Kinetics [5], Something-something [14], UCF101 [31]). We showthat the SMART frame selection consistently improves the accuracy compared toother frame selection strategies while reducing the computational cost by a factorof 4 to 10 times. Additionally, we show that when the primary goal is recognitionperformance, our selection strategy can improve over recent state-of-the-art modelsand frame selection strategies on various benchmarks (UCF101, HMDB51 [21],FCVID [17], and ActivityNet [4]).

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 64 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MLENet: Multi-Level Extraction Network for video action recognition;Pattern Recognition;2024-10

2. Non-relevant segment recognition via hard example mining under sparsely distributed events;Computers in Biology and Medicine;2024-09

3. StreamTinyNet: video streaming analysis with spatial-temporal TinyML;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

4. FE-Adapter: Adapting Image-Based Emotion Classifiers to Videos;2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG);2024-05-27

5. Temporal position information embedding method suitable for action recognition;2024 36th Chinese Control and Decision Conference (CCDC);2024-05-25