Affiliation:
1. Ryerson University and Vector Institute
2. Ryerson University
Abstract
In this article, we study the problem of video-based action recognition. We improve the action recognition performance by finding an effective temporal and appearance representation. For capturing the temporal representation, we introduce two temporal learning techniques for improving long-term temporal information modeling, specifically Temporal Relational Network and Temporal Second-Order Pooling-based Network. Moreover, we harness the representation using complementary learning techniques, specifically Global-Local Network and Fuse-Inception Network. Performance evaluation on three datasets (UCF101, HMDB-51, and Mini-Kinetics-200) demonstrated the superiority of the proposed framework compared to the 2D Deep ConvNets-based state-of-the-art techniques.
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献