Affiliation:
1. Beijing National Research Center for Information Science and Technology (BNRist), Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
2. Huawei Technologies Co., Ltd., Shenzhen, China
Abstract
The framework provides a novel pipeline for action recognition. The action recognition task classifies the action label of the scene. High-speed cameras are commonly used to generate high frame-rate videos for capturing sufficient motion information. However, the data volume would be the bottleneck of the system. With the insight that the discrete cosine transform (DCT) of video signals reveals the motion information remarkably, instead of obtaining video data as with traditional cameras, the proposed method directly captures a DCT spectrum of video in a single shot through optical pixel-wise encoding. Considering that video signals are sparsely distributed in the DCT domain, a learning-based frequency selector is designed for pruning the trivial frequency channels of the spectrum. An opto-electronic neural network is designed for action recognition from a single coded spectrum. The optical encoder generates the DCT spectrum, and the rest of the network jointly optimizes the frequency selector and classification model simultaneously. Compared to conventional video-based action recognition methods, the proposed method achieves higher accuracy with less data, less communication bandwidth, and less computational burden. Both simulations and experiments demonstrate that the proposed method has superior action recognition performance. To the best of our knowledge, this is the first work investigating action recognition in the DCT domain.
Funder
National Natural Science Foundation of China
National Key Research and Development Program of China
Subject
Computer Networks and Communications,Atomic and Molecular Physics, and Optics
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献