Action recognition based on discrete cosine transform by optical pixel-wise encoding-Reference-Cited by-同舟云学术

Action recognition based on discrete cosine transform by optical pixel-wise encoding

Published:2022-11-01 Issue:11 Volume:7 Page:116101
ISSN:2378-0967
Container-title:APL Photonics
language:en
Short-container-title:APL Photonics

Author:

Liang Yu¹^ORCID,Huang Honghao¹^ORCID,Li Jingwei²,Dong Xiaowen²,Chen Minghua¹^ORCID,Yang Sigang¹,Chen Hongwei¹^ORCID

Affiliation:

1. Beijing National Research Center for Information Science and Technology (BNRist), Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

2. Huawei Technologies Co., Ltd., Shenzhen, China

Abstract

The framework provides a novel pipeline for action recognition. The action recognition task classifies the action label of the scene. High-speed cameras are commonly used to generate high frame-rate videos for capturing sufficient motion information. However, the data volume would be the bottleneck of the system. With the insight that the discrete cosine transform (DCT) of video signals reveals the motion information remarkably, instead of obtaining video data as with traditional cameras, the proposed method directly captures a DCT spectrum of video in a single shot through optical pixel-wise encoding. Considering that video signals are sparsely distributed in the DCT domain, a learning-based frequency selector is designed for pruning the trivial frequency channels of the spectrum. An opto-electronic neural network is designed for action recognition from a single coded spectrum. The optical encoder generates the DCT spectrum, and the rest of the network jointly optimizes the frequency selector and classification model simultaneously. Compared to conventional video-based action recognition methods, the proposed method achieves higher accuracy with less data, less communication bandwidth, and less computational burden. Both simulations and experiments demonstrate that the proposed method has superior action recognition performance. To the best of our knowledge, this is the first work investigating action recognition in the DCT domain.

Funder

National Natural Science Foundation of China

National Key Research and Development Program of China

Publisher

AIP Publishing

Subject

Computer Networks and Communications,Atomic and Molecular Physics, and Optics

Link

https://aip.scitation.org/doi/pdf/10.1063/5.0109807

Reference51 articles.

1. Video Processing Using Deep Learning Techniques: A Systematic Literature Review

2. Human activity recognition in artificial intelligence framework: a narrative review

3. A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

4. A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition

5. Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep Learning Approach for Human Action Recognition Using a Time Saliency Map Based on Motion Features Considering Camera Movement and Shot in Video Image Sequences;Information;2023-11-15

2. Deep coded exposure: end-to-end co-optimization of flutter shutter and deblurring processing for general motion blur removal;Photonics Research;2023-09-27