Author:
Zhu Ziqi,Shao Wuchang,Jiao Dongdong
Abstract
AbstractOnline action detection (OAD)is a challenging task that involves predicting the ongoing action class in real-time streaming videos, which is essential in the field of autonomous driving and video surveillance. In this article, we propose an approach for OAD based on the Receptance Weighted Key Value (RWKV) model with temporal label smooth. The RWKV model captures temporal dependencies and computes efficiently at the same time, which makes it well-suited for real-time applications. Our TLS-RWKV model demonstrates advancements in two aspects. First, we conducted experiments on two widely used datasets, THUMOS’14 and TVSeries. Our proposed approach demonstrates state-of-the-art performance with 71.8% mAP on THUMOS’14 and 89.7% cAP on TVSeries. Second, our proposed approach demonstrates impressive efficiency, running at over 600 FPS and maintaining a competitive mAP of 59.9% on THUMOS’14 with RGB features alone. Notably, this efficiency surpasses the prior state-of-the-art model, TesTra, by more than two times. Even when executed on a CPU, our model maintains a commendable speed, exceeding 200 FPS. This high efficiency makes our model suitable for real-time deployment, even on resource-constrained devices. These results showcase the effectiveness and competitiveness of our proposed approach in OAD.
Funder
National Computer System Engineering Research Institute of China
Publisher
Springer Science and Business Media LLC