Funder
Key Research and Development Program of Sichuan Province
Science and Technology Department of Sichuan Province
National Natural Science Foundation of China
Reference61 articles.
1. An effective video transformer with synchronized spatiotemporal and spatial self-attention for action recognition;Alfasly;IEEE Transactions on Neural Networks and Learning Systems,2024
2. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., & Schmid, C. (2021). ViViT: A Video Vision Transformer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6836–6846).
3. Fuzzy integral-based CNN classifier fusion for 3D skeleton action recognition;Banerjee;IEEE Transactions on Circuits and Systems for Video Technology,2020
4. Is space-time attention all you need for video understanding?;Bertasius,2021
5. Space-time mixing attention for video transformer;Bulat,2021