1. Slowfast networks for video recognition;Feichtenhofer,2019
2. Tsm: Temporal shift module for efficient video understanding;Lin,2019
3. Vivit: A video vision transformer;Arnab,2021
4. Temporal segment networks: Towards good practices for deep action recognition;Wang,2016
5. Learning spatio-temporal representation with pseudo-3d residual networks;Qiu,2017