1. K. Soomro, A. R. Zamir, M. Shah, UCF101: A Dataset of 101 Human Actions Classes from Videos In the Wild, arXiv:1212.0402(2012).
2. W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, others, The Kinetics Human Action Video Dataset, arXiv:1705.06950(2017).
3. Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification;Xie,2018
4. Gesture recognition using spatiotemporal deformable convolutional representation;Shi,2019
5. PA3D: pose-action 3D machine for video recognition;Yan,2019