1. Shou, Z., Wang, D., Chang, S.F.: Temporal action localization in untrimmed videos via multi-stage CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1049–1058. IEEE, Las Vegas (2016)
2. Xu, H., Das, A., Saenko, K.: R-C3D: region convolutional 3D network for temporal activity detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), p. 17453302. IEEE, Venice (2017)
3. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99. IEEE, Waikoloa (2015)
4. Puscas, M.M., Sangineto, E., Culibrk, D., et al.: Unsupervised tube extraction using transductive learning and dense trajectories. Comput. Graph. 38(1), 300–309 (2015)
5. Kopuklu, O., Wei, X.Y., Rigoll, G.: You only watch once: a unified CNN architecture for real-time spatiotemporal action localization. arXiv preprint arXiv:1911.06644 (2020)