1. Chen S, Yao T, Jiang Y-G (2019) Deep learning for video captioning: A review. IJCAI 1
2. Monfort M, Pan B, Ramakrishnan K et al (2021) Multi-moments in time: learning and interpreting models for multi-action video understanding. IEEE Trans Pattern Anal Mach Intel 44(12):9434–9445
3. Cai JJ, Tang J, Chen QG, Hu Y, Wang X, Huang SJ (2018) Surveil- lance applications. In: 2018 International Conference on Communication and Signal Processing (ICCSP). IEEE, pp 563–568
4. Cai JJ, Tang J, Chen QG, Hu Y, Wang X, Huang SJ (2019) Multi-view active learning for video recommendation. IJCAI 2019:2053–2059
5. Aafaq N et al (2019) Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition