1. YouTube-8M: A large-scale video classification benchmark;Abu-El-Haija Sami;arXiv preprint arXiv:1609.08675,2016
2. Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV’18). 132–149.
3. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.
4. Wenlong Dong, Zhongchen Ma, Qing Zhu, and Qirong Mao. 2023. Two-stage multi-instance multi-label learning model for video social relationship recognition. In Proceedings of the 4th International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI’23). IEEE, 84–88.
5. Yazan Abu Farha and Jurgen Gall. 2019. MS-TCN: Multi-stage temporal convolutional network for action segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3575–3584.