1. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text;Akbari Hassan;Advances in Neural Information Processing Systems,2021
2. VGGFace2: A Dataset for Recognising Faces across Pose and Age
3. HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do
4. ImageNet: A large-scale hierarchical image database
5. Yunhao Du , Yang Song , Bo Yang , and Yanyun Zhao . 2022 . Strongsort: Make deepsort great again. arXiv preprint arXiv:2202.13514 (2022). Yunhao Du, Yang Song, Bo Yang, and Yanyun Zhao. 2022. Strongsort: Make deepsort great again. arXiv preprint arXiv:2202.13514 (2022).