1. Sami Abu-El-Haija , Nisarg Kothari , Joonseok Lee , Paul Natsev , George Toderici , Balakrishnan Varadarajan , and Sudheendra Vijayanarasimhan . 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 ( 2016 ). Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016).
2. ViViT: A Video Vision Transformer
3. Sören Auer , Christian Bizer , Georgi Kobilarov , Jens Lehmann , Richard Cyganiak , and Zachary Ives . 2007 . Dbpedia: A nucleus for a web of open data. In The semantic web . Springer , 722--735. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722--735.
4. A. Bochkovskiy C. Y. Wang and Hym Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. (2020). A. Bochkovskiy C. Y. Wang and Hym Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. (2020).
5. Cascade R-CNN: Delving Into High Quality Object Detection