1. Anurag Arnab , Mostafa Dehghani , Georg Heigold , Chen Sun , Mario Lui , and Cordelia Schmid . 2021 . Vivit: A video vision transformer. arXiv preprint arXiv:2103.15691 (2021). Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lui, and Cordelia Schmid. 2021. Vivit: A video vision transformer. arXiv preprint arXiv:2103.15691 (2021).
2. Yueran Bai , Yingying Wang , Yunhai Tong , Yang Yang , Qiyue Liu , and Junhui Liu . 2020. Boundary Content Graph Neural Network for Temporal Action Proposal Generation. arXiv preprint arXiv:2008.01432 ( 2020 ). Yueran Bai, Yingying Wang, Yunhai Tong, Yang Yang, Qiyue Liu, and Junhui Liu. 2020. Boundary Content Graph Neural Network for Temporal Action Proposal Generation. arXiv preprint arXiv:2008.01432 (2020).
3. Shyamal Buch , Victor Escorcia , Chuanqi Shen , Bernard Ghanem , and Juan Carlos Niebles . 2017 . SST: Single-Stream Temporal Action Proposals. In CVPR. Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, and Juan Carlos Niebles. 2017. SST: Single-Stream Temporal Action Proposals. In CVPR.
4. Fabian Caba Heilbron , Victor Escorcia , Bernard Ghanem , and Juan Carlos Niebles . 2015 . Activitynet: A large-scale video benchmark for human activity understanding. In CVPR. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In CVPR.
5. Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In ECCV. Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In ECCV.