1. Is space-time attention all you need for video understanding?;Bertasius,2021
2. Yolov4: Optimal speed and accuracy of object detection;Bochkovskiy,2020
3. End-to-end object detection with transformers;Carion,2020
4. Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6299–6308).
5. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman, A. (2020). Counting out time: Class agnostic video repetition counting in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10387–10396).