1. Memory-Attended Recurrent Network for Video Captioning
2. Support-set bottlenecks for video-text representation learning;patrick;ICLRE,2021
3. End-to-End Learning of Visual Representations From Uncurated Instructional Videos
4. Clip4clip: An empirical study of clip for end to end video clip retrieval;luo;ArXiv Preprint,2021
5. Univl: A unified video and language pre-training model for multimodal understanding and generation;luo;ArXiv Preprint,2020