1. Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
2. Localizing Moments in Video with Natural Language
3. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
4. Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, and Ledell Wu. 2022. Altclip: Altering the language encoder in clip for extended language capabilities. arXiv preprint arXiv:2211.06679 (2022).
5. VindLU: A Recipe for Effective Video-and-Language Pretraining