1. Improving video-text retrieval by multi-stream corpus alignment and dual softmax loss;Cheng,2021
2. Clip2video: Mastering video-text retrieval via image clip;Fang,2021
3. Visual spatio-temporal relation-enhanced network for cross-modal text-video retrieval;Han,2021
4. Recurrent neural network regularization;Zaremba,2014
5. Coot: Cooperative hierarchical transformer for video-text representation learning;Ging;Adv. Neural Inf. Process. Syst.,2020