1. Layer normalization;Ba Lei Jimmy;CoRR,2016
2. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65–72.
3. Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, and Yuexian Zou. 2021. On pursuit of designing multi-modal transformer for video grounding. In EMNLP. 9810–9823.
4. Jingyuan Chen, Xinpeng Chen, Lin Ma, Zequn Jie, and Tat-Seng Chua. 2018. Temporally grounding natural sentence in video. In EMNLP.
5. Retrieval augmented convolutional encoder-decoder networks for video captioning;Chen Jingwen;ACM Trans. Multimedia Comput. Commun. Appl.,2022