1. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L., 2018. Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 6077–6086.
2. Banerjee, S., Lavie, A., 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. pp. 65–72.
3. Chen, H., Li, J., Hu, X., 2020a. Delving Deeper into the Decoder for Video Captioning. In: ECAI 2020 - 24th European Conference on Artificial Intelligence. 325, pp. 1079–1086.
4. A semantics-assisted video captioning model trained with scheduled sampling;Chen;Front. Robotics AI,2020
5. Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP. pp. 1724–1734.