Author:
Wang Jie,Zheng Yixiao,Du Ruoyi,Zhang Yiming,Liang Kongming,Ma Zhanyu
Publisher
Springer Nature Singapore
Reference24 articles.
1. Chen, Q., Deng, C., Wu, Q.: Learning distinct and representative modes for image captioning. arXiv preprint arXiv:2209.08231 (2022)
2. Cho, J., Yoon, S., Kale, A., Dernoncourt, F., Bui, T., Bansal, M.: Fine-grained image captioning with clip reward. arXiv preprint arXiv:2205.13115 (2022)
3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
4. Fabbri, A.R., Li, I., She, T., Li, S., Radev, D.R.: Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model. arXiv preprint arXiv:1906.01749 (2019)
5. Gan, C., Gan, Z., He, X., Gao, J., Deng, L.: Stylenet: generating attractive visual captions with styles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3137–3146 (2017)