1. Early embedding and late reranking for video captioning;Dong,2016
2. Show and tell: a neural image caption generator;Vinyals,2015
3. Devlin J., Cheng H., Fang H., Gupta S., Deng L., He X., et al. Language models for image captioning: the quirks and what works. arXiv:150501809 2015.
4. Long short-term memory recurrent neural network architectures for large scale acoustic modeling;Sak,2014
5. Mao J., Xu W., Yang Y., Wang J., Huang Z., Yuille A.. Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv:14126632 2014.