1. Translating videos to natural language using deep recurrent neural networks;Venugopalan,2015
2. From captions to visual concepts and back;Fang,2015
3. Show, attend and tell: neural image caption generation with visual attention;Xu,2015
4. Guiding the long-short term memory model for image caption generation;Jia,2015
5. Deep captioning with multimodal recurrent neural networks (m-rnn);Mao;ICLR 2015,2015