1. Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell: Long-term recurrent convolutional networks for visual recognition and description. In proceedings of IEEE conference Computer Vision Pattern Recognition. pp. 2625–2634. 2015.
2. R Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel: Unifying visual-semantic embeddings with multimodal neural language models. Machine Learning. NIPS 2014 deep learning workshop [Online]. Available: https://arxiv.org/abs/1411.2539.
3. Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille : Deep captioning with multimodal recurrent neural networks (m-RNN). Computer Vision and Pattern Recognition. ICLR 2015 [Online]. Available: https://arxiv.org/abs/1412.6632.
4. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan: Show and tell A neural image caption generator. In proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015. [Online]. Available: https://doi.org/10.1109/CVPR.2015.7298935.
5. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio: Show, attend and tell Neural image caption generation with visual attention. In proceedings of the 32nd International Conference on Machine Learning, ICML 2015. [Online]. Available: http://jmlr.org/proceedings/papers/v37/xuc15.html.