1. Deep speech 2: End-to-end speech recognition in english and mandarin;Amodei,2016
2. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition;Shi;IEEE Trans. Pattern Anal. Mach. Intell.,2017
3. Show, attend and tell: Neural image caption generation with visual attention.;Xu,2015
4. Deep captioning with multimodal recurrent neural networks (M-RNN);Mao,2015
5. Guiding the long-short term memory model for image caption generation;Jia,2015