1. Bottom-up and top-down attention for image captioning and visual question answering;Anderson,2018
2. Ba, J. L., Kiros, J. R., Hinton, G. E., 2016. Layer normalization. 1607.06450.
3. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments;Banerjee,2005
4. Scheduled sampling for sequence prediction with recurrent neural networks;Bengio,2015
5. The fingerprint of human referring expressions and their surface realization with graph transducers;Bohnet,2008