1. Show and tell: a neural image caption generator;Vinyals,2015
2. Learning to reason: end-to-end module networks for visual question answering;Hu,2017
3. FiLM: Visual reasoning with a general conditioning layer;Perez,2018
4. LIUM-CVC Submissions for WMT17 multimodal translation task;Caglayan,2017
5. Imagination improves multimodal translation;Elliott,2017