1. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
2. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
3. Rohan Anil , Gabriel Pereyra , Alexandre Passos , Robert Ormandi , George E Dahl , and Geoffrey E Hinton . 2018. Large scale distributed neural network training through online distillation. arXiv preprint arXiv:1804.03235 ( 2018 ). Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E Dahl, and Geoffrey E Hinton. 2018. Large scale distributed neural network training through online distillation. arXiv preprint arXiv:1804.03235 (2018).
4. VQA: Visual Question Answering
5. John Arevalo , Thamar Solorio , Manuel Montes-y Gómez, and Fabio A González . 2017 . Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992 (2017). John Arevalo, Thamar Solorio, Manuel Montes-y Gómez, and Fabio A González. 2017. Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992 (2017).