1. A multi-world approach to question answering about real-world scenes based on uncertain input;Malinowski,2014
2. VQA: visual question answering;Antol,2015
3. Visual genome: connecting language and vision using crowdsourced dense image annotations;Krishna;Int. J. Comput. Vis. (IJCV),2017
4. Visual7w: grounded question answering in images;Zhu,2016
5. Yin and yang: balancing and answering binary visual questions;Zhang,2016