1. Aishwarya, R., Sarath, P., Sneha, U., & Manmadhan, S. (2022). Stacked Attention based Textbook Visual Question Answering with BERT. 2022 IEEE 19th India Council International Conference (INDICON).
2. Akula, A., Changpinyo, S., Gong, B., Sharma, P., Zhu, S.-C., & Soricut, R. (2021). Crossvqa: Scalably generating benchmarks for systematically testing vqa generalization. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
3. Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., & Reynolds, M. (2022). Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35, 23716-23736.
4. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
5. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering