1. Agarwal, V., Shetty, R., & Fritz, M. (2020). Towards causal vqa: Revealing and reducing spurious correlations by invariant and covariant semantic editing. In IEEE conference on computer vision and pattern recognition (pp. 9690–9698).
2. Agrawal, A., Kajić, I., Bugliarello, E., Davoodi, E., Gergely, A., Blunsom, P., & Nematzadeh, A. (2022). Rethinking evaluation practices in visual question answering: A case study on out-of-distribution generalization. arXiv preprint arXiv:2205.12191.
3. Ahuja, K., Caballero, E., Zhang, D., Bengio, Y., Mitliagkas, I., & Rish, I. (2021). Invariance principle meets information bottleneck for out-of-distribution generalization. In Neural information processing systems (pp. 3438–3450).
4. Alayrac, J. B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., & Reynolds, M., et al. (2022). Flamingo: A visual language model for few-shot learning. In Neural information processing systems (pp. 23716–23736).
5. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In IEEE conference on computer vision and pattern recognition (pp. 6077–6086).