1. Chen, Z., et al.: LaKo: knowledge-driven visual question answering via late knowledge-to-text injection. arXiv (2022)
2. DeLong, L.N., Mir, R.F., Whyte, M., Ji, Z., Fleuriot, J.D.: Neurosymbolic AI for reasoning on graph structures: a survey. arXiv (2023)
3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Minneapolis (2019)
4. Ding, Y., Yu, J., Liu, B., Hu, Y., Cui, M., Wu, Q.: MuKEA: multimodal knowledge extraction and accumulation for knowledge-based visual question answering. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5079–5088. IEEE, New Orleans (2022)
5. Gao, F., Ping, Q., Thattai, G., Reganti, A., Wu, Y.N., Natarajan, P.: A thousand words are worth more than a picture: natural language-centric outside-knowledge visual question answering. arXiv (2022)