A survey of neurosymbolic visual reasoning with scene graphs and common sense knowledge
-
Published:2024-05-13
Issue:
Volume:
Page:1-24
-
ISSN:2949-8732
-
Container-title:Neurosymbolic Artificial Intelligence
-
language:
-
Short-container-title:NAI
Author:
Khan M. Jaleed1, Ilievski Filip2, Breslin John G.13, Curry Edward13
Affiliation:
1. SFI Centre for Research Training in Artificial Intelligence, Data Science Institute, University of Galway, Ireland 2. Center on Knowledge Graphs, Information Sciences Institute, University of Southern California, United States 3. Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway, Ireland
Abstract
Combining deep learning and common sense knowledge via neurosymbolic integration is essential for semantically rich scene representation and intuitive visual reasoning. This survey paper delves into data- and knowledge-driven scene representation and visual reasoning approaches based on deep learning, common sense knowledge and neurosymbolic integration. It explores how scene graph generation, a process that detects and analyses objects, visual relationships and attributes in scenes, serves as a symbolic scene representation. This representation forms the basis for higher-level visual reasoning tasks such as visual question answering, image captioning, image retrieval, image generation, and multimodal event processing. Infusing common sense knowledge, particularly through the use of heterogeneous knowledge graphs, improves the accuracy, expressiveness and reasoning ability of the representation and allows for intuitive downstream reasoning. Neurosymbolic integration in these approaches ranges from loose to tight coupling of neural and symbolic components. The paper reviews and categorises the state-of-the-art knowledge-based neurosymbolic approaches for scene representation based on the types of deep learning architecture, common sense knowledge source and neurosymbolic integration used. The paper also discusses the visual reasoning tasks, datasets, evaluation metrics, key challenges and future directions, providing a comprehensive review of this research area and motivating further research into knowledge-enhanced and data-driven neurosymbolic scene representation and visual reasoning.
Reference117 articles.
1. Domain-specific knowledge graphs: A survey 2. M. Allamanis, P. Chanthirasegaran, P. Kohli and C. Sutton, Learning continuous semantic representations of symbolic expressions, in: International Conference on Machine Learning, PMLR, 2017, pp. 80–88. 3. S. Amizadeh, H. Palangi, A. Polozov, Y. Huang and K. Koishida, Neuro-symbolic visual reasoning: Disentangling, in: International Conference on Machine Learning, PMLR, 2020, pp. 279–290. 4. P. Anderson, B. Fernando, M. Johnson and S. Gould, Spice: Semantic propositional image caption evaluation, in: European Conference on Computer Vision, Springer, 2016, pp. 382–398. 5. P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould and L. Zhang, Bottom-up and top-down attention for image captioning and visual question answering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6077–6086.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|