Generating Qualitative Descriptions of Diagrams with a Transformer-Based Language Model-Reference-Cited by-同舟云学术

Generating Qualitative Descriptions of Diagrams with a Transformer-Based Language Model

Published:2024 Issue: Volume: Page:61-75
ISSN:0302-9743
Container-title:Lecture Notes in Computer Science
language:en
Short-container-title:

Author:

Schorlemmer Marco,Ballout Mohamad,Kühnberger Kai-Uwe

Abstract

AbstractTo address the task of diagram understanding we propose to distinguish between the perception of the geometric configuration of a diagram from the assignment of meaning to the geometric entities and their topological relationships. As a consequence, diagram parsing does not need to assume any particular a priori interpretations of diagrams and their constituents. Focussing on Euler diagrams, we tackle the first of these subtasks—that of identifying the geometric entities that constitute a diagram (i.e., circles, rectangles, lines, arrows, etc.) and their topological relations—as an image captioning task, using a Vision Transformer for image recognition combined with language model GPT-2 to generate qualitative spatial descriptions of Euler diagrams with an encoder-decoder model. Due to the lack of sufficient high-quality data to train the pre-trained language model for this task, we describe how we generated a synthetic dataset of Euler diagrams annotated with qualitative spatial representations based on the Region Connection Calculus (RCC8). Results showed over 95% accuracy of the transformer-based language model in the generation of meaning-carrying RCC8 specifications for given Euler diagrams.

Publisher

Springer Nature Switzerland

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-71291-3_5

Reference28 articles.

1. Allwein, G., Barwise, J. (eds.): Logical Reasoning with Diagrams. Oxford University Press, Oxford (1996)

2. Ballout, M., Krumnack, U., Heidemann, G., Kühnberger, K.: Investigating pre-trained language models on cross-domain datasets, a step closer to general AI. In: Jayne, C., et al. (eds.) International Neural Network Society Workshop on Deep Learning Innovations and Applications, INNS DLIA@IJCNN 2023, Gold Coast, Australia, 23 June 2023. Procedia Computer Science, vol. 222, pp. 94–103. Elsevier (2023)

3. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence);D Bourou,2021

4. Bourou, D., Schorlemmer, M., Plaza, E.: Modelling the sense-making of diagrams using image schemas. In: Proceedings of the Annual Meeting of the Cognitive Science Society (CogSci 2021), pp. 1105–1111 (2021)

5. Bourou, D., Schorlemmer, M., Plaza, E.: Euler vs hasse diagrams for reasoning about sets: a cognitive approach. In: Giardino, V., Linker, S., Burns, R., Bellucci, F., Boucheix, JM., Viana, P. (eds.) Diagrammatic Representation and Inference. Diagrams 2022. LNCS, vol. 13462, pp. 151–167. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15146-0_13