Neural models for semantic analysis of handwritten document images-Reference-Cited by-同舟云学术

Neural models for semantic analysis of handwritten document images

Published:2024-06-06 Issue:3 Volume:27 Page:245-263
ISSN:1433-2833
Container-title:International Journal on Document Analysis and Recognition (IJDAR)
language:en
Short-container-title:IJDAR

Author:

Tüselmann Oliver,Fink Gernot A.

Abstract

AbstractSemantic analysis of handwritten document images offers a wide range of practical application scenarios. A sequential combination of handwritten text recognition (HTR) and a task-specific natural language processing system offers an intuitive solution in this domain. However, this HTR-based approach suffers from the problem of error propagation. An HTR-free model, which avoids explicit text recognition and solves the task end-to-end, tackles this problem, but often produces poor results. A possible reason for this is that it does not incorporate largely pre-trained semantic word embeddings, which turn out to be one of the most powerful advantages in the textual domain. In this work, we propose an HTR-based and an HTR-free model and compare them on a variety of segmentation-based handwritten document image benchmarks including semantic word spotting, named entity recognition, and question answering. Furthermore, we propose a cross-modal knowledge distillation approach to integrate semantic knowledge from textually pre-trained word embeddings into HTR-free models. In a series of experiments, we investigate optimization strategies for robust semantic word image representation. We show that the incorporation of semantic knowledge is beneficial for HTR-free approaches in achieving state-of-the-art results on a variety of benchmarks.

Funder

Technische Universität Dortmund

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10032-024-00477-8.pdf

Reference77 articles.

1. Adak, C., Chaudhuri, B.B., Blumenstein, M.: Named entity recognition from unstructured handwritten document images. In: International Workshop on Document Analysis Systems, pp. 375–380 (2016)

2. Adak, C., Chaudhuri, B.B., Lin, C., Blumenstein, M.: Detecting named entities in unstructured Bengali manuscript images. In: International Conference on Document Analysis and Recognition, pp. 196–201 (2019)

3. Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: An easy-to-use framework for state-of-the-art NLP. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 54–59 (2019)

4. Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: International Conference on Computational Linguistics, pp. 1638–1649 (2018)

5. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)