Annotation-Free Word Spotting with Bag-of-Features HMMs-Reference-Cited by-同舟云学术

Annotation-Free Word Spotting with Bag-of-Features HMMs

Published:2020-11-20 Issue:04 Volume:35 Page:2153001
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

Rothacker Leonard¹,Wolf Fabian¹^ORCID,Fink Gernot A.¹

Affiliation:

1. Department of Computer Science, TU Dortmund University, Dortmund 44227, Germany

Abstract

The annotation-free word spotting method that is proposed in this paper makes document images searchable without requiring any labeled training data. Thus, our method supports the exploration of a document collection directly without demanding any manual efforts from the users for the preparation of a training dataset. Our method works in the query-by-example scenario where the user selects an exemplary occurrence of the query word. Afterwards, the entire collection of document images is searched according to visual similarity to the query. The proposed method requires only minimal assumptions about the visual appearance of text. This is achieved by processing document images as a whole without requiring a given segmentation of the images on word level or on line level. Therefore, the method is also segmentation-free. Word size variabilities can be handled by representing the sequential structure of text with a statistical sequence model. In order to make the computationally costly application of the sequence model feasible in practice, regions are retrieved according to approximate similarity with an efficient model decoding algorithm. Re-ranking these regions according to the visual similarity obtained with the sequence model leads to highly accurate word spotting results. The method is evaluated on five benchmark datasets. In the segmentation-free query-by-example scenario where no annotated training data is available, the method outperforms all other methods that have been evaluated on any of these five benchmarks.

Funder

the German Research Foundation

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001421530013

Reference57 articles.

1. A Survey on handwritten documents word spotting

2. Integrating Visual and Textual Cues for Query-by-String Word Spotting

3. A study of Bag-of-Visual-Words representations for handwritten keyword spotting

4. Segmentation-free word spotting with exemplar SVMs

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Self-training for handwritten word recognition and retrieval;International Journal on Document Analysis and Recognition (IJDAR);2024-06-18

2. Z-Transform-Based Profile Matching to Develop a Learning-Free Keyword Spotting Method for Handwritten Document Images;International Journal of Computational Intelligence Systems;2022-11-02

3. New Deep Spatio-Structural Features of Handwritten Text Lines for Document Age Classification;International Journal of Pattern Recognition and Artificial Intelligence;2022-06-06

4. Graph Convolutional Neural Networks for Learning Attribute Representations for Word Spotting;Document Analysis and Recognition – ICDAR 2021;2021