Affiliation:
1. Autonomous University of the State of Mexico, Literary Institute 100, Toluca 50000, Mexico
Abstract
There is a lot of cultural heritage information in historical documents that have not been explored or exploited yet. Lower-Baseline Localization (LBL) is the first step in information retrieval from images of manuscripts where groups of handwritten text lines representing a message are identified. An LBL method is described depending on how the features of the writing style of an author are treated: the character shape and size, gap between characters and between lines, the shape of ascendant and descendant strokes, character body, space between characters, words and columns, and touching and overlapping lines. For example, most of the supervised LBL methods only analyze the gap between characters as part of the preprocessing phase of the document and the rest of features of the writing style of the author are left for the learning phase of the classifier. For such reason, supervised LBL methods tend to learn particular styles and collections. This paper presents an unsupervised LBL method that explicit analyses all the features of the writing style of the author and processes the document by windows. In this sense, the proposed method is more independent from the writing style of the author, and it is more reliable with new collections in real scenarios. According to the experimentation, the proposed method surpasses the state-of-the-art methods with the standard READ-BAD historical collection with 2,036 manuscripts and 132,124 manually annotated baselines from 9 libraries in 500 years.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference15 articles.
1. Mauricio V. , Alejandro T. , Joan-Andreu S. and Enrique V. , Overview of the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task, presented at the ImageCLEF, Portugal, 2016, vol, 1609.
2. Causer T. and Wallace V. , Building a Volunteer Community: Results and Findings from Transcribe Bentham, Digit Humanit Q 6(2) (2012).
3. Text line segmentation of historical documents: a survey;Likforman-Sulem;Int J Doc Anal Recognit IJDAR,2006
4. Arivazhagan , Srinivasan Harish and Srihari Sargur , A statistical approach to line segmentation in handwritten documents, vol, 6500, 2007.
5. Handwritten Text Line Segmentation Using Fully Convolutional Network;Renton;2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR),2017