Affiliation:
1. Yale University, New Haven, USA
2. CRS4, Pula (CA), Italy
Abstract
We propose three automatic algorithms for analyzing digitized medieval manuscripts,
text block computation
,
text line segmentation
, and
special component extraction
, by taking advantage of previous clustering algorithms and a template-matching technique. These three methods are completely automatic, so no user intervention or input is required to make them work. Moreover, they are all per-page based; that is, unlike some prior methods—that need a set of pages from the same manuscript for training purposes—they are able to analyze a single page without requiring any additional pages for input, eliminating the need for training on additional pages with similar layout. We extensively evaluated the algorithms on 1,771 images of pages of six different publicly available historical manuscripts, which differ significantly from each other in terms of layout structure, acquisition resolution, writing style, and so on. The experimental results indicate that they are able to achieve very satisfactory performance, that is, the average precision and recall values obtained by the
text block computation method
can reach as high as 98% and 99%, respectively.
Funder
Sardinian Regional Authorities under project VIGEC
Mellon Foundation
Digitally Enabled Scholarship with Medieval Manuscripts
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Computer Science Applications,Information Systems,Conservation
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Data and Process Quality Evaluation in a Textual Big Data Archiving System;Journal on Computing and Cultural Heritage;2022-02-28
2. Qidong Yixin Oral Liquid for Viral Myocarditis: A Systematic Review and Meta-Analysis;Evidence-Based Complementary and Alternative Medicine;2020-05-22
3. Machine Learning for Cultural Heritage: A Survey;Pattern Recognition Letters;2020-05
4. Historical Document Processing: A Survey of Techniques, Tools, and Trends;Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management;2020
5. HORAE;Proceedings of the 5th International Workshop on Historical Document Imaging and Processing;2019-09-20