Affiliation:
1. Yale University, USA/CRS4, Italy
2. Yale University, USA
Abstract
Massive digital acquisition and preservation of deteriorating historical and artistic documents is of particular importance due to their value and fragile condition. The study and browsing of such digital libraries is invaluable for scholars in the Cultural Heritage field but requires automatic tools for analyzing and indexing these datasets. We present two completely automatic methods requiring no human intervention: text height estimation and text line extraction. Our proposed methods have been evaluated on a huge heterogeneous corpus of illuminated medieval manuscripts of different writing styles and with various problematic attributes, such as holes, spots, ink bleed-through, ornamentation, background noise, and overlapping text lines. Our experimental results demonstrate that these two new methods are efficient and reliable, even when applied to very noisy and damaged old handwritten manuscripts.
Funder
Digitally Enabled Scholarship with Medieval Manuscripts
Mellon Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Computer Science Applications,Information Systems,Conservation
Reference61 articles.
1. Nonlinear separation of show-through image mixtures using a physical model trained with ICA
2. Andalusian Qur’an. n.d. Retrieved August 11 2014 from http://en.wikipedia.org/wiki/Criticism_of_Islam. Andalusian Qur’an. n.d. Retrieved August 11 2014 from http://en.wikipedia.org/wiki/Criticism_of_Islam.
3. ICDAR 2013 Competition on Historical Newspaper Layout Analysis (HNLA 2013)
4. ICDAR 2009 Page Segmentation Competition
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献