End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code

Author:

De Gregorio GiuseppeORCID,Capriolo GiulianaORCID,Marcelli AngeloORCID

Abstract

The growth of digital libraries has yielded a large number of handwritten historical documents in the form of images, often accompanied by a digital transcription of the content. The ability to track the position of the words of the digital transcription in the images can be important both for the study of the document by humanities scholars and for further automatic processing. We propose a learning-free method for automatically aligning the transcription to the document image. The method receives as input the digital image of the document and the transcription of its content and aims at linking the transcription to the corresponding images within the page at the word level. The method comprises two main original contributions: a line-level segmentation algorithm capable of detecting text lines with curved baseline, and a text-to-image alignment algorithm capable of dealing with under- and over-segmentation errors at the word level. Experiments on pages from a 17th-century Italian manuscript have demonstrated that the line segmentation method allows one to segment 92% of the text line correctly. They also demonstrated that it achieves a correct alignment accuracy greater than 68%. Moreover, the performance achieved on widely used data sets compare favourably with the state of the art.

Funder

Department of Information and Electrical Engineering and Applied Mathematics of the University of Salerno

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Vision and Pattern Recognition,Radiology, Nuclear Medicine and imaging

Reference39 articles.

1. (2022, August 15). DVL—Digital Vatican Library. Available online: https://digi.vatlib.it.

2. (2022, August 15). Gallica. Available online: https://gallica.bnf.fr.

3. (2022, August 15). e-codices—Virtual Manuscript Library of Switzerland. Available online: https://www.e-codices.unifr.ch.

4. (2022, August 15). Manuscripta Mediaevalia. Available online: http://www.manuscripta-mediaevalia.de/.

5. Internet Culturale (2022, August 15). Cataloghi e Collezioni Digitali Delle Biblioteche Italiane. Available online: http://www.internetculturale.it.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Rule-based Semi-automated OCR Postprocessing Method for Aligning Multi-language Transcripts with Multi-column Text;2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI);2023-10-19

2. Segmentation-Free Alignment of Arbitrary Symbol Transcripts to Images;Document Analysis and Recognition – ICDAR 2023 Workshops;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3