Large-Scale Optical Character Recognition of Ancient Greek-Reference-Cited by-同舟云学术

Large-Scale Optical Character Recognition of Ancient Greek

Published:2017-11 Issue:3 Volume:14 Page:341-359
ISSN:1496-9343
Container-title:Mouseion
language:en
Short-container-title:Mouseion

Author:

Robertson Bruce¹,Boschetti Federico²

Affiliation:

1. Classics, Mount Allison University

2. Istituto di Linguistica Computazionale “A. Zampolli,” CNR of Pisa

Abstract

This paper documents our campaign to undertake the large-scale optical character recognition of ancient, or polytonic, Greek. Building upon the Gamera OCR engine and developing a suite of post-processing tools, including automatic spellcheck, we processed 1,200 volumes comprising 329,002,271 Greek words. A sample of 10 pages is studied in detail; they demonstrate the degree to which each step of post-processing improved the results, and with which source documents. These pages attain an average character accuracy of about 96%. These results will provide a basis for further improvements, including the training of other open-source OCR engines.

Publisher

University of Toronto Press Inc. (UTPress)

Subject

Archeology,Archeology,Classics

Link

https://utpjournals.press/doi/pdf/10.3138/mous.14.3-3

Reference39 articles.

1. Bamman, D., and G. Crane. 2011. “Measuring Historical Word Sense Variation,” in Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011). New York: ACM. 1–10.

2. Improving OCR Accuracy for Classical Critical Editions

3. Brandt, C., and C. Dalitz. 2011. “Gamera Addon: GreekOCR Toolkit.” http://gamera.informatik.hsnr.de/addons/greekocr4gamera/index.html.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Low Resource Multi-lingual Simultaneous Script Identification and Text Recognition Model;SN Computer Science;2024-07-31

2. Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs;The 6th International Workshop on Historical Document Imaging and Processing;2021-09-05

3. Semantic enrichment on large scanned collections through their “satellite texts”: the paradigm of Migne’s Patrologia Graeca;Information Discovery and Delivery;2021-08-16

4. NN-based analytic approach to symbol level recognition for degraded Bengali printed documents;Sādhanā;2020-10-22

5. NLP for the Greek Language: A Brief Survey;11th Hellenic Conference on Artificial Intelligence;2020-09