Abstract
OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) are now ready for Armenian language. This technology may offer a greater valorization for documents by enabling improved accessibility, using by instance keywords search, and consists in a new challenge for Digital Libraries. Our presentation intends to propose a view on what is possible today, by introducing a state-of-the-art of the challenges raised by text recognition for Armenian. A focus will be drawn on the technology developed by Calfa for handwritten archives, ancient manuscripts and old printed books. We will present our feedback on three of our ongoing projects: processing catalogs of manuscripts (Mekhitarist, Venice), printed newspapers of Fundamental Scientific Library of NASRA, and handwritten correspondences (Mekhitarist, Venice). Methodology applied by Calfa leads to an accuracy higher than 95% for handwritten documents and higher than 99,5% for printed documents.
Publisher
National Library of Armenia Publications
Subject
General Materials Science
Reference8 articles.
1. Kindt B., Vidal-Gorène C., From Manuscript to Tagged Corpora. An Automated Process for Ancient Armenian or Other Under-Resourced Languages of the Christian East // Armeniaca. International Journal of Armenian Studies, 2022, No 1, pp. 73-96.
2. Kahle P., Colutto S., Hackl G. and Mühlberger G., Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents // 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, 2017, pp. 19-24.
3. Lucas N., Salah C., and Vidal-Gorène C., New Results for the Text Recognition of Arabic Maghribī Manuscripts - Managing an Under-resourced Script // arXiv preprint, 2022, arXiv: 2211.16147.
4. Nikolaidou, K., Seuret, M., Mokayed, H. et al., A survey of historical document image datasets // International Journal on Document Analysis and Recognition (IJDAR), Springer, 2022, No 25, pp. 305–338.
5. Ströbel P. B., Clematide S. and Volk. M., How Much Data Do You Need ? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR // Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, ACL Anthology, 2020, pp. 3551-3559.