Experimental evaluation of Arabic OCR systems

Author:

Alghamdi Mansoor,Teahan William

Abstract

Purpose The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches. Design/methodology/approach This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes. Findings The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script. Originality/value To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.

Publisher

Emerald

Subject

General Medicine

Reference29 articles.

1. Abbyy OCR (Optical Character Recognition) (2017), available at: www.abbyy.com/en-gb/ (accessed 15 April 2017).

2. A database for Arabic printed character recognition,2008

3. Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models;Pattern Recognition,2016

4. Survey and bibliography of Arabic optical text recognition;Signal Processing,1995

5. A new thinning algorithm for Arabic script;International Journal of Computer Science and Information Security,2017

Cited by 13 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Sentence splitting in Arabic to Spanish translation;Revista Española de Lingüística Aplicada/Spanish Journal of Applied Linguistics;2023-07-04

2. metaGraphos: a Web-based system for transcribing, proofreading and publishing scanned documents;Collection and Curation;2023-04-10

3. Arabic Optical Character Recognition: A Review;Computer Modeling in Engineering & Sciences;2023

4. Typefaces and Ligatures in Printed Arabic Text: A Deep Learning-Based OCR Perspective;Document Analysis and Recognition – ICDAR 2023 Workshops;2023

5. Meta‐feature based few‐shot Siamese learning for Urdu optical character recognition;Computational Intelligence;2022-05-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3