A Multi-Layer Holistic Approach for Cursive Text Recognition

Author:

Umair MuhammadORCID,Zubair MuhammadORCID,Dawood FarhanORCID,Ashfaq Sarim,Bhatti Muhammad ShahidORCID,Hijji Mohammad,Sohail AbidORCID

Abstract

Urdu is a widely spoken and narrated language in several South-Asian countries and communities worldwide. It is relatively hard to recognize Urdu text compared to other languages due to its cursive writing style. The Urdu text script belongs to a non-Latin cursive family script like Arabic, Hindi and Chinese. Urdu is written in several writing styles, among which ‘Nastaleeq’ is the most popular and widely used font style. A gap still poses a challenge for localization/detection and recognition of Urdu Nastaleeq text as it follows modified version of Arabic script. This research study presents a methodology to recognize and classify Urdu text in Nastaleeq font, regardless of the text position in the image. The proposed solution is comprised of a two-step methodology. In the first step, text detection is performed using the Connected Component Analysis (CCA) and Long Short-Term Memory Neural Network (LSTM). In the second step, a hybrid Convolution Neural Network and Recurrent Neural Network (CNN-RNN) architecture is deployed to recognize the detected text. The image containing Urdu text is binarized and segmented to produce a single-line text image fed to the hybrid CNN-RNN model, which recognizes the text and saves it in a text file. The proposed technique outperforms the existing ones by achieving an overall accuracy of 97.47%.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference66 articles.

1. (2022, November 21). Hindustani Language. Available online: https://www.britannica.com/topic/Hindustani-language.

2. World Data.info (2022, October 22). Urdu as Language—Urdu Speaking Countires. Available online: https://www.worlddata.info/languages/urdu.php.

3. Computers & Writing Systems (2022, January 17). Nastaliq Navees Features—Preffered Urdu Language Script. Available online: https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=nastaliq_features.

4. Segmentation Free Nastalique Urdu OCR;Javed;Int. J. Comput. Inf. Eng.,2010

5. Segmentation-free optical character recognition for printed Urdu text;Siddiqi;EURASIP J. Image Video Process.,2017

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Review of Pattern Recognition and Machine Learning;Journal of Machine and Computing;2024-01-05

2. A Unified Architecture for Urdu Printed and Handwritten Text Recognition;Lecture Notes in Computer Science;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3