Affiliation:
1. Department of Computer Science and IT, The University of Lahore 54000, Pakistan
2. School of Computer Sciences, NCBA&E 54000, Pakistan
Abstract
There are two main techniques to convert written or printed text into digital format. The first technique is to create an image of written/printed text, but images are large in size so they require huge memory space to store, as well as text in image form cannot be undergo further processes like edit, search, copy, etc. The second technique is to use an Optical Character Recognition (OCR) system. OCR’s can read documents and convert manual text documents into digital text and this digital text can be processed to extract knowledge. A huge amount of Urdu language’s data is available in handwritten or in printed form that needs to be converted into digital format for knowledge acquisition. Highly cursive, complex structure, bi-directionality, and compound in nature, etc. make the Urdu language too complex to obtain accurate OCR results. In this study, supervised learning-based OCR system is proposed for Nastalique Urdu language. The proposed system evaluations under a variety of experimental settings apprehend 98.4% training results and 97.3% test results, which is the highest recognition rate ever achieved by any Urdu language OCR system. The proposed system is simple to implement especially in software front of OCR system also the proposed technique is useful for printed text as well as handwritten text and it will help in developing more accurate Urdu OCR’s software systems in the future.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献