Affiliation:
1. Bahauddin Zakariya University
2. Women University
Abstract
Abstract
Optical character recognition has received significant research focus to digitize the text in images. Urdu OCR is a difficult task as compared to English and similar languages due to its complex nature where a character can have multiple inflections depending upon its position in the word. The proposed research work presents segmentation-free approach (i.e. holistic approach) for offline Urdu printed text detection. To extract text lines in an image, horizontal histogram projection is employed whereas for ligature segmentation in extracted image text line, proposed technique has used connected components labelling. In this model, set of 14 statistical features along with HOG features are extracted for each sub-word/ligature and used for the training of the proposed model. An open-source dataset UPTI [10] has been used to train and test the proposed algorithm. SVM with RBF kernel function is used for the classification of ligatures. The proposed algorithm has achieved 97.3% character recognition rate on given dataset.
Publisher
Research Square Platform LLC
Reference24 articles.
1. Naz S, Umar AI, Ahmed R et al (2010) Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5, (2016)
2. Ud Din I, Siddiqi I, Khalid S et al (2017) Segmentation-free optical character recognition for printed Urdu text. J Image Video Proc. 62 (2017). https://doi.org/10.1186/s13640-017-0208-z
3. Tofik, Ali Tauseef Ahmad and Mohd. Imran “UOCR: A Ligature Based Approach for an Urdu OCR System” March 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)
4. Uddin I, Siddiqi I, Khalid S (2017) "A Holistic Approach for Recognition of Complete Urdu Ligatures Using Hidden Markov Models," 2017 International Conference on Frontiers of Information Technology (FIT), pp. 155–160, DOI: 10.1109/FIT.2017.00035
5. Pal U, and Anirban Sarkar (2003). Recognition of printed Urdu script. In Proc. 7th International Conference on Document AnalysisRecognition (ICDAR), pages 1183–1187,
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献