Author:
Kazmi M.,Yasir F.,Habib S.,Hayat M. S.,Qazi S. A.
Abstract
Urdu Optical Character Recognition (OCR) based on character level recognition (analytical approach) is less popular as compared to ligature level recognition (holistic approach) due to its added complexity, characters and strokes overlapping. This paper presents a holistic approach Urdu ligature extraction technique. The proposed Photometric Ligature Extraction (PLE) technique is independent of font size and column layout and is capable to handle non-overlapping and all inter and intra overlapping ligatures. It uses a customized photometric filter along with the application of X-shearing and padding with connected component analysis, to extract complete ligatures instead of extracting primary and secondary ligatures separately. A total of ~ 2,67,800 ligatures were extracted from scanned Urdu Nastaliq printed text images with an accuracy of 99.4%. Thus, the proposed framework outperforms the existing Urdu Nastaliq text extraction and segmentation algorithms. The proposed PLE framework can also be applied to other languages using the Nastaliq script style, languages such as Arabic, Persian, Pashto, and Sindhi.
Publisher
Engineering, Technology & Applied Science Research
Reference33 articles.
1. A. Wali and S. Hussain, "Context Sensitive Shape-Substitution in Nastaliq Writing System: Analysis and Formulation," 2007, pp. 53–58, https://doi.org/10.1007/978-1-4020-6268-1_10.
2. S. T. Javed and S. Hussain, "Segmentation Based Urdu Nastalique OCR," in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 2013, pp. 41–49, https://doi.org/10.1007/978-3-642-41827-3_6.
3. I. U. Din, Z. Malik, I. Siddiqi, and S. Khalid, "Line and Ligature Segmentation in Printed Urdu Document Images," presented at the 3rd International Conference on Computational and Social Sciences, Oct. 2015.
4. S. Naz, A. I. Umar, S. B. Ahmed, S. H. Shirazi, M. Imran Razzak, and I. Siddiqi, "An Ocr system for printed Nasta’liq script: A segmentation based approach," in 17th IEEE International Multi Topic Conference 2014, Dec. 2014, pp. 255–259, https://doi.org/10.1109/INMIC.2014.7097347.
5. H. R. Khan, M. A. Hasan, M. Kazmi, N. Fayyaz, H. Khalid, and S. A. Qazi, "A Holistic Approach to Urdu Language Word Recognition using Deep Neural Networks," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7140–7145, Jun. 2021, https://doi.org/10.48084/etasr.4143.