Urdu text in natural scene images: a new dataset and preliminary text detection-Reference-Cited by-同舟云学术

Urdu text in natural scene images: a new dataset and preliminary text detection

Published:2021-09-16 Issue: Volume:7 Page:e717
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Ali Hazrat¹^ORCID,Iqbal Khalid²^ORCID,Mujtaba Ghulam³,Fayyaz Ahmad³,Bulbul Mohammad Farhad⁴^ORCID,Karam Fazal Wahab³,Zahir Ali³

Affiliation:

1. Department of Electrical and Computer Engineering, COMSATS University Islamabad,Abbottabad Campus, Abbottabad, Pakistan

2. Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Pakistan

3. Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan

4. Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh

Abstract

Text detection in natural scene images for content analysis is an interesting task. The research community has seen some great developments for English/Mandarin text detection. However, Urdu text extraction in natural scene images is a task not well addressed. In this work, firstly, a new dataset is introduced for Urdu text in natural scene images. The dataset comprises of 500 standalone images acquired from real scenes. Secondly, the channel enhanced Maximally Stable Extremal Region (MSER) method is applied to extract Urdu text regions as candidates in an image. Two-stage filtering mechanism is applied to eliminate non-candidate regions. In the first stage, text and noise are classified based on their geometric properties. In the second stage, a support vector machine classifier is trained to discard non-text candidate regions. After this, text candidate regions are linked using centroid-based vertical and horizontal distances. Text lines are further analyzed by a different classifier based on HOG features to remove non-text regions. Extensive experimentation is performed on the locally developed dataset to evaluate the performance. The experimental results show good performance on test set images. The dataset will be made available for research use. To the best of our knowledge, the work is the first of its kind for the Urdu language and would provide a good dataset for free research use and serve as a baseline performance on the task of Urdu text extraction.

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-717.pdf

Reference29 articles.

1. Deep learning based isolated arabic scene character recognition;Ahmed,2017

2. Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network;Ali;SN Applied Sciences,2020

3. Urdu-text detection and recognition in natural scene images using deep learning;Arafat;IEEE Access,2020

4. Exploring geometric property thresholds for filtering non-text regions in a connected component based text detection application;Brooks,2017

5. Character classification and recognition for Urdu texts in natural scene images;Chandio,2018

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cyberbullying Detection and Abuser Profile Identification on Social Media for Roman Urdu;IEEE Access;2024

2. Leukocyte subtype classification with multi-model fusion;Medical & Biological Engineering & Computing;2023-04-03

3. Estimating Double Cropping Plantations in the Brazilian Cerrado through PlanetScope Monthly Mosaics;Land;2023-02-28

4. Soil Surface Texture Classification Using RGB Images Acquired Under Uncontrolled Field Conditions;IEEE Access;2023

5. Handwritten Urdu Characters and Digits Recognition Using Transfer Learning and Augmentation With AlexNet;IEEE Access;2022