An End-to-End Scene Text Recognition for Bilingual Text
-
Published:2024-09-09
Issue:9
Volume:8
Page:117
-
ISSN:2504-2289
-
Container-title:Big Data and Cognitive Computing
-
language:en
-
Short-container-title:BDCC
Author:
Albalawi Bayan M.12, Jamal Amani T.1, Al Khuzayem Lama A.1, Alsaedi Olaa A.1ORCID
Affiliation:
1. Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia 2. Department of Computer Science, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia
Abstract
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily focused on recognizing English text, whereas Arabic text has been underrepresented, and (2) most prior research has adopted separate approaches for scene text localization and recognition, as opposed to one integrated framework. To address these gaps, we propose a novel bilingual end-to-end approach that localizes and recognizes both Arabic and English text within a single natural scene image. Specifically, our approach utilizes pre-trained CNN models (ResNet and EfficientNetV2) with kernel representation for localization text and RNN models (LSTM and BiLSTM) with an attention mechanism for text recognition. In addition, the AraElectra Arabic language model was incorporated to enhance Arabic text recognition. Experimental results on the EvArest, ICDAR2017, and ICDAR2019 datasets demonstrated that our model not only achieves superior performance in recognizing horizontally oriented text but also in recognizing multi-oriented and curved Arabic and English text in natural scene images.
Reference100 articles.
1. Wang, C., Bochkovskiy, A., and Liao, H. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv. 2. Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., and Shen, C. (November, January 27). Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea. 3. Moran: A multi-object rectified attention network for scene text recognition;Luo;Pattern Recognit.,2019 4. A bilingual text detection in natural images using heuristic and unsupervised learning;Bayatpour;J. AI Data Min.,2022 5. Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., and Jin, L. (2022, January 18–24). Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. Proceedings of the IEEE/CVF Conference on Compute Vision and Pattern Recognition, New Orleans, LA, USA.
|
|