EA-ConvNeXt: An Approach to Script Identification in Natural Scenes Based on Edge Flow and Coordinate Attention
-
Published:2023-06-27
Issue:13
Volume:12
Page:2837
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Zhang Zhiyun1, Eli Elham1, Mamat Hornisa1, Aysa Alimjan12ORCID, Ubul Kurban12ORCID
Affiliation:
1. School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China 2. Xinjiang Key Laboratory of Multilingual Information Technology, Xinjiang University, Urumqi 830046, China
Abstract
In multilingual scene text understanding, script identification is an important prerequisite step for text image recognition. Due to the complex background of text images in natural scenes, severe noise, and common symbols or similar layouts in different language families, the problem of script identification has not been solved. This paper proposes a new script identification method based on ConvNext improvement, namely EA-ConvNext. Firstly, the method of generating an edge flow map from the original image is proposed, which increases the number of scripts and reduces background noise. Then, based on the feature information extracted by the convolutional neural network ConvNeXt, a coordinate attention module is proposed to enhance the description of spatial position feature information in the vertical direction. The public dataset SIW-13 has been expanded, and the Uyghur script image dataset has been added, named SIW-14. The improved method achieved identification rates of 97.3%, 93.5%, and 92.4% on public script identification datasets CVSI-2015, MLe2e, and SIW-13, respectively, and 92.0% on the expanded dataset SIW-14, verifying the superiority of this method.
Funder
Natural Science Foundation of China
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference39 articles.
1. Banu, J.F., Muneeshwari, P., Raja, K., Suresh, S., Latchoumi, T.P., and Deepan, S. (2022, January 27–28). Ontology based image retrieval by utilizing model annotations and content. Proceedings of the 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India. 2. Beyond english-centric multilingual machine translation;Fan;J. Mach. Learn. Res.,2021 3. Sumbul, G., Charfuelan, M., Demir, B., and Markl, V. (August, January 28). Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan. 4. Islam, N., Islam, Z., and Noor, N. (2017). A survey on optical character recognition system. arXiv. 5. Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., and Wang, W. (2020, January 13–19). Seed: Semantics enhanced encoder-decoder framework for scene text recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|