A Three-Stage Uyghur Recognition Model Combining the Attention Mechanism and Different Convolutional Recurrent Networks-Reference-Cited by-同舟云学术

A Three-Stage Uyghur Recognition Model Combining the Attention Mechanism and Different Convolutional Recurrent Networks

Published:2023-08-23 Issue:17 Volume:13 Page:9539
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Li Wentao¹^ORCID,Zhang Yuduo¹^ORCID,Huang Yongdong¹^ORCID,Shen Yue¹^ORCID,Wang Zhe¹

Affiliation:

1. College of Science, Dalian Minzu University, Dalian 116600, China

Abstract

Uyghur text recognition faces several challenges in the field due to the scarcity of publicly available datasets and the intricate nature of the script characterized by strong ligatures and unique attributes. In this study, we propose a unified three-stage model for Uyghur language recognition. The model is developed using a self-constructed Uyghur text dataset, enabling evaluation of previous Uyghur text recognition modules as well as exploration of novel module combinations previously unapplied to Uyghur text recognition, including Convolutional Recurrent Neural Networks (CRNNs), Gated Recurrent Convolutional Neural Networks (GRCNNs), ConvNeXt, and attention mechanisms. Through a comprehensive analysis of the accuracy, time, normalized edit distance, and memory requirements of different module combinations on a consistent training and evaluation dataset, we identify the most suitable text recognition structure for Uyghur text. Subsequently, utilizing the proposed approach, we train the model weights and achieve optimal recognition of Uyghur text using the ConvNeXt+Bidirectional LSTM+attention mechanism structure, achieving a notable accuracy of 90.21%. These findings demonstrate the strong generalization and high precision exhibited by Uyghur text recognition based on the proposed model, thus establishing its potential practical applications in Uyghur text recognition.

Funder

Liaoning Ning Department of Science and Technology, Natural Science Foundation of Liaoning Province

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/17/9539/pdf

Reference36 articles.

1. Graves, A., Fernández, S., and Gomez, F. (2006, January 25–29). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA.

2. Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv.

3. Convolutional Neural Networks With Gated Recurrent Connections;Wang;IEEE Trans. Pattern Anal. Mach. Intell.,2022

4. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

5. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 18–24). A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Collaborative Encoding Method for Scene Text Recognition in Low Linguistic Resources: The Uyghur Language Case Study;Applied Sciences;2024-02-20