Affiliation:
1. Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Beijing Normal University—Hong Kong Baptist University United International College, Zhuhai 519087, China
Abstract
Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.
Funder
BNU-HKBU United International College
Guangdong Higher Education Key Platform and Research Project
Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science
Subject
Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)
Reference26 articles.
1. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition;Shi;IEEE Trans. Pattern Anal. Mach. Intell.,2017
2. Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014, January 24–27). A convolutional neural network for modelling sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA.
3. Liu, Y., Chen, H., Shen, C., He, T., Jin, L., and Wang, L. (2020, January 7–12). Real-time scene text detection with differentiable binarization. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.
4. Multi-scale retinex for color image enhancement;Rahman;Proceedings of the International Conference on Image Processing,1996
5. Shi, B., Bai, X., and Yao, C. (2017, January 21–26). Detecting Oriented Text in Natural Images by Linking Segments. Proceedings of the Computer Vision and Pattern Recognition, Honolulu, HI, USA.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献