NRSTRNet: A Novel Network for Noise-Robust Scene Text Recognition-Reference-Cited by-同舟云学术

NRSTRNet: A Novel Network for Noise-Robust Scene Text Recognition

Published:2023-01-23 Issue:1 Volume:16 Page:
ISSN:1875-6883
Container-title:International Journal of Computational Intelligence Systems
language:en
Short-container-title:Int J Comput Intell Syst

Author:

Yue Hongwei,Huang Yufeng,Vong Chi-Man,Jin Yingying,Zeng Zhiqiang,Yu Mingqi,Chen Chuangquan

Abstract

AbstractScene text recognition (STR) has been widely applied in industrial and commercial fields. However, existing methods still face challenges when processing text images with defects such as low contrast, blur, low resolution, and insufficient illumination. These defects are common in actual situations because of diverse text backgrounds in natural scenes and limitations in shooting conditions. To address these challenges, we propose a novel network for noise-robust scene text recognition (NRSTRNet), which comprehensively suppresses the noise in the three critical steps of STR. Specifically, in the text feature extraction stage, NRSTRNet enhances the text-related features through the channel and spatial dimensions and disregards some disturbances from the non-text area, reducing the noise and redundancy in the input image. In the context encoding stage, fine-grained feature coding is proposed to effectively reduce the influence of previous noisy temporal features on current temporal features while simultaneously reducing the impact of partial noise on the overall encoding by sharing contextual feature encoding parameters. In the decoding stage, a self-attention module is added to enhance the connections between different temporal features, thereby leveraging the global information to obtain noise-resistant features. Through these approaches, NRSTRNet can enhance the local semantic information while considering the global semantic information. Experimental results show that the proposed NRSTRNet can improve the ability to characterize text images, enhance stability under the influence of noise, and achieve superior accuracy in text recognition. As a result, our model outperforms SOTA STR models on irregular text recognition benchmarks by 2% on average, and it is exceptionally robust when applied to noisy images.

Funder

Characteristic Innovation Projects of Colleges and Universities of Guangdong Province

Guangdong Basic and Applied Basic Research Foundation

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,General Computer Science

Link

https://link.springer.com/content/pdf/10.1007/s44196-023-00181-1.pdf

Reference39 articles.

1. L. Neumann, J. Matas, Real-time scene text localization and recognition. In 2012 IEEE conference on computer vision and pattern recognition, 2012: IEEE, p. 3538–3545

2. K. Wang, S. Belongie, Word spotting in the wild. In European conference on computer vision, 2010: Springer, p. 591-604