Lightweight Scene Text Recognition Based on Transformer
Author:
Luan Xin123, Zhang Jinwei123, Xu Miaomiao123, Silamu Wushouer123, Li Yanbing123
Affiliation:
1. College of Information Science and Engineering, Xinjiang University, No. 777 Huarui Street, Urumqi 830017, China 2. Xinjiang Laboratory of Multi-Language Information Technology, Xinjiang University, No. 777 Huarui Street, Urumqi 830017, China 3. Xinjiang Multilingual Information Technology Research Center, Xinjiang University, No. 777 Huarui Street, Urumqi 830017, China
Abstract
Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load.
Funder
National Natural Science Foundation of China Joint Fund Project
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference51 articles.
1. Mandavia, K., Badelia, P., Ghosh, S., and Chaudhuri, A. (2017). Optical Character Recognition Systems for Different Languages with Soft Computing, Springer. 2. Twenty years of document image analysis in PAMI;Nagy;IEEE Trans. Pattern Anal. Mach. Intell.,2000 3. Automatic number plate recognition system (anpr): A survey;Patel;Int. J. Comput. Appl.,2013 4. Laroca., R., Cardoso., E.V., Lucio., D.R., Estevam., V., and Menotti., D. (2022, January 6–8). On the Cross-dataset Generalization in License Plate Recognition. Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Vienna, Austria. 5. Hwang, W., Kim, S., Seo, M., Yim, J., Park, S., Park, S., Lee, J., Lee, B., and Lee, H. (2019, January 8–14). Post-OCR parsing: Building simple and robust parser via BIO tagging. Proceedings of the Workshop on Document Intelligence at NeurIPS 2019, Vancouver, BC, Canada.
|
|