Author:
Zhou Haibin,Shao Lujiao,Jia Boxiang,Zhang Haijun
Abstract
AbstractText, which is regarded as one of the important clues for visual recognition, can provide rich and accurate high-level semantic information. Therefore, the detection and recognition of textual data have become a research hotspot in computer vision and artificial intelligence. However, the difficulty of data collection and the non-uniform distribution of characters still poses challenges for accurate text recognition, especially for recognizing complicated character sets, such as Chinese. To address small-sample text recognition, we propose an improved image-based text transfer framework, named $$\mathrm T^2$$
T
2
Net. This work can replace or modify the text content in an image so as to arbitrarily expand a recognition data set. Considering that the main challenge of text transfer lies in decoupling the complex interrelationship between text and background, a text content mask branch is first added into a background inpainting module so as to more realistically restore background textures. Second, a text recognition model is developed to guide the readability of the text transfer results in the text conversion module. Finally, a text fusion module is used to fuse the independent migrations of background and text. We examined the performance of our proposed framework in a real-word scene text recognition data set. Qualitative and quantitative results have proved the efficiency of our method in comparison with previous works.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Guangdong Province
Shenzhen Science and Technology Program
HITSZ-J&A Joint Laboratory of Digital Design and Intelligent Fabrication
Publisher
Springer Science and Business Media LLC
Reference24 articles.
1. Chen R, Huang W, Huang B, Sun F, Fang B (2020) Reusing discriminators for encoding: Towards unsupervised image-to-image translation. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 8165–8174
2. Cubuk ED, Zoph B, Mané D, Vasudevan V, Le QV (2019) Autoaugment: learning augmentation strategies from data. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 113–123
3. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. Conference and Workshop on Neural Information Processing Systems, 2672–2680
4. Gunna S, Saluja R, Jawahar CV (2021) Towards boosting the accuracy of non-latin scene text recognition. Proceedings of International Conference on Document Analysis and Recognition 41:1611–3349
5. Gupta A, Vedaldi A, Z A (2016) Synthetic data for text localisation in natural images. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2315–2324