$$\mathrm T^2$$Net: an improved image-based text transfer framework using background inpainting and text conversion-Reference-Cited by-同舟云学术

$$\mathrm T^2$$Net: an improved image-based text transfer framework using background inpainting and text conversion

Published:2023-07-11 Issue:1 Volume:1 Page:
ISSN:2731-667X
Container-title:Industrial Artificial Intelligence
language:en
Short-container-title:Industrial Artificial Intelligence

Author:

Zhou Haibin,Shao Lujiao,Jia Boxiang,Zhang Haijun

Abstract

AbstractText, which is regarded as one of the important clues for visual recognition, can provide rich and accurate high-level semantic information. Therefore, the detection and recognition of textual data have become a research hotspot in computer vision and artificial intelligence. However, the difficulty of data collection and the non-uniform distribution of characters still poses challenges for accurate text recognition, especially for recognizing complicated character sets, such as Chinese. To address small-sample text recognition, we propose an improved image-based text transfer framework, named

$$\mathrm T^2$$

T 2 Net. This work can replace or modify the text content in an image so as to arbitrarily expand a recognition data set. Considering that the main challenge of text transfer lies in decoupling the complex interrelationship between text and background, a text content mask branch is first added into a background inpainting module so as to more realistically restore background textures. Second, a text recognition model is developed to guide the readability of the text transfer results in the text conversion module. Finally, a text fusion module is used to fuse the independent migrations of background and text. We examined the performance of our proposed framework in a real-word scene text recognition data set. Qualitative and quantitative results have proved the efficiency of our method in comparison with previous works.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Guangdong Province

Shenzhen Science and Technology Program

HITSZ-J&A Joint Laboratory of Digital Design and Intelligent Fabrication

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s44244-023-00010-6.pdf

Reference24 articles.

1. Chen R, Huang W, Huang B, Sun F, Fang B (2020) Reusing discriminators for encoding: Towards unsupervised image-to-image translation. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 8165–8174

2. Cubuk ED, Zoph B, Mané D, Vasudevan V, Le QV (2019) Autoaugment: learning augmentation strategies from data. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 113–123

3. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. Conference and Workshop on Neural Information Processing Systems, 2672–2680

4. Gunna S, Saluja R, Jawahar CV (2021) Towards boosting the accuracy of non-latin scene text recognition. Proceedings of International Conference on Document Analysis and Recognition 41:1611–3349

5. Gupta A, Vedaldi A, Z A (2016) Synthetic data for text localisation in natural images. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2315–2324