Text Image Super-Resolution Guided by Text Structure and Embedding Priors-Reference-Cited by-同舟云学术

Text Image Super-Resolution Guided by Text Structure and Embedding Priors

Published:2023-07-12 Issue:6 Volume:19 Page:1-18
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Huang Cong¹^ORCID,Peng Xiulian²^ORCID,Liu Dong¹^ORCID,Lu Yan²^ORCID

Affiliation:

1. University of Science and Technology of China, China

2. Microsoft Research, China

Abstract

We aim to super-resolve text images from unrecognizable low-resolution inputs. Existing super-resolution methods mainly learn a direct mapping from low-resolution to high-resolution images by exploring low-level features, which usually generate blurry outputs and suffer from severe structure distortion for text parts, especially when the resolution is quite low. Both the visual quality and the readability will suffer. To tackle these issues, we propose a new text super-resolution paradigm by recovering with understanding. Specifically, we extract a text-embedding prior and a text-structure prior from the upsampled image by learning to understand the text. The two priors with rich structure information and text-embedding information are then used as auxiliary information to recover the clear text structure. In addition, we introduce a text-feature loss to guide the training for better text recognizability. Extensive evaluations on both screen and scene text image datasets show that our method largely outperforms the state-of-the-art in both visual quality and recognition accuracy.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3595924

Reference53 articles.

1. Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? Dataset and model analysis. In Proceedings of ICCV. 4714–4722.

2. Jingye Chen, Bin Li, and Xiangyang Xue. 2021. Scene text telescope: Text-focused scene image super-resolution. In Proceedings of the CVPR. 12026–12035.

3. Zhanzhan Cheng, Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, and Shuigeng Zhou. 2017. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of the ICCV. 5086–5094.

4. Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. 2019. Second-order attention network for single image super-resolution. In Proceedings of the CVPR. 11065–11074.

5. Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. In Proceedings of the ECCV. 184–199.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cross-Modal Face Super-Resolution Based on Quasi-Siamese Domain Transfer Fusion Network;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-08-28

2. Multi Fine-Grained Fusion Network for Depression Detection;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-06-29

3. DeMaskGAN: a de-masking generative adversarial network guided by semantic segmentation;The Visual Computer;2023-11-06

4. Automatic Face Recognition System Using Deep Convolutional Mixer Architecture and AdaBoost Classifier;Applied Sciences;2023-08-31