End-to-End Training of VAE-GAN Network for Text Detection-Reference-Cited by-同舟云学术

End-to-End Training of VAE-GAN Network for Text Detection

Published:2023-05-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Naveen Palanichamy¹

Affiliation:

1. Sri Eshwar College of Engineering

Abstract

Abstract Scene text detection is challenging due to variations in text appearance, backgrounds, and orientations. It is necessary to improve robustness, accuracy, and efficiency for applications like OCR, image understanding, and autonomous vehicles. The combination of Generative Adversarial Network (GAN) and Network Variational Autoencoder (VAE) has the potential to create a more robust and powerful text detection network. The proposed network comprises three modules: the VAE module, the GAN module, and the text detection module. The VAE module generates diverse and variable text regions, while the GAN module refines and enhances these regions to make them more realistic and accurate. The text detection module is responsible for detecting text regions in the input image and assigning a confidence score to each region. During training, the entire network is trained end-to-end to minimize a joint loss function, which includes the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures that the generated text regions are diverse and variable, while the GAN loss ensures that the generated text regions are realistic and accurate. The text detection loss guarantees that the network can detect text regions in the input image with high accuracy. The proposed method employs an encoder-decoder structure in the VAE module and a generator-discriminator structure in the GAN module. The generated text regions are refined and enhanced by the GAN module to produce more accurate results. The text detection module then identifies the text regions with high confidence scores. The proposed network is tested on several datasets, including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text and achieved promising results.

Publisher

Research Square Platform LLC

Reference31 articles.

1. A review of natural scene text detection methods;Yang L;Procedia Computer Science,2022

2. A decade: Review of scene text detection methods;Rainarli E;Computer Science Review,2021

3. Deep learning approaches to scene text detection: a comprehensive review;Khan T;Artif Intell Rev,2021

4. Traditional to transfer learning progression on scene text detection and recognition: a survey;Gupta N;Artif Intell Rev,2022

5. Scene text detection and recognition: recent advances and future trends;Zhu Y;Front. Comput. Sci.,2016

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transforming Scene Text Detection and Recognition: A Multi-Scale End-to-End Approach With Transformer Framework;IEEE Access;2024