Advanced Deep Learning Techniques for High-Quality Synthetic Thermal Image Generation-Reference-Cited by-同舟云学术

Advanced Deep Learning Techniques for High-Quality Synthetic Thermal Image Generation

Published:2023-10-27 Issue:21 Volume:11 Page:4446
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Pavez Vicente¹,Hermosilla Gabriel¹^ORCID,Silva Manuel¹^ORCID,Farias Gonzalo¹^ORCID

Affiliation:

1. Escuela de Ingeniería Eléctrica, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2147, Valparaíso 2362804, Chile

Abstract

In this paper, we introduce a cutting-edge system that leverages state-of-the-art deep learning methodologies to generate high-quality synthetic thermal face images. Our unique approach integrates a thermally fine-tuned Stable Diffusion Model with a Vision Transformer (ViT) classifier, augmented by a Prompt Designer and Prompt Database for precise image generation control. Through rigorous testing across various scenarios, the system demonstrates its capability in producing accurate and superior-quality thermal images. A key contribution of our work is the development of a synthetic thermal face image database, offering practical utility for training thermal detection models. The efficacy of our synthetic images was validated using a facial detection model, achieving results comparable to real thermal face images. Specifically, a detector fine-tuned with real thermal images achieved a 97% accuracy rate when tested with our synthetic images, while a detector trained exclusively on our synthetic data achieved an accuracy of 98%. This research marks a significant advancement in thermal image synthesis, paving the way for its broader application in diverse real-world scenarios.

Funder

FONDECYT

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/11/21/4446/pdf

Reference57 articles.

1. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv.

2. Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. (2023, January 23–29). Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA.

3. OpenAI (2023). GPT-4 Technical Report. arXiv.

4. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022, January 18–24). High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.

5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021). An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Comprehensive Survey of Hypermedia System for Text- to-Image Conversion Using Generative AI;Advances in Computational Intelligence and Robotics;2024-06-28

2. Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning;Electronics;2023-12-07