Advanced Deep Learning Techniques for High-Quality Synthetic Thermal Image Generation

Author:

Pavez Vicente1,Hermosilla Gabriel1ORCID,Silva Manuel1ORCID,Farias Gonzalo1ORCID

Affiliation:

1. Escuela de Ingeniería Eléctrica, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2147, Valparaíso 2362804, Chile

Abstract

In this paper, we introduce a cutting-edge system that leverages state-of-the-art deep learning methodologies to generate high-quality synthetic thermal face images. Our unique approach integrates a thermally fine-tuned Stable Diffusion Model with a Vision Transformer (ViT) classifier, augmented by a Prompt Designer and Prompt Database for precise image generation control. Through rigorous testing across various scenarios, the system demonstrates its capability in producing accurate and superior-quality thermal images. A key contribution of our work is the development of a synthetic thermal face image database, offering practical utility for training thermal detection models. The efficacy of our synthetic images was validated using a facial detection model, achieving results comparable to real thermal face images. Specifically, a detector fine-tuned with real thermal images achieved a 97% accuracy rate when tested with our synthetic images, while a detector trained exclusively on our synthetic data achieved an accuracy of 98%. This research marks a significant advancement in thermal image synthesis, paving the way for its broader application in diverse real-world scenarios.

Funder

FONDECYT

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Reference57 articles.

1. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv.

2. Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. (2023, January 23–29). Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA.

3. OpenAI (2023). GPT-4 Technical Report. arXiv.

4. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022, January 18–24). High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.

5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021). An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3