Text2Light

Author:

Chen Zhaoxi1,Wang Guangcong1,Liu Ziwei1

Affiliation:

1. Nanyang Technological University, Singapore

Abstract

High-quality HDRIs (High Dynamic Range Images), typically HDR panoramas, are one of the most popular ways to create photorealistic lighting and 360-degree reflections of 3D scenes in graphics. Given the difficulty of capturing HDRIs, a versatile and controllable generative model is highly desired, where layman users can intuitively control the generation process. However, existing state-of-the-art methods still struggle to synthesize high-quality panoramas for complex scenes. In this work, we propose a zero-shot text-driven framework, Text2Light , to generate 4K+ resolution HDRIs without paired training data. Given a free-form text as the description of the scene, we synthesize the corresponding HDRI with two dedicated steps: 1 ) text-driven panorama generation in low dynamic range (LDR) and low resolution (LR), and 2 ) super-resolution inverse tone mapping to scale up the LDR panorama both in resolution and dynamic range. Specifically, to achieve zero-shot text-driven panorama generation, we first build dual codebooks as the discrete representation for diverse environmental textures. Then, driven by the pre-trained Contrastive Language-Image Pre-training (CLIP) model, a text-conditioned global sampler learns to sample holistic semantics from the global codebook according to the input text. Furthermore, a structure-aware local sampler learns to synthesize LDR panoramas patch-by-patch, guided by holistic semantics. To achieve super-resolution inverse tone mapping, we derive a continuous representation of 360-degree imaging from the LDR panorama as a set of structured latent codes anchored to the sphere. This continuous representation enables a versatile module to upscale the resolution and dynamic range simultaneously. Extensive experiments demonstrate the superior capability of Text2Light in generating high-quality HDR panoramas. In addition, we show the feasibility of our work in realistic rendering and immersive VR.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Reference57 articles.

1. Inverse tone mapping

2. Sam Bond-Taylor , Peter Hessey , Hiroshi Sasaki , Toby P. Breckon , and Chris G . Willcocks . 2021 . Unleashing Transformers : Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes . arXiv:2111.12701 [cs] (Nov. 2021). arXiv: 2111.12701. Sam Bond-Taylor, Peter Hessey, Hiroshi Sasaki, Toby P. Breckon, and Chris G. Willcocks. 2021. Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes. arXiv:2111.12701 [cs] (Nov. 2021). arXiv: 2111.12701.

3. Huiwen Chang , Han Zhang , Lu Jiang , Ce Liu , and William T . Freeman . 2022 . MaskGIT: Masked Generative Image Transformer . arXiv:2202.04200 [cs] (Feb. 2022). arXiv:2202.04200. Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T. Freeman. 2022. MaskGIT: Masked Generative Image Transformer. arXiv:2202.04200 [cs] (Feb. 2022). arXiv:2202.04200.

4. Guanying Chen , Chaofeng Chen , Shi Guo , Zhetong Liang , Kwan-Yee K. Wong , and Lei Zhang . 2021 . HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 2482--2491 . Guanying Chen, Chaofeng Chen, Shi Guo, Zhetong Liang, Kwan-Yee K. Wong, and Lei Zhang. 2021. HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 2482--2491.

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Deeper Analysis of Volumetric Relightable Faces;International Journal of Computer Vision;2023-10-31

2. 360-Degree Panorama Generation from Few Unregistered NFoV Images;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

3. AI Generated Content in the Metaverse: Risks and Mitigation Strategies;2023 International Symposium on Networks, Computers and Communications (ISNCC);2023-10-23

4. Simonstown: An AI-facilitated Interactive Story of Love, Life, and Pandemic;Proceedings of the 16th International Symposium on Visual Information Communication and Interaction;2023-09-22

5. sPellorama: An Immersive Prototyping Tool using Generative Panorama and Voice-to-Prompts;ACM SIGGRAPH 2023 Posters;2023-07-23

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3