Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive Learning-Reference-Cited by-同舟云学术

Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive Learning

Published:2024-05-16 Issue:7 Volume:20 Page:1-21
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Fu Huiyuan¹^ORCID,Liu Jin¹^ORCID,Yu Ting¹^ORCID,Wang Xin²^ORCID,Ma Huadong¹^ORCID

Affiliation:

1. Beijing University of Posts and Telecommunications, Beijing, China

2. Stony Brook University, Stony Brook, United States

Abstract

The objective of multi-domain image-to-image translation is to learn the mapping from a source domain to a target domain in multiple image domains while preserving the content representation of the source domain. Despite the importance and recent efforts, most previous studies disregard the large style discrepancy between images and instances in various domains, or fail to capture instance details and boundaries properly, resulting in poor translation results for rich scenes. To address these problems, we present an effective architecture for multi-domain image-to-image translation that only requires one generator. Specifically, we provide detailed procedures for capturing the features of instances throughout the learning process, as well as learning the relationship between the style of the global image and that of a local instance in the image by enforcing the cross-granularity consistency. In order to capture local details within the content space, we employ a dual contrastive learning strategy that operates at both the instance and patch levels. Extensive studies on different multi-domain image-to-image translation datasets reveal that our proposed method outperforms state-of-the-art approaches.

Funder

NSFC

National Key R&D Program of China

Beijing Nova Program

nnovation Research Group Project of NSFC

111 Project

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3656048

Reference69 articles.

1. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning. PMLR, Sydney, NSW, Australia, 214–223.

2. Deblina Bhattacharjee, Seungryong Kim, Guillaume Vizier, and Mathieu Salzmann. 2020. DUNIT: Detection-based unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation/IEEE, Seattle, WA, USA, 4787–4796.

3. Ruichu Cai, Zijian Li, Pengfei Wei, Jie Qiao, Kun Zhang, and Zhifeng Hao. 2019. Learning disentangled semantic representation for domain adaptation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Vol. 2019. NIH Public Access, ijcai.org, Macao, China, 2060.

4. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

5. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. PMLR, Virtual Event, 1597–1607.