Weakly Supervised Semantic Segmentation in Aerial Imagery via Cross-Image Semantic Mining-Reference-Cited by-同舟云学术

Weakly Supervised Semantic Segmentation in Aerial Imagery via Cross-Image Semantic Mining

Published:2023-02-10 Issue:4 Volume:15 Page:986
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Zhou Ruixue¹²³^ORCID,Yuan Zhiqiang¹²³^ORCID,Rong Xuee¹²³,Ma Weicong⁴,Sun Xian¹³,Fu Kun¹³,Zhang Wenkai¹³

Affiliation:

1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China

2. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China

3. Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China

4. Xi’an Institute of Applied Optics, Xi’an 710065, China

Abstract

Weakly Supervised Semantic Segmentation (WSSS) with only image-level labels reduces the annotation burden and has been rapidly developed in recent years. However, current mainstream methods only employ a single image’s information to localize the target and do not account for the relationships across images. When faced with Remote Sensing (RS) images, limited to complex backgrounds and multiple categories, it is challenging to locate and differentiate between the categories of targets. As opposed to previous methods that mostly focused on single-image information, we propose CISM, a novel cross-image semantic mining WSSS framework. CISM explores cross-image semantics in multi-category RS scenes for the first time with two novel loss functions: the Common Semantic Mining (CSM) loss and the Non-common Semantic Contrastive (NSC) loss. In particular, prototype vectors and the Prototype Interactive Enhancement (PIE) module were employed to capture semantic similarity and differences across images. To overcome category confusions and closely related background interferences, we integrated the Single-Label Secondary Classification (SLSC) task and the corresponding single-label loss into our framework. Furthermore, a Multi-Category Sample Generation (MCSG) strategy was devised to balance the distribution of samples among various categories and drastically increase the diversity of images. The above designs facilitated the generation of more accurate and higher-granularity Class Activation Maps (CAMs) for each category of targets. Our approach is superior to the RS dataset based on extensive experiments and is the first WSSS framework to explore cross-image semantics in multi-category RS scenes and obtain cutting-edge state-of-the-art results on the iSAID dataset by only using image-level labels. Experiments on the PASCAL VOC2012 dataset also demonstrated the effectiveness and competitiveness of the algorithm, which pushes the mean Intersection-Over-Union (mIoU) to 67.3% and 68.5% on the validation and test sets of PASCAL VOC2012, respectively.

Funder

National Science Fund for Distinguished Young Scholars

Surface of the State Natural Science Fund projects

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/15/4/986/pdf

Reference64 articles.

1. Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv.

2. Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21–26). Pyramid scene parsing network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.

3. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8–14). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the Computer Vision—ECCV 2018, Munich, Germany.

4. Pu, M., Huang, Y., Guan, Q., and Qi, Z. (2018, January 22–26). Graphnet: Learning image pseudo annotations for weakly supervised semantic segmentation. Proceedings of the 2018 ACM Multimedia Conference, Seoul, Republic of Korea.

5. Vernaza, P., and Chandraker, M. (2017, January 21–26). Learning random-walk label propagation for weakly supervised semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Few-Shot Rotation-Invariant Aerial Image Semantic Segmentation;IEEE Transactions on Geoscience and Remote Sensing;2024

2. Efficient and Controllable Remote Sensing Fake Sample Generation Based on Diffusion Model;IEEE Transactions on Geoscience and Remote Sensing;2023