JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation-Reference-Cited by-同舟云学术

JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation

Published:2024-01-10 Issue:1 Volume:5 Page:
ISSN:2661-8907
Container-title:SN Computer Science
language:en
Short-container-title:SN COMPUT. SCI.

Author:

Muralidhara Shishir,Jagadeesh Sravan Kumar,Schuster René^ORCID,Stricker Didier

Abstract

AbstractPart-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity. More precisely, semantic areas, object instances, and semantic parts are predicted simultaneously. In this paper, we present our joint panoptic part fusion (JPPF) that combines the three individual segmentations effectively to obtain a panoptic-part segmentation. Two aspects are of utmost importance for this: first, a unified model for the three problems is desired that allows for mutually improved and consistent representation learning. Second, balancing the combination so that it gives equal importance to all individual results during fusion. Our proposed JPPF is parameter-free and dynamically balances its input. The method is evaluated and compared on the Cityscapes panoptic parts (CPP) and Pascal panoptic parts (PPP) datasets in terms of PartPQ and Part-Whole Quality (PWQ). In extensive experiments, we verify the importance of our fair fusion, highlight its most significant impact for areas that can be further segmented into parts, and demonstrate the generalization capabilities of our design without fine-tuning on 5 additional datasets.

Funder

Bundesministerium für Bildung und Forschung

Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI)

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s42979-023-02499-1.pdf

Reference64 articles.

1. Bulo SR, Porzi L, Kontschieder P. In-place activated batchnorm for memory-optimized training of DNNs. In: Conference on Computer Vision and Pattern Recognition (CVPR). 2018.

2. Chen L, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. 2017.

3. Chen L, Collins MD, Zhu Y, et al. Searching for efficient multi-scale architectures for dense image prediction. Adv Neural Inf Process Syst (NeurIPS). 2018.

4. Chen LC, Zhu Y, Papandreou G, et al. Encoder–decoder with atrous separable convolution for semantic image segmentation. In: European Conference on Computer Vision (ECCV). 2018.

5. Cheng B, Collins MD, Zhu Y, et al. Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR). 2020.