Cluster2Former: Semisupervised Clustering Transformers for Video Instance Segmentation-Reference-Cited by-同舟云学术

Cluster2Former: Semisupervised Clustering Transformers for Video Instance Segmentation

Published:2024-02-03 Issue:3 Volume:24 Page:997
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Fóthi Áron¹^ORCID,Szlatincsán Adrián¹^ORCID,Somfai Ellák¹²^ORCID

Affiliation:

1. Department of Artificial Intelligence, ELTE Eötvös Loránd University, 1053 Budapest, Hungary

2. HUN-REN Wigner Research Centre for Physics, 1121 Budapest, Hungary

Abstract

A novel approach for video instance segmentation is presented using semisupervised learning. Our Cluster2Former model leverages scribble-based annotations for training, significantly reducing the need for comprehensive pixel-level masks. We augment a video instance segmenter, for example, the Mask2Former architecture, with similarity-based constraint loss to handle partial annotations efficiently. We demonstrate that despite using lightweight annotations (using only 0.5% of the annotated pixels), Cluster2Former achieves competitive performance on standard benchmarks. The approach offers a cost-effective and computationally efficient solution for video instance segmentation, especially in scenarios with limited annotation resources.

Funder

Artificial Intelligence National Laboratory

National Research, Development, and Innovation Fund of Hungary

Ministry of Culture and Innovation of Hungary from the National Research, Development, and Innovation Fund

Governmental Agency for IT Development (KIFÜ) in Hungary

Robert Bosch, Ltd.

Publisher

MDPI AG

Link

https://www.mdpi.com/1424-8220/24/3/997/pdf

Reference44 articles.

1. Yang, L., Fan, Y., and Xu, N. (November, January 27). Video instance segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea.

2. Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. (2020, January 13–19). Bdd100k: A diverse driving dataset for heterogeneous multitask learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.

3. Occluded video instance segmentation: A benchmark;Qi;Int. J. Comput. Vis.,2022

4. Cheng, B., Parkhi, O., and Kirillov, A. (2022, January 18–24). Pointly-supervised instance segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.

5. Ke, L., Danelljan, M., Ding, H., Tai, Y.W., Tang, C.K., and Yu, F. (2023, January 17–24). Mask-free video instance segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.