Cluster2Former: Semisupervised Clustering Transformers for Video Instance Segmentation
Author:
Fóthi Áron1ORCID, Szlatincsán Adrián1ORCID, Somfai Ellák12ORCID
Affiliation:
1. Department of Artificial Intelligence, ELTE Eötvös Loránd University, 1053 Budapest, Hungary 2. HUN-REN Wigner Research Centre for Physics, 1121 Budapest, Hungary
Abstract
A novel approach for video instance segmentation is presented using semisupervised learning. Our Cluster2Former model leverages scribble-based annotations for training, significantly reducing the need for comprehensive pixel-level masks. We augment a video instance segmenter, for example, the Mask2Former architecture, with similarity-based constraint loss to handle partial annotations efficiently. We demonstrate that despite using lightweight annotations (using only 0.5% of the annotated pixels), Cluster2Former achieves competitive performance on standard benchmarks. The approach offers a cost-effective and computationally efficient solution for video instance segmentation, especially in scenarios with limited annotation resources.
Funder
Artificial Intelligence National Laboratory National Research, Development, and Innovation Fund of Hungary Ministry of Culture and Innovation of Hungary from the National Research, Development, and Innovation Fund Governmental Agency for IT Development (KIFÜ) in Hungary Robert Bosch, Ltd.
Reference44 articles.
1. Yang, L., Fan, Y., and Xu, N. (November, January 27). Video instance segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea. 2. Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. (2020, January 13–19). Bdd100k: A diverse driving dataset for heterogeneous multitask learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 3. Occluded video instance segmentation: A benchmark;Qi;Int. J. Comput. Vis.,2022 4. Cheng, B., Parkhi, O., and Kirillov, A. (2022, January 18–24). Pointly-supervised instance segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA. 5. Ke, L., Danelljan, M., Ding, H., Tai, Y.W., Tang, C.K., and Yu, F. (2023, January 17–24). Mask-free video instance segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.
|
|