Global video object segmentation with spatial constraint module-Reference-Cited by-同舟云学术

Global video object segmentation with spatial constraint module

Published:2023-01-03 Issue:2 Volume:9 Page:385-400
ISSN:2096-0433
Container-title:Computational Visual Media
language:en
Short-container-title:Comp. Visual Media

Author:

Chen Yadang,Wang Duolin,Chen Zhiguo,Yang Zhi-Xin,Wu Enhua

Abstract

AbstractWe present a lightweight and efficient semi-supervised video object segmentation network based on the space-time memory framework. To some extent, our method solves the two difficulties encountered in traditional video object segmentation: one is that the single frame calculation time is too long, and the other is that the current frame’s segmentation should use more information from past frames. The algorithm uses a global context (GC) module to achieve high-performance, real-time segmentation. The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time. Moreover, the prediction mask of the previous frame is helpful for the segmentation of the current frame, so we input it into a spatial constraint module (SCM), which constrains the areas of segments in the current frame. The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources. We added a refinement module to the decoder to improve boundary segmentation. Our model achieves state-of-the-art results on various datasets, scoring 80.1% on YouTube-VOS 2018 and a

$${\cal J}{\rm{\& }}{\cal F}$$

J & F score of 78.0% on DAVIS 2017, while taking 0.05 s per frame on the DAVIS 2016 validation dataset.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Vision and Pattern Recognition

Link

https://link.springer.com/content/pdf/10.1007/s41095-022-0282-8.pdf

Reference52 articles.

1. Chen, D.; Tang, F.; Dong, W. M.; Yao, H. X.; Xu, C. S. SiamCPN: Visual tracking with the Siamese center-prediction network. Computational Visual Media Vol. 7, No. 2, 253–265, 2021.

2. Li, X.; Liu, S.; De Mello, S.; Wang, X.; Kautz, J.; Yang, M. H. Joint-task self-supervised learning for temporal correspondence. arXiv preprint arXiv:1909.11895, 2019.

3. Zhang, F. L.; Barnes, C.; Zhang, H. T.; Zhao, J. H.; Salas, G. Coherent video generation for multiple handheld cameras with dynamic foreground. Computational Visual Media Vol. 6, No. 3, 291–306, 2020.

4. Cheng, J. C.; Tsai, Y. H.; Hung, W. C.; Wang, S. J.; Yang, M. H. Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7415–7424, 2018.

5. Maninis, K. K.; Caelles, S.; Chen, Y.; Pont-Tuset, J.; Leal-Taixé, L.; Cremers, D.; Van Gool, L. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 6, 1515–1530, 2019.