Deep learning for video object segmentation: a review-Reference-Cited by-同舟云学术

Deep learning for video object segmentation: a review

Published:2022-04-08 Issue:1 Volume:56 Page:457-531
ISSN:0269-2821
Container-title:Artificial Intelligence Review
language:en
Short-container-title:Artif Intell Rev

Author:

Gao Mingqi,Zheng Feng,Yu James J. Q.,Shan Caifeng,Ding Guiguang,Han Jungong^ORCID

Abstract

AbstractAs one of the fundamental problems in the field of video understanding, video object segmentation aims at segmenting objects of interest throughout the given video sequence. Recently, with the advancements of deep learning techniques, deep neural networks have shown outstanding performance improvements in many computer vision applications, with video object segmentation being one of the most advocated and intensively investigated. In this paper, we present a systematic review of the deep learning-based video segmentation literature, highlighting the pros and cons of each category of approaches. Concretely, we start by introducing the definition, background concepts and basic ideas of algorithms in this field. Subsequently, we summarise the datasets for training and testing a video object segmentation algorithm, as well as common challenges and evaluation metrics. Next, previous works are grouped and reviewed based on how they extract and use spatial and temporal features, where their architectures, contributions and the differences among each other are elaborated. At last, the quantitative and qualitative results of several representative methods on a dataset with many remaining challenges are provided and analysed, followed by further discussions on future research directions. This article is expected to serve as a tutorial and source of reference for learners intended to quickly grasp the current progress in this research area and practitioners interested in applying the video object segmentation methods to their problems. A public website is built to collect and track the related works in this field: https://github.com/gaomingqi/VOS-Review.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics

Link

https://link.springer.com/content/pdf/10.1007/s10462-022-10176-7.pdf

Reference171 articles.

1. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

2. Ballas N, Yao L, Pal C, Courville AC (2016) Delving deeper into convolutional networks for learning video representations. In: Proceedings of the International Conference on Learning Representations

3. Bao L, Wu B, Liu W (2018) Cnn in mrf: Video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5977–5986

4. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, Springer, pp 850–865

5. Bhat G, Lawin FJ, Danelljan M, Robinson A, Felsberg M, Van Gool L, Timofte R (2020) Learning what to learn for video object segmentation. In: Proceedings of the European Conference on Computer Vision, Springer, pp 777–794

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hardness-aware loss for object segmentation;Alexandria Engineering Journal;2024-12

2. Cubixel: a novel paradigm in image processing using three-dimensional pixel representation;Multimedia Tools and Applications;2024-09-09

3. Fast supervoxel segmentation of connectivity median simulation based on Manhattan distance;International Journal of Applied Earth Observation and Geoinformation;2024-09

4. Learning effective feature representation for video object segmentation via memory;Knowledge-Based Systems;2024-09

5. Hardware-accelerated integrated optoelectronic platform towards real-time high-resolution hyperspectral video understanding;Nature Communications;2024-08-15