Mitigating Distractor Challenges in Video Object Segmentation through Shape and Motion Cues

Author:

Peng Jidong1,Zhao Yibing2,Zhang Dingwei2,Chen Yadang2ORCID

Affiliation:

1. Nanjing Research Institute of Electronic Engineering, Nanjing 210001, China

2. School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China

Abstract

The purpose of semi-supervised video object segmentation (VOS) is to predict and generate object masks in subsequent video frames after being provided with the initial frame’s object mask. Currently, mainstream methods leverage historical frame information for enhancing the network’s performance. However, this approach faces the following issues: (1) They often overlook important shape information, leading to decreased accuracy in segmenting object-edge areas. (2) They often use pixel-level motion estimation to guide the matching for addressing distractor objects. However, this brings heavy computation costs and struggle against occlusion or fast/blurry motion. For the first problem, this paper introduces an object shape extraction module that exploits both the high-level and low-level features to obtain object shape information, by which the shape information can be used to further refine the predicted masks. For the second problem, this paper introduces a novel object-level motion prediction module, in which it stores the representative motion features during the training stage, and predicts the object motion by retrieving them during the inference stage. We evaluate our method on benchmark datasets compared with recent state-of-the-art methods, and the results demonstrate the effectiveness of the proposed method.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Reference48 articles.

1. Oh, S.W., Lee, J.Y., Xu, N., and Kim, S.J. (November, January 27). Video object segmentation using space-time memory networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea.

2. Rethinking space-time networks with improved memory coverage for efficient video object segmentation;Cheng;Adv. Neural Inf. Process. Syst.,2021

3. Yang, Z., Wei, Y., and Yang, Y. (2020, January 23–28). Collaborative video object segmentation by foreground-background integration. Proceedings of the European Conference on Computer Vision, Glasgow, UK.

4. Collaborative video object segmentation by multi-scale foreground-background integration;Yang;IEEE Trans. Pattern Anal. Mach. Intell.,2021

5. Li, M., Hu, L., Xiong, Z., Zhang, B., Pan, P., and Liu, D. (2022, January 18–24). Recurrent Dynamic Embedding for Video Object Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3