Modulated Memory Network for Video Object Segmentation-Reference-Cited by-同舟云学术

Modulated Memory Network for Video Object Segmentation

Published:2024-03-15 Issue:6 Volume:12 Page:863
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Lu Hannan¹,Guo Zixian¹,Zuo Wangmeng¹

Affiliation:

1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China

Abstract

Existing video object segmentation (VOS) methods based on matching techniques commonly employ a reference set comprising historical segmented frames, referred to as ‘memory frames’, to facilitate the segmentation process. However, these methods suffer from the following limitations: (i) Inherent segmentation errors in memory frames can propagate and accumulate errors when utilized as templates for subsequent segmentation. (ii) The non-local matching technique employed in top-leading solutions often fails to incorporate positional information, potentially leading to incorrect matching. In this paper, we introduce the Modulated Memory Network (MMN) for VOS. Our MMN enhances matching-based VOS methods in the following ways: (i) Introducing an Importance Modulator, which adjusts memory frames using adaptive weight maps generated based on the segmentation confidence associated with each frame. (ii) Incorporating a Position Modulator that encodes spatial and temporal positional information for both memory frames and the current frame. The proposed modulator improves matching accuracy by embedding positional information. Meanwhile, the Importance Modulator mitigates error propagation and accumulation by incorporating confidence-based modulation. Through extensive experimentation, we demonstrate the effectiveness of our proposed MMN, which also achieves promising performance on VOS benchmarks.

Funder

National Key Research and Development Program of China

Publisher

MDPI AG

Link

https://www.mdpi.com/2227-7390/12/6/863/pdf

Reference70 articles.

1. Weston, J., Chopra, S., and Bordes, A. (2014). Memory networks. arXiv.

2. Graves, A., Wayne, G., and Danihelka, I. (2014). Neural turing machines. arXiv.

3. Oh, S.W., Lee, J.Y., Xu, N., and Kim, S.J. (November, January 27). Video object segmentation using space-time memory networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.

4. Li, Y., Shen, Z., and Shan, Y. (2020, January 23–28). Fast Video Object Segmentation using the Global Context Module. Proceedings of the European Conference on Computer Vision, Glasgow, UK.

5. Seong, H., Hyun, J., and Kim, E. (2020, January 23–28). Kernelized Memory Network for Video Object Segmentation. Proceedings of the European Conference on Computer Vision, Glasgow, UK.