Affiliation:
1. College of Mathematics and Informatics South China Agricultural University Guangzhou China
2. School of Control Science and Engineering Beijing University of Technology Beijing China
3. School of Control Science and Engineering Sun Yat‐sen University Guangzhou China
Abstract
AbstractIn recent years, most video segmentation methods use deep CNN to process the input image, but they did not fully mine the rich intermediate predictions in spatio‐temporal space. And, the segmentation challenges such as occlusion, severe deformation and illumination have not been well solved so far. To alleviate these problems, this paper focuses on constructing multi module network structures that represent multi semantics and proposes a video object segmentation network via coupled‐stream architecture with feature memory mechanism. This network first extracts high‐level semantic features, edge features, long‐term and short‐term stable depth features of the target, and then decode them into the segmentation mask of target. In addition, negative skeleton inhibition and frame interpolation are used to prevent the interference of similar objects and motion blur, respectively. The method has a low GPU memory usage, regardless of the number of object in video. And performs 86.5%and 62.4% in J&F measure on DAVIS 2016 and DAVIS 2017 validation set, without fine‐tuning and online training.
Funder
National Natural Science Foundation of China
Science and Technology Planning Project of Guangdong Province
Publisher
Institution of Engineering and Technology (IET)