Author:
Fu Yang,Yang Linjie,Liu Ding,Huang Thomas S.,Shi Humphrey
Abstract
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features atboth frame-level and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Video Instance Segmentation in an Open-World;International Journal of Computer Vision;2024-07-30
2. A Novel Loss Function Based on Clustering Quality Criteria in Spatio-Temporal Clustering;2024 13th Iranian/3rd International Machine Vision and Image Processing Conference (MVIP);2024-03-06
3. Offline-to-Online Knowledge Distillation for Video Instance Segmentation;2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2024-01-03
4. Video Instance Segmentation Using Graph Matching Transformer;2023 IEEE International Conference on Data Mining Workshops (ICDMW);2023-12-04
5. TCOVIS: Temporally Consistent Online Video Instance Segmentation;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01