Occluded Video Instance Segmentation: A Benchmark-Reference-Cited by-同舟云学术

Occluded Video Instance Segmentation: A Benchmark

Published:2022-06-18 Issue:8 Volume:130 Page:2022-2039
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Qi Jiyang,Gao Yan,Hu Yao,Wang Xinggang,Liu Xiaoyu,Bai Xiang,Belongie Serge,Yuille Alan,Torr Philip H. S.,Bai Song^ORCID

Abstract

AbstractCan our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion. Built upon MaskTrack R-CNN and SipMask, we obtain a remarkable AP improvement on the OVIS dataset. The OVIS dataset and project code are available at http://songbai.site/ovis.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s11263-022-01629-1.pdf

Reference95 articles.

1. Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2006). Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675

2. Athar, A., Mahadevan, S., Ošep, A., Leal-Taixé, L., & Leibe, B. (2020). Stem-seg: Spatio-temporal embeddings for instance segmentation in videos. In ECCV.

3. Bertasius, G., & Torresani, L. (2020). Classifying, segmenting, and tracking object instances in video with mask propagation. In CVPR

4. Bertasius, G., Torresani, L., & Shi, J. (2018). Object detection in video with spatiotemporal sampling networks. In ECCV (pp. 331–346).

5. Bolya, D., Foley, S., Hays, J., & Hoffman, J. (2020). Tide: A general toolbox for identifying object detection errors. In ECCV

Cited by 26 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cluster2Former: Semisupervised Clustering Transformers for Video Instance Segmentation;Sensors;2024-02-03

2. RGB oralscan video-based orthodontic treatment monitoring;Science China Information Sciences;2023-12-27

3. Ensembling noisy segmentation masks of blurred sperm images;Computers in Biology and Medicine;2023-11

4. The First Visual Object Tracking Segmentation VOTS2023 Challenge Results;2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW);2023-10-02

5. Artificial intelligence in intelligent vehicles: recent advances and future directions;Journal of the Chinese Institute of Engineers;2023-09-30