Abstract
Segmenting primary objects in a video is an important yet challenging problem in intelligent video surveillance, as it exhibits various levels of foreground/background ambiguities. To reduce such ambiguities, we propose a novel formulation via exploiting foreground and background context as well as their complementary constraint. Under this formulation, a unified objective function is further defined to encode each cue. For implementation, we design a complementary segmentation network (CSNet) with two separate branches, which can simultaneously encode the foreground and background information along with joint spatial constraints. The CSNet is trained on massive images with manually annotated salient objects in an end-to-end manner. By applying CSNet on each video frame, the spatial foreground and background maps can be initialized. To enforce temporal consistency effectively and efficiently, we divide each frame into superpixels and construct a neighborhood reversible flow that reflects the most reliable temporal correspondences between superpixels in far-away frames. With such a flow, the initialized foregroundness and backgroundness can be propagated along the temporal dimension so that primary video objects gradually pop out and distractors are well suppressed. Extensive experimental results on three video datasets show that the proposed approach achieves impressive performance in comparisons with 22 state-of-the-art models.
Funder
National Natural Science Foundation of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference78 articles.
1. Learning object class detectors from weakly annotated video;Prest;Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition,2012
2. Salient object detection: A discriminative regional feature integration approach;Jiang;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2013
3. Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework
4. Deep saliency with encoded low level distance map and high level features;Lee;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016
5. Visual saliency based on multiscale deep features;Li;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015