Abstract
Annotations for image segmentation are expensive and time-consuming. In contrast to image segmentation, the task of object detection is in general easier in terms of the acquisition of labeled training data and the design of training models. In this paper, we combine the idea of unsupervised learning and a pretrained object-detection network to perform image segmentation, without using expensive segmentation labels. Specially, we designed a pretext task based on the sparse decomposition of object instances in videos to obtain the segmentation mask of the objects, which benefits from the sparsity of image instances and the inter-frame structure of videos. To improve the accuracy of identifying the ’right’ object, we used a pretrained object-detection network to provide the location information of the object instances, and propose an Object Location Segmentation (OLSeg) model of three branches with bounding box prior. The model is trained from videos and is able to capture the foreground, background and segmentation mask in a single image. The performance gain benefits from the sparsity of object instances (the foreground and background in our experiments) and the provided location information (bounding box prior), which work together to produce a comprehensive and robust visual representation for the input. The experimental results demonstrate that the proposed model boosts the performance effectively on various image segmentation benchmarks.
Funder
the National Natural Science Foundation of China
the Natural Science Foundation of Shandong Province
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献