Affiliation:
1. School of Electronic and Information Engineering Zhongyuan University of Technology Zhengzhou China
2. Department of Energy and Environment Zhongyuan University of Technology Zhengzhou China
3. School of Electrical, Computer and Telecommunications Engineering University of Wollongong Wollongong New South Wales Australia
Abstract
AbstractWeakly supervised object detection (WSOD) is becoming increasingly important for computer vision tasks, as it alleviates the burden of manual annotation. Most WSOD techniques rely on multiple instance learning (MIL), which tends to localise the discriminative parts of salient objects instead of the whole object. In addition, network training is often supervised using simple image‐level annotations, without including object quantities or location information. However, this can lead to ambiguous differentiation of object instances, both in terms of location and semantics. To address these issues, propose an end‐to‐end triple critical feature capture network (TCFCNet) for WSOD is proposed. Specifically, a multi‐task branch, which can perform fully supervised classification and regression task, was integrated with a PCL in an end‐to‐end network for refining object locations in an online method. A cyclic parametric dropblock module (CPDM) was then designed to help the detector focus on the contextual information by using cyclic masking techniques to maximise the removal of the discriminative components of an object instance to alleviate the part domination problem. Finally, a feature decoupling module (FDM) is proposed to further reduce the ambiguous distinction of object instances by adaptively constructing robust critical features that adapt to multi‐task branch for classification and regression tasks, which contains a feature enhancement module and task‐specific polarisation functions. Comprehensive experiments are carried out on the challenging Pascal VOC 2007 and VOC 2012 datasets. The proposed method achieves a 54.6% mAP and a 44.3% mAP on the Pascal VOC 2007 and VOC 2012 datasets respectively, showed that our method outperformed existing mainstream techniques by a considerable margin.
Publisher
Institution of Engineering and Technology (IET)
Subject
Computer Vision and Pattern Recognition,Software