Abstract
AbstractObject detectors are used for searching all objects belonging to a pre-defined set of categories contained in a given picture. However, users are often not interested in finding all objects, but only those that pertain to a small set of categories or concepts. Nowadays, the standard approach to solve this task involves initially employing an object detector to identify all objects within the image, followed by refining the outcomes to retain only the ones of interest. Nevertheless, the object detector does not take advantage of the user’s prior intent that, when used, can potentially improve the detection performance of the model. This work presents a method to condition an existing object detector with the user’s intent, encoded as one or more concepts from the WordNet graph, to find just those objects of interest. The proposed approach takes advantage of existing datasets for object detection without the need for new annotations, and it allows to adapt the already existing object detector models with minor changes. The evaluation, performed on the COCO and the Visual Genome datasets considering several object detector architectures, shows that conditioning the search on concepts is actually beneficial. The code and the pre-trained model weights are released at: https://github.com/drigoni/Concept-Conditioned-Object-Detector.
Funder
Università degli Studi di Padova
Publisher
Springer Science and Business Media LLC