Author:
Rahman Shafin,Khan Salman,Barnes Nick
Abstract
Zero-shot object detection is an emerging research topic that aims to recognize and localize previously ‘unseen’ objects. This setting gives rise to several unique challenges, e.g., highly imbalanced positive vs. negative instance ratio, proper alignment between visual and semantic concepts and the ambiguity between background and unseen classes. Here, we propose an end-to-end deep learning framework underpinned by a novel loss function that handles class-imbalance and seeks to properly align the visual and semantic cues for improved zero-shot learning. We call our objective the ‘Polarity loss’ because it explicitly maximizes the gap between positive and negative predictions. Such a margin maximizing formulation is not only important for visual-semantic alignment but it also resolves the ambiguity between background and unseen objects. Further, the semantic representations of objects are noisy, thus complicating the alignment between visual and semantic domains. To this end, we perform metric learning using a ‘Semantic vocabulary’ of related concepts that refines the noisy semantic embeddings and establishes a better synergy between visual and semantic domains. Our approach is inspired by the embodiment theories in cognitive science, that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word vocabulary) and the visual perception (seen/unseen object images). Our extensive results on MS-COCO and Pascal VOC datasets show significant improvements over state of the art.1
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
54 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Single-stage zero-shot object detection network based on CLIP and pseudo-labeling;International Journal of Machine Learning and Cybernetics;2024-08-20
2. Zero-Shot Object Detection Using YOLO;2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS);2024-06-28
3. Vision-Language Models in Remote Sensing: Current progress and future trends;IEEE Geoscience and Remote Sensing Magazine;2024-06
4. Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13
5. Zero-Shot Object Detection with Partitioned Contrastive Feature Alignment;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14