Affiliation:
1. College of Computer Science and Engineering Chongqing University of Technology Chongqing 400054 China
Abstract
Few‐Shot Object Detection (FSOD) task involves accurately identifying target object classes using only a small set of labeled samples. Most of the current FSOD tasks independently predict class prototype features without considering class relationships and only rely on visual information. To address these challenges, we propose a novel Class‐relational Reasoning Method with Knowledge‐transfer (CRK‐Net), built on the meta‐learning‐based framework. Although data may be scarce, the semantic relationship between classes is invariant, Joint‐feature Fusion Module (JFM) are hence proposed to transfers the semantic information of different categories in the natural language world to integrate with visual information and produce multi‐modality embeddings. Some base classes and novel classes have similar features, so this can be borrowed by modeling the relationship between classes feature. Building upon the observation, we propose a Class‐relational Reasoning Module (CRM) to establish the correlations between categories and enhance prototype representations for each category. After passing through the JFM and CRM modules, a high‐quality class prototype is finally produced for subsequent regression and classification. Extensive experiments on PASCAL VOC demonstrate the effectiveness of our proposed method and provide a new scheme for fusing semantic and visual information. © 2024 Institute of Electrical Engineer of Japan and Wiley Periodicals LLC.