Anchor DETR: Query Design for Transformer-Based Detector-Reference-Cited by-同舟云学术

Anchor DETR: Query Design for Transformer-Based Detector

Published:2022-06-28 Issue:3 Volume:36 Page:2567-2575
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wang Yingming,Zhang Xiangyu,Yang Tong,Sun Jian

Abstract

In this paper, we propose a novel query design for the transformer-based object detection. In previous transformer-based detectors, the object queries are a set of learned embeddings. However, each learned embedding does not have an explicit physical meaning and we cannot explain where it will focus on. It is difficult to optimize as the prediction slot of each object query does not have a specific mode. In other words, each object query will not focus on a specific region. To solve these problems, in our query design, object queries are based on anchor points, which are widely used in CNN-based detectors. So each object query focuses on the objects near the anchor point. Moreover, our query design can predict multiple objects at one position to solve the difficulty: ``one region, multiple objects''. In addition, we design an attention variant, which can reduce the memory cost while achieving similar or better performance than the standard attention in DETR. Thanks to the query design and the attention variant, the proposed detector that we called Anchor DETR, can achieve better performance and run faster than the DETR with 10x fewer training epochs. For example, it achieves 44.2 AP with 19 FPS on the MSCOCO dataset when using the ResNet50-DC5 feature for training 50 epochs. Extensive experiments on the MSCOCO benchmark prove the effectiveness of the proposed methods. Code is available at https://github.com/megvii-research/AnchorDETR.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 126 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. IFS-DETR: A real-time industrial fire smoke detection algorithm based on an end-to-end structured network;Measurement;2025-02

2. STMixer: A One-Stage Sparse Action Detector;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-10

3. FeatAug-DETR: Enriching One-to-Many Matching for DETRs With Feature Augmentation;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-09

4. DITA: DETR with improved queries for end-to-end temporal action detection;Neurocomputing;2024-09

5. Gateinst: instance segmentation with multi-scale gated-enhanced queries in transformer decoder;Multimedia Systems;2024-08-20