Abstract
Proposals Average Precision (AP) of DiffusionDet relies on a random coverage of the object boxes and a redundant iterative evaluation strategy. In order for diffusion model to achieve more outstanding performance in object detection tasks, we propose a simple and efficient sampling strategy for detection boxes: Only Use Dynamic Head Once. That implementation is based on we propose a Deformable Sigmoid Variance Schedule to optimize the process of adding noise and sampling and also propose an Adjustable Sampling Strategy to reduce the randomness of sampling results. Through that two methods combined we can choose to apply fewer timesteps to the process of adding noise and sampling and get the model achieves better sampling results with a shorter number of iterations, also in this way to alleviate the model's difficulty in learning to sample sparse and discrete Ground Truth(GT) boxes information. Our model gets a sizable performance improvement over DiffusionDet. For example, the same and even beyond AP was achieved by applying half the number proposals(random boxes) based on DiffusionDet in detecting the VOC dataset, meanwhile with Timesteps=50, we outperformed DiffusionDet by 6.0 AP. A 0.4 AP improvement was obtained based on the COCO dataset and in the same ablation experiments. This work which to have certain extent solved the disadvantage of the low distribution density of GT boxes information in Proposals, which makes it difficult for the model to learn to sample.