CrowdFormer: An Overlap Patching Vision Transformer for Top-Down Crowd Counting-Reference-Cited by-同舟云学术

CrowdFormer: An Overlap Patching Vision Transformer for Top-Down Crowd Counting

Published:2022-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Yang Shaopeng¹,Guo Weiyu²,Ren Yuheng¹

Affiliation:

1. Watrix Technology Co. LTD.

2. Information School, Central University of Finance and Economics, Beijing, China

Abstract

Crowd counting methods typically predict a density map as an intermediate representation of counting, and achieve good performance. However, due to the perspective phenomenon, there is a scale variation in real scenes, which causes the density map-based methods suffer from a severe scene generalization problem because only a limited number of scales are fitted in density map prediction and generation. To address this issue, we propose a novel vision transformer network, i.e., CrowdFormer, and a density kernels fusion framework for more accurate density map estimation and generation, respectively. Thereafter, we incorporate these two innovations into an adaptive learning system, which can take both the annotation dot map and original image as input, and jointly learns the density map estimator and generator within an end-to-end framework. The experimental results demonstrate that the proposed model achieves the state-of-the-art in the terms of MAE and MSE (e.g., it achieved a MAE of 67.1 and MSE of 301.6 on NWPU-Crowd dataset.), and confirm the effectiveness of the proposed two designs. The code is https://github.com/special-yang/Top_Down-CrowdCounting.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cross-modal collaborative feature representation via Transformer-based multimodal mixers for RGB-T crowd counting;Expert Systems with Applications;2024-12

2. CrowdTrans: Learning top-down visual perception for crowd counting by transformer;Neurocomputing;2024-06

3. CC-DETR: DETR with Hybrid Context and Multi-Scale Coordinate Convolution for Crowd Counting;Mathematics;2024-05-17

4. CLDE-Net: crowd localization and density estimation based on CNN and transformer network;Multimedia Systems;2024-04-08

5. A Weakly-Supervised Crowd Density Estimation Method Based on Two-Stage Linear Feature Calibration;IEEE/CAA Journal of Automatica Sinica;2024-04