Improving MLP-Based Weakly Supervised Crowd-Counting Network via Scale Reasoning and Ranking
-
Published:2024-01-23
Issue:3
Volume:13
Page:471
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Gao Ming1, Deng Mingfang1ORCID, Zhao Huailin1, Chen Yangjian1, Chen Yongqi1
Affiliation:
1. School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai 201400, China
Abstract
MLP-based weakly supervised crowd counting approaches have made significant advancements over the past few years. However, owing to the limited datasets, the current MLP-based methods do not consider the problem of region-to-region dependency in the image. For this, we propose a weakly supervised method termed SR2. SR2 consists of three parts: scale-reasoning module, scale-ranking module, and regression branch. In particular, the scale-reasoning module extracts and fuses the region-to-region dependency in the image and multiple scale feature, then sends the fused features to the regression branch to obtain estimated counts; the scale-ranking module is used to understand the internal information of the image better and expand the datasets efficiently, which will help to improve the accuracy of the estimated counts in the regression branch. We conducted extensive experiments on four benchmark datasets. The final results showed that our approach has better and higher competing counting performance with respect to other weakly supervised counting networks and with respect to some popular fully supervised counting networks.
Reference49 articles.
1. Khan, K., Khan, R.U., Albattah, W., Nayab, D., Qamar, A.M., Habib, S., and Islam, M. (2021). Crowd counting using end-to-end semantic image segmentation. Electronics, 10. 2. Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (2016, January 27–30). Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 3. Li, Y., Zhang, X., and Chen, D. (2018, January 18–23). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 4. Transcrowd: Weakly supervised crowd counting with transformers;Liang;Sci. China Inf. Sci.,2022 5. Kolesnikov, A., Dosovitskiy, A., Weissenborn, D., Heigold, G., Uszkoreit, J., Beyer, L., Minderer, M., Dehghani, M., Houlsby, N., and Gelly, S. (2021, January 3–7). An image is worth 16 × 16 words: Transformers for image recognition at scale. Proceedings of the International Conference on Learning Representations, Virtual Event.
|
|