Affiliation:
1. East China Jiaotong University
2. Jiangxi Provincial Communication Investment Group Co
Abstract
Abstract
The goal of crowd-counting techniques is to estimate the number of people in an image or video in real-time and accurately. In recent years, with the development of deep learning, the accuracy of the crowd-counting task has been improving. However, this task still faces great challenges in crowded scenarios with large individual size variations. To cope with this situation, this paper proposes a new type of crowd-counting network: Context-Scaled Fusion Network. The details include (1) the design of the Multi-Scale Receptive Field Fusion Module (MRFF Module), which employs multiple dilated convolutional layers with different dilatation rates and uses a fusion mechanism to obtain multi-scale hybrid information to generate higher quality feature maps; (2) The Contextual Space Attention Module ( CSA Module) is proposed, which can obtain pixel-level contextual information and combine it with the attention map to enable the model to autonomously learn and pay attention to the important regions to achieve the effect of reducing the counting error. In this paper, we train and test several publicly available and challenging datasets to evaluate the performance of CSFNet. The experimental results show that CSFNet outperforms many SOTA methods on these datasets, demonstrating its superior counting ability and robustness.
Publisher
Research Square Platform LLC
Reference58 articles.
1. Real-time, embedded scene invariant crowd counting using scale-normalized histogram of moving gradients (homg), CVPR Workshop;Siva P,2016
2. Andrew Zisserman: Learning To Count Objects in Images;Victor S,2010
3. Pedestrian detection: an evaluation of the state of the art;Dollár P;IEEE Trans. Pattern Anal. Mach. Intell.,2012
4. Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors;Wu Bo, Nevatia R;Int. J. Comput. Vis.,2007
5. Histograms of oriented gradients for human detection;Dalal N;CVPR,2005