Affiliation:
1. The School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
2. The Shaanxi Key Laboratory of Network Computing and Security Technology, Xi’an 710048, China
Abstract
Crowd counting is an important task that serves as a preprocessing step in many applications. Despite obvious improvement reported by various convolutional-neural-network-based approaches, they only focus on the role of deep feature maps while neglecting the importance of shallow features for crowd counting. In order to surmount this issue, a dilated convolutional-neural-network-based cross-level contextual information extraction network is proposed in this work, which is abbreviated as CL-DCNN. Specifically, a dilated contextual module (DCM) is constructed by importing cross-level connection between different feature maps. It can effectively integrate contextual information while conserving the local details of crowd scenes. Extensive experiments show that the proposed approach outperforms state-of-the-art approaches using five public datasets, i.e., ShanghaiTech part A, ShanghaiTech part B, Mall, UCF_CC_50 and UCF-QNRF, achieving MAE 52.6, 8.1, 1.55, 181.8, and 96.4, respectively.
Funder
National Natural Science Foundation
Key R&D Project in Shaanxi Province of China