Abstract
AbstractIn crowded scenes, one of the most important issues is that heavily overlapped objects are hardly distinguished from each other since most of their pixels are shared and the visible pixels of the occluded objects, which are used to represent their features, are limited. In this paper, a spatial pyramid convolutional shuffle (SPCS) module is proposed to extract refined information from the limited visible pixels of the occluded objects and generate distinguishable representations for the heavily overlapped objects. We adopt four convolutional kernels with different sizes and dilation rates at each location in the pyramid features and adjacently recombine their fused outputs spatially using a pixel shuffle module. In this way, four distinguishable instance predictions corresponding different convolutional kernels can be produced for each location in the pyramid feature. In addition, multiple convolutional operations with different kernel sizes and dilation rates at the same location can generate refined information for the corresponding regions, which is helpful to extract features for the occluded objects from their limited visible pixels. Extensive experimental results demonstrate that SPCS module can effectively boost the performance in crowded human detection. YOLO detector with SPCS module achieves 94.11% AP, 41.75% MR, 97.75% Recall on CrowdHuman, 93.04% AP, and 98.45% Recall on WiderPerson, which are the best compared with previous state-of-the-art models.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献