Crowd Density Estimation in Spatial and Temporal Distortion Environment Using Parallel Multi-Size Receptive Fields and Stack Ensemble Meta-Learning-Reference-Cited by-同舟云学术

Crowd Density Estimation in Spatial and Temporal Distortion Environment Using Parallel Multi-Size Receptive Fields and Stack Ensemble Meta-Learning

Published:2022-10-15 Issue:10 Volume:14 Page:2159
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Assefa Addis Abebe^ORCID,Tian Wenhong,Hundera Negalign Wake^ORCID,Aftab Muhammad Umar^ORCID

Abstract

The estimation of crowd density is crucial for applications such as autonomous driving, visual surveillance, crowd control, public space planning, and warning visually distracted drivers prior to an accident. Having strong translational, reflective, and scale symmetry, models for estimating the density of a crowd yield an encouraging result. However, dynamic scenes with perspective distortions and rapidly changing spatial and temporal domains still present obstacles. The main reasons for this are the dynamic nature of a scene and the difficulty of representing and incorporating the feature space of objects of varying sizes into a prediction model. To overcome the aforementioned issues, this paper proposes a parallel multi-size receptive field units framework that leverages the majority of the CNN layer’s features, allowing for the representation and participation in the model prediction of the features of objects of all sizes. The proposed method utilizes features generated from lower to higher layers. As a result, different object scales can be handled at different framework depths, and various environmental densities can be estimated. However, the inclusion of the vast majority of layer features in the prediction model has a number of negative effects on the prediction’s outcome. Asymmetric non-local attention and the channel weighting module of a feature map are proposed to handle noise and background details and re-weight each channel to make it more sensitive to important features while ignoring irrelevant ones, respectively. While the output predictions of some layers have high bias and low variance, those of other layers have low bias and high variance. Using stack ensemble meta-learning, we combine individual predictions made with lower-layer features and higher-layer features to improve prediction while balancing the tradeoff between bias and variance. The UCF CC 50 dataset and the ShanghaiTech dataset have both been subjected to extensive testing. The results of the experiments indicate that the proposed method is effective for dense distributions and objects of various sizes.

Funder

National Key Research and Development Plan and Award

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2073-8994/14/10/2159/pdf

Reference56 articles.

1. Crowd monitoring using image processing

2. On crowd density estimation for surveillance;Rahmalan;Proceedings of the 2006 IET Conference on Crime and Security,2006

3. A viewpoint invariant approach for crowd counting;Kong;Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06),2006

4. On the efficiency of texture analysis for crowd monitoring;Marana;Proceedings of the International Symposium on Computer Graphics, Image Processing, and Vision (Cat. No.98EX237),1998

5. Crowd density estimation using texture analysis and learning;Wu;Proceedings of the 2006 IEEE International Conference on Robotics and Biomimetics,2006

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. People Counting System Using OpenCV Algorithms and Edge Computing for Safety Management;2023 IEEE International Conference on Smart Information Systems and Technologies (SIST);2023-05-04