HADF-Crowd: A Hierarchical Attention-Based Dense Feature Extraction Network for Single-Image Crowd Counting-Reference-Cited by-同舟云学术

HADF-Crowd: A Hierarchical Attention-Based Dense Feature Extraction Network for Single-Image Crowd Counting

Published:2021-05-17 Issue:10 Volume:21 Page:3483
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Ilyas Naveed^ORCID,Lee Boreom^ORCID,Kim Kiseon

Abstract

Crowd counting is a challenging task due to large perspective, density, and scale variations. CNN-based crowd counting techniques have achieved significant performance in sparse to dense environments. However, crowd counting in high perspective-varying scenes (images) is getting harder due to different density levels occupied by the same number of pixels. In this way large variations for objects in the same spatial area make it difficult to count accurately. Further, existing CNN-based crowd counting methods are used to extract rich deep features; however, these features are used locally and disseminated while propagating through intermediate layers. This results in high counting errors, especially in dense and high perspective-variation scenes. Further, class-specific responses along channel dimensions are underestimated. To address these above mentioned issues, we therefore propose a CNN-based dense feature extraction network for accurate crowd counting. Our proposed model comprises three main modules: (1) backbone network, (2) dense feature extraction modules (DFEMs), and (3) channel attention module (CAM). The backbone network is used to obtain general features with strong transfer learning ability. The DFEM is composed of multiple sub-modules called dense stacked convolution modules (DSCMs), densely connected with each other. In this way features extracted from lower and middle-lower layers are propagated to higher layers through dense connections. In addition, combinations of task independent general features obtained by the former modules and task-specific features obtained by later ones are incorporated to obtain high counting accuracy in large perspective-varying scenes. Further, to exploit the class-specific response between background and foreground, CAM is incorporated at the end to obtain high-level features along channel dimensions for better counting accuracy. Moreover, we have evaluated the proposed method on three well known datasets: Shanghaitech (Part-A), Shanghaitech (Part-B), and Venice. The performance of the proposed technique justifies its relative effectiveness in terms of selected performance compared to state-of-the-art techniques.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/10/3483/pdf

Reference44 articles.

1. World population in 2050: Assessing the projections;Cohen,1998

2. CASA-Crowd: A Context-Aware Scale Aggregation CNN-Based Crowd Counting Technique

3. Convolutional-Neural Network-Based Image Crowd Counting: Review, Categorization, Analysis, and Performance Evaluation

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Video surveillance using deep transfer learning and deep domain adaptation: Towards better generalization;Engineering Applications of Artificial Intelligence;2023-03

2. Research on steel rail surface defects detection based on improved YOLOv4 network;Frontiers in Neurorobotics;2023-02-09

3. A Deep Learning-Based Crowd Counting Method and System Implementation on Neural Processing Unit Platform;Computers, Materials & Continua;2023

4. Double Encryption Algorithm for Massive Personal Biometric Authentication Images Based on Chaotic Mapping for Future Smart Cities;Journal of Testing and Evaluation;2022-09-23

5. Transfer Learning For Crowed Counting;2022 IEEE 2nd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA);2022-05-23