Abstract
In the pedestrian detection task, the excessive depth of the convolutional network in YOLOv7 results in an abundance of background feature information, thereby posing challenges for the model to accurately locate and detect pedestrians, particularly in small-scale or heavily occluded scenarios. To handle this problem, we propose a pedestrian detection model called YOLOv7-PD, to strengthen the accuracy of detecting small-scale pedestrians and occluded pedestrians. First of all, we propose an improved module called DE-ELAN, an improvement on the existing E-ELAN module, which is based on Omni-Dimensional Dynamic Convolution (ODConv). This module leverages four complementary attention types to enhance feature extraction, capturing rich contextual information. Then, we propose a lightweight receptive field enhancement module called light-REFM, which constructs a pyramid structure and acquires fine-grained multi-scale information through dilated convolutions of different sizes. Finally, we propose an improved regression loss function based on the Normalized Wasserstein Distance (NWD) that combines NWD with Complete-IoU (CIoU), enabling precise position and feature capture for small targets. On the Citypersons dataset, YOLOv7-PD outperforms YOLOv7, improving the average precision (AP) by 7% and reducing the miss rate by 2.58%. Experiments on three challenging pedestrian detection datasets demonstrate a balance between precision and speed, achieving excellent performance.
Publisher
Kaunas University of Technology (KTU)