Two-Stage Pedestrian Detection Model Using a New Classification Head for Domain Generalization-Reference-Cited by-同舟云学术

Two-Stage Pedestrian Detection Model Using a New Classification Head for Domain Generalization

Published:2023-11-24 Issue:23 Volume:23 Page:9380
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Schulz Daniel¹²,Perez Claudio A.¹²^ORCID

Affiliation:

1. Department of Electrical Engineering, and Advanced Mining Technology Center, Universidad de Chile, Santiago 8370451, Chile

2. IMPACT, Center of Interventional Medicine for Precision and Advanced Cellular Therapy, Santiago 7620086, Chile

Abstract

Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom classification head, adding the triplet loss function to the standard bounding box regression and classification losses, is presented. This aims to improve the domain generalization capabilities of existing pedestrian detectors, by explicitly maximizing inter-class distance and minimizing intra-class distance. Triplet loss is applied to the features generated by the region proposal network, aimed at clustering together pedestrian samples in the features space. We used Faster R-CNN and Cascade R-CNN with the HRNet backbone pre-trained on ImageNet, changing the standard classification head for Faster R-CNN, and changing one of the three heads for Cascade R-CNN. The best results were obtained using a progressive training pipeline, starting from a dataset that is further away from the target domain, and progressively fine-tuning on datasets closer to the target domain. We obtained state-of-the-art results, MR−2 of 9.9, 11.0, and 36.2 for the reasonable, small, and heavy subsets on the CityPersons benchmark with outstanding performance on the heavy subset, the most difficult one.

Funder

Agencia Nacional de Investigación y Desarrollo

Department of Electrical Engineering, Universidad de Chile

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/23/9380/pdf

Reference95 articles.

1. Girshick, R. (2015, January 7–13). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.

2. Faster R-CNN: Towards real-time object detection with region proposal networks;Ren;Adv. Neural Inf. Process. Syst.,2015

3. Schroff, F., Kalenichenko, D., and Philbin, J. (2015, January 7–12). Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.

4. Iris recognition using low-level CNN layers without training and single matching;Zambrano;IEEE Access,2022

5. Two-level genetic algorithm for evolving convolutional neural networks for pattern recognition;Montecino;IEEE Access,2021

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on Multi-Modal Pedestrian Detection and Tracking Algorithm Based on Deep Learning;Future Internet;2024-05-31

2. Cross-domain pedestrian detection via feature alignment and image quality assessment;iScience;2024-04