NRPerson: A Non-Registered Multi-Modal Benchmark for Tiny Person Detection and Localization
-
Published:2024-04-27
Issue:9
Volume:13
Page:1697
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Yang Yi1, Han Xumeng1, Wang Kuiran1ORCID, Yu Xuehui1, Yu Wenwen1, Wang Zipeng1, Li Guorong1ORCID, Han Zhenjun1ORCID, Jiao Jianbin1
Affiliation:
1. School of Electronic, Electrical, and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China
Abstract
In recent years, the detection and localization of tiny persons have garnered significant attention due to their critical applications in various surveillance and security scenarios. Traditional multi-modal methods predominantly rely on well-registered image pairs, necessitating the use of sophisticated sensors and extensive manual effort for registration, which restricts their practical utility in dynamic, real-world environments. Addressing this gap, this paper introduces a novel non-registered multi-modal benchmark named NRPerson, specifically designed to advance the field of tiny person detection and localization by accommodating the complexities of real-world scenarios. The NRPerson dataset comprises 8548 RGB-IR image pairs, meticulously collected and filtered from 22 video sequences, enriched with 889,207 high-quality annotations that have been manually verified for accuracy. Utilizing NRPerson, we evaluate several leading detection and localization models across both mono-modal and non-registered multi-modal frameworks. Furthermore, we develop a comprehensive set of natural multi-modal baselines for the innovative non-registered track, aiming to enhance the detection and localization of unregistered multi-modal data using a cohesive and generalized approach. This benchmark is poised to facilitate significant strides in the practical deployment of detection and localization technologies by mitigating the reliance on stringent registration requirements.
Reference62 articles.
1. Dollár, P., Wojek, C., Schiele, B., and Perona, P. (2009, January 20–25). Pedestrian detection: A benchmark. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 2. Zhang, S., Benenson, R., and Schiele, B. (2017, January 21–26). Citypersons: A diverse dataset for pedestrian detection. Proceedings of the In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. 3. Jia, X., Zhu, C., Li, M., Tang, W., and Zhou, W. (2021). LLVIP: A Visible-infrared Paired Dataset for Low-light Vision. arXiv. 4. KGSNet: Key-Point-Guided Super-Resolution Network for Pedestrian Detection in the Wild;Zhang;IEEE Trans. Neural Netw. Learn. Syst.,2021 5. Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16–21). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.
|
|