Inception Convolution and Feature Fusion for Person Search
Author:
Ouyang Huan12ORCID, Zeng Jiexian13, Leng Lu12
Affiliation:
1. School of Software, Nanchang Hangkong University, Nanchang 330063, China 2. Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, China 3. Science and Technology College, Nanchang Hangkong University, Gongqingcheng 332020, China
Abstract
With the rapid advancement of deep learning theory and hardware device computing capacity, computer vision tasks, such as object detection and instance segmentation, have entered a revolutionary phase in recent years. As a result, extremely challenging integrated tasks, such as person search, might develop quickly. The majority of efficient network frameworks, such as Seq-Net, are based on Faster R-CNN. However, because of the parallel structure of Faster R-CNN, the performance of re-ID can be significantly impacted by the single-layer, low resolution, and occasionally overlooked check feature diagrams retrieved during pedestrian detection. To address these issues, this paper proposed a person search methodology based on an inception convolution and feature fusion module (IC-FFM) using Seq-Net (Sequential End-to-end Network) as the benchmark. First, we replaced the general convolution in ResNet-50 with the new inception convolution module (ICM), allowing the convolution operation to effectively and dynamically distribute various channels. Then, to improve the accuracy of information extraction, the feature fusion module (FFM) was created to combine multi-level information using various levels of convolution. Finally, Bounding Box regression was created using convolution and the double-head module (DHM), which considerably enhanced the accuracy of pedestrian retrieval by combining global and fine-grained information. Experiments on CHUK-SYSU and PRW datasets showed that our method has higher accuracy than Seq-Net. In addition, our method is simpler and can be easily integrated into existing two-stage frameworks.
Funder
National Natural Science Foundation of China Jiangxi Provincial Key Program Project of Research and Development The technology Innovation Guidance Program Project
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference49 articles.
1. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA. 2. Girshick, R., Iandola, F., Darrell, T., and Malik, J. (2015, January 7–12). Deformable part models are convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA. 3. Yang, Y., Wen, L., Lyu, S., and Li, S.Z. (2017, January 4–9). Unsupervised learning of multi-level descriptors for person reidentification. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA. 4. Zhao, C., Wang, X., Chen, Y., Gao, C., Zuo, W., and Miao, D. (2017, January 21–26). Consistent iterative multi-view transfer learning for person re-identification. Proceedings of the IEEE International Conference on Computer Vision Workshops, Honolulu, HI, USA. 5. Wang, G., Lai, J., Huang, P., and Xie, X. (2019, January 27–31). Spatialtemporal person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|