Multi-Resolution and Semantic-Aware Bidirectional Adapter for Multi-Scale Object Detection
-
Published:2023-11-24
Issue:23
Volume:13
Page:12639
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Li Zekun1, Pan Jin1, He Peidong23, Zhang Ziqi4, Zhao Chunlu1, Li Bing4
Affiliation:
1. National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT/CC), Beijing 100029, China 2. Aerospace Information Research Institute, Chinese Academy of Sciences, No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China 3. Department of Key Laboratory of Computational Optical Imaging Technology, Chinese Academy of Sciences, No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China 4. State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100094, China
Abstract
Scale variation presents a significant challenge in object detection. To address this, multi-level feature fusion techniques have been proposed, exemplified by methods such as the feature pyramid network (FPN) and its extensions. Nonetheless, the input features provided to these methods and the interaction among features across different levels are limited and inflexible. In order to fully leverage the features of multi-scale objects and amplify feature interaction and representation, we introduce a novel and efficient framework known as a multi-resolution and semantic-aware bidirectional adapter (MSBA). Specifically, MSBA comprises three successive components: multi-resolution cascaded fusion (MCF), a semantic-aware refinement transformer (SRT), and bidirectional fine-grained interaction (BFI). MCF adaptively extracts multi-level features to enable cascaded fusion. Subsequently, SRT enriches the long-range semantic information within high-level features. Following this, BFI facilitates ample fine-grained interaction via bidirectional guidance. Benefiting from the coarse-to-fine process, we can acquire robust multi-scale representations for a variety of objects. Each component can be individually integrated into different backbone architectures. Experimental results substantiate the superiority of our approach and validate the efficacy of each proposed module.
Funder
National Natural Science Foundation of China National Key Research and Development Program of China Beijing Natural Science Foundation
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference58 articles.
1. Yang, Z., Liu, S., Hu, H., Wang, L., and Lin, S. (2019). RepPoints: Point Set Representation for Object Detection. arXiv. 2. Wang, X., Zhang, S., Yu, Z., Feng, L., and Zhang, W. (2020, January 13–19). Scale-Equalizing Pyramid Convolution for Object Detection. Proceedings of the CVPR 2020: Computer Vision and Pattern Recognition, Seattle, WA, USA. 3. Guo, C., Fan, B., Zhang, Q., Xiang, S., and Pan, C. (2020, January 13–19). AugFPN: Improving Multi-Scale Feature Learning for Object Detection. Proceedings of the CVPR 2020: Computer Vision and Pattern Recognition, Seattle, WA, USA. 4. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23–28). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK. 5. Lin, T.Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L. (2014, January 8–11). Microsoft COCO: Common Objects in Context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.
|
|