Improving Object Detection Quality by Incorporating Global Contexts via Self-Attention-Reference-Cited by-同舟云学术

Improving Object Detection Quality by Incorporating Global Contexts via Self-Attention

Published:2021-01-05 Issue:1 Volume:10 Page:90
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Lee Donghyeon,Kim Joonyoung,Jung Kyomin

Abstract

Fully convolutional structures provide feature maps acquiring local contexts of an image by only stacking numerous convolutional layers. These structures are known to be effective in modern state-of-the-art object detectors such as Faster R-CNN and SSD to find objects from local contexts. However, the quality of object detectors can be further improved by incorporating global contexts when some ambiguous objects should be identified by surrounding objects or background. In this paper, we introduce a self-attention module for object detectors to incorporate global contexts. More specifically, our self-attention module allows the feature extractor to compute feature maps with global contexts by the self-attention mechanism. Our self-attention module computes relationships among all elements in the feature maps, and then blends the feature maps considering the computed relationships. Therefore, this module can capture long-range relationships among objects or backgrounds, which is difficult for fully convolutional structures. Furthermore, our proposed module is not limited to any specific object detectors, and it can be applied to any CNN-based model for any computer vision task. In the experimental results on the object detection task, our method shows remarkable gains in average precision (AP) compared to popular models that have fully convolutional structures. In particular, compared to Faster R-CNN with the ResNet-50 backbone, our module applied to the same backbone achieved +4.0 AP gains without the bells and whistles. In image semantic segmentation and panoptic segmentation tasks, our module improved the performance in all metrics used for each task.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/10/1/90/pdf

Reference40 articles.

1. ImageNet classification with deep convolutional neural networks

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Physics informed neural network approach for solving time fractional Black-Scholes partial differential equations;Optimization and Engineering;2024-08-02

2. Fruit ripeness identification using transformers;Applied Intelligence;2023-06-29

3. Temporal Context Modeling Network with Local-Global Complementary Architecture for Temporal Proposal Generation;Electronics;2022-08-26

4. Multiple Neural Network architectures for visual emotion recognition using Song-Speech modality;2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS);2022-07-15

5. Deep Neural Network for visual Emotion Recognition based on ResNet50 using Song-Speech characteristics;2022 5th International Conference on Advanced Systems and Emergent Technologies (IC_ASET);2022-03-22