MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images-Reference-Cited by-同舟云学术

MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images

Published:2023-01-07 Issue:2 Volume:15 Page:371
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Chen Juanjuan,Hong Hansheng,Song Bin^ORCID,Guo Jie,Chen Chen,Xu Junjie

Abstract

Deep learning (DL)-based object detection algorithms have gained impressive achievements in natural images and have gradually matured in recent years. However, compared with natural images, remote sensing images are faced with severe challenges due to the complex backgrounds and difficult detection of small objects in dense scenes. To address these problems, a novel one-stage object detection model named MDCT is proposed based on a multi-kernel dilated convolution (MDC) block and transformer block. Firstly, a new feature enhancement module, MDC block, is developed in the one-stage object detection model to enhance small objects’ ontology and adjacent spatial features. Secondly, we integrate a transformer block into the neck network of the one-stage object detection model in order to prevent the loss of object information in complex backgrounds and dense scenes. Finally, a depthwise separable convolution is introduced to each MDC block to reduce the computational cost. We conduct experiments on three datasets: DIOR, DOTA, and NWPU VHR-10. Compared with the YOLOv5, our model improves the object detection accuracy by 2.3%, 0.9%, and 2.9% on the DIOR, DOTA, and NWPU VHR-10 datasets, respectively.

Funder

the National Natural Science Foundation of China

the ISN State Key Laboratory

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/15/2/371/pdf

Reference74 articles.

1. Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images;Zhang;IEEE Trans. Image Process.,2020

2. A multiple conditional random fields ensemble model for urban area detection in remote sensing optical images;Zhong;IEEE Trans. Geosci. Remote Sens.,2007