Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation-Reference-Cited by-同舟云学术

Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation

Published:2020-02-13 Issue:4 Volume:20 Page:1010
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Zhang Yiqing^ORCID,Chu Jun,Leng Lu,Miao Jun

Abstract

With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/20/4/1010/pdf

Reference49 articles.

1. An Efficient Vision-based Object Detection and Tracking using Online Learning;Kim;J. Multimed. Inf. Syst.,2017

2. Efficient Facial Expression Recognition Algorithm Based on Hierarchical Deep Neural Network Structure

3. Deep convolutional neural network designed for age assessment based on orthopantomography data

4. MU R-CNN: A Two-Dimensional Code Instance Segmentation Network Based on Deep Learning

Cited by 160 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic localization of image semantic patches for crop disease recognition;Applied Soft Computing;2024-11

2. Fast Person Detection Using YOLOX With AI Accelerator For Train Station Safety;2024 International Electronics Symposium (IES);2024-08-06

3. CSASNet—A Crop Leaf Disease Identification Method Based on Improved ShuffleNetV2;Automatic Control and Computer Sciences;2024-08

4. A Novel Underwater Detection Method for Ambiguous Object Finding via Distraction Mining;IEEE Transactions on Industrial Informatics;2024-07

5. An Efficient MLP-Based Point-Guided Segmentation Network for Ore Images With Ambiguous Boundary;IEEE Transactions on Industrial Informatics;2024-07