GAMNet: Global attention via multi‐scale context for depth estimation algorithm and application-Reference-Cited by-同舟云学术

GAMNet: Global attention via multi‐scale context for depth estimation algorithm and application

Published:2023-10-05 Issue:1 Volume:18 Page:247-264
ISSN:1751-9659
Container-title:IET Image Processing
language:en
Short-container-title:IET Image Processing

Author:

Yang Huitong¹²,Lei Liang¹²,Sang Haiwei¹^ORCID

Affiliation:

1. School of Mathematics And Big Data Guizhou Education University Guiyang Guizhou China

2. School of Physics and Optoelectronic Engineering Guangdong University of Technology Guangzhou China

Abstract

AbstractDeep neural networks significantly enhance the accuracy of the stereo‐based disparity estimation. Some current methods suffer from inefficient use of the global context information, which will lead to the loss of structural details in ill‐posed areas. To this end, a novel stereo network GAMNet is designed, composed of three core components (GDA, MPF, DCA) for estimating the depth prediction in challenging real‐world environments. First, a lightweight attention module is presented, integrating the global semantic cues for every feature position across the channel and spatial dimensions. Next, the MPF module is constructed to fuse the diverse semantic and contextual information from different levels of the feature pyramid. Finally, cost volume is aggregated with a stacked encoder‐decoder composed of the DCA module and 3D convolutions, filtering the transmission of matching clues and capturing the rich global contexts. Substantial experiments conducted on KITTI 2012, KITTI 2015, SceneFlow, and Middlebury‐v3 datasets manifest that GAMNet surpasses preceding methods with contour‐preserving disparity predictions. In addition, the first 3D scene reconstruction linear evaluation strategy on spatial grasping points for the end‐to‐end stereo networks in an unsupervised mode is proposed, and it is deployed on the designed robot vision‐guided system. In application experiments, the method can produce densely high‐precision 3D reconstructions to implement the grasping task in complex real‐world scenes, and achieve excellent robust performance with competitive inference efficiency.

Publisher

Institution of Engineering and Technology (IET)

Subject

Electrical and Electronic Engineering,Computer Vision and Pattern Recognition,Signal Processing,Software

Reference45 articles.

1. Bleyer M. Rhemann C. Rother C.:Patchmatch stereo ‐ stereo matching with slanted support windows. In:BMVC pp.1–11.Springer‐Verlag London(2011)

2. Cross-scale cost aggregation for stereo matching

3. Stereo Processing by Semiglobal Matching and Mutual Information

4. Fully Convolutional Pyramidal Networks for Semantic Segmentation

5. Ronneberger O. Fischer P. Brox T.:U‐net: Convolutional networks for biomedical image segmentation. In:Medical Image Computing and Computer‐Assisted Intervention ‐ MICCAI 2015 vol.9351 pp.234–241.Springer Cham(2015)