Online supervised attention-based recurrent depth estimation from monocular video-Reference-Cited by-同舟云学术

Online supervised attention-based recurrent depth estimation from monocular video

Published:2020-11-23 Issue: Volume:6 Page:e317
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Maslov Dmitrii¹,Makarov Ilya¹²

Affiliation:

1. School of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russia

2. Samsung-PDMI Joint AI Center, St. Petersburg Department of Steklov Institute of Mathematics, St. Petersburg, Russia

Abstract

Autonomous driving highly depends on depth information for safe driving. Recently, major improvements have been taken towards improving both supervised and self-supervised methods for depth reconstruction. However, most of the current approaches focus on single frame depth estimation, where quality limit is hard to beat due to limitations of supervised learning of deep neural networks in general. One of the way to improve quality of existing methods is to utilize temporal information from frame sequences. In this paper, we study intelligent ways of integrating recurrent block in common supervised depth estimation pipeline. We propose a novel method, which takes advantage of the convolutional gated recurrent unit (convGRU) and convolutional long short-term memory (convLSTM). We compare use of convGRU and convLSTM blocks and determine the best model for real-time depth estimation task. We carefully study training strategy and provide new deep neural networks architectures for the task of depth estimation from monocular video using information from past frames based on attention mechanism. We demonstrate the efficiency of exploiting temporal information by comparing our best recurrent method with existing image-based and video-based solutions for monocular depth reconstruction.

Funder

Samsung Research, Samsung Electronics

Data Analysis and Artificial Intelligence School, HSE University

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-317.pdf

Reference61 articles.

1. Neural machine translation by jointly learning to align and translate;Bahdanau,2014

2. Delving deeper into convolutional networks for learning video representations;Ballas,2015

3. Estimating depth from monocular images as classification using deep fully convolutional residual networks;Cao;IEEE Transactions on Circuits and Systems for Video Technology,2018

4. Depth Prediction without the sensors: leveraging structure for unsupervised learning from monocular videos;Casser,2018

5. Attention-based context aggregation network for monocular depth estimation;Chen,2019

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey;ACM Computing Surveys;2024-07-15

2. Inpainting Semantic and Depth Features to Improve Visual Place Recognition in the Wild;IEEE Access;2024

3. Gesture Recognition on Video Data;Communications in Computer and Information Science;2024

4. Reparameterization for Improved Training and Weight Optimization in Single Image Super-Resolution Networks;2023 IEEE 23rd International Symposium on Computational Intelligence and Informatics (CINTI);2023-11-20

5. MonoVAN: Visual Attention for Self-Supervised Monocular Depth Estimation;2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR);2023-10-16