Online supervised attention-based recurrent depth estimation from monocular video

Author:

Maslov Dmitrii1,Makarov Ilya12

Affiliation:

1. School of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russia

2. Samsung-PDMI Joint AI Center, St. Petersburg Department of Steklov Institute of Mathematics, St. Petersburg, Russia

Abstract

Autonomous driving highly depends on depth information for safe driving. Recently, major improvements have been taken towards improving both supervised and self-supervised methods for depth reconstruction. However, most of the current approaches focus on single frame depth estimation, where quality limit is hard to beat due to limitations of supervised learning of deep neural networks in general. One of the way to improve quality of existing methods is to utilize temporal information from frame sequences. In this paper, we study intelligent ways of integrating recurrent block in common supervised depth estimation pipeline. We propose a novel method, which takes advantage of the convolutional gated recurrent unit (convGRU) and convolutional long short-term memory (convLSTM). We compare use of convGRU and convLSTM blocks and determine the best model for real-time depth estimation task. We carefully study training strategy and provide new deep neural networks architectures for the task of depth estimation from monocular video using information from past frames based on attention mechanism. We demonstrate the efficiency of exploiting temporal information by comparing our best recurrent method with existing image-based and video-based solutions for monocular depth reconstruction.

Funder

Samsung Research, Samsung Electronics

Data Analysis and Artificial Intelligence School, HSE University

Publisher

PeerJ

Subject

General Computer Science

Reference61 articles.

1. Neural machine translation by jointly learning to align and translate;Bahdanau,2014

2. Delving deeper into convolutional networks for learning video representations;Ballas,2015

3. Estimating depth from monocular images as classification using deep fully convolutional residual networks;Cao;IEEE Transactions on Circuits and Systems for Video Technology,2018

4. Depth Prediction without the sensors: leveraging structure for unsupervised learning from monocular videos;Casser,2018

5. Attention-based context aggregation network for monocular depth estimation;Chen,2019

Cited by 22 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey;ACM Computing Surveys;2024-07-15

2. Inpainting Semantic and Depth Features to Improve Visual Place Recognition in the Wild;IEEE Access;2024

3. Gesture Recognition on Video Data;Communications in Computer and Information Science;2024

4. Reparameterization for Improved Training and Weight Optimization in Single Image Super-Resolution Networks;2023 IEEE 23rd International Symposium on Computational Intelligence and Informatics (CINTI);2023-11-20

5. MonoVAN: Visual Attention for Self-Supervised Monocular Depth Estimation;2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR);2023-10-16

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3