ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture-Reference-Cited by-同舟云学术

ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture

Published:2023 Issue: Volume:54 Page:01004
ISSN:2271-2097
Container-title:ITM Web of Conferences
language:
Short-container-title:ITM Web Conf.

Author:

B Jayaraj P.,J Ebin,R Karthik,P N Pournami

Abstract

Localization is one of the main tasks involved in the operation of autonomous agents (e.g., vehicle, robot etc.). It allows them to be able to track their paths and properly detect and avoid obstacles. Visual Odometry (VO) is one of the techniques used for agent localization. VO involves estimating the motion of an agent using the images taken by cameras attached to it. Conventional VO algorithms require specific workarounds for challenges posed by the working environment and the captured sensor data. On the other hand, Deep Learning approaches have shown tremendous efficiency and accuracy in tasks that require high degree of adaptability and scalability. In this work, a novel deep learning model is proposed to perform VO tasks for space robotic applications. The model consists of an optical flow estimation module which abstracts away scene-specific details from the input video sequence and produces an intermediate representation. The CNN module which follows next learn relative poses from the optical flow estimates. The final module is a state-of-the-art Vision Transformer, which learn absolute pose from the relative pose learnt by the CNN module. The model is trained on the KITTI dataset and has obtained a promising accuracy of approximately 2%. It has outperformed the baseline model, MagicVO, in a few sequences in the dataset.

Publisher

EDP Sciences

Subject

General Medicine

Link

https://www.itm-conferences.org/10.1051/itmconf/20235401004/pdf

Reference19 articles.

1. Visual Odometry [Tutorial]

2. Howard A., Real-time stereo visual odometry for autonomous ground vehicles, in 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (2008), pp. 3946–3952

3. Pandey T., Pena D., Byrne J., Moloney D., Sensors 21 (2021)

4. Muller P., Savakis A., Flowdometry: An Optical Flow and Deep Learning Based Ap- proach to Visual Odometry, in 2017 IEEE Winter Conference on Applications of Com- puter Vision (WACV) (2017), pp. 624–631

5. Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S. et al., CoRR abs/2010.11929 (2020), 2010. 11929