MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments-Reference-Cited by-同舟云学术

MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments

Published:2021-11-11 Issue:22 Volume:10 Page:2751
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Koutras Dimitrios I.^ORCID,Kapoutsis Athanasios C.^ORCID,Amanatiadis Angelos A.^ORCID,Kosmatopoulos Elias B.^ORCID

Abstract

This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot’s dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by evaluating PPO learned policy algorithm side-by-side with frontier-based exploration strategies. A study on the performance curves revealed that PPO-based policy was capable of performing adaptive-to-the-unknown-terrain sweeping without leaving expensive-to-revisit areas uncovered, underlying the capability of RL-based methodologies to tackle exploration tasks efficiently.

Funder

European Commission

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/10/22/2751/pdf

Reference37 articles.

1. All Aboard to Marshttps://www.nature.com/articles/d41586-020-01861-0

2. Real-time adaptive multi-robot exploration with application to underwater map construction

3. A Multi-Resolution Frontier-Based Planner for Autonomous 3D Exploration

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AutoRL X: Automated Reinforcement Learning on the Web;ACM Transactions on Interactive Intelligent Systems;2024-06-03

2. Autonomous Exploration and Mapping for Mobile Robots via Cumulative Curriculum Reinforcement Learning;2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS);2023-10-01

3. Autonomous Exploration for Mobile Robot in Three Dimensional Multi-layer Space;Intelligent Robotics and Applications;2023

4. Coordinating heterogeneous mobile sensing platforms for effectively monitoring a dispersed gas plume;Integrated Computer-Aided Engineering;2022-09-05

5. Deep Reinforcement Learning for Multi-UAV Exploration Under Energy Constraints;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2022