Hierarchical reinforcement Thompson composition-Reference-Cited by-同舟云学术

Hierarchical reinforcement Thompson composition

Published:2024-04-20 Issue:20 Volume:36 Page:12317-12326
ISSN:0941-0643
Container-title:Neural Computing and Applications
language:en
Short-container-title:Neural Comput & Applic

Author:

Tanık Güven Orkun^ORCID,Ertekin Şeyda

Abstract

AbstractModern real-world control problems call for continuous control domains and robust, sample efficient and explainable control frameworks. We are presenting a framework for recursively composing control skills to solve compositional and progressively complex tasks. The framework promotes reuse of skills, and as a result quick adaptability to new tasks. The decision tree can be observed, providing insight into the agents’ behavior. Furthermore, the skills can be transferred, modified or trained independently, which can simplify reward shaping and increase training speeds considerably. This paper is concerned with efficient composition of control algorithms using reinforcement learning and soft attention. Compositional and temporal abstraction is the key to improving learning and planning in reinforcement learning. Our Thompson sampling inspired soft-attention model is demonstrated to efficiently solve the composition problem.

Funder

Middle East Technical University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s00521-024-09732-9.pdf

Reference27 articles.

1. Amodei D, Olah C, Steinhardt J, Christiano P, Schulman J, Mané D (2016) Concrete problems in AI safety

2. Hangl S, Dunjko V, Briegel HJ, Piater J (2020) Skill learning by autonomous robotic playing using active learning and exploratory behavior composition. Front Roboti AI. https://doi.org/10.3389/frobt.2020.00042

3. Cheng Y, Zhao P, Wang F, Block DJ, Hovakimyan N (2022) Improving the robustness of reinforcement learning policies with l1adaptive control. IEEE Robot Autom Lett 7:6574–6581. https://doi.org/10.1109/LRA.2022.3169309

4. Amini A, Gilitschenski I, Phillips J, Moseyko J, Banerjee R, Karaman S, Rus D (2020) Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robot Autom Lett 5:1143–1150. https://doi.org/10.1109/LRA.2020.2966414

5. Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112:181–211. https://doi.org/10.1016/S0004-3702(99)00052-1