Actively learning costly reward functions for reinforcement learning-Reference-Cited by-同舟云学术

Actively learning costly reward functions for reinforcement learning

Published:2024-03-01 Issue:1 Volume:5 Page:015055
ISSN:2632-2153
Container-title:Machine Learning: Science and Technology
language:
Short-container-title:Mach. Learn.: Sci. Technol.

Author:

Eberhard André^ORCID,Metni Houssam^ORCID,Fahland Georg,Stroh Alexander^ORCID,Friederich Pascal^ORCID

Abstract

Abstract Transfer of recent advances in deep reinforcement learning to real-world applications is hindered by high data demands and thus low efficiency and scalability. Through independent improvements of components such as replay buffers or more stable learning algorithms, and through massively distributed systems, training time could be reduced from several days to several hours for standard benchmark tasks. However, while rewards in simulated environments are well-defined and easy to compute, reward evaluation becomes the bottleneck in many real-world environments, e.g. in molecular optimization tasks, where computationally demanding simulations or even experiments are required to evaluate states and to quantify rewards. When ground-truth evaluations become orders of magnitude more expensive than in research scenarios, direct transfer of recent advances would require massive amounts of scale, just for evaluating rewards rather than training the models. We propose to alleviate this problem by replacing costly ground-truth rewards with rewards modeled by neural networks, counteracting non-stationarity of state and reward distributions during training with an active learning component. We demonstrate that using our proposed method, it is possible to train agents in complex real-world environments orders of magnitudes faster than would be possible when using ground-truth rewards. By enabling the application of RL methods to new domains, we show that we can find interesting and non-trivial solutions to real-world optimization problems in chemistry, materials science and engineering. We demonstrate speed-up factors of 50–3000 when applying our approach to challenges of molecular design and airfoil optimization.

Funder

Federal Ministry of Economics and Energy

Deutsche Forschungsgemeinschaft

Bundesministerium für Bildung und Forschung

Publisher

IOP Publishing

Link

https://iopscience.iop.org/article/10.1088/2632-2153/ad33e0/pdf

Reference94 articles.

1. Apprenticeship learning via inverse reinforcement learning;Abbeel,2004

2. A survey of inverse reinforcement learning;Adams;Artif. Intell. Rev.,2022

3. Learning to optimize molecular geometries using reinforcement learning;Ahuja;J. Chem. Theory Comput.,2021

4. Hindsight experience replay;Andrychowicz,2017

5. Aerodynamic effects of uniform blowing and suction on a NACA4412 airfoil;Atzori;Flow Turbul. Combust.,2020

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Digital chemistry: navigating the confluence of computation and experimentation – definition, status quo, and future perspective;Digital Discovery;2024