Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle

Author:

Tinker Theodore Jerome1,Doya Kenji2,Tani Jun3

Affiliation:

1. Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology Graduate University, Onna-san 904-0495, Okinawa, Japan theodore.tinker@oist.jp

2. Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Onna-san 904-0495, Okinawa, Japan doya@oist.jp

3. Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology Graduate University, Onna-san 904-0495, Okinawa, Japan jun.tani@oist.jp

Abstract

Abstract In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.

Publisher

MIT Press

Reference30 articles.

1. A novel predictive-coding-inspired variational RNN model for online prediction and recognition;Ahmadi;Neural Computation,2019

2. Intrinsic motivations and open-ended development in animals, humans, and robots: An overview;Baldassarre;Frontiers in Psychology,2014

3. Neuronlike adaptive elements that can solve difficult learning control problems;Barto;IEEE Transactions on Systems, Man, and Cybernetics,1983

4. Weight uncertainty in neural network;Blundell,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3