Learning Intention-Aware Policies in Deep Reinforcement Learning

Author:

Zhao Tingting1,Wu Shuai1,Li Guixi1,Chen Yarui1,Niu Gang2,Sugiyama Masashi34

Affiliation:

1. College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, P.R.C. tingting@tust.edu.cn

2. RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan gang.niu.ml@gmail.com

3. RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan

4. Graduate School of Frontier Sciences, University of Tokyo, Tokyo 277-8561, Japan sugi@k.u-tokyo.ac.jp

Abstract

Abstract Deep reinforcement learning (DRL) provides an agent with an optimal policy so as to maximize the cumulative rewards. The policy defined in DRL mainly depends on the state, historical memory, and policy model parameters. However, we humans usually take actions according to our own intentions, such as moving fast or slow, besides the elements included in the traditional policy models. In order to make the action-choosing mechanism more similar to humans and make the agent to select actions that incorporate intentions, we propose an intention-aware policy learning method in this letter To formalize this process, we first define an intention-aware policy by incorporating the intention information into the policy model, which is learned by maximizing the cumulative rewards with the mutual information (MI) between the intention and the action. Then we derive an approximation of the MI objective that can be optimized efficiently. Finally, we demonstrate the effectiveness of the intention-aware policy in the classical MuJoCo control task and the multigoal continuous chain walking task.

Publisher

MIT Press

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Reference41 articles.

1. A theory of adaptive pattern classifiers;Amari;IEEE Transactions on Electronic Computers,1967

2. Skill-based curiosity for intrinsically motivated reinforcement learning;Bougie;Machine Learning,2020

3. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets;Chen,2016

4. Deep reinforcement learning for general game playing;Goldwaser,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3