Deceptive Path Planning via Count-Based Reinforcement Learning under Specific Time Constraint

Author:

Chen Dejun1,Zeng Yunxiu1,Zhang Yi1,Li Shuilin1,Xu Kai1,Yin Quanjun1

Affiliation:

1. College of Systems Engineering, National University of Defense Technology, Changsha 410073, China

Abstract

Deceptive path planning (DPP) aims to find a path that minimizes the probability of the observer identifying the real goal of the observed before it reaches. It is important for addressing issues such as public safety, strategic path planning, and logistics route privacy protection. Existing traditional methods often rely on “dissimulation”—hiding the truth—to obscure paths while ignoring the time constraints. Building upon the theory of probabilistic goal recognition based on cost difference, we proposed a DPP method, DPP_Q, based on count-based Q-learning for solving the DPP problems in discrete path-planning domains under specific time constraints. Furthermore, to extend this method to continuous domains, we proposed a new model of probabilistic goal recognition called the Approximate Goal Recognition Model (AGRM) and verified its feasibility in discrete path-planning domains. Finally, we also proposed a DPP method based on proximal policy optimization for continuous path-planning domains under specific time constraints called DPP_PPO. DPP methods like DPP_Q and DPP_PPO are types of research that have not yet been explored in the field of path planning. Experimental results show that, in discrete domains, compared to traditional methods, DPP_Q exhibits better effectiveness in enhancing the average deceptiveness of paths. (Improved on average by 12.53% compared to traditional methods). In continuous domains, DPP_PPO shows significant advantages over random walk methods. Both DPP_Q and DPP_PPO demonstrate good applicability in path-planning domains with uncomplicated obstacles.

Funder

Natural Science Foundation of China

Publisher

MDPI AG

Reference38 articles.

1. Liar, liar, working memory on fire: Investigating the role of working memory in childhood verbal deception;Alloway;J. Exp. Child Psychol.,2015

2. The effect of deception on optimal decisions;Greenberg;Oper. Res. Lett.,1982

3. Matsubara, S., and Yokoo, M. (1998, January 3–7). Negotiations with inaccurate payoff values. Proceedings of the International Conference on Multi Agent Systems (Cat. No. 98EX160), Paris, France.

4. Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., DiRenzo, J., Maule, B., and Meyer, G. (2012, January 4–8). Protect: A deployed game theoretic system to protect the ports of the United States. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, Valencia, Spain.

5. Geib, C.W., and Goldman, R.P. (2001, January 12–14). Plan recognition in intrusion detection systems. Proceedings of the DARPA Information Survivability Conference and Exposition II, DISCEX’01, Anaheim, CA, USA.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3