Abstract
AbstractDue to their dependence on a task-specific reward function, reinforcement learning agents are ineffective at responding to a dynamic goal or environment. This paper seeks to overcome this limitation of traditional reinforcement learning through a task-agnostic, self-organising autonomous agent framework. The proposed algorithm is a hybrid of TMGWR for self-adaptive learning of sensorimotor maps and value iteration for goal-directed planning. TMGWR has been previously demonstrated to overcome the problems associated with competing sensorimotor techniques such SOM, GNG, and GWR; these problems include: difficulty in setting a suitable number of neurons for a task, inflexibility, the inability to cope with non-markovian environments, challenges with noise, and inappropriate representation of sensory observations and actions together. However, the binary sensorimotor-link implementation in the original TMGWR enables catastrophic forgetting when the agent experiences changes in the task and it is therefore not suitable for self-adaptive learning. A new sensorimotor-link update rule is presented in this paper to enable the adaptation of the sensorimotor map to new experiences. This paper has demonstrated that the TMGWR-based algorithm has better sample efficiency than model-free reinforcement learning and better self-adaptivity than both the model-free and the traditional model-based reinforcement learning algorithms. Moreover, the algorithm has been demonstrated to give the lowest overall computational cost when compared to traditional reinforcement learning algorithms.
Funder
Tertiary Education Trust Fund
Publisher
Springer Science and Business Media LLC
Reference44 articles.
1. Bellman R (1952) On the theory of dynamic programming. Proc Natl Acad Sci USA 38:716
2. Belousov B, Abdulsamad H, Klink P, Parisi S, Peters J (2021) Reinforcement learning algorithms: analysis and applications. Springer, New York
3. Berridge KC, Robinson TE, Aldridge JW (2009) Dissecting components of reward:’liking’’,wanting’, and learning. Curr Opin Pharmacol 9:65–73
4. Bozkurt AK, Wang Y, Zavlanos MM, Pajic M (2021) Model-free reinforcement learning for stochastic games with linear temporal logic objectives. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 10649–10655
5. Chaput H.H (2004) The constructivist learning architecture: A model of cognitive development for robust autonomous robots. Ph.D. thesis
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献