Abstract
Abstract
The model-free characteristic of the Q-learning algorithm, without obtaining information about the environment and being available for agents to learn by themselves, enables Q-learning to be widely applied to path planning fields. Nonetheless, the selection of parameter values will have a crucial impact on the results. In this paper, how to determine an appropriate value of learning rate and discount factor and these parameters’ effect on the overall results will be presented. The agents with different learning rate or discount factor values will perform in randomly generated mazes, the results of which will be aggregated and compared. When the learning rate equals 0.9, under the condition of setting the learning rate as variable and discount factor as invariant, the aggregated data of 0.9 can reach convergence way more quickly than in other settings (0.6, 0.3, 0.1); when the discount factor equals 0.9 and the experiment follows the unique variable principle, the aggregated data of 0.9 searches for shorter path length and faster than other groups (0.6, 0.3, 0.1); when both the learning rate and discount factor are set to 0.9 – other groups are 1.0, 0.1, and 0 – the group of 0.9 is more stable than the group of 0.1 and shows convergence, which does not appear in the group of 1.0 and 0, within 80 iterations.
Subject
General Physics and Astronomy
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献