Author:
Kim Beomjoon,Kaelbling Leslie Pack,Lozano-Pérez Tomás
Abstract
We propose an actor-critic algorithm that uses past planning experience to improve the efficiency of solving robot task-and-motion planning (TAMP) problems. TAMP planners search for goal-achieving sequences of high-level operator instances specified by both discrete and continuous parameters. Our algorithm learns a policy for selecting the continuous parameters during search, using a small training set generated from the search trees of previously solved instances. We also introduce a novel fixed-length vector representation for world states with varying numbers of objects with different shapes, based on a set of key robot configurations. We demonstrate experimentally that our method learns more efficiently from less data than standard reinforcementlearning approaches and that using a learned policy to guide a planner results in the improvement of planning efficiency.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献