Affiliation:
1. School of Electronic and Information Engineering Suzhou University of Science and Technology Suzhou 215009 China
2. Suzhou Automobile Research Institute Tsinghua University Suzhou 215134 China
3. School of Computer and Information Technology Shanxi University Taiyuan 030006 China
4. Faculty of Engineering McMaster University Hamilton ON L8S 0A Canada
5. Department of Automation Tsinghua University Beijing 100084 China
Abstract
Due to the nonlinearity and underactuation of bipedal robots, developing efficient jumping strategies remains challenging. To address this, a multiobjective collaborative deep reinforcement learning algorithm based on the actor‐critic framework is presented. Initially, two deep deterministic policy gradient (DDPG) networks are established for training the jumping motion, each focusing on different objectives and collaboratively learning the optimal jumping policy. Following this, a recovery experience replay mechanism, predicated on dynamic time warping, is integrated into the DDPG to enhance sample utilization efficiency. Concurrently, a timely adjustment unit is incorporated, which works in tandem with the training frequency to improve the convergence accuracy of the algorithm. Additionally, a Markov decision process is designed to manage the complexity and parameter uncertainty in the dynamic model of the bipedal robot. Finally, the proposed method is validated on a PyBullet platform. The results show that the method outperforms baseline methods by improving learning speed and enabling robust jumps with greater height and distance.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Jiangsu Province
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献