Affiliation:
1. Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR)
2. Advanced Remanufacturing and Technology Centre (ARTC), Agency for Science, Technology and Research (A*STAR)
Abstract
Abstract
Teaching a robot to grasp cluttered and stacked objects is a common and challenging but important research topic that can potentially benefit many real-life applications. The challenge can be addressed by incorporating pushing action into grasping strategy. To make robot capable of pushing and grasping (PG) stacked objects, deep neural network (DNN), typically reinforcement learning, has been reported widely, while the limitations of long training time and low action efficiency are still obvious. In this work, an exploratory guided pushing and grasping approach was proposed using a self-supervised technique to expedite training cycle, and a memory buffer enhancement strategy to guide the training continuously. The network architecture, data collection and learning strategy were investigated and analysed to study their effects. Comprehensive experiments were conducted on both simulation and real environment to validate the proposed solution with high action efficiency, outperforming the state-of-the-art solutions with a large margin.
Publisher
Research Square Platform LLC
Reference37 articles.
1. He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian (2016) Deep residual learning for image recognition. 770--778, Proceedings of the IEEE conference on computer vision and pattern recognition
2. Szegedy, Christian and Ioffe, Sergey and Vanhoucke, Vincent and Alemi, Alexander A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence
3. Huang, Gao and Liu, Zhuang and Van Der Maaten, Laurens and Weinberger, Kilian Q (2017) Densely connected convolutional networks. 4700--4708, Proceedings of the IEEE conference on computer vision and pattern recognition
4. Karen Simonyan and Andrew Zisserman (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations
5. Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby (2021) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. May, 9th International Conference on Learning Representations, {ICLR} 2021