FF3D: A Rapid and Accurate 3D Fruit Detector for Robotic Harvesting
Author:
Liu Tianhao1, Wang Xing2, Hu Kewei3, Zhou Hugh1ORCID, Kang Hanwen1, Chen Chao1ORCID
Affiliation:
1. Faculty of Engineering, Monash University, Clayton, VIC 3800, Australia 2. CSIRO’s Data61, Level 5/13 Garden St, Eveleigh, NSW 2015, Australia 3. College of Engineering, South China Agriculture University, Guangzhou 510070, China
Abstract
This study presents the Fast Fruit 3D Detector (FF3D), a novel framework that contains a 3D neural network for fruit detection and an anisotropic Gaussian-based next-best view estimator. The proposed one-stage 3D detector, which utilizes an end-to-end 3D detection network, shows superior accuracy and robustness compared to traditional 2D methods. The core of the FF3D is a 3D object detection network based on a 3D convolutional neural network (3D CNN) followed by an anisotropic Gaussian-based next-best view estimation module. The innovative architecture combines point cloud feature extraction and object detection tasks, achieving accurate real-time fruit localization. The model is trained on a large-scale 3D fruit dataset and contains data collected from an apple orchard. Additionally, the proposed next-best view estimator improves accuracy and lowers the collision risk for grasping. Thorough assessments on the test set and in a simulated environment validate the efficacy of our FF3D. The experimental results show an AP of 76.3%, an AR of 92.3%, and an average Euclidean distance error of less than 6.2 mm, highlighting the framework’s potential to overcome challenges in orchard environments.
Reference45 articles.
1. Girshick, R. (1995, January 20–23). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Cambridge, MA, USA. 2. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 3. He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017, January 22–29). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. 4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv. 5. Kang, H., Zhou, H., Wang, X., and Chen, C. (2020). Real-time fruit recognition and grasping estimation for robotic apple harvesting. Sensors, 20.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|