Abstract
AbstractAchieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color images is a critical challenge with wide-ranging applications in robotics and close-contact aircraft operations. This study investigates key techniques in the application of YOLOv5 object detection convolutional neural network (CNN) for 6D pose localization of aircraft using only color imagery. Traditional object detection labeling methods suffer from inaccuracies due to perspective geometry and being limited to visible key points. This research demonstrates that with precise labeling, a CNN can predict object features with near-pixel accuracy, effectively learning the distinct appearance of the object due to perspective distortion with a pinhole camera. Additionally, we highlight the crucial role of knowledge about occluded features. Training the CNN with such knowledge slightly reduces pixel precision, but enables the prediction of 3 times more features, including those that are not initially visible, resulting in an overall better performing 6D system. Notably, we reveal that the data augmentation technique of scale can interfere with pixel precision when used during training. These findings are crucial for the entire system, which leverages the Solve Perspective-N-Point (Solve-PnP) algorithm, achieving 6D pose accuracy within 1$$^\circ$$
∘
and 7 cm at distances ranging from 7.5 to 35 m from the camera. Moreover, this solution operates in real-time, achieving sub-10ms processing times on a desktop PC.
Funder
Air Force Research Laboratory
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Software
Reference59 articles.
1. Anderson James D, Nykl Scott, Wischgoll Thomas (2019) Augmenting flight imagery from aerial refueling. In: Advances in Visual Computing: 14th International Symposium on Visual Computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, Proceedings, Part II 14, pp 154–165. Springer
2. Anderson James D, Raettig Ryan M, Larson Josh, Nykl Scott L, Taylor Clark N, Wischgoll Thomas (2022) Delaunay walk for fast nearest neighbor: accelerating correspondence matching for icp. Mach Vis Appl 33(2):31
3. Bello I, Fedus W, Du X, Cubuk ED, Srinivas A, Lin T-Y, Shlens J, Zoph B (2021) Revisiting resnets: improved training and scaling strategies. Adv Neural Inf Process Syst 34:22614–22627
4. Yannick B, Marcus V (2020) Efficientpose: an efficient, accurate and scalable end-to-end 6d multi object pose estimation approach. arXiv preprint arXiv:2011.04307,
5. Jeffrey C, Derek W, Scott N, Clark T, Brett B, Schubert KC (2023) Advancing training data techniques for 6d pose localization via object detection. YouTube video, 2023. Accessed on April 28, https://youtu.be/Ot9Ug7FAh3s