Cross-Viewpoint Semantic Mapping: Integrating Human and Robot Perspectives for Improved 3D Semantic Reconstruction
Author:
Kopácsi László12ORCID, Baffy Benjámin2ORCID, Baranyi Gábor2, Skaf Joul2, Sörös Gábor3, Szeier Szilvia2, Lőrincz András2ORCID, Sonntag Daniel14
Affiliation:
1. Department of Interactive Machine Learning, German Research Center for Artificial Intelligence (DFKI), 66123 Saarbrücken, Germany 2. Department of Artificial Intelligence, Eötvös Loránd University, 1053 Budapest, Hungary 3. Nokia Bell Labs, 1083 Budapest, Hungary 4. Department of Applied Artificial Intelligence, University of Oldenburg, 26129 Oldenburg, Germany
Abstract
Allocentric semantic 3D maps are highly useful for a variety of human–machine interaction related tasks since egocentric viewpoints can be derived by the machine for the human partner. Class labels and map interpretations, however, may differ or could be missing for the participants due to the different perspectives. Particularly, when considering the viewpoint of a small robot, which significantly differs from the viewpoint of a human. In order to overcome this issue, and to establish common ground, we extend an existing real-time 3D semantic reconstruction pipeline with semantic matching across human and robot viewpoints. We use deep recognition networks, which usually perform well from higher (i.e., human) viewpoints but are inferior from lower viewpoints, such as that of a small robot. We propose several approaches for acquiring semantic labels for images taken from unusual perspectives. We start with a partial 3D semantic reconstruction from the human perspective that we transfer and adapt to the small robot’s perspective using superpixel segmentation and the geometry of the surroundings. The quality of the reconstruction is evaluated in the Habitat simulator and a real environment using a robot car with an RGBD camera. We show that the proposed approach provides high-quality semantic segmentation from the robot’s perspective, with accuracy comparable to the original one. In addition, we exploit the gained information and improve the recognition performance of the deep network for the lower viewpoints and show that the small robot alone is capable of generating high-quality semantic maps for the human partner. The computations are close to real-time, so the approach enables interactive applications.
Funder
European Union project European Commission funded project “Humane AI: Toward AI Systems That Augment and Empower Humans by Understanding Us, our Society and the World Around Us” the European Commission project MASTER
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference40 articles.
1. Oviatt, S., Schuller, B., Cohen, P.R., Sonntag, D., Potamianos, G., and Krüger, A. (2019). The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions, Association for Computing Machinery and Morgan & Claypool. 2. Baranyi, G., Dos Santos Melício, B.C., Gaál, Z., Hajder, L., Simonyi, A., Sindely, D., Skaf, J., Dušek, O., Nekvinda, T., and Lőrincz, A. (2022). AI Technologies for Machine Supervision and Help in a Rehabilitation Scenario. Multimodal Technol. Interact., 6. 3. Song, S., Lichtenberg, S.P., and Xiao, J. (2015, January 7–12). SUN RGB-D: A RGB-D scene understanding benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA. 4. Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012, January 7–13). Indoor Segmentation and Support Inference from RGBD Images. Proceedings of the 12th European Conference on Computer Vision, Florence, Italy. 5. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L. (2014, January 6–12). Microsoft COCO: Common Objects in Context. Proceedings of the 13th European Conference, Zurich, Switzerland.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Contextually-aware autonomous navigation framework for human guidance;Signal Processing, Sensor/Information Fusion, and Target Recognition XXXIII;2024-06-07
|
|