Affiliation:
1. University of Michigan - Ann Arbor, Ann Arbor, MI, USA
Abstract
Converting widely-available 2D images and videos, captured using an RGB camera, to 3D can help accelerate the training of machine learning systems in spatial reasoning domains ranging from in-home assistive robots to augmented reality to autonomous vehicles. However, automating this task is challenging because it requires not only accurately estimating object location and orientation, but also requires knowing currently unknown camera properties (e.g., focal length). A scalable way to combat this problem is to leverage people's spatial understanding of scenes by crowdsourcing visual annotations of 3D object properties. Unfortunately, getting people to directly estimate 3D properties reliably is difficult due to the limitations of image resolution, human motor accuracy, and people's 3D perception (i.e., humans do not "see" depth like a laser range finder). In this paper, we propose a crowd-machine hybrid approach that jointly uses crowds' approximate measurements of multiple in-scene objects to estimate the 3D state of a single target object. Our approach can generate accurate estimates of the target object by combining heterogeneous knowledge from multiple contributors regarding various different objects that share a spatial relationship with the target object. We evaluate our joint object estimation approach with 363 crowd workers and show that our method can reduce errors in the target object's 3D location estimation by over 40%, while requiring only $35$% as much human time. Our work introduces a novel way to enable groups of people with different perspectives and knowledge to achieve more accurate collective performance on challenging visual annotation tasks.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Human-Computer Interaction,Social Sciences (miscellaneous)
Reference72 articles.
1. Sean Bell Paul Upchurch Noah Snavely and Kavita Bala. 2013. OpenSurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG)32 4 (2013) 111. Sean Bell Paul Upchurch Noah Snavely and Kavita Bala. 2013. OpenSurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG)32 4 (2013) 111.
2. Xun Cao Alan C Bovik Yao Wang and Qionghai Dai. 2011. Converting 2D video to 3D: An efficient path to a 3Dexperience.IEEE MultiMedia 18 4 (2011) 12--17. Xun Cao Alan C Bovik Yao Wang and Qionghai Dai. 2011. Converting 2D video to 3D: An efficient path to a 3Dexperience.IEEE MultiMedia 18 4 (2011) 12--17.
3. Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision
4. Learning Single-Image Depth From Videos Using Quality Assessment Networks
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献