Author:
Jose John Anthony C., ,Cruz Meygen D.,Keh Jefferson James U.,Rivera Maverick,Sybingco Edwin,Dadios Elmer P.
Abstract
Large annotated datasets are crucial for training deep machine learning models, but they are expensive and time-consuming to create. There are already numerous public datasets, but a vast amount of unlabeled data, especially video data, can still be annotated and leveraged to further improve the performance and accuracy of machine learning models. Therefore, it is essential to reduce the time and effort required to annotate a dataset to prevent bottlenecks in the development of this field. In this study, we propose Anno-Mate, a pair of features integrated into the Computer Vision Annotation Tool (CVAT). It facilitates human–machine collaboration and reduces the required human effort. Anno-Mate comprises Auto-Fit, which uses an EfficientDet-D0 backbone to tighten an existing bounding box around an object, and AutoTrack, which uses a channel and spatial reliability tracking (CSRT) tracker to draw a bounding box on the target object as it moves through the video frames. Both features exhibit a good speed and accuracy trade-off. Auto-Fit garnered an overall accuracy of 87% and an average processing time of 0.47 s, whereas the AutoTrack feature exhibited an overall accuracy of 74.29% and could process 18.54 frames per second. When combined, these features are proven to reduce the time required to annotate a minute of video by 26.56%.
Publisher
Fuji Technology Press Ltd.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction
Reference30 articles.
1. M. A. Rosales et al., “Artificial Intelligence: The Technology Adoption and Impact in the Philippines,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400025, 2020.
2. H. L. Aquino et al., “Trend Forecasting of Computer Vision Application in Aquaponic Cropping Systems Industry,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400070, 2020.
3. A. Vahdat, “Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pp. 5601-5610, 2017.
4. B. Sekachev, M. Nikita, and Z. Andrey, “Computer vision annotation tool: A universal approach to data annotation,” 2019, https://software.intel.com/content/www/us/en/develop/articles/computer-vision-annotation-tool-a-universal-approach-to-data-annotation.html [accessed October 4, 2020]
5. T.-N. Le et al., “Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework,” 2020 IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 3220-3229, doi: 10.1109/WACV45572.2020.9093398, 2020.