Abstract
Humans’ decision making process often relies on utilizing visual information from different views or perspectives. However, in machine-learning-based image classification we typically infer an object’s class from just a single image showing an object. Especially for challenging classification problems, the visual information conveyed by a single image may be insufficient for an accurate decision. We propose a classification scheme that relies on fusing visual information captured through images depicting the same object from multiple perspectives. Convolutional neural networks are used to extract and encode visual features from the multiple views and we propose strategies for fusing these information. More specifically, we investigate the following three strategies: (1) fusing convolutional feature maps at differing network depths; (2) fusion of bottleneck latent representations prior to classification; and (3) score fusion. We systematically evaluate these strategies on three datasets from different domains. Our findings emphasize the benefit of integrating information fusion into the network rather than performing it by post-processing of classification scores. Furthermore, we demonstrate through a case study that already trained networks can be easily extended by the best fusion strategy, outperforming other approaches by large margin.
Publisher
Public Library of Science (PLoS)
Reference37 articles.
1. Deep Learning;Y LeCun;Nature,2015
2. ImageNet Large Scale Visual Recognition Challenge;O Russakovsky;International Journal of Computer Vision,2015
3. Image-based classification of plant genus and family for trained and untrained plant species;M Seeland;BMC Bioinformatics,2019
4. Automated plant species identification—Trends and future directions;J Wäldchen;PLOS Computational Biology,2018
5. Ant genera identification using an ensemble of convolutional neural networks;ACR Marques;PLOS ONE,2018
Cited by
85 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献