Author:
Ujala Razaq ,Muhammad Muneeb Ullah ,Muhammad Usman
Abstract
This study focuses on the area of visual indoor place recognition (e.g., in an office setting, automatically recognizing different places, such as offices, corridor, wash room, etc.). The potential applications include robot navigation, augmented reality, and image retrieval. However, the task is extremely demanding because of the variations in appearance in such dynamic setups (e.g., view-point, occlusion, illumination, scale, etc.). Recently, Convolutional Neural Network (CNN) has emerged as a powerful learning mechanism, able to learn deep higher-level features when provided with a comparatively big quantity of labeled training data. Here, we exploit the generic nature of CNN features for robust visual place recognition in the challenging COLD dataset. So, we employ the pre-trained CNNs (on the related tasks of object and scene classification) for deep feature extraction in the COLD images. We demonstrate that these off-the-shelf features, when combined with a simple linear SVM classifier, outperform their bag-of-features counterpart. Moreover, a simple combination scheme, combining the local bag-of-features and higher-level deep CNN features, produce outstanding results on the COLD dataset.
Publisher
Readers Insight Publisher
Reference16 articles.
1. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Paper presented at the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05).
2. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. Paper presented at the International conference on machine learning.
3. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
4. Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. Paper presented at the European conference on computer vision.
5. Guillaume, H., Dubois, M., Emmanuelle, F., & Tarroux, P. (2011). Temporal bag-of-words-a generative model for visual place recognition using temporal integration.