Visual Semantic-Based Representation Learning Using Deep CNNs for Scene Recognition-Reference-Cited by-同舟云学术

Visual Semantic-Based Representation Learning Using Deep CNNs for Scene Recognition

Published:2021-06 Issue:2 Volume:17 Page:1-24
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Gupta Shikha¹,Sharma Krishan¹,Dinesh Dileep Aroor¹,Thenkanidiyoor Veena²

Affiliation:

1. Indian Institute of Technology Mandi, Mandi, H.P.

2. National Institute of Technology Goa

Abstract

In this work, we address the task of scene recognition from image data. A scene is a spatially correlated arrangement of various visual semantic contents also known as concepts, e.g., “chair,” “car,” “sky,” etc. Representation learning using visual semantic content can be regarded as one of the most trivial ideas as it mimics the human behavior of perceiving visual information. Semantic multinomial (SMN) representation is one such representation that captures semantic information using posterior probabilities of concepts. The core part of obtaining SMN representation is the building of concept models. Therefore, it is necessary to have ground-truth (true) concept labels for every concept present in an image. Moreover, manual labeling of concepts is practically not feasible due to the large number of images in the dataset. To address this issue, we propose an approach for generating pseudo-concepts in the absence of true concept labels. We utilize the pre-trained deep CNN-based architectures where activation maps (filter responses) from convolutional layers are considered as initial cues to the pseudo-concepts. The non-significant activation maps are removed using the proposed filter-specific threshold-based approach that leads to the removal of non-prominent concepts from data. Further, we propose a grouping mechanism to group the same pseudo-concepts using subspace modeling of filter responses to achieve a non-redundant representation. Experimental studies show that generated SMN representation using pseudo-concepts achieves comparable results for scene recognition tasks on standard datasets like MIT-67 and SUN-397 even in the absence of true concept labels.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3436494

Reference65 articles.

1. Learning multi-label scene classification

2. LIBSVM

3. The devil is in the details: an evaluation of recent feature encoding methods

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A bat biomimetic model for scenario recognition using echo Doppler information;Bioinspiration & Biomimetics;2024-02-21

2. Efficient deep-narrow residual networks using dilated pooling for scene recognition;Expert Systems with Applications;2023-12

3. Unlocking the black box of CNNs: Visualising the decision-making process with PRISM;Information Sciences;2023-09

4. TEVL: Trilinear Encoder for Video-language Representation Learning;ACM Transactions on Multimedia Computing, Communications, and Applications;2023-06-07

5. Attention-Augmented Memory Network for Image Multi-Label Classification;ACM Transactions on Multimedia Computing, Communications, and Applications;2023-02-25