Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator-Reference-Cited by-同舟云学术

Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator

Published:2023-06-13 Issue:12 Volume:13 Page:7071
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Hur Chan¹,Park Hyeyoung¹

Affiliation:

1. School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Republic of Korea

Abstract

Although image recognition technologies are developing rapidly with deep learning, conventional recognition models trained by supervised learning with class labels do not work well when test inputs from untrained classes are given. For example, a recognizer trained to classify Asian bird species cannot recognize the species of kiwi, because the class label “kiwi” and its image samples have not been seen during training. To overcome this limitation, zero-shot classification has been studied recently, and the joint-embedding-based approach has been suggested as one of the promised solutions. In this approach, image features and text descriptions belonging to the same class are trained to be closely located in a common joint-embedding space. Once we obtain the embedding function that can represent the semantic relationship of image–text pairs in training data, test images and text descriptions (prototypes) of unseen classes can also be mapped to the joint-embedding space for classification. The main challenge with this approach is mapping inputs of two different modalities into a common space, and previous works suffer from the inconsistency between the distribution of two feature sets on joint-embedding space extracted from the heterogeneous inputs. To treat this problem, we propose a novel method of employing additional textual information to rectify the visual representation of input images. Since the conceptual information of test classes is generally given as texts, we expect that the additional descriptions from a caption generator can adjust the visual feature for better matching with the representation of the test classes. We also propose to use the generated textual descriptions to augment training samples for learning joint-embedding space. In the experiments on two benchmark datasets, the proposed method shows significant performance improvements of 1.4% on the CUB dataset and 5.5% on the flower dataset, in comparison to existing models.

Funder

Human Resources Program in Energy Technology of the Korea Institute of Energy Technology Evaluation and Plannin

Institute of Information & communications Technology Planning & Evaluation

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/12/7071/pdf

Reference43 articles.

1. Attribute-based classification for zero-shot visual object categorization;Lampert;IEEE Trans. Pattern Anal. Mach. Intell.,2013

2. Larochelle, H., Erhan, D., and Bengio, Y. (2008, January 13–17). Zero-data learning of new tasks. Proceedings of the 23rd National Conference on Artificial Intelligence, Chicago, IL, USA.

3. Rohrbach, M., Stark, M., and Schiele, B. (2011, January 20–25). Evaluating knowledge transfer and zero-shot learning in a large-scale setting. Proceedings of the CVPR 2011, Colorado Springs, CO, USA.

4. Yu, X., and Aloimonos, Y. (2010, January 5–11). Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example. Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece.

5. Xu, X., Shen, F., Yang, Y., Zhang, D., Shen, H.T., and Song, J. (2017, January 21–26). Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.