A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language-Reference-Cited by-同舟云学术

A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language

Published:2023-07-26 Issue:15 Volume:23 Page:6694
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Song Ying¹²,Li Mengxing¹²^ORCID,Wang Bo³^ORCID

Affiliation:

1. Beijing Key Laboratory of Internet Culture and Digital Dissemination, Beijing Information Science and Technology University, Beijing 100101, China

2. Beijing Advanced Innovation Center for Materials Genome Engineering, Beijing Information Science and Technology University, Beijing 100101, China

3. Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou 450002, China

Abstract

To solve the problem that the common long-tailed classification method does not use the semantic features of the original label text of the image, and the difference between the classification accuracy of most classes and minority classes are large, the long-tailed image classification method based on enhanced contrast visual language trains the head class and tail class samples separately, uses text image to pre-train the information, and uses the enhanced momentum contrastive loss function and RandAugment enhancement to improve the learning of tail class samples. On the ImageNet-LT long-tailed dataset, the enhanced contrasting visual language-based long-tailed image classification method has improved all class accuracy, tail class accuracy, middle class accuracy, and the F1 value by 3.4%, 7.6%, 3.5%, and 11.2%, respectively, compared to the BALLAD method. The difference in accuracy between the head class and tail class is reduced by 1.6% compared to the BALLAD method. The results of three comparative experiments indicate that the long-tailed image classification method based on enhanced contrastive visual language has improved the performance of tail classes and reduced the accuracy difference between the majority and minority classes.

Funder

National Natural Science Foundation of China

State Key Laboratory of Computer Architecture

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/15/6694/pdf

Reference64 articles.

1. Tas, S., Sari, O., Dalveren, Y., Pazar, S., Kara, A., and Derawi, M. (2022). Deep learning-based vehicle classification for low quality images. Sensors, 22.

2. Berwo, M.A., Khan, A., Fang, Y., Fahim, H., Javaid, S., Mahmood, J., Abideen, Z.U., and M.S., S. (2023). Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey. Sensors, 23.

3. Wang, Z., Shen, H., Xiong, W., Zhang, X., and Hou, J. (2023). Method for Diagnosing Bearing Faults in Electromechanical Equipment Based on Improved Prototypical Networks. Sensors, 23.

4. Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., and Kalantidis, Y. (2020, January 30). Decoupling representation and classifier for long-tailed recognition. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia.

5. Wang, T., Li, Y., Kang, B., Li, J., Liew, J., Tang, S., Hoi, S., and Feng, J. (2020, January 23–28). The devil is in classification: A simple framework for long-tail instance segmentation. Proceedings of the 16th European Conference, Glasgow, UK.