Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification

Author:

Kato Naoki1ORCID,Nota Yoshiki2,Aoki Yoshimitsu1ORCID

Affiliation:

1. Department of Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan

2. Meidensha Corporation, Tokyo 141-6029, Japan

Abstract

Large vision-language models, such as Contrastive Vision-Language Pre-training (CLIP), pre-trained on large-scale image–text datasets, have demonstrated robust zero-shot transfer capabilities across various downstream tasks. To further enhance the few-shot recognition performance of CLIP, Tip-Adapter augments the CLIP model with an adapter that incorporates a key-value cache model constructed from the few-shot training set. This approach enables training-free adaptation and has shown significant improvements in few-shot recognition, especially with additional fine-tuning. However, the size of the adapter increases in proportion to the number of training samples, making it difficult to deploy in practical applications. In this paper, we propose a novel CLIP adaptation method, named Proto-Adapter, which employs a single-layer adapter of constant size regardless of the amount of training data and even outperforms Tip-Adapter. Proto-Adapter constructs the adapter’s weights based on prototype representations for each class. By aggregating the features of the training samples, it successfully reduces the size of the adapter without compromising performance. Moreover, the performance of the model can be further enhanced by fine-tuning the adapter’s weights using a distance margin penalty, which imposes additional inter-class discrepancy to the output logits. We posit that this training scheme allows us to obtain a model with a discriminative decision boundary even when trained with a limited amount of data. We demonstrate the effectiveness of the proposed method through extensive experiments of few-shot classification on diverse datasets.

Publisher

MDPI AG

Reference31 articles.

1. DeVries, T., and Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv.

2. Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv.

3. Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014, January 8–13). How transferable are features in deep neural networks?. Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada.

4. Chen, W.Y., Liu, Y.C., Kira, Z., Wang, Y.C.F., and Huang, J.B. (2019). A closer look at few-shot classification. arXiv.

5. Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., and Isola, P. (2020, January 23–28). Rethinking few-shot image classification: A good embedding is all you need?. Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK. Proceedings, Part XIV 16.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3