Author:
Li Xiangxian,Zheng Yuze,Ma Haokai,Qi Zhuang,Meng Xiangxu,Meng Lei
Abstract
AbstractThe prevalence of long-tailed distributions in real-world data often results in classification models favoring the dominant classes, neglecting the less frequent ones. Current approaches address the issues in long-tailed image classification by rebalancing data, optimizing weights, and augmenting information. However, these methods often struggle to balance the performance between dominant and minority classes because of inadequate representation learning of the latter. To address these problems, we introduce descriptional words into images as cross-modal privileged information and propose a cross-modal enhanced method for long-tailed image classification, referred to as CMLTNet. CMLTNet improves the learning of intraclass similarity of tail-class representations by cross-modal alignment and captures the difference between the head and tail classes in semantic space by cross-modal inference. After fusing the above information, CMLTNet achieved an overall performance that was better than those of benchmark long-tailed and cross-modal learning methods on the long-tailed cross-modal datasets, NUS-WIDE and VireoFood-172. The effectiveness of the proposed modules was further studied through ablation experiments. In a case study of feature distribution, the proposed model was better in learning representations of tail classes, and in the experiments on model attention, CMLTNet has the potential to help learn some rare concepts in the tail class through mapping to the semantic space.
Publisher
Springer Science and Business Media LLC
Reference50 articles.
1. Zhou, B.; Cui, Q.; Wei, X. S.; Chen, Z. M. BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9716–9725, 2020.
2. Kang, B.; Xie, S.; Rohrbach, M.; Yan, Z.; Gordo, A.; Feng, J.; Kalantidis, Y. Decoupling representation and classifier for long-tailed recognition. In: Proceedings of the International Conference on Learning Representations, 2019.
3. Cui, Y.; Jia, M.; Lin, T. Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277, 2019.
4. Cao, K.; Wei, C.; Gaidon, A.; Arechiga, N.; Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the Advances in Neural Information Processing Systems, 1567–1578, 2019.
5. Cui, J.; Zhong, Z.; Liu, S.; Yu, B.; Jia, J. Parametric contrastive learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 715–724, 2021.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献