Webly Supervised Fine-Grained Image Recognition with Graph Representation and Metric Learning
-
Published:2022-12-11
Issue:24
Volume:11
Page:4127
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Lin Jianman, Lin Jiantao, Gao YuefangORCID, Yang ZhijingORCID, Chen Tianshui
Abstract
The aim of webly supervised fine-grained image recognition (FGIR) is to distinguish sub-ordinate categories based on data retrieved from the Internet, which can significantly mitigate the dependence of deep learning on manually annotated labels. Most current fine-grained image recognition algorithms use a large-scale data-driven deep learning paradigm, which relies heavily on manually annotated labels. However, there is a large amount of weakly labeled free data on the Internet. To utilize fine-grained web data effectively, this paper proposes a Graph Representation and Metric Learning (GRML) framework to learn discriminative and effective holistic–local features by graph representation for web fine-grained images and to handle noisy labels simultaneously, thus effectively using webly supervised data for training. Specifically, we first design an attention-focused module to locate the most discriminative region with different spatial aspects and sizes. Next, a structured instance graph is constructed to correlate holistic and local features to model the holistic–local information interaction, while a graph prototype that contains both holistic and local information for each category is introduced to learn category-level graph representation to assist in processing the noisy labels. Finally, a graph matching module is further employed to explore the holistic–local information interaction through intra-graph node information propagation as well as to evaluate the similarity score between each instance graph and its corresponding category-level graph prototype through inter-graph node information propagation. Extensive experiments were conducted on three webly supervised FGIR benchmark datasets, Web-Bird, Web-Aircraft and Web-Car, with classification accuracy of 76.62%, 85.79% and 82.99%, respectively. In comparison with Peer-learning, the classification accuracies of the three datasets separately improved 2.47%, 4.72% and 1.59%.
Funder
National Natural Science Foundation of China Natural Science Foundation of China Foundation of Fire Science and Technology Project of Guangdong Province Guangdong Provincial Key Laboratory of Human Digital Twin
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference20 articles.
1. Sun, Z., Yao, Y., Wei, X.-S., Zhang, Y., Shen, F., Wu, J., Zhang, J., and Shen, H.T. (2021, January 11–17). Webly supervised fi-ne-grained recognition: Benchmark datasets and an approach. Proceedings of the IEEE/CVF Interna-tional Conference on Computer Vision, Montreal, BC, Canada. 2. Li, J., Xiong, C., and Hoi, S.C. (2020). Mopro: Webly supervised learning with momentum prototypes. arXiv. 3. Liu, J., Kanazawa, A., Jacobs, D., and Belhumeur, P. (2012, January 7–13). Dog breed classification using part localization. Proceedings of the European Conference on Computer Vision, Florence, Italy. 4. Liu, X., Xia, T., Wang, J., and Lin, Y. (2016). Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. arXiv. 5. Yao, Y., Hua, X., Gao, G., Sun, Z., Li, Z., and Zhang, J. (2020, January 12–16). Bridging the web data and fine-grained visual recognition via alleviating label noise and domain mismatch. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|