Graph Attention Transformer Network for Multi-label Image Classification-Reference-Cited by-同舟云学术

Graph Attention Transformer Network for Multi-label Image Classification

Published:2023-02-27 Issue:4 Volume:19 Page:1-16
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Yuan Jin¹^ORCID,Chen Shikai¹^ORCID,Zhang Yao²^ORCID,Shi Zhongchao³^ORCID,Geng Xin¹^ORCID,Fan Jianping³^ORCID,Rui Yong³^ORCID

Affiliation:

1. Southeast University, China

2. University of Chinese Academy of Sciences, China

3. Lenovo Research, Beijing, China

Abstract

Multi-label classification aims to recognize multiple objects or attributes from images. The key to solving this issue relies on effectively characterizing the inter-label correlations or dependencies, which bring the prevailing graph neural network. However, current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model’s generalization ability. This article proposes a Graph Attention Transformer Network, a general framework for multi-label image classification by mining rich and effective label correlation. First, we use the cosine similarity value of the pre-trained label word embedding as the initial correlation matrix, which can represent richer semantic information than the co-occurrence one. Subsequently, we propose the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve highly competitive performance on three datasets.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3578518

Reference61 articles.

1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).

2. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of the International Conference on Learning Representations (ICLR’14).

3. Mark Chen, Alec Radford, Rewon Child, Jeffrey K. Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Vol. 1. 1691–1703.

4. Shikai Chen, Jianfeng Wang, Yuedong Chen, Zhongchao Shi, Xin Geng, and Yong Rui. 2020. Label distribution learning on auxiliary label space graphs for facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 13984–13993.

5. Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, and Yu-Chiang Frank Wang. 2017. Order-free RNN with visual attention for multi-label classification. In Proceedings of the AAAI Annual Conference on Artificial Intelligence (AAAI’17). 6714–6721.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. 基于多尺度融合和自适应标签相关性的多标签图像分类模型;Journal of Shanghai Jiaotong University (Science);2024-01-02

2. Multi-label image classification model based on multi-scale semantic attention and graph attention network;Third International Conference on Signal Image Processing and Communication (ICSIPC 2023);2023-10-20

3. Cross-modality semantic guidance for multi-label image classification;Intelligent Data Analysis;2023-09-14