Unsupervised Heterogeneous Graph Neural Networks for One-Class Tasks: Exploring Early Fusion Operators-Reference-Cited by-同舟云学术

Unsupervised Heterogeneous Graph Neural Networks for One-Class Tasks: Exploring Early Fusion Operators

Published:2024-05-29 Issue:1 Volume:15 Page:517-529
ISSN:2763-7719
Container-title:Journal on Interactive Systems
language:
Short-container-title:JIS

Author:

Gôlo Marcos Paulo Silva^ORCID,De Moraes Junior Marcelo Isaias^ORCID,Goularte Rudinei^ORCID,Marcacini Ricardo Marcondes^ORCID

Abstract

Heterogeneous graphs are an essential structure that models real-world data through different types of nodes and relationships between them, including multimodality, which comprises different types of data such as text, image, and audio. Graph Neural Networks (GNNs) are a prominent graph representation learning method that takes advantage of the graph structure and its attributes that, when applied to the multimodal heterogeneous graph, learn a unique semantic space for the different modalities. Consequently, it allows multimodal fusion through simple operators such as sum, average, or multiplication, generating unified representations considering the supplementary and complementarity relationships between the modalities. In multimodal heterogeneous graphs, the labeling process tends to be even more costly due to the multiple modalities analyzed, in addition to the imbalance of classes inherent to some applications. In order to overcome these problems in applications that comprise a class of interest, One-Class Learning (OCL) is used. Given the lack of studies on multimodal early fusion in heterogeneous graphs for OCL tasks, we proposed a method based on unsupervised GNN for heterogeneous graphs and evaluated different early fusion operators. In this paper, we extend another work by evaluating the behavior of the main GNN convolutions in the method. We highlight that using operators such as average, addition, and subtraction were the best early fusion operators. In addition, GNN layers that do not use an attention mechanism performed better. In this way, we argue for heterogeneous graph neural networks in multimodal using early fusion simple operators instead of well-often-used concatenation and less complex convolutions.

Publisher

Sociedade Brasileira de Computacao - SB

Reference49 articles.

1. Alam, S., Sonbhadra, S. K., Agarwal, S., and Nagabhushan, P. (2020). One-class support vector classifiers: A survey. Knowledge-Based Systems, 196:105754. DOI: https://doi.org/10.1016/j.knosys.2020.105754.

2. Atrey, P. K., Hossain, M. A., El Saddik, A., and Kankanhalli, M. S. (2010). Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 16:345–379. DOI: https://doi.org/10.1007/s00530-010-0182-0.

3. Baltrušaitis, T., Ahuja, C., and Morency, L.-P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2):423–443. DOI: https://doi.org/10.1109/TPAMI.2018.2798607.

4. Beserra, A. A. and Goularte, R. (2023). Multimodal early fusion operators for temporal video scene segmentation tasks. Multimedia Tools and Applications, 82:1–18. DOI: https://doi.org/10.1007/s11042-023-14953-6.

5. Beserra, A. A., Kishi, R. M., and Goularte, R. (2020). Evaluating early fusion operators at mid-level feature space. In Proceedings of the Brazilian Symposium on Multimedia and the Web, pages 113–120, online. ACM. DOI: https://doi.org/10.1145/3428658.3431079.