XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular Data-Reference-Cited by-同舟云学术

XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular Data

Published:2024-07-03 Issue:13 Volume:14 Page:5826
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Yan Liuxi¹,Xu Yaoqun²^ORCID

Affiliation:

1. School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China

2. Institute of System Engineering, Harbin University of Commerce, Harbin 150028, China

Abstract

Graph neural networks (GNNs) perform well in text analysis tasks. Their unique structure allows them to capture complex patterns and dependencies in text, making them ideal for processing natural language tasks. At the same time, XGBoost (version 1.6.2.) outperforms other machine learning methods on heterogeneous tabular data. However, traditional graph neural networks mainly study isomorphic and sparse data features. Therefore, when dealing with tabular data, traditional graph neural networks encounter challenges such as data structure mismatch, feature selection, and processing difficulties. To solve these problems, we propose a novel architecture, XGNN, which combines the advantages of XGBoost and GNNs to deal with heterogeneous features and graph structures. In this paper, we use GAT for our graph neural network model. We can train XGBoost and GNN end-to-end to fit and adjust the new tree in XGBoost based on the gradient information from the GNN. Extensive experiments on node prediction and node classification tasks demonstrate that the performance of our proposed new model is significantly improved for both prediction and classification tasks and performs particularly well on heterogeneous tabular data.

Funder

The Nature Science Foundation of Heilongjiang Province provided funding

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/13/5826/pdf

Reference32 articles.

1. Ulmer, D., Meijerink, L., and Cinà, G. (2020, January 7–8). Trust issues: Uncertainty estimation does not enable reliable ood detection on medical tabular data. Proceedings of the Machine Learning for Health, Durham, NC, USA.

2. Clements, J.M., Xu, D., Yousefi, N., and Efimov, D. (2020). Sequential deep learning for credit risk monitoring with tabular financial data. arXiv.

3. McElfresh, D., Khandagale, S., Valverde, J., Prasad, C.V., Ramakrishnan, G., Goldblum, M., and White, C. (2023, January 10–16). When do neural nets outperform boosted trees on tabular data?. Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS’23), New Orleans, LA, USA.

4. Xie, Y., Wang, Z., Li, Y., Ding, B., Gürel, N.M., Zhang, C., Huang, M., Lin, W., and Zhou, J. (2021, January 14–18). Fives: Feature interaction via edge search for large-scale tabular data. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore.

5. A comparative analysis of gradient boosting algorithms;Artif. Intell. Rev.,2021