XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular Data

Author:

Yan Liuxi1,Xu Yaoqun2ORCID

Affiliation:

1. School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China

2. Institute of System Engineering, Harbin University of Commerce, Harbin 150028, China

Abstract

Graph neural networks (GNNs) perform well in text analysis tasks. Their unique structure allows them to capture complex patterns and dependencies in text, making them ideal for processing natural language tasks. At the same time, XGBoost (version 1.6.2.) outperforms other machine learning methods on heterogeneous tabular data. However, traditional graph neural networks mainly study isomorphic and sparse data features. Therefore, when dealing with tabular data, traditional graph neural networks encounter challenges such as data structure mismatch, feature selection, and processing difficulties. To solve these problems, we propose a novel architecture, XGNN, which combines the advantages of XGBoost and GNNs to deal with heterogeneous features and graph structures. In this paper, we use GAT for our graph neural network model. We can train XGBoost and GNN end-to-end to fit and adjust the new tree in XGBoost based on the gradient information from the GNN. Extensive experiments on node prediction and node classification tasks demonstrate that the performance of our proposed new model is significantly improved for both prediction and classification tasks and performs particularly well on heterogeneous tabular data.

Funder

The Nature Science Foundation of Heilongjiang Province provided funding

Publisher

MDPI AG

Reference32 articles.

1. Ulmer, D., Meijerink, L., and Cinà, G. (2020, January 7–8). Trust issues: Uncertainty estimation does not enable reliable ood detection on medical tabular data. Proceedings of the Machine Learning for Health, Durham, NC, USA.

2. Clements, J.M., Xu, D., Yousefi, N., and Efimov, D. (2020). Sequential deep learning for credit risk monitoring with tabular financial data. arXiv.

3. McElfresh, D., Khandagale, S., Valverde, J., Prasad, C.V., Ramakrishnan, G., Goldblum, M., and White, C. (2023, January 10–16). When do neural nets outperform boosted trees on tabular data?. Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS’23), New Orleans, LA, USA.

4. Xie, Y., Wang, Z., Li, Y., Ding, B., Gürel, N.M., Zhang, C., Huang, M., Lin, W., and Zhou, J. (2021, January 14–18). Fives: Feature interaction via edge search for large-scale tabular data. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore.

5. A comparative analysis of gradient boosting algorithms;Artif. Intell. Rev.,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3