A BERT-GNN Approach for Metastatic Breast Cancer Prediction Using Histopathology Reports

Author:

Basaad  Abdullah1ORCID,Basurra  Shadi1,Vakaj  Edlira1ORCID,Eldaly  Ahmed Karam2ORCID,Abdelsamea  Mohammed M.2

Affiliation:

1. School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK

2. Department of Computer Science, University of Exeter, North Park Road, Exeter EX4 4QF, UK

Abstract

Metastatic breast cancer (MBC) continues to be a leading cause of cancer-related deaths among women. This work introduces an innovative non-invasive breast cancer classification model designed to improve the identification of cancer metastases. While this study marks the initial exploration into predicting MBC, additional investigations are essential to validate the occurrence of MBC. Our approach combines the strengths of large language models (LLMs), specifically the bidirectional encoder representations from transformers (BERT) model, with the powerful capabilities of graph neural networks (GNNs) to predict MBC patients based on their histopathology reports. This paper introduces a BERT-GNN approach for metastatic breast cancer prediction (BG-MBC) that integrates graph information derived from the BERT model. In this model, nodes are constructed from patient medical records, while BERT embeddings are employed to vectorise representations of the words in histopathology reports, thereby capturing semantic information crucial for classification by employing three distinct approaches (namely univariate selection, extra trees classifier for feature importance, and Shapley values to identify the features that have the most significant impact). Identifying the most crucial 30 features out of 676 generated as embeddings during model training, our model further enhances its predictive capabilities. The BG-MBC model achieves outstanding accuracy, with a detection rate of 0.98 and an area under curve (AUC) of 0.98, in identifying MBC patients. This remarkable performance is credited to the model’s utilisation of attention scores generated by the LLM from histopathology reports, effectively capturing pertinent features for classification.

Funder

Antser Ltd., UK

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3