Path-BigBird: An AI-Driven Transformer Approach to Classification of Cancer Pathology Reports

Author:

Chandrashekar Mayanka1ORCID,Lyngaas Isaac2ORCID,Hanson Heidi A.1,Gao Shang1ORCID,Wu Xiao-Cheng34ORCID,Gounley John1

Affiliation:

1. Advanced Computing for Health Sciences Section, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN

2. Advanced Computing for Life Science & Engineering, National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN

3. Louisiana Tumor Registry, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA

4. Department of Epidemiology, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA

Abstract

PURPOSE Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from pathology reports in near real time, we explore the impact of using domain-specific transformer models that understand cancer pathology reports. METHODS We built a pathology transformer model, Path-BigBird, by using 2.7 million pathology reports from six SEER cancer registries. We then compare different variations of Path-BigBird with two less computationally intensive methods: Hierarchical Self-Attention Network (HiSAN) classification model and an off-the-shelf clinical transformer model (Clinical BigBird). We use five pathology information extraction tasks for evaluation: site, subsite, laterality, histology, and behavior. Model performance is evaluated by using macro and micro F1 scores. RESULTS We found that Path-BigBird and Clinical BigBird outperformed the HiSAN in all tasks. Clinical BigBird performed better on the site and laterality tasks. Versions of the Path-BigBird model performed best on the two most difficult tasks: subsite (micro F1 score of 72.53, macro F1 score of 35.76) and histology (micro F1 score of 80.96, macro F1 score of 37.94). The largest performance gains over the HiSAN model were for histology, for which a Path-BigBird model increased the micro F1 score by 1.44 points and the macro F1 score by 3.55 points. Overall, the results suggest that a Path-BigBird model with a vocabulary derived from well-curated and deidentified data is the best-performing model. CONCLUSION The Path-BigBird pathology transformer model improves automated information extraction from pathology reports. Although Path-BigBird outperforms Clinical BigBird and HiSAN, these less computationally expensive models still have utility when resources are constrained.

Publisher

American Society of Clinical Oncology (ASCO)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3