Clinical concept extraction using transformers

Author:

Yang Xi12,Bian Jiang12,Hogan William R1ORCID,Wu Yonghui12

Affiliation:

1. Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA

2. Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, Florida, USA

Abstract

Abstract Objective The goal of this study is to explore transformer-based models (eg, Bidirectional Encoder Representations from Transformers [BERT]) for clinical concept extraction and develop an open-source package with pretrained clinical models to facilitate concept extraction and other downstream natural language processing (NLP) tasks in the medical domain. Methods We systematically explored 4 widely used transformer-based architectures, including BERT, RoBERTa, ALBERT, and ELECTRA, for extracting various types of clinical concepts using 3 public datasets from the 2010 and 2012 i2b2 challenges and the 2018 n2c2 challenge. We examined general transformer models pretrained using general English corpora as well as clinical transformer models pretrained using a clinical corpus and compared them with a long short-term memory conditional random fields (LSTM-CRFs) mode as a baseline. Furthermore, we integrated the 4 clinical transformer-based models into an open-source package. Results and Conclusion The RoBERTa-MIMIC model achieved state-of-the-art performance on 3 public clinical concept extraction datasets with F1-scores of 0.8994, 0.8053, and 0.8907, respectively. Compared to the baseline LSTM-CRFs model, RoBERTa-MIMIC remarkably improved the F1-score by approximately 4% and 6% on the 2010 and 2012 i2b2 datasets. This study demonstrated the efficiency of transformer-based models for clinical concept extraction. Our methods and systems can be applied to other clinical tasks. The clinical transformer package with 4 pretrained clinical models is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER. We believe this package will improve current practice on clinical concept extraction and other tasks in the medical domain.

Funder

Patient-Centered Outcomes Research Institute Award

National Cancer Institute

National Institute on Aging

University of Florida Informatics Institute Junior SEED Program

Cancer Informatics and eHealth

University of Florida Health Cancer Center

University of Florida Clinical and Translational Science Institute

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Reference72 articles.

1. Clinical information extraction applications: a literature review;Wang;J Biomed Inform,2018

2. Systematic analysis of free-text family history in electronic health record;Wang;AMIA Jt Summ Transl Sci Proc,2017

3. Natural language processing for EHR-based pharmacovigilance: a structured review;Luo;Drug Saf,2017

Cited by 90 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A GPT-based EHR modeling system for unsupervised novel disease detection;Journal of Biomedical Informatics;2024-09

2. Advancing Chinese biomedical text mining with community challenges;Journal of Biomedical Informatics;2024-09

3. Natural language processing with transformers: a review;PeerJ Computer Science;2024-08-07

4. Transformers and large language models in healthcare: A review;Artificial Intelligence in Medicine;2024-08

5. Transformer models in biomedicine;BMC Medical Informatics and Decision Making;2024-07-29

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3