Affiliation:
1. Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
2. Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, Florida, USA
Abstract
Abstract
Objective
The goal of this study is to explore transformer-based models (eg, Bidirectional Encoder Representations from Transformers [BERT]) for clinical concept extraction and develop an open-source package with pretrained clinical models to facilitate concept extraction and other downstream natural language processing (NLP) tasks in the medical domain.
Methods
We systematically explored 4 widely used transformer-based architectures, including BERT, RoBERTa, ALBERT, and ELECTRA, for extracting various types of clinical concepts using 3 public datasets from the 2010 and 2012 i2b2 challenges and the 2018 n2c2 challenge. We examined general transformer models pretrained using general English corpora as well as clinical transformer models pretrained using a clinical corpus and compared them with a long short-term memory conditional random fields (LSTM-CRFs) mode as a baseline. Furthermore, we integrated the 4 clinical transformer-based models into an open-source package.
Results and Conclusion
The RoBERTa-MIMIC model achieved state-of-the-art performance on 3 public clinical concept extraction datasets with F1-scores of 0.8994, 0.8053, and 0.8907, respectively. Compared to the baseline LSTM-CRFs model, RoBERTa-MIMIC remarkably improved the F1-score by approximately 4% and 6% on the 2010 and 2012 i2b2 datasets. This study demonstrated the efficiency of transformer-based models for clinical concept extraction. Our methods and systems can be applied to other clinical tasks. The clinical transformer package with 4 pretrained clinical models is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER. We believe this package will improve current practice on clinical concept extraction and other tasks in the medical domain.
Funder
Patient-Centered Outcomes Research Institute Award
National Cancer Institute
National Institute on Aging
University of Florida Informatics Institute Junior SEED Program
Cancer Informatics and eHealth
University of Florida Health Cancer Center
University of Florida Clinical and Translational Science Institute
Publisher
Oxford University Press (OUP)
Reference72 articles.
1. Clinical information extraction applications: a literature review;Wang;J Biomed Inform,2018
2. Systematic analysis of free-text family history in electronic health record;Wang;AMIA Jt Summ Transl Sci Proc,2017
3. Natural language processing for EHR-based pharmacovigilance: a structured review;Luo;Drug Saf,2017
Cited by
90 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献