Affiliation:
1. Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, United States
Abstract
In this paper, we address the related tasks of medication extraction, event classification, and context classification from clinical text. The data for the tasks were obtained from the National Natural Language Processing (NLP) Clinical Challenges (n2c2) Track 1. We developed a named entity recognition (NER) model based on BioClinicalBERT and applied a dictionary-based fuzzy matching mechanism to identify the medication mentions in clinical notes. We developed a unified model architecture for event classification and context classification. The model used two pre-trained models—BioClinicalBERT and RoBERTa to predict the class, separately. Additionally, we applied an ensemble mechanism to combine the predictions of BioClinicalBERT and RoBERTa. For event classification, our best model achieved 0.926 micro-averaged F1-score, 5% higher than the baseline model. The shared task released the data in different stages during the evaluation phase. Our system consistently ranked among the top 10 for Releases 1 and 2.