Affiliation:
1. Pontificia Universidad Javeriana, Colombia
2. University Grenoble Alpes, France
Abstract
Extracting valuable knowledge from Electronic Health Records (EHR) represents a challenging task due to the presence of both structured and unstructured data, including codified fields, images and test results. Narrative text in particular contains a variety of notes which are diverse in language and detail, as well as being full of ad hoc terminology, including acronyms and jargon, which is especially challenging in non-English EHR, where there is a dearth of annotated corpora or trained case sets. This paper proposes an approach for NER and concept attribute labeling for EHR that takes into consideration the contextual words around the entity of interest to determine its sense. The approach proposes a composition method of three different NER methods, together with the analysis of the context (neighboring words) using an ensemble classification model. This contributes to disambiguate NER, as well as labeling the concept as confirmed, negated, speculative, pending or antecedent. Results show an improvement of the recall and a limited impact on precision for the NER process.
Reference21 articles.
1. TEXT2TABLE
2. Bingel, J., & Haider, T. (2014). Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers. In Proceedings of the Ninth International Conference on Language Resources and Evaluation LREC '14 (pp. 2578-2583).
3. Learning to extract adverse drug reaction events from electronic health records in Spanish
4. Creating an Online Dictionary of Abbreviations from MEDLINE
5. Dong, X., Qian, L., Guan, Y., Huang, L., Yu, Q., & Yang, J. (2016, August). A multiclass classification method based on deep learning for named entity recognition in electronic medical records. In Proceedings of the Scientific Data Summit (NYSDS), New York (pp. 1-10). IEEE.