Combining data augmentation and domain information with TENER model for Clinical Event Detection-Reference-Cited by-同舟云学术

Combining data augmentation and domain information with TENER model for Clinical Event Detection

Published:2021-11 Issue:S9 Volume:21 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Zhang Zhichang^ORCID,Liu Dan,Zhang Minyu,Qin Xiaohui

Abstract

Abstract Background In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applying the advancements in deep learning to CED task often yields unsatisfactory results. The main reasons are due to the following two points: (1) A great number of obscure professional terms in the electronic medical record leads to poor recognition performance of model. (2) The scarcity of datasets required for the task leads to poor model robustness. Therefore, it is urgent to solve these two problems to improve model performance. Methods This paper proposes a combining data augmentation and domain information with TENER Model for Clinical Event Detection. Results We use two evaluation metrics to compare the overall performance of the proposed model with the existing model on the 2012 i2b2 challenge dataset. Experimental results demonstrate that our proposed model achieves the best F1-score of 80.26%, type accuracy of 93% and Span F1-score of 90.33%, and outperforms the state-of-the-art approaches. Conclusions This paper proposes a multi-granularity information fusion encoder-decoder framework, which applies the TENER model to the CED task for the first time. It uses the pre-trained language model (BioBERT) to generate word-level features, solving the problem of a great number of obscure professional terms in the electronic medical record lead to poor recognition performance of model. In addition, this paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness.

Funder

Key Science and Technology Foundation of Gansu Province

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

Link

https://link.springer.com/content/pdf/10.1186/s12911-021-01618-3.pdf

Reference31 articles.

1. Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc. 2013;20(5):806–13. https://doi.org/10.1136/amiajnl-2013-001628.

2. Cortes C, Vapnik VN. Support-vector networks. Mach Learn. 1995;20(3):273–97.

3. Lafferty J, Mccallum A, Pereira FCN. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th international conference on machine learning (2001).

4. Roberts K, Rink B, Harabagiu SM. A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. J Am Med Inform Assoc. 2013;20(5):867–75. https://doi.org/10.1136/amiajnl-2013-001619.

5. Kovačević A, Dehghan A, Filannino M, Keane JA, a Nenadic G. Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J Am Med Inform Assoc. 2013;20(5):859–66.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Clinical utility of a deep-learning mortality prediction model for cardiac surgery decision making;The Journal of Thoracic and Cardiovascular Surgery;2023-12

2. Data-driven drug discovery for drug repurposing;Folia Pharmacologica Japonica;2023