Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features-Reference-Cited by-同舟云学术

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features

Published:2013-04 Issue:S1 Volume:13 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Tang Buzhou,Cao Hongxin,Wu Yonghui,Jiang Min,Xu Hua

Abstract

Abstract Background Named entity recognition (NER) is an important task in clinical natural language processing (NLP) research. Machine learning (ML) based NER methods have shown good performance in recognizing entities in clinical text. Algorithms and features are two important factors that largely affect the performance of ML-based NER systems. Conditional Random Fields (CRFs), a sequential labelling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to clinical NER tasks. For features, syntactic and semantic information of context words has often been used in clinical NER systems. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, and word representation features, which contain word-level back-off information over large unlabelled corpus by unsupervised algorithms, have not been extensively investigated for clinical text processing. Therefore, the primary goal of this study is to evaluate the use of SSVMs and word representation features in clinical NER tasks. Methods In this study, we developed SSVMs-based NER systems to recognize clinical entities in hospital discharge summaries, using the data set from the concept extration task in the 2010 i2b2 NLP challenge. We compared the performance of CRFs and SSVMs-based NER classifiers with the same feature sets. Furthermore, we extracted two different types of word representation features (clustering-based representation features and distributional representation features) and integrated them with the SSVMs-based clinical NER system. We then reported the performance of SSVM-based NER systems with different types of word representation features. Results and discussion Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER systems achieved better performance than the CRFs-based systems for clinical entity recognition, when same features were used. Both types of word representation features (clustering-based and distributional representations) improved the performance of ML-based NER systems. By combining two different types of word representation features together with SSVMs, our system achieved a highest F-measure of 85.82%, which outperformed the best system reported in the challenge by 0.6%. Our results show that SSVMs is a great potential algorithm for clinical NLP research, and both types of unsupervised word representation features are beneficial to clinical NER tasks.

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

Link

https://link.springer.com/content/pdf/10.1186/1472-6947-13-S1-S1.pdf

Reference36 articles.

1. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB: A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994, 1: 161-174. 10.1136/jamia.1994.95236146.

2. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008, 128-144.

3. Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, Huff SM: Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care. 1995, 284-288.

4. Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K: A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp. 1997, 814-818.

5. Aronson AR, Lang FM: An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010, 17: 229-236.

Cited by 77 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Handwritten text recognition and information extraction from ancient manuscripts using deep convolutional and recurrent neural network;Soft Computing;2024-09-13

2. An improved data augmentation approach and its application in medical named entity recognition;BMC Medical Informatics and Decision Making;2024-08-05

3. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction;Journal of the American Medical Informatics Association;2024-05-14

4. Research on medical text named entity recognition based on Two-stage approach;Proceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy;2024-03

5. A news-based climate policy uncertainty index for China;Scientific Data;2023-12-08