Abstract
Abstract
Background
In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future.
Methods
We developed a four-step approach that is suitable for the construction of TCM medical records in our corpus. First, we determined the entity types included in this study through sample annotation. Then, we drafted a fine-grained annotation guideline by summarizing the characteristics of the dataset and referring to some existing guidelines. We iteratively updated the guidelines until the inter-annotator agreement (IAA) exceeded a Cohen’s kappa value of 0.9. Comprehensive annotations were performed while keeping the IAA value above 0.9.
Results
We annotated the 10,197 clinical records in five rounds. Four entity categories involving 13 entity types were employed. The final fine-grained annotated entity corpus consists of 1104 entities and 67,799 tokens. The final IAAs are 0.936 on average (for three annotators), indicating that the fine-grained entity recognition corpus is of high quality.
Conclusions
These results will provide a foundation for future research on corpus construction and named entity recognition tasks in the TCM clinical domain.
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Health Policy,Computer Science Applications
Reference55 articles.
1. Qiu J. Traditional medicine: a culture in the balance. Nature. 2007;448:126.
2. Ministry of Health. Basic Specification for Eelectronic Medical Records (Trial). China's Health Qual Manage. 2010;17:22–3.
3. Yao L, Chen X, Yang Z, Wang H, Wang Z. On construction of Chinese medicine ontology Concept's description architecture; 2008.
4. Nadkarni P, Ohno-Machado L, Chapman W. Natural language processing: an introduction. J Am Med Inform Assn. 2011;18:544–51.
5. Lei J, Tang B, Lu X, Gao K, Jiang M, Xu H. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inform Assoc. 2014;21:808–14.
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献