RCorp: a resource for chemical disease semantic extraction in Chinese-Reference-Cited by-同舟云学术

RCorp: a resource for chemical disease semantic extraction in Chinese

Published:2019-12 Issue:S5 Volume:19 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Sun Yueping,Hou Li,Qin Lu,Liu Yan,Li Jiao,Qian Qing

Abstract

Abstract Background To robustly identify synergistic combinations of drugs, high-throughput screenings are desirable. It will be of great help to automatically identify the relations in the published papers with machine learning based tools. To support the chemical disease semantic relation extraction especially for chronic diseases, a chronic disease specific corpus for combination therapy discovery in Chinese (RCorp) is manually annotated. Methods In this study, we extracted abstracts from a Chinese medical literature server and followed the annotation framework of the BioCreative CDR corpus, with the guidelines modified to make the combination therapy related relations available. An annotation tool was incorporated to the standard annotation process. Results The resulting RCorp consists of 339 Chinese biomedical articles with 2367 annotated chemicals, 2113 diseases, 237 symptoms, 164 chemical-induce-disease relations, 163 chemical-induce-symptom relations, and 805 chemical-treat-disease relations. Each annotation includes both the mention text spans and normalized concept identifiers. The corpus gets an inter-annotator agreement score of 0.883 for chemical entities, 0.791 for disease entities which are measured by F score. And the F score for chemical-treat-disease relations gets 0.788 after unifying the entity mentions. Conclusions We extracted and manually annotated a chronic disease specific corpus for combination therapy discovery in Chinese. The result analysis of the corpus proves its quality for the combination therapy related knowledge discovery task. Our annotated corpus would be a useful resource for the modelling of entity recognition and relation extraction tools. In the future, an evaluation based on the corpus will be held.

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

Link

http://link.springer.com/content/pdf/10.1186/s12911-019-0936-3.pdf

Reference25 articles.

1. Neves M. An analysis on the entity annotations in biological corpora. F1000Res. 2014;3:96.

2. Karjalainen E, Repasky GA. Chapter nine - molecular changes during acute myeloid leukemia (AML) evolution and identification of novel treatment strategies through molecular stratification. Prog Mol Biol Transl Sci. 2016;144:383–436.

3. Patel L, Grossberg GT. Combination therapy for Alzheimer's disease. Drugs Aging. 2011;28(7):539–46.

4. Orloff D G: Fixed combination drugs for cardiovascular disease risk reduction: regulatory approach. Am J Cardiol. 2005; 96(9), Sup. 1: 28–33.

5. Bailey T. Options for Combination Therapy in Type 2 Diabetes: Comparison of the ADA/EASD Position Statement and AACE/ACE Algorithm. Am J Med. 2013;129(9 Suppl 1):S10–20.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring relationship between emotion and probiotics with knowledge graphs;Health Information Science and Systems;2022-09-10

2. Editorial: The second international workshop on health natural language processing (HealthNLP 2019);BMC Medical Informatics and Decision Making;2019-12