Construction of an annotated corpus to support biomedical information extraction-Reference-Cited by-同舟云学术

Construction of an annotated corpus to support biomedical information extraction

Published:2009-10-23 Issue:1 Volume:10 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Thompson Paul,Iqbal Syed A,McNaught John,Ananiadou Sophia

Abstract

Abstract Background Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-10-349.pdf

Reference66 articles.

1. Verspoor K, Cohen KB, Goertzel B, Mani I: Introduction to BioNLP'06. Linking natural language processing and biology: Towards deeper biological literature analysis. Proceedings of the HLT-NAACL Workshop on Linking Natural Language and Biology. New York City, USA 2006, iii-iv.

2. Ananiadou S, McNaught J, (eds): Text Mining for Biology and Biomedicine. Boston/London: Artech House; 2006.

3. Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Brief Bioinform 2005, 6(1):57–71. 10.1093/bib/6.1.57

4. Cohen KB, Hunter L: Getting started in text mining. PLoS Comput Biol 2008, 4(1):e20. 10.1371/journal.pcbi.0040020

5. Ananiadou S, Kell DB, Tsujii J: Text mining and its potential applications in systems biology. Trends Biotechnol 2006, 24(12):571–579. 10.1016/j.tibtech.2006.10.002

Cited by 66 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Portuguese Framework Semantic Role Labeling Based On Multiple Attention Mechanisms And Bi-LSTM;J APPL SCI ENG;2025

2. Pipelined biomedical event extraction rivaling joint learning;Methods;2024-06

3. Text Analysis for Information Retrieval Using NLP;Lecture Notes in Electrical Engineering;2024

4. A novel corpus of molecular to higher-order events that facilitates the understanding of the pathogenic mechanisms of idiopathic pulmonary fibrosis;Scientific Reports;2023-04-12

5. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022;npj Digital Medicine;2022-12-21