Overview of the BioCreative III Workshop-Reference-Cited by-同舟云学术

Overview of the BioCreative III Workshop

Published:2011-10-03 Issue:S8 Volume:12 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Arighi Cecilia N,Lu Zhiyong,Krallinger Martin,Cohen Kevin B,Wilbur W John,Valencia Alfonso,Hirschman Lynette,Wu Cathy H

Abstract

Abstract Background The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, and BioCreative II.5 in 2009. Each of these workshops involved humanly annotated test data for several basic tasks in text mining applied to the biomedical literature. Participants in the workshops were invited to compete in the tasks by constructing software systems to perform the tasks automatically and were given scores based on their performance. The results of these workshops have benefited the community in several ways. They have 1) provided evidence for the most effective methods currently available to solve specific problems; 2) revealed the current state of the art for performance on those problems; 3) and provided gold standard data and results on that data by which future advances can be gauged. This special issue contains overview papers for the three tasks of BioCreative III. Results The BioCreative III Workshop was held in September of 2010 and continued the tradition of a challenge evaluation on several tasks judged basic to effective text mining in biology, including a gene normalization (GN) task and two protein-protein interaction (PPI) tasks. In total the Workshop involved the work of twenty-three teams. Thirteen teams participated in the GN task which required the assignment of EntrezGene IDs to all named genes in full text papers without any species information being provided to a system. Ten teams participated in the PPI article classification task (ACT) requiring a system to classify and rank a PubMed® record as belonging to an article either having or not having “PPI relevant” information. Eight teams participated in the PPI interaction method task (IMT) where systems were given full text documents and were required to extract the experimental methods used to establish PPIs and a text segment supporting each such method. Gold standard data was compiled for each of these tasks and participants competed in developing systems to perform the tasks automatically. BioCreative III also introduced a new interactive task (IAT), run as a demonstration task. The goal was to develop an interactive system to facilitate a user’s annotation of the unique database identifiers for all the genes appearing in an article. This task included ranking genes by importance (based preferably on the amount of described experimental information regarding genes). There was also an optional task to assist the user in finding the most relevant articles about a given gene. For BioCreative III, a user advisory group (UAG) was assembled and played an important role 1) in producing some of the gold standard annotations for the GN task, 2) in critiquing IAT systems, and 3) in providing guidance for a future more rigorous evaluation of IAT systems. Six teams participated in the IAT demonstration task and received feedback on their systems from the UAG group. Besides innovations in the GN and PPI tasks making them more realistic and practical and the introduction of the IAT task, discussions were begun on community data standards to promote interoperability and on user requirements and evaluation metrics to address utility and usability of systems. Conclusions In this paper we give a brief history of the BioCreative Workshops and how they relate to other text mining competitions in biology. This is followed by a synopsis of the three tasks GN, PPI, and IAT in BioCreative III with figures for best participant performance on the GN and PPI tasks. These results are discussed and compared with results from previous BioCreative Workshops and we conclude that the best performing systems for GN, PPI-ACT and PPI-IMT in realistic settings are not sufficient for fully automatic use. This provides evidence for the importance of interactive systems and we present our vision of how best to construct an interactive system for a GN or PPI like task in the remainder of the paper.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-12-S8-S1.pdf

Reference25 articles.

1. Grishman R, Sundheim B: Message Understanding Conference - 6: A Brief History. 16th International Conference on Computational Linguistics Kopenhagen 1996, 466–471.

2. Krallinger M, Valencia A, Hirschman L: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol 2008, 9(Suppl 2):S8. 10.1186/gb-2008-9-s2-s8

3. Friedman C, Kra P, Rzhetsky A: Two biomedical sublanguages: a description based on the theories of Zellig Harris. J Biomed Inform 2002, 35: 222–235. 10.1016/S1532-0464(03)00012-1

4. Yeh A, Hirschman L, Morgan A: Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles. SIGKDD Explor Newsl 2002, 4: 87–89. 10.1145/772862.772873

5. Hersh W, Voorhees E: TREC genomics special issue overview. Inf Retr 2009, 12: 1–15. 10.1007/s10791-008-9076-6

Cited by 101 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DMLS: an automated pipeline to extract the Drosophila modular transcription regulators and targets from massive literature articles;Database;2024

2. NetSyn: genomic context exploration of protein families;2023-02-15

3. Ontology-Aware Biomedical Relation Extraction;2022-03-25

4. Overview of ChEMU 2022 Evaluation Campaign: Information Extraction in Chemical Patents;Lecture Notes in Computer Science;2022

5. CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources;Database;2022-01-01