GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains-Reference-Cited by-同舟云学术

GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains

Published:2015 Issue: Volume:2015 Page:1-7
ISSN:2314-6133
Container-title:BioMed Research International
language:en
Short-container-title:BioMed Research International

Author:

Wei Chih-Hsuan¹^ORCID,Kao Hung-Yu²^ORCID,Lu Zhiyong¹

Affiliation:

1. National Center for Biotechnology Information (NCBI), 8600 Rockville Pike, Bethesda, MD 20894, USA

2. Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 701, Taiwan

Abstract

The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications. Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene names or only document level gene identifiers. In this work, we report GNormPlus: an end-to-end and open source system that handles both gene mention and identifier detection. We created a new corpus of 694 PubMed articles to support our development of GNormPlus, containing manual annotations for not only gene names and their identifiers, but also closely related concepts useful for gene name disambiguation, such as gene families and protein domains. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. As a result, GNormPlus compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset. The GNormPlus source code and its annotated corpus are freely available, and the results of applying GNormPlus to the entire PubMed are freely accessible through our web-based tool PubTator.

Publisher

Hindawi Limited

Subject

General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

http://downloads.hindawi.com/journals/bmri/2015/918710.pdf

Reference46 articles.

1. PubMed and beyond: a survey of web tools for searching biomedical literature

2. Frontiers of biomedical text mining: current progress

3. Seeking a New Biology through Text Mining

4. Mining the Biomedical Literature in the Genomic Era: An Overview

5. Facts from Text—Is Text Mining Ready to Deliver?

Cited by 155 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery;Computational and Structural Biotechnology Journal;2024-12

2. EnzChemRED, a rich enzyme chemistry relation extraction dataset;Scientific Data;2024-09-09

3. MiPRIME: an integrated and intelligent platform for mining primer and probe sequences of microbial species;Bioinformatics;2024-07-01

4. Ensemble pretrained language models to extract biomedical knowledge from literature;Journal of the American Medical Informatics Association;2024-03-23

5. GPDminer: a tool for extracting named entities and analyzing relations in biological literature;BMC Bioinformatics;2024-03-06