NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature-Reference-Cited by-同舟云学术

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature

Published:2021-03-25 Issue:1 Volume:8 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Islamaj Rezarta^ORCID,Leaman Robert^ORCID,Kim Sun^ORCID,Kwon Dongseop,Wei Chih-Hsuan,Comeau Donald C.^ORCID,Peng Yifan^ORCID,Cissel David,Coss Cathleen,Fisher Carol,Guzman Rob,Kochar Preeti Gokal^ORCID,Koppel Stella,Trinh Dorothy,Sekiya Keiko,Ward Janice,Whitman Deborah,Schmidt Susan,Lu Zhiyong

Abstract

AbstractAutomatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.

Funder

U.S. Department of Health & Human Services | NIH | U.S. National Library of Medicine

NIH Intramural Research Program, National Library of Medicine

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

http://www.nature.com/articles/s41597-021-00875-1.pdf

Reference31 articles.

1. Islamaj Dogan, R., Murray, G. C., Neveol, A. & Lu, Z. Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009, bap018, https://doi.org/10.1093/database/bap018 (2009).

2. Krallinger, M., Rabal, O., Lourenco, A., Oyarzabal, J. & Valencia, A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 117, 7673–7761, https://doi.org/10.1021/acs.chemrev.6b00851 (2017).

3. Krallinger, M. et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles. J Cheminform 7, S2, https://doi.org/10.1186/1758-2946-7-S1-S2 (2015).

4. Hirschman, L. et al. Text mining for the biocuration workflow. Database (Oxford) 2012, bas020, https://doi.org/10.1093/database/bas020 (2012).

5. Krallinger, M. et al. CHEMDNER: The drugs and chemical names extraction challenge. J Cheminform 7, S1, https://doi.org/10.1186/1758-2946-7-S1-S1 (2015).

Cited by 36 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. EnzChemRED, a rich enzyme chemistry relation extraction dataset;Scientific Data;2024-09-09

2. Chemical entity normalization for successful translational development of Alzheimer’s disease and dementia therapeutics;Journal of Biomedical Semantics;2024-07-31

3. Application of machine reading comprehension techniques for named entity recognition in materials science;Journal of Cheminformatics;2024-07-02

4. Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition;Journal of Proteome Research;2024-05-11

5. PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge;Nucleic Acids Research;2024-04-04