Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII

Author:

Leaman Robert1ORCID,Islamaj Rezarta1ORCID,Adams Virginia2,Alliheedi Mohammed A3,Almeida João Rafael45,Antunes Rui4ORCID,Bevan Robert6,Chang Yung-Chun7ORCID,Erdengasileng Arslan8,Hodgskiss Matthew6,Ida Ryuki9,Kim Hyunjae10ORCID,Li Keqiao8,Mercer Robert E11,Mertová Lukrécia12,Mobasher Ghadeer1213,Shin Hoo-Chang2,Sung Mujeen10ORCID,Tsujimura Tomoki9,Yeh Wen-Chao14ORCID,Lu Zhiyong1ORCID

Affiliation:

1. National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health , 8600 Rockville Pike, Bethesda, MD 20894, USA

2. NVIDIA , 2788 San Tomas Expressway, Santa Clara, CA 95051, USA

3. Department of Computer Science, Al Baha University , 4781 King Fahd Rd, Al Aqiq 65779, Saudi Arabia

4. Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro , Campus Universitário de Santiago, Aveiro 3810-193, Portugal

5. Department of Information and Communications Technologies, University of A Coruña , Camiño do Lagar de Castro, A Coruña 15008, Spain

6. Informatics Department, Medicines Discovery Catapult , Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK

7. Graduate Institute of Data Science, Taipei Medical University , No. 172-1, Section 2, Keelung Rd, Da’an District, Taipei City , Taipei 106, Taiwan

8. Department of Statistics, Florida State University , 117 N. Woodward Ave, Tallahassee, FL 32306, USA

9. Computational Intelligence Laboratory, Toyota Technological Institute , 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan

10. Department of Computer Science and Engineering, Korea University , 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea

11. Department of Computer Science, The University of Western Ontario , Room 355, Middlesex College, Ontario , London N6A 5B7, Canada

12. Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH) , Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany

13. Institute of Computer Science, Heidelberg University , Im Neuenheimer Feld 205, Heidelberg 69120, Germany

14. Institute of Information Systems and Applications, National Tsing Hua University , No. 101, Section 2, Kuang-Fu Road, Hsinchu 30013, Taiwan

Abstract

AbstractThe BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and—as highlighted during the coronavirus disease 2019 pandemic—their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text–mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/

Funder

Foundation for Science and Technology

Natural Sciences and Engineering Research Council of Canada

H2020 Marie Sklodowska-Curie Actions

U.S. National Library of Medicine

Albaha University

Publisher

Oxford University Press (OUP)

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,Information Systems

Reference71 articles.

1. Ten tips for a text-mining-ready article: how to improve automated discoverability and interpretability;Leaman;PLoS Biol.,2020

2. Understanding PubMed user search behavior through log analysis;Islamaj Dogan;Database (Oxford),2009

3. NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature;Islamaj;Sci. Data,2021

4. Literature information in PubChem: associations between PubChem records and scientific articles;Kim;J. Cheminform.,2016

5. Evaluation of lexical methods for detecting relationships between concepts from multiple ontologies;Johnson;Pac. Symp. Biocomput.,2006

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3