Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts-Reference-Cited by-同舟云学术

Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts

Published:2019-09-03 Issue:4 Volume:36 Page:1226-1233
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Tsueng Ginger¹^ORCID,Nanis Max¹,Fouquier Jennifer T¹,Mayers Michael¹,Good Benjamin M¹,Su Andrew I¹

Affiliation:

1. Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA

Abstract

Abstract Motivation Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depend on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction (RE). Results In this article, we introduce the Relationship Extraction Module of the web-based application Mark2Cure (M2C) and demonstrate that citizen scientists can perform RE. We confirm the importance of accurate named entity recognition on user performance of RE and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the M2C Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration and natural language processing. Availability and implementation Mark2Cure platform: https://mark2cure.org; Mark2Cure source code: https://github.com/sulab/mark2cure; and data and analysis code for this article: https://github.com/gtsueng/M2C_rel_nb. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

US National Institute of Health

Scripps Translational Science Institute

NIH-NCATS Clinical and Translational Science Award

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz678/29522887/btz678.pdf

Reference59 articles.

1. Radio galaxy zoo: discovery of a poor cluster through a giant wide-angle tail radio galaxy;Banfield;Mon. Not. R. Astron. Soc,2016

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Outbreak.info Research Library: a standardized, searchable platform to discover and explore COVID-19 resources;Nature Methods;2023-02-23

2. Outbreak.info Research Library: A standardized, searchable platform to discover and explore COVID-19 resources;2022-01-21

3. Building a pipeline to solicit expert knowledge from the community to aid gene summary curation;Database;2020-01-01

4. A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing;Database;2020