Automatic consistency assurance for literature-based gene ontology annotation-Reference-Cited by-同舟云学术

Automatic consistency assurance for literature-based gene ontology annotation

Published:2021-11-25 Issue:1 Volume:22 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Chen Jiyu,Geard Nicholas,Zobel Justin,Verspoor Karin^ORCID

Abstract

Abstract Background Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annotations and the associated evidence texts identified by expert curators is reliable but time-consuming, and is infeasible in the context of rapidly growing biological literature. A key challenge is maintaining consistency of existing GO annotations as new studies are published and the GO vocabulary is updated. Results In this work, we introduce a formalisation of biological database annotation inconsistencies, identifying four distinct types of inconsistency. We propose a novel and efficient method using state-of-the-art text mining models to automatically distinguish between consistent GO annotation and the different types of inconsistent GO annotation. We evaluate this method using a synthetic dataset generated by directed manipulation of instances in an existing corpus, BC4GO. We provide detailed error analysis for demonstrating that the method achieves high precision on more confident predictions. Conclusions Two models built using our method for distinct annotation consistency identification tasks achieved high precision and were robust to updates in the GO vocabulary. Our approach demonstrates clear value for human-in-the-loop curation scenarios.

Funder

Australian Research Council

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-021-04479-9.pdf

Reference40 articles.

1. Gene Ontology Consortium. Gene ontology consortium: going forward. Nucleic Acids Res. 2015;43(D1):1049–56.

2. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.

3. Zhou N, Jiang Y, Bergquist TR. The CAFA challenge reports improved protein function prediction and new functional. Genome Biol. 2019;20:1.

4. Cozzetto D, Jones D. Computational methods for annotation transfers from sequence. Methods Mol Biol (Clifton, NJ). 2017;1446:55–67.

5. Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 2017;45(D1):331–8.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Auricular acupressure for constipation in adults: a systematic review and meta-analysis;Frontiers in Physiology;2023-10-16

2. Exploring automatic inconsistency detection for literature-based gene ontology annotation;Bioinformatics;2022-06-24