Unification of functional annotation descriptions using text mining
Author:
Queirós Pedro1ORCID, Novikova Polina1ORCID, Wilmes Paul1ORCID, May Patrick2ORCID
Affiliation:
1. Systems Ecology , Esch-sur-Alzette , Luxembourg 2. Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg , 4362 , Esch-sur-Alzette , Luxembourg
Abstract
Abstract
A common approach to genome annotation involves the use of homology-based tools for the prediction of the functional role of proteins. The quality of functional annotations is dependent on the reference data used, as such, choosing the appropriate sources is crucial. Unfortunately, no single reference data source can be universally considered the gold standard, thus using multiple references could potentially increase annotation quality and coverage. However, this comes with challenges, particularly due to the introduction of redundant and exclusive annotations. Through text mining it is possible to identify highly similar functional descriptions, thus strengthening the confidence of the final protein functional annotation and providing a redundancy-free output. Here we present UniFunc, a text mining approach that is able to detect similar functional descriptions with high precision. UniFunc was built as a small module and can be independently used or integrated into protein function annotation pipelines. By removing the need to individually analyse and compare annotation results, UniFunc streamlines the complementary use of multiple reference datasets.
Publisher
Walter de Gruyter GmbH
Subject
Clinical Biochemistry,Molecular Biology,Biochemistry
Reference54 articles.
1. Aramaki, T., Blanc-Mathieu, R., Endo, H., Ohkubo, K., Kanehisa, M., Goto, S., and Ogata, H. (2020). KofamKOALA: KEGG oOrtholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36: 2251–2252, https://doi.org/10.1093/bioinformatics/btz859. 2. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.. (2000). Gene ontology: tool for the unification of biology. Nat. Genet. 25: 25–29, https://doi.org/10.1038/75556. 3. Benabderrahmane, S., Smail-Tabbone, M., Poch, O., Napoli, A., and Devignes, M.-D. (2010). IntelliGO: a new vector-based semantic similarity measure including annotation origin. BMC Bioinf. 11, https://doi.org/10.1186/1471-2105-11-588. 4. Bird, S., Klein, E., and Loper, E. (2009). Natural language processing with python, Available at: . 5. Brown, C.T. and Irber, L. (2016). sourmash: a library for MinHash sketching of DNA. J. Open Source Softw. 1: 27.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|