Affiliation:
1. Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) , 76344 Eggenstein-Leopoldshafen, Germany
2. Wellcome Trust Sanger Institute , Hinxton, Saffron Walden CB10 1RQ, United Kingdom
Abstract
AbstractAnnotating protein sequences according to their biological functions is one of the key steps in understanding microbial diversity, metabolic potentials, and evolutionary histories. However, even in the best-studied prokaryotic genomes, not all proteins can be characterized by classical in vivo, in vitro, and/or in silico methods—a challenge rapidly growing alongside the advent of next-generation sequencing technologies and their enormous extension of ‘omics’ data in public databases. These so-called hypothetical proteins (HPs) represent a huge knowledge gap and hidden potential for biotechnological applications. Opportunities for leveraging the available ‘Big Data’ have recently proliferated with the use of artificial intelligence (AI). Here, we review the aims and methods of protein annotation and explain the different principles behind machine and deep learning algorithms including recent research examples, in order to assist both biologists wishing to apply AI tools in developing comprehensive genome annotations and computer scientists who want to contribute to this leading edge of biological research.
Funder
Karlsruhe Institute of Technology
Publisher
Oxford University Press (OUP)
Subject
Infectious Diseases,Microbiology
Reference218 articles.
1. Protein secondary structure prediction (PSSP) using different machine algorithms;Afify;Egypt J Med Hum Genet,2021
2. Metagenomics, metatranscriptomics, and metabolomics approaches for microbiome analysis;Aguiar-Pulido;Evol Bioinform Online,2016
3. Unified rational protein engineering with sequence-based deep representation learning;Alley;Nat Methods,2019
4. A systematic review on supervised and unsupervised machine learning algorithms for data science;Alloghani,2020
5. An introduction to kernel and nearest-neighbor nonparametric regression;Altman;Am Stat,1992
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献