Abstract
AbstractThe majority of microbial genomes have yet to be cultured, and most proteins predicted from microbial genomes or sequenced from the environment cannot be functionally annotated. As a result, current computational approaches to describe microbial systems rely on incomplete reference databases that cannot adequately capture the full functional diversity of the microbial tree of life, limiting our ability to model high-level features of biological sequences. The scientific community needs a means to capture the functionally and evolutionarily relevant features underlying biology, independent of our incomplete reference databases. Such a model can form the basis for transfer learning tasks, enabling downstream applications in environmental microbiology, medicine, and bioengineering. Here we present LookingGlass, a deep learning model capturing a “universal language of life”. LookingGlass encodes contextually-aware, functionally and evolutionarily relevant representations of short DNA reads, distinguishing reads of disparate function, homology, and environmental origin. We demonstrate the ability of LookingGlass to be fine-tuned to perform a range of diverse tasks: to identify novel oxidoreductases, to predict enzyme optimal temperature, and to recognize the reading frames of DNA sequence fragments. LookingGlass is the first contextually-aware, general purpose pre-trained “biological language” representation model for short-read DNA sequences. LookingGlass enables functionally relevant representations of otherwise unknown and unannotated sequences, shedding light on the microbial dark matter that dominates life on Earth.AvailabilityThe pretrained LookingGlass model and the transfer learning-derived models demonstrated in this paper are available in the LookingGlass release v1.01. The open source fastBio Github repository and python package provides classes and functions for training and fine tuning deep learning models with biological data2. Code for reproducing analyses presented in this paper are available as an open source Github repository3.
Publisher
Cold Spring Harbor Laboratory
Reference75 articles.
1. Phylogenetically Novel Uncultured Microbial Cells Dominate Earth Microbiomes;mSystems,2018
2. High proportions of bacteria and archaea across most biomes remain uncultured;ISME J.,2019
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献