Author:
Donvil Linus,Housmans Joëlle A.J.,Peeters Eveline,Vranken Wim,Orlando Gabriele
Abstract
ABSTRACTThe rapid advancement of next-generation sequencing technologies has generated an immense volume of genetic data. However, this data is unevenly distributed, with well-studied organisms being disproportionately represented, while other organisms, such as from archaea, remain significantly underexplored. The study of archaea is particularly challenging due to the extreme environments they inhabit and the difficulties associated with culturing them in the laboratory. Despite these challenges, archaea likely represent a crucial evolutionary link between eukaryotic and prokaryotic organisms, and their investigation could shed light on the early stages of life on Earth. Yet, a significant portion of archaeal proteins are annotated with limited or inaccurate information.Among the various classes of archaeal proteins, DNA-binding proteins are of particular importance. While they represent a large portion of every known proteome, their identification in archaea is complicated by the substantial evolutionary divergence between archaeal and the other better studied organisms.To address the challenges of identifying DNA-binding proteins in archaea, we developed Xenusia, a neural network-based tool capable of screening entire archaeal proteomes to identify DNA-binding proteins. Xenusia has proven effective across diverse datasets, including metagenomics data, successfully identifying novel DNA-binding proteins, with experimental validation of its predictions.Xenusia is available as a PyPI package, with source code accessible athttps://github.com/grogdrinker/xenusia, and as a Google Colab web server application athttps://colab.research.google.com/drive/1c4eb4sEz8OsBqHL62XDFrqmwa7CxImww?usp=sharing.
Publisher
Cold Spring Harbor Laboratory