Transcription factors across theEscherichia colipangenome: a 3D perspective
Author:
Moreno-Hagelsieb GabrielORCID
Abstract
AbstractMotivationIdentification of complete sets of transcription factors (TFs) is a foundational step in the inference of genetic regulatory networks. With the availability of high-quality predictions of protein three-dimensional structures (3D), it has become possible to use structural comparisons for the inference of homology beyond what is possible from sequence analyses alone. This work explores the potential to use predicted 3D structures for the identification of TFs in theEscherichia colipangenome.ResultsComparisons between predicted structures and their experimentally confirmed counterparts confirmed the high-quality of predicted structures, with most 3D structural alignments showing TM-scores well above established structural similarity thresholds, though the quality seemed slightly lower for TFs than for other proteins. As expected, structural similarity decreased with sequence similarity, though most TM-scores still remained above the structural similarity threshold. This was true regardless of the aligned structures being experimental or predicted. Results at the lowest sequence identity levels revealed potential for 3D structural comparisons to extend homology inferences below the “twilight zone” of sequence-based methods. The body of predicted 3D structures covered 99.7% of available proteins from theE. colipangenome, missing only two of those matching TF domain sequence profiles. Structural analyses increased the inferred TFs in theE. colipangenome by 18% above the amount obtained with sequence profiles alone.
Publisher
Cold Spring Harbor Laboratory
Reference26 articles.
1. The Gene Ontology knowledgebase in 2023 2. Accurate prediction of protein structures and interactions using a three-track neural network 3. Barrio-Hernandez, I. , Yeo, J. , Jänes, J. , Mirdita, M. , Gilchrist, C. L. M. , Wein, T. , Varadi, M. , Velankar, S. , Beltrao, P. , and Steinegger, M. (2023). Clustering-predicted structures at the scale of the known protein universe. Nature, pages 1–9. 4. Bittrich, S. , Bhikadiya, C. , Bi, C. , Chao, H. , Duarte, J. M. , Dutta, S. , Fayazi, M. , Henry, J. , Khokhriakov, I. , Lowe, R. , Piehl, D. W. , Segura, J. , Vallat, B. , Voigt, M. , Westbrook, J. D. , Burley, S. K. , and Rose, Y. (2023). RCSB Protein Data Bank: E?icient Searching and Simultaneous Access to One Million Computed Structure Moddels Alongside the PDB Structures Enabled by Architectural Advances. Journal of Molecular Biology, page 167994. 5. Sensitive protein alignments at tree-of-life scale using DIAMOND;Nature Methods,2021
|
|