Author:
Zhang Shijie,Zhang Teng,Fu Yuan
Abstract
Traditional evolutionary biology research mainly relies on sequence information to infer evolutionary relationships between genes or proteins. In contrast, protein structural information has long been overlooked, although structures are more conserved and closely linked to the functions than the sequences. To address this gap, we conducted a proteome-wide structural analysis using experimental and computed protein structures for organisms from the three distinct domains, includingHomo sapiens(eukarya),Escherichia coli(bacteria), andMethanocaldococcus jannaschii(archaea). We reveal the distribution of structural similarity and sequence identity at the genomic level and characterize the twilight zone, where signals obtained from sequence alignment are blurred and evolutionary relationships cannot be inferred unambiguously. We find that structurally similar homologous protein pairs in the twilight zone account for ∼0.004%–0.021% of all possible protein pair combinations, which translates to ∼8%–32% of the protein-coding genes, depending on the species under comparison. In addition, by comparing the structural homologs, we show that human proteins involved in the energy supply are more similar to theirE. colihomologs, whereas proteins relating to the central dogma are more similar to theirM. jannaschiihomologs. We also identify a bacterial GPCR homolog in theE. coliproteome that displays distinctive domain architecture. Our results shed light on the characteristics of the twilight zone and the origin of different pathways from a protein structure perspective, highlighting an exciting new frontier in evolutionary biology.
Funder
National Natural Science Foundation of China
Tianjin Applied Basic Research Diversified Investment Foundation
Publisher
Cold Spring Harbor Laboratory
Subject
Genetics (clinical),Genetics