Affiliation:
1. Department of Biological Sciences University of North Texas Denton Texas USA
2. BioDiscovery Institute University of North Texas Denton Texas USA
Abstract
ABSTRACTA fundamental problem in the field of protein evolutionary biology is determining the degree and nature of evolutionary relatedness among homologous proteins that have diverged to a point where they share less than 30% amino acid identity yet retain similar structures and/or functions. Such proteins are said to lie within the “Twilight Zone” of amino acid identity. Many researchers have leveraged experimentally determined structures in the quest to classify proteins in the Twilight Zone. Such endeavors can be highly time consuming and prohibitively expensive for large‐scale analyses. Motivated by this problem, here we use molecular weight–hydrophobicity physicochemical dynamic time warping (MWHP DTW) to quantify similarity of simulated and real‐world homologous protein domains. MWHP DTW is a physicochemical method requiring only the amino acid sequence to quantify similarity of related proteins and is particularly useful in determining similarity within the Twilight Zone due to its resilience to primary sequence substitution saturation. This is a step forward in determination of the relatedness among Twilight Zone proteins and most notably allows for the discrimination of random similarity and true homology in the 0%–20% identity range. This method was previously presented expeditiously just after the outbreak of COVID‐19 because it was able to functionally cluster ACE2‐binding betacoronavirus receptor binding domains (RBDs), a task that has been elusive using standard techniques. Here we show that one reason that MWHP DTW is an effective technique for comparisons within the Twilight Zone is because it can uncover hidden homology by exploiting physicochemical conservation, a problem that protein sequence alignment algorithms are inherently incapable of addressing within the Twilight Zone. Further, we present an extended definition of the Twilight Zone that incorporates the dynamic relationship between structural, physicochemical, and sequence‐based metrics.