Abstract
ABSTRACTThe genetic code uses three-nucleotide units to encode each amino acid in proteins. Insertions or deletions of nucleotides not divisible by three shift the reading frames, resulting in significantly different protein sequences. These events are disruptive but can also create variability important for evolution. Previous studies suggest that genetic code and gene sequences evolve to minimize frameshift effects, maintaining similar physicochemical properties to their reference proteins. Here, we focused on tandem repeat sequences, known as frameshift hotspots. Using cutting-edge bioinformatics tools, we compared reference and frameshifted protein sequences within tandem repeats across 50 prokaryotic and eukaryotic proteomes. Our analysis revealed several intriguing sequence-structure-function correlations. We showed that in contrast to the general tendency, frameshifts within these regions, especially with short repeats, lead to significant changes: increased hydrophobicity and arginine content, new aggregation-prone and transmembrane regions. Overall, frameshifts have stronger effects on tandem repeat regions compared to non-repetitive sequences, and therefore can be a primary cause of altered functions, cellular localization, and the development of various pathologies.
Publisher
Cold Spring Harbor Laboratory