Abstract
AbstractThe C-terminal sequence of a protein is involved in processes such as efficiency of translation termination and protein degradation. However, the general relationship between features of this C-terminal sequence and levels of protein expression remains unknown. Here, we identified C-terminal amino acid biases that are ubiquitous across the bacterial taxonomy (1582 genomes). We showed that the frequency is higher for positively charged amino acids (lysine, arginine) while hydrophobic amino acids and threonine are lower. In highly abundant proteins, the C-terminal residue is more conserved. We then studied the impact of C-terminal composition on protein levels in a library ofM. pneumoniaemutants, covering all possible combinations of the two last codons. We found that charged and polar residues, in particular lysine, led to higher expression, while hydrophobic and aromatic residues led to lower expression, with a difference in protein levels up to 4-fold. Our results demonstrate that the identity of the last amino acids has a strong influence on protein expression levels and is under selective pressure in highly expressed proteins.
Publisher
Cold Spring Harbor Laboratory