Author:
Novoa Eva Maria,Jaillon Olivier,Jungreis Irwin,Kellis Manolis
Abstract
AbstractDue to the degeneracy of the genetic code, multiple codons are translated into the same amino acid. Despite being ‘synonymous’, these codons are not equally used. Selective pressures are thought to drive the choice among synonymous codons within a genome, while GC content, which is generally attributed to mutational drift, is the major determinant of interspecies codon usage bias. Here we find that in addition to the bias caused by GC content, inter-species codon usage signatures can also be detected. More specifically, we show that a single amino acid, arginine, is the major contributor to codon usage bias differences across domains of life. We then exploit this finding, and show that the identified domain-specific codon bias signatures can be used to classify a given sequence into its corresponding domain with high accuracy. Considering that species belonging to the same domain share similar tRNA decoding strategies, we then wondered whether the inclusion of codon autocorrelation patterns might improve the classification performance of our algorithm. However, we find that autocorrelation patterns are not domain-specific, and surprisingly, are unrelated to tRNA reusage, in contrast to the common belief. Instead, our results reveal that codon autocorrelation patterns are a consequence of codon optimality throughout a sequence, where highly expressed genes display autocorrelated ‘optimal’ codons, whereas lowly expressed genes display autocorrelated ‘non-optimal’ codons.
Publisher
Cold Spring Harbor Laboratory