Abstract
AbstractThe sequence of nucleotides that make up an RNA determines its structure, which determines its function. The RNA hairpin, also known as a stem-loop, is a ubiquitous and fundamental feature of RNA secondary structure. A common method of randomizing an RNA sequence is dinucleotide shuffling with the Altschul-Erickson algorithm, which preserves the dinucleotide content of the sequence. This algorithm generates randomized sequences by sampling Eulerian paths through the de Bruijn graph representation of the original sequence. We identified a subset of RNA hairpins in the bpRNA-1m meta-database that always form hairpins after repeated application of dinucleotide shuffling. We investigated these “unbreakable hairpins” and found several common properties. First, we found that unbreakable hairpins had on average similar folding energies compared to other hairpins of similar lengths, although they frequently contained ultra-stable hairpin loops. We found that they tend to be split by purines and pyrimidines on opposite strands of the stem. Furthermore, we found that this specific sequence feature restricts the number of distinct Eulerian paths through their de Bruijn graph representation, resulting in a small number of distinguishable dinucleotide-shuffled sequences. Beyond this algorithmic means of identification, these distinct sequences may have biological significance because we found that a significant percentage occur in a specific location of 16S ribosomal RNAs. Finally, we present a formula to calculate the number of possible unique dinucleotide shuffled sequences for an input RNA sequence, which has utility for the general application of the Altschul-Erickson algorithm.
Publisher
Cold Spring Harbor Laboratory