Abstract
AbstractTransposable elements (TEs) are mobile, repetitive DNA sequences that make the largest contribution to genome bulk. They thus contribute to the so-called “dark matter of the genome”, the part of the genome in which nothing is immediately recognizable as biologically functional.We developed a new method, based on k-mers, to identify degenerate TE sequences. With this new algorithm, we detect up to 10% of the A. thaliana genome as derived from as yet unidentified TEs, bringing the proportion of the genome known to be derived from TEs up to 50%. A significant proportion of these sequences overlapped conserved non-coding sequences identified in crucifers and rosids, and transcription factor binding sites. They are overrepresented in some gene regulation networks, such as the flowering gene network, suggesting a functional role for these sequences that have been conserved for more than 100 million years, since the spread of flowering plants in the Cretaceous.
Publisher
Cold Spring Harbor Laboratory