Abstract
AbstractTo evade repression by the host defense, transposable elements (TEs) are occasionally horizontally transferred (HT) to naive species. TE invasions triggered by HT may be much more abundant than previously thought. For example, previous studies inDrosophila melanogasterfound 11 TE invasions over 200 the past years. A major limitation of current approaches for detecting recent invasions is the necessity for a repeat-library, which is notoriously difficult to generate. To address this, we developed GenomeDelta, a novel approach for identifying sample-specific sequences, such as recently invading TEs, without prior knowledge of the sequence. It can thus be used with model and non-model organisms. As input, GenomeDelta requires a long-read assembly and short-read data. It will find sequences in the assembly that are not represented in the short read data. Beyond identifying recent TE invasions, GenomeDelta can detect sequences with spatially heterogeneous distributions, recent insertions of viral elements and recent lateral gene transfers. We thoroughly validated GenomeDelta with simulated and real data from extant and historical specimens. Finally, we demonstrate that GenomeDelta can reveal novel biological insights: we discovered the three most recent TE invasions inDrosophila melanogasterand a novel TE with a geographically heterogeneous distribution inZymoseptoria tritici.
Publisher
Cold Spring Harbor Laboratory