Author:
Takeda Atsushi,Nonaka Daisuke,Imazu Yuta,Fukunaga Tsukasa,Hamada Michiaki
Abstract
AbstractMotivationInterspersed repeats occupy a large part of many eukaryotic genomes, and thus their accurate annotation is essential for various genome analyses. Database-freede novorepeat detection approaches are powerful for annotating genomes that lack well-curated repeat databases. However, existing tools do not yet have sufficient repeat detection performance.ResultsIn this study, we developed REPrise, ade novointerspersed repeat detection software program based on a seed-and-extension method. Although the algorithm of REPrise is similar to that of RepeatScout, which is currently the de facto standard tool, we incorporated three unique techniques into REPrise: inexact seeding, affine gap scoring and loose masking. Analyses of rice and simulation genome datasets showed that REPrise outperformed RepeatScout in terms of sensitivity, especially when the repeat sequences contained many mutations. Furthermore, when applied to the complete human genome dataset T2T-CHM13, REPrise demonstrated the potential to detect novel repeat sequence families.AvailabilityThe source code of REPrise is freely available athttps://github.com/hmdlab/REPrise. Repeat annotations predicted for the T2T genome using REPrise are also available athttps://waseda.box.com/v/REPrise-data.Contactfukunaga@aoni.waseda.jpandmhamada@waseda.jp
Publisher
Cold Spring Harbor Laboratory