Abstract
AbstractMotivationApproximately 8% of the human genome is derived from endogenous retroviruses (ERVs). In recent years, an increasing number of human diseases have been found to be associated with ERVs. However, it remains challenging to accurately detect the full spectrum of polymorphic (unfixed) ERVs using next-generation sequencing (NGS) data.ResultsWe designed a new tool, ERVcaller, to detect and genotype transposable element (TE) insertions, including ERVs, in the human genome. We evaluated ERVcaller using both simulated and real benchmark whole-genome sequencing (WGS) datasets. By comparing with existing tools, ERVcaller consistently obtained both the highest sensitivity and precision for detecting simulated ERV and other TE insertions derived from real polymorphic TE sequences. For the WGS data from the 1000 Genomes Project, ERVcaller detected the largest number of TE insertions per sample based on consensus TE loci. By analyzing the experimentally verified TE insertions, ERVcaller had 94.0% TE detection sensitivity and 96.6% genotyping accuracy. PCR and Sanger sequencing in a small sample set verified 86.7% of examined insertion statuses and 100% of examined genotypes. In conclusion, ERVcaller is capable of detecting and genotyping TE insertions using WGS data with both high sensitivity and precision. This tool can be applied broadly to other species.Availabilitywww.uvm.edu/genomics/software/ERVcaller.htmlContactdawei.li@uvm.eduSupplementary informationSupplementary data are available at Bioinformatics online.
Publisher
Cold Spring Harbor Laboratory
Reference63 articles.
1. Repbase Update, a database of repetitive elements in eukaryotic genomes
2. Genomewide Screening Reveals High Levels of Insertional Polymorphism in the Human Endogenous Retrovirus Family HERV-K(HML2): Implications for Present-Day Activity
3. The role of human endogenous retroviruses in the pathogenesis of autoimmune diseases;Med Sci Monit,2012
4. Chaisson, M.J.P. , et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. bioRxiv 2018: 193144.