Abstract
AbstractShort tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome sequence, are highly polymorphic, and are known to cause not only Mendelian disease, but extensively affect gene expression. Nevertheless, their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data are beginning to address this. Here, we compare software that genotypes common STRs and detect rarer STR expansions genome-wide, with the aim of applying them to population-scale genome sequences. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project sequencing data, we can compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure performance against a set of clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. By using analysis of Mendelian inheritance patterns and comparison with capillary electrophoresis genotypes, we find that both HipSTR and GangSTR perform well in genotyping common STRs, with GangSTR outperforming HipSTR for genotyping call rate and memory usage. Analysis for expanded STRs showed ExpansionHunter denovo (EHdn) and STRetch outperformed GangSTR, but EHdn used considerably less processor time and memory compared to STRetch. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.
Publisher
Cold Spring Harbor Laboratory
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献