A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples-Reference-Cited by-同舟云学术

A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples

Published:2022-05-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Oketch John W.,Wain Louise V.^ORCID,Hollox Edward J.^ORCID

Abstract

AbstractShort tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome sequence, are highly polymorphic, and are known to cause not only Mendelian disease, but extensively affect gene expression. Nevertheless, their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data are beginning to address this. Here, we compare software that genotypes common STRs and detect rarer STR expansions genome-wide, with the aim of applying them to population-scale genome sequences. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project sequencing data, we can compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure performance against a set of clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. By using analysis of Mendelian inheritance patterns and comparison with capillary electrophoresis genotypes, we find that both HipSTR and GangSTR perform well in genotyping common STRs, with GangSTR outperforming HipSTR for genotyping call rate and memory usage. Analysis for expanded STRs showed ExpansionHunter denovo (EHdn) and STRetch outperformed GangSTR, but EHdn used considerably less processor time and memory compared to STRetch. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.

Publisher

Cold Spring Harbor Laboratory

Reference35 articles.

1. Recent advances in the detection of repeat expansions with short-read next-generation sequencing;F1000Research,2018

2. Genetic variation and differentiation among a native British and five migrant South Asian populations of the East Midlands (UK) based on CODIS forensic STR loci;Ann Hum Biol,2020

3. Mutation Rate in Human Microsatellites: Influence of the Structure and Length of the Tandem Repeat

4. Dante: genotyping of known complex and expanded short tandem repeats;Bioinformatics,2019

5. Mutation rate estimates for 110 Y-chromosome STRs combining population and father–son pair data

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comprehensive Analysis of the Genetic Variation in the LPA Gene from Short-Read Sequencing;BioMed;2024-06-04

2. Comprehensive analysis of the genetic variation in theLPAgene from short-read sequencing;2024-03-22

3. Multi-ancestry tandem repeat association study of hair colour using exome-wide sequencing;2024-02-28

4. Characterization of genome-wide STR variation in 6487 human genomes;Nature Communications;2023-04-12