vamos: VNTR annotation using efficient motif sets-Reference-Cited by-同舟云学术

vamos: VNTR annotation using efficient motif sets

Published:2022-10-08 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ren Jingwen,Gu Bida,Chaisson Mark JP

Abstract

AbstractMotivationRoughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): tandemly repeated arrays of motifs at least six bases. These loci are highly polymorphic: over 61% of insertion and deletion variants at least 50 bases found from long-read assemblies are inside VNTRs. Furthermore, long-read assemblies reveal that VNTR loci are multiallelic, and can vary by both motif composition and copy number. Current approaches that define and merge variants based on alignment breakpoints do not capture this complexity of variation. A natural alternative approach is to instead define the motif composition of VNTR sequences from samples, and to detect differences based on comparisons of repeat composition. However, due to the complexity of VNTR sequences, it is difficult to establish a common reference set of motif sequences that may be used to describe variation in large sequencing studies.ResultsHere we present a method vamos: VNTR Annotation using efficient Motif Sets that for any VNTR locus selects a set of representative motifs from all motifs observed at that locus that may be used to encode VNTR sequences within a bounded edit distance of the original sequence. We use our method to characterize VNTR variation in 32 haplotype-resolved human genomes. In contrast to current studies that merge multi-allelic calls, we estimate an average of 3.1-4.0 alleles per locus.Availabilitygithub.com/chaissonlab/vamos, zenodo.org/record/7158427Contactmchaisso@usc.edu

Publisher

Cold Spring Harbor Laboratory

Reference31 articles.

1. A global reference for human genetic variation

2. The impact of contact tracing and household bubbles on deconfinement strategies for COVID-19

3. Tandem repeats finder: a program to analyze DNA sequences

4. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits

5. Metastable brain waves

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. vamos: variable-number tandem repeats annotation using efficient motif sets;Genome Biology;2023-07-27

2. Variant calling and benchmarking in an era of complete human genome sequences;Nature Reviews Genetics;2023-04-14

3. The motif composition of variable number tandem repeats impacts gene expression;Genome Research;2023-04

4. The motif composition of variable-number tandem repeats impacts gene expression;2022-03-19