VarSCAT: A computational tool for sequence context annotations of genomic variants
-
Published:2023-08-11
Issue:8
Volume:19
Page:e1010727
-
ISSN:1553-7358
-
Container-title:PLOS Computational Biology
-
language:en
-
Short-container-title:PLoS Comput Biol
Author:
Wang NingORCID,
Khan Sofia,
Elo Laura L.
Abstract
The sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annotations have been limited. Herein, we describe the Variant Sequence Context Annotation Tool (VarSCAT) for annotating the sequence contexts of genomic variants, including breakpoint ambiguities, flanking bases of variants, wildtype/mutated DNA sequences, variant nomenclatures, distances between adjacent variants, tandem repeat regions, and custom annotation with user customizable options. Our analyses demonstrate that VarSCAT is more versatile and customizable than the currently available methods or strategies for annotating variants in short tandem repeat (STR) regions or insertions and deletions (indels) with breakpoint ambiguity. Variant sequence context annotations of high-confidence human variant sets with VarSCAT revealed that more than 75% of all human individual germline and clinically relevant indels have breakpoint ambiguities. Moreover, we illustrate that more than 80% of human individual germline small variants in STR regions are indels and that the sizes of these indels correlated with STR motif sizes. VarSCAT is available from https://github.com/elolab/VarSCAT.
Funder
Turun Yliopistosäätiö
Turun yliopiston tutkijakoulu
H2020 European Research Council
Horizon 2020
Academy of Finland
Sigrid Juséliuksen Säätiö
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference68 articles.
1. Standards and Guidelines for the Interpretation and Reporting of Sequence Variants in Cancer.;MM Li;J Mol Diagnostics,2017
2. Landscape of somatic single nucleotide variants and indels in colorectal cancer and impact on survival;SH Zaidi;Nat Commun,2020
3. Identifying noncoding risk variants using disease-relevant gene regulatory networks;L Gao;Nat Commun,2018
4. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution;DG Hwang;Proc Natl Acad Sci U S A,2004
5. Variation in the mutation rate across mammalian genomes;A Hodgkinson;Nature Reviews Genetics,2011
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献