Abstract
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies—as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome.
Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices.
We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference49 articles.
1. The variant call format and VCFtools;P Danecek;Bioinformatics,2011
2. HTS-Specs: specifications of SAM/BAM and related high-throughput sequencing file formats; 2011 (accessed April 2021). https://samtools.github.io/hts-specs/. GitHub Repository.
3. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data;A McKenna;Genome Res,2010
4. Haplotype-Based Variant Detection from Short-Read Sequencing;E Garrison;ARXIV,2012
5. Tabix: fast retrieval of sequence features from generic TAB-delimited files;H Li;Bioinformatics,2011
Cited by
95 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献