Affiliation:
1. Quantitative and Systems Biology Graduate Program University of California Merced California
2. Department of Molecular and Cell Biology, School of Natural Sciences University of California Merced California
3. Health Science Research Institute University of California Merced California
Abstract
AbstractWhole‐genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short‐read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP‐SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high‐confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP‐SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP‐SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP‐SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP‐SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide‐ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC.Basic Protocol: Predicting single nucleotide polymorphisms and structural variationsSupport Protocol 1: Downloading publicly available sequencing dataSupport Protocol 2: Visualizing variant loci using Integrated Genome ViewerSupport Protocol 3: Converting between VCF and aligned FASTA formats