SNP‐SVant: A Computational Workflow to Predict and Annotate Genomic Variants in Organisms Lacking Benchmarked Variants

Author:

Gunasekaran Deepika12,Ardell David H.2,Nobile Clarissa J.23ORCID

Affiliation:

1. Quantitative and Systems Biology Graduate Program University of California Merced California

2. Department of Molecular and Cell Biology, School of Natural Sciences University of California Merced California

3. Health Science Research Institute University of California Merced California

Abstract

AbstractWhole‐genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short‐read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP‐SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high‐confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP‐SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP‐SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP‐SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP‐SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide‐ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC.Basic Protocol: Predicting single nucleotide polymorphisms and structural variationsSupport Protocol 1: Downloading publicly available sequencing dataSupport Protocol 2: Visualizing variant loci using Integrated Genome ViewerSupport Protocol 3: Converting between VCF and aligned FASTA formats

Publisher

Wiley

Reference29 articles.

1. Andrews S.(2010).FastQC: A quality control tool for high throughput sequence data. Available at:http://www.bioinformatics.babraham.ac.uk/projects/fastqc

2. Trimmomatic: a flexible trimmer for Illumina sequence data

3. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly

4. Saccharomyces Genome Database: the genomics resource of budding yeast

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3