A Fast, Reproducible, High-throughput Variant Calling Workflow for Population Genomics

Author:

Mirchandani Cade D12ORCID,Shultz Allison J3ORCID,Thomas Gregg W C4,Smith Sara J45,Baylis Mara12,Arnold Brian67,Corbett-Detig Russ12,Enbody Erik1ORCID,Sackton Timothy B4

Affiliation:

1. Department of Biomolecular Engineering, University of California Santa Cruz , Santa Cruz, CA 95064 , USA

2. Genomics Institute, University of California Santa Cruz , Santa Cruz, CA 95064 , USA

3. Ornithology Department, Natural History Museum of Los Angeles County , Los Angeles, CA 90007 , USA

4. Informatics Group, Harvard University , Cambridge, MA , USA

5. Biology, Mount Royal University , Calgary, AB T3E 6K6 , Canada

6. Department of Ecology and Evolutionary Biology, Princeton University , Princeton, NJ , USA

7. Center for Statistics and Machine Learning, Princeton University , Princeton, NJ , USA

Abstract

Abstract The increasing availability of genomic resequencing data sets and high-quality reference genomes across the tree of life present exciting opportunities for comparative population genomic studies. However, substantial challenges prevent the simple reuse of data across different studies and species, arising from variability in variant calling pipelines, data quality, and the need for computationally intensive reanalysis. Here, we present snpArcher, a flexible and highly efficient workflow designed for the analysis of genomic resequencing data in nonmodel organisms. snpArcher provides a standardized variant calling pipeline and includes modules for variant quality control, data visualization, variant filtering, and other downstream analyses. Implemented in Snakemake, snpArcher is user-friendly, reproducible, and designed to be compatible with high-performance computing clusters and cloud environments. To demonstrate the flexibility of this pipeline, we applied snpArcher to 26 public resequencing data sets from nonmammalian vertebrates. These variant data sets are hosted publicly to enable future comparative population genomic analyses. With its extensibility and the availability of public data sets, snpArcher will contribute to a broader understanding of genetic variation across species by facilitating the rapid use and reuse of large genomic data sets.

Publisher

Oxford University Press (OUP)

Subject

Genetics,Molecular Biology,Ecology, Evolution, Behavior and Systematics

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3