Origin Matters: Using a Local Reference Genome Improves Measures in Population Genomics
Author:
Thorburn Doko-Miles J.ORCID, Sagonas Kostas, Binzer-Panchal Mahesh, Chain Frederic J.J.ORCID, Feulner Philine G.D.ORCID, Bornberg-Bauer Erich, Reusch Thorsten BH, Samonte-Padilla Irene E., Milinski Manfred, Lenz Tobias L., Eizaguirre Christophe
Abstract
AbstractGenome-level sequencing enables us to ask fundamental questions about the genetic basis of adaptation, population structure, and epigenetic mechanisms, but usually requires a suitable reference genome for mapping population-level re-sequencing data. In some model systems, multiple reference genomes are available, giving researchers the challenging task of determining which reference genome best suits their data. Here we compare the use of two different reference genomes for the three-spined stickleback (Gasterosteus aculeatus), one novel genome derived from a European gynogenetic individual and the published reference genome of a North American individual. Specifically, we investigate the impact of using a local reference versus one generated from a distinct lineage on several common population genomics analyses. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we demonstrate that genetic distance among samples and the reference impacts downstream analyses. Using a local reference genome increased mapping efficiency and genotyping accuracy, effectively retaining more and better data. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e., π, Tajima’sD, andFST), window-based statistics using different references resulted in different outlier genes and enriched gene functions. A marker-based analysis of DNA methylation distributions had a comparably high overlap in outlier genes and functions, yet with distinct differences depending on the reference genome. Overall, our results highlight how using a local reference genome decreases reference bias to increase confidence in downstream analyses of the data. Such results have significant implications in all reference-genome-based population genomic analyses.
Publisher
Cold Spring Harbor Laboratory
Reference101 articles.
1. genomation: a toolkit to summarize, annotate and visualize genomic intervals 2. SweGen: A whole-genome data resource of genetic variability in a cross-section of the Swedish population;European Journal of Human Genetics,2017 3. Andrews, S. (2010). FASTQC A Quality Control tool for High Throughput Sequence Data. Babraham Institute. 4. Auwera, G. A. van der , Carneiro, M. O. , Chris Hartl, R. P. , Angel, G. del , Levy-Moonshine, A. , Jordan, T. , Shakir, K. , Roazen, D. , Thibault, J. , Banks, E. , Garimella1, K. v. , Altshuler, D. , Gabriel, S. , & DePristo, M. A. (2014). From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics. 5. Baltazar-Soares, M. , Klein, J. D. , Correia, S. M. , Reischig, T. , Taxonera, A. , Roque, S. M. , dos Passos, L. , Durão, J. , Lomba, J. P. , Dinis, H. , Cameron, S. J. K. , Stiebens, V. A. , & Eizaguirre, C. (2020). Distribution of genetic diversity reveals colonization patterns and philopatry of the loggerhead sea turtles across geographic scales. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-74141-6
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|