pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data-Reference-Cited by-同舟云学术

pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data

Published:2020-06-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Korunes Katharine L^ORCID,Samuk Kieran^ORCID

Abstract

AbstractPopulation genetic analyses often use summary statistics to describe patterns of genetic variation and provide insight into evolutionary processes. Among the most fundamental of these summary statistics areπanddXY, which are used to describe genetic diversity within and between populations, respectively. Here, we address a widespread issue inπanddXYcalculation: systematic bias generated by missing data of various types. Many popular methods for calculatingπanddXYoperate on data encoded in the Variant Call Format (VCF), which condenses genetic data by omitting invariant sites. When calculatingπanddXYusing a VCF, it is often implicitly assumed that missing genotypes (including those at sites not represented in the VCF) are homozygous for the reference allele. Here, we show how this assumption can result in substantial downward bias in estimates ofπanddXYthat is directly proportional to the amount of missing data. We discuss the pervasive nature and importance of this problem in population genetics, and introduce a user-friendly UNIX command line utility,pixy, that solves this problem via an algorithm that generates unbiased estimates ofπanddXYin the face of missing data. We comparepixyto existing methods using both simulated and empirical data, and show thatpixyalone produces unbiased estimates ofπanddXYregardless of the form or amount of missing data. In sum, our software solves a long-standing problem in applied population genetics and highlights the importance of properly accounting for missing data in population genetic analyses.

Publisher

Cold Spring Harbor Laboratory

Reference31 articles.

1. The variant call format and VCFtools

2. Interpreting differentiation landscapes in the light of long-term linked selection;Evolution Letters,2017

3. Broad Institute. 2019. Picard toolkit. GitHub repository [Internet]. Available from: http://broadinstitute.github.io/picard/

4. Transposable elements map in a conserved pattern of distribution extending from beta-heterochromatin to centromeres in Drosophila melanogaster

5. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Repeated genetic adaptation to altitude in two tropical butterflies;Nature Communications;2022-08-09

2. Population genetic analysis reveals the role of natural selection and phylogeography on genome-wide diversity in an extremely compact and reduced microsporidian genome;2022-03-30

3. Repeated genetic adaptation to high altitude in two tropical butterflies;2021-12-01

4. Assembled chromosomes of the blood fluke Schistosoma mansoni provide insight into the evolution of its ZW sex-determination system;2021-08-13

5. Sex‐linked genetic diversity and differentiation in a globally distributed avian species complex;Molecular Ecology;2021-03-29