Author:
Melfi Andrew,Viswanath Divakar
Abstract
AbstractThe Kingman coalescent, widely used in genetics, is known to be a good approximation when the sample size is small relative to the population size. In this article, we investigate how large the sample size can get without violating the coalescent approximation. If the haploid population size is 2N, we prove that for samples of size N1/3−ϵ, ϵ > 0, coalescence under the Wright-Fisher (WF) model converges in probability to the Kingman coalescent in the limit of large N. For samples of size N2/5−ϵ or smaller, the WF coalescent converges to a mixture of the Kingman coalescent and what we call the mod-2 coalescent. For samples of size N1/2 or larger, triple collisions in the WF genealogy of the sample become important. The sample size for which the probability of conformance with the Kingman coalescent is 95% is found to be 1.47 × N0.31 for N ∈ [103, 105], showing the pertinence of the asymptotic theory. The probability of no triple collisions is found to be 95% for sample sizes equal to 0.92 × N0.49, which too is in accord with the asymptotic theory.Varying population sizes are handled using algorithms that calculate the probability of WF coalescence agreeing with the Kingman model or taking place without triple collisions. For a sample of size 100, the probabilities of coalescence according to the Kingman model are 2%, 0%, 1%, and 0% in four models of human population with constant N, constant N except for two bottlenecks, recent exponential growth, and increasing recent exponential growth, respectively. For the same four demographic models and the same sample size, the probabilities of coalescence with no triple collision are 92%, 73%, 88%, and 87%, respectively. Visualizations of the algorithm show that even distant bottlenecks can impede agreement between the coalescent and the WF model.Finally, we prove that the WF sample frequency spectrum for samples of size N1/3−ϵ or smaller converges to the classical answer for the coalescent.
Publisher
Cold Spring Harbor Laboratory