Abstract
AbstractIf two haplotypes share the same alleles for an extended gene tract, these haplotypes are likely to derive identical-by-descent from a recent common ancestor. Identity-by-descent segment lengths are correlated via unobserved tree and recombination processes, which commonly presents challenges to the derivation of theoretical results in population genetics. Under interpretable regularity conditions, we show that the proportion of detectable identity-by-descent segments at a locus is normally distributed for large sample size and large scaled population size. We use efficient and exact simulations to study the distributional behavior of the detectable identity-by-descent rate in finite samples. One consequence of non-normality in finite samples is that genome-wide scans based on identity-by-descent rates may be subject to anti-conservative Type 1 error control.HighlightsWe show the asymptotic normality of the identity-by-descent rate, a mean of correlated binary random variables that arises in population genetics studies.We describe an efficient algorithm capable of simulating long identity-by-descent segments around a locus in large sample sizes.In enormous simulation studies, we use this algorithm to characterize the distributional properties of the identity-by-descent rate.In finite samples, we reject the null hypothesis of normality more often than the nominal significance level, indicating that genome-wide scans based on identity-by-descent rates may be anti-conservative.
Publisher
Cold Spring Harbor Laboratory