Abstract
AbstractScaling is a common practice in population genetic simulations to increase computational efficiency. However, there exists a dearth of standardized guidelines for best practices. Few studies have examined the effects of scaling on diversity and whether the results are directly comparable to unscaled and empirical data. We examine the effects of scaling in two model populations, modern humans andDrosophila melanogaster. The reason is twofold: 1) due to the substantial difference in population sizes and generation times, human populations require moderate-to-no scaling, while more dramatic scaling is required forDrosophila; and 2) model populations have empirical data for comparison. We determine whether coalescence, runtime, memory, estimates of diversity, the site frequency spectra, and deleterious variation are affected by scaling. We also explore the effect of varying the simulated segment length and burn-in times. We find that the typical 10N generation burn-in is often not sufficient for full coalescence to occur in human orDrosophilasimulations. As expected, memory and runtime increase as the scaling coefficient decreases and the length of the simulated segment increases. We show that simulating larger segments in humans is preferable, as it produces a smaller variance in diversity estimates. Conversely, inDrosophilait is preferable to simulate smaller segments and concatenate them into full genome for achieving comparable levels of diversity to empirical data. We find that aggressive scaling leads to stronger negative selection and ultimately amplifies the strength of background selection on flanking variation.Author SummaryScaling is a common approach to make population genetic simulations more computationally tractable. However, the implications of scaling and best practices for scaling are still unknown. This study highlights the importance of carefully considering scaling practices for forward-in-time population genetics simulations. We provide insights about the trade-offs between computational efficiency and accuracy of scaled simulations relative to empirical data, in human andDrosophila. We achieved this by varying the species demographic model; the method of coalescence; the simulated genomic element length; and the scaling factor. For each combination of parameters genetic diversity was quantified and computational was efficiency tracked. Our findings suggest that when simulating populations, such as humans, where moderate scaling is required, one should simulate larger genomic segments for more accurate measures of diversity. Scaling seems to cause an inflation of diversity in human simulations relative to empirical data. On the other hand, in populations where more aggressive scaling is required, such asDrosophila, simulating smaller segments is advantageous. The scaling factor increases substantially inDrosophilastudies, and the simulated data experiences a drastic drop in diversity, relative to empirical data, and an increased effect of purifying selection.
Publisher
Cold Spring Harbor Laboratory