Towards Simulation Optimization: An Examination of the Impact of Scaling on Coalescent and Forward Simulations

Author:

Ferrari TessaORCID,Feng SiyuanORCID,Zhang XinjunORCID,Mooney JazlynORCID

Abstract

AbstractScaling is a common practice in population genetic simulations to increase computational efficiency. However, there exists a dearth of standardized guidelines for best practices. Few studies have examined the effects of scaling on diversity and whether the results are directly comparable to unscaled and empirical data. We examine the effects of scaling in two model populations, modern humans andDrosophila melanogaster. The reason is twofold: 1) due to the substantial difference in population sizes and generation times, human populations require moderate-to-no scaling, while more dramatic scaling is required forDrosophila; and 2) model populations have empirical data for comparison. We determine whether coalescence, runtime, memory, estimates of diversity, the site frequency spectra, and deleterious variation are affected by scaling. We also explore the effect of varying the simulated segment length and burn-in times. We find that the typical 10N generation burn-in is often not sufficient for full coalescence to occur in human orDrosophilasimulations. As expected, memory and runtime increase as the scaling coefficient decreases and the length of the simulated segment increases. We show that simulating larger segments in humans is preferable, as it produces a smaller variance in diversity estimates. Conversely, inDrosophilait is preferable to simulate smaller segments and concatenate them into full genome for achieving comparable levels of diversity to empirical data. We find that aggressive scaling leads to stronger negative selection and ultimately amplifies the strength of background selection on flanking variation.Author SummaryScaling is a common approach to make population genetic simulations more computationally tractable. However, the implications of scaling and best practices for scaling are still unknown. This study highlights the importance of carefully considering scaling practices for forward-in-time population genetics simulations. We provide insights about the trade-offs between computational efficiency and accuracy of scaled simulations relative to empirical data, in human andDrosophila. We achieved this by varying the species demographic model; the method of coalescence; the simulated genomic element length; and the scaling factor. For each combination of parameters genetic diversity was quantified and computational was efficiency tracked. Our findings suggest that when simulating populations, such as humans, where moderate scaling is required, one should simulate larger genomic segments for more accurate measures of diversity. Scaling seems to cause an inflation of diversity in human simulations relative to empirical data. On the other hand, in populations where more aggressive scaling is required, such asDrosophila, simulating smaller segments is advantageous. The scaling factor increases substantially inDrosophilastudies, and the simulated data experiences a drastic drop in diversity, relative to empirical data, and an increased effect of purifying selection.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3