Abstract
ABSTRACTNumerous studies over the last decade have demonstrated the utility of machine learning methods when applied to population genetic tasks. More recent studies show the potential of deep learning methods in particular, which allow researchers to approach problems without making prior assumptions about how the data should be summarized or manipulated, instead learning their own internal representation of the data in an attempt to maximize inferential accuracy. One type of deep neural network, called Generative Adversarial Networks (GANs), can even be used to generate new data, and this approach has been used to create individual artificial human genomes free from privacy concerns. In this study, we further explore the application of GANs in population genetics by designing and training a network to learn the statistical distribution of population genetic alignments (i.e. data sets consisting of sequences from an entire population sample) under several diverse evolutionary histories—the first GAN capable of performing this task. After testing multiple different neural network architectures, we report the results of a fully differentiable Deep-Convolutional Wasserstein GAN with gradient penalty that is capable of generating artificial examples of population genetic alignments that successfully mimic key aspects of the training data, including the site frequency spectrum, differentiation between populations, and patterns of linkage disequilibrium. We demonstrate consistent training success across various evolutionary models, including models of panmictic and subdivided populations, populations at equilibrium and experiencing changes in size, and populations experiencing either no selection or positive selection of various strengths, all without the need for extensive hyperparameter tuning. Overall, our findings highlight the ability of GANs to learn and mimic population genetic data and suggest future areas where this work can be applied in population genetics research that we discuss herein.
Publisher
Cold Spring Harbor Laboratory
Reference74 articles.
1. Abbasnejad M. E. , Q. Shi , A. van den Hengel , and L. Liu , 2019 A generative adversarial density estimator, pp. 10774–10783 in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE.
2. Predicting the landscape of recombination using deep learning;Mol. Biol. Evol,2020
3. Adrion J. R. , C. B. Cole , N. Dukler , J. G. Galloway , A. L. Gladstein , et al., 2020b A community-maintained standard library of population genetic models. eLife 9. https://doi.org/10.7554/eLife.54967
4. Arjovsky M. , S. Chintala , and L. Bottou , 2017 Wasserstein GAN. arXiv. [accessed 2023 Apr 6]. https://doi.org/10.48550/arxiv.1701.07875
5. Battey C. J. , G. C. Coffing , and A. D. Kern , 2021 Visualizing population structure with variational autoencoders. G3 (Bethesda) 11. https://doi.org/10.1093/g3journal/jkaa036
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献