Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

Author:

Lauterbur M Elise1ORCID,Cavassim Maria Izabel A2ORCID,Gladstein Ariella L3ORCID,Gower Graham4ORCID,Pope Nathaniel S5,Tsambos Georgia6ORCID,Adrion Jeffrey57ORCID,Belsare Saurabh5ORCID,Biddanda Arjun8ORCID,Caudill Victoria5ORCID,Cury Jean9,Echevarria Ignacio10,Haller Benjamin C11ORCID,Hasan Ahmed R1213ORCID,Huang Xin1415,Iasi Leonardo Nicola Martin16,Noskova Ekaterina17ORCID,Obsteter Jana18,Pavinato Vitor Antonio Correa19ORCID,Pearson Alice2021,Peede David2223,Perez Manolo F24,Rodrigues Murillo F5,Smith Chris CR5,Spence Jeffrey P25ORCID,Teterina Anastasia5,Tittes Silas5ORCID,Unneberg Per26,Vazquez Juan Manuel27ORCID,Waples Ryan K28,Wohns Anthony Wilder29,Wong Yan30ORCID,Baumdicker Franz31,Cartwright Reed A32ORCID,Gorjanc Gregor33,Gutenkunst Ryan N34ORCID,Kelleher Jerome30ORCID,Kern Andrew D5ORCID,Ragsdale Aaron P35,Ralph Peter L536ORCID,Schrider Daniel R37ORCID,Gronau Ilan38ORCID

Affiliation:

1. Department of Ecology and Evolutionary Biology, University of Arizona

2. Department of Ecology and Evolutionary Biology, University of California, Los Angeles

3. Embark Veterinary, Inc

4. Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen

5. Institute of Ecology and Evolution, University of Oregon

6. School of Mathematics and Statistics, University of Melbourne

7. Ancestry DNA

8. 54Gene, Inc

9. Universite Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numerique

10. School of Life Sciences, University of Glasgow

11. Department of Computational Biology, Cornell University

12. Department of Cell and Systems Biology, University of Toronto

13. Department of Biology, University of Toronto Mississauga

14. Department of Evolutionary Anthropology, University of Vienna

15. Human Evolution and Archaeological Sciences (HEAS), University of Vienna

16. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology

17. Computer Technologies Laboratory, ITMO University

18. Agricultural Institute of Slovenia, Department of Animal Science

19. Entomology Department, The Ohio State University

20. Department of Genetics, University of Cambridge

21. Department of Zoology, University of Cambridge

22. Department of Ecology, Evolution, and Organismal Biology, Brown University

23. Center for Computational Molecular Biology, Brown University

24. Department of Genetics and Evolution, Federal University of Sao Carlos

25. Department of Genetics, Stanford University School of Medicine

26. Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University

27. Department of Integrative Biology, University of California, Berkeley

28. Department of Biostatistics, University of Washington

29. Broad Institute of MIT and Harvard

30. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford

31. Cluster of Excellence - Controlling Microbes to Fight Infections, Eberhard Karls Universit¨at Tubingen

32. School of Life Sciences and The Biodesign Institute, Arizona State University

33. The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh

34. Department of Molecular and Cellular Biology, University of Arizona

35. Department of Integrative Biology, University of Wisconsin–Madison

36. Department of Mathematics, University of Oregon

37. Department of Genetics, University of North Carolina at Chapel Hill

38. Efi Arazi School of Computer Science, Reichman University

Abstract

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.

Funder

National Science Foundation

National Institute of General Medical Sciences

Dim One Health

Human Frontier Science Program

Brown University

Science for Life Laboratory

Deutsche Forschungsgemeinschaft

University of Edinburgh

Robertson Foundation

Publisher

eLife Sciences Publications, Ltd

Subject

General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3