Abstract
ABSTRACTMotivationDeep generative models have the potential to overcome difficulties in sharing individual-level genomic data by producing synthetic genomes that preserve the genomic associations specific to a cohort while not violating the privacy of any individual cohort member. However, there is significant room for improvement in the fidelity and usability of existing synthetic genome approaches.ResultsWe demonstrate that when combined with plentiful data and with population-specific selection criteria, deep generative models can produce synthetic genomes and cohorts that closely model the original populations. Our methods improve fidelity in the site-frequency spectra and linkage disequilibrium decay and yield synthetic genomes that can be substituted in downstream local ancestry inference analysis, recreating results with .91 to .94 accuracy.AvailabilityThe model described in this paper is freely available atgithub.com/rlaboulaye/clonehort.
Publisher
Cold Spring Harbor Laboratory