Abstract
AbstractAccurate inference of population structure is important in many studies of population genetics. In this paper we present, HaploNet, a novel method for performing dimensionality reduction and clustering in genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or genotype data. By utilizing a Gaussian mixture prior in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We demonstrate that we can use haplotype clusters in the latent space to infer global population structure utilizing haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and its latent haplotype clusters, we can perform principal component analysis and estimate the ancestry proportions based on a maximum likelihood framework. Using sequencing data from closely related human populations, we demonstrate that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
Publisher
Cold Spring Harbor Laboratory
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献