Archetypal Analysis for population genetics-Reference-Cited by-同舟云学术

Archetypal Analysis for population genetics

Published:2022-08-25 Issue:8 Volume:18 Page:e1010301
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Gimbernat-Mayol Julia,Dominguez Mantes Albert^ORCID,Bustamante Carlos D.,Mas Montserrat Daniel^ORCID,Ioannidis Alexander G.^ORCID

Abstract

The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in such a fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids.

Funder

Chan Zuckerberg Initiative

Royal Academy of Engineering Leaders Scholarship

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference17 articles.

1. Inference of population structure using multilocus genotype data;JK Pritchard;Genetics,2000

2. Estimation of individual admixture: analytical and study design considerations;H Tang;Genetic epidemiology,2005

3. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation;DH Alexander;BMC Bioinformatics,2011

4. Principal component analysis of genetic data;D Reich;Nature Genetics,2008

5. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts;A Diaz-Papkovich;PLoS genetics,2019

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Patterns of genetic variation and local adaptation of a native herbivore to a lethal invasive plant;Molecular Ecology;2024-03-21

2. SNVstory: inferring genetic ancestry from genome sequencing data;BMC Bioinformatics;2024-02-20

3. Genomic Databases Homogenization with Machine Learning;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05

4. Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations;2023-10-17

5. Mexican Biobank advances population and medical genomics of diverse ancestries;Nature;2023-10-11