Abstract
We detail the development of the ancestry informative single nucleotide polymorphisms (SNPs) panel forming part of the VISAGE Basic Tool (BT), which combines 41 appearance predictive SNPs and 112 ancestry predictive SNPs (three SNPs shared between sets) in one massively parallel sequencing (MPS) multiplex, whereas blood-based age analysis using methylation markers is run in a parallel MPS analysis pipeline. The selection of SNPs for the BT ancestry panel focused on established forensic markers that already have a proven track record of good sequencing performance in MPS, and the overall SNP multiplex scale closely matched that of existing forensic MPS assays. SNPs were chosen to differentiate individuals from the five main continental population groups of Africa, Europe, East Asia, America, and Oceania, extended to include differentiation of individuals from South Asia. From analysis of 1000 Genomes and HGDP-CEPH samples from these six population groups, the BT ancestry panel was shown to have no classification error using the Bayes likelihood calculators of the Snipper online analysis portal. The differentiation power of the component ancestry SNPs of BT was balanced as far as possible to avoid bias in the estimation of co-ancestry proportions in individuals with admixed backgrounds. The balancing process led to very similar cumulative population-specific divergence values for Africa, Europe, America, and Oceania, with East Asia being slightly below average, and South Asia an outlier from the other groups. Comparisons were made of the African, European, and Native American estimated co-ancestry proportions in the six admixed 1000 Genomes populations, using the BT ancestry panel SNPs and 572,000 Affymetrix Human Origins array SNPs. Very similar co-ancestry proportions were observed down to a minimum value of 10%, below which, low-level co-ancestry was not always reliably detected by BT SNPs. The Snipper analysis portal provides a comprehensive population dataset for the BT ancestry panel SNPs, comprising a 520-sample standardised reference dataset; 3445 additional samples from 1000 Genomes, HGDP-CEPH, Simons Foundation and Estonian Biocentre genome diversity projects; and 167 samples of six populations from in-house genotyping of individuals from Middle East, North and East African regions complementing those of the sampling regimes of the other diversity projects.
Funder
Horizon 2020 Framework Programme
Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain
Subject
Genetics (clinical),Genetics
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献