Abstract
ABSTRACTMissing data and genotyping errors are common in microsatellite data sets. We used simulated data to quantify the effect of these data aberrations on the accuracy of population structure inference. Data sets with complex, randomly-generated, population histories were simulated under the coalescent. Models describing the characteristic patterns of missing data and genotyping error in real microsatellite data sets were used to modify the simulated data sets. Accuracy of ordination, tree-based, and model-based methods of inference was evaluated before and after data set modifications. The ability to recover correct population clusters decreased as missing data increased. The rate of decrease was similar among analytical procedures, thus no single analytical approach was preferable. For every 1% of a data matrix that contained missing genotypes, 2–4% fewer correct clusters were found. For every 1% of a matrix that contained erroneous genotypes, 1–2% fewer correct clusters were found using ordination and tree-based methods. Model-based procedures that minimize the deviation from Hardy-Weinberg equilibrium in order to assign individuals to clusters performed better as genotyping error increased. We attribute this surprising result to the inbreeding-like nature of microsatellite genotyping error, wherein heterozygous genotypes are mischaracterized as homozygous. We show that genotyping error elevates estimates of the level of genetic admixture. Overall, missing data negatively impact population structure inference more than typical genotyping errors.
Publisher
Cold Spring Harbor Laboratory
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献