Breed identification using breed-informative SNPs and machine learning based on whole genome sequence data and SNP chip data

Author:

Zhao Changheng,Wang Dan,Teng Jun,Yang Cheng,Zhang Xinyi,Wei Xianming,Zhang QinORCID

Abstract

Abstract Background Breed identification is useful in a variety of biological contexts. Breed identification usually involves two stages, i.e., detection of breed-informative SNPs and breed assignment. For both stages, there are several methods proposed. However, what is the optimal combination of these methods remain unclear. In this study, using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project, we compared the combinations of three methods (Delta, FST, and In) for breed-informative SNP detection and five machine learning methods (KNN, SVM, RF, NB, and ANN) for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs. In addition, we evaluated the accuracy of breed identification using SNP chip data of different densities. Results We found that all combinations performed quite well with identification accuracies over 95% in all scenarios. However, there was no combination which performed the best and robust across all scenarios. We proposed to integrate the three breed-informative detection methods, named DFI, and integrate the three machine learning methods, KNN, SVM, and RF, named KSR. We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99% in most cases and was very robust in all scenarios. The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases. Conclusions The current study showed that the combination of DFI and KSR was the optimal strategy. Using sequence data resulted in higher accuracies than using chip data in most cases. However, the differences were generally small. In view of the cost of genotyping, using chip data is also a good option for breed identification.

Funder

National Key Research and Development Program of China

the Yangzhou University Interdisciplinary Research Foundation for Animal Science Discipline of Targeted Support

the Project of Genetic Improvement for Agricultural Species (Dairy Cattle) of Shandong Province

Publisher

Springer Science and Business Media LLC

Subject

Animal Science and Zoology,Biochemistry,Food Science,Biotechnology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3