Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs

Author:

Xiang Tao1,Li Tao234ORCID,Li Jielin1,Li Xin234,Wang Jia234ORCID

Affiliation:

1. Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture Huazhong Agricultural University Wuhan China

2. College of Informatics Huazhong Agricultural University Wuhan China

3. Key Laboratory of Smart Farming for Agricultural Animals Huazhong Agricultural University Wuhan China

4. Hubei Key Laboratory of Agricultural Bioinformatics Huazhong Agricultural University Wuhan China

Abstract

AbstractGenomic prediction, which is based on solving linear mixed‐model (LMM) equations, is the most popular method for predicting breeding values or phenotypic performance for economic traits in livestock. With the need to further improve the performance of genomic prediction, nonlinear methods have been considered as an alternative and promising approach. The excellent ability to predict phenotypes in animal husbandry has been demonstrated by machine learning (ML) approaches, which have been rapidly developed. To investigate the feasibility and reliability of implementing genomic prediction using nonlinear models, the performances of genomic predictions for pig productive traits using the linear genomic selection model and nonlinear machine learning models were compared. Then, to reduce the high‐dimensional features of genome sequence data, different machine learning algorithms, including the random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost) and convolutional neural network (CNN) algorithms, were used to perform genomic feature selection as well as genomic prediction on reduced feature genome data. All of the analyses were processed on two real pig datasets: the published PIC pig dataset and a dataset comprising data from a national pig nucleus herd in Chifeng, North China. Overall, the accuracies of predicted phenotypic performance for traits T1, T2, T3 and T5 in the PIC dataset and average daily gain (ADG) in the Chifeng dataset were higher using the ML methods than the LMM method, while those for trait T4 in the PIC dataset and total number of piglets born (TNB) in the Chifeng dataset were slightly lower using the ML methods than the LMM method. Among all the different ML algorithms, SVM was the most appropriate for genomic prediction. For the genomic feature selection experiment, the most stable and most accurate results across different algorithms were achieved using XGBoost in combination with the SVM algorithm. Through feature selection, the number of genomic markers can be reduced to 1 in 20, while the predictive performance on some traits can even be improved compared to using the full genome data. Finally, we developed a new tool that can be used to execute combined XGBoost and SVM algorithms to realize genomic feature selection and phenotypic prediction.

Funder

National Basic Research Program of China

Fundamental Research Funds for the Central Universities

Publisher

Wiley

Subject

Genetics,Molecular Biology,Biochemistry,Biotechnology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3