Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs-Reference-Cited by-同舟云学术

Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs

Published:2023-05-13 Issue:6 Volume:37 Page:
ISSN:0892-6638
Container-title:The FASEB Journal
language:en
Short-container-title:The FASEB Journal

Author:

Xiang Tao¹,Li Tao²³⁴^ORCID,Li Jielin¹,Li Xin²³⁴,Wang Jia²³⁴^ORCID

Affiliation:

1. Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture Huazhong Agricultural University Wuhan China

2. College of Informatics Huazhong Agricultural University Wuhan China

3. Key Laboratory of Smart Farming for Agricultural Animals Huazhong Agricultural University Wuhan China

4. Hubei Key Laboratory of Agricultural Bioinformatics Huazhong Agricultural University Wuhan China

Abstract

AbstractGenomic prediction, which is based on solving linear mixed‐model (LMM) equations, is the most popular method for predicting breeding values or phenotypic performance for economic traits in livestock. With the need to further improve the performance of genomic prediction, nonlinear methods have been considered as an alternative and promising approach. The excellent ability to predict phenotypes in animal husbandry has been demonstrated by machine learning (ML) approaches, which have been rapidly developed. To investigate the feasibility and reliability of implementing genomic prediction using nonlinear models, the performances of genomic predictions for pig productive traits using the linear genomic selection model and nonlinear machine learning models were compared. Then, to reduce the high‐dimensional features of genome sequence data, different machine learning algorithms, including the random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost) and convolutional neural network (CNN) algorithms, were used to perform genomic feature selection as well as genomic prediction on reduced feature genome data. All of the analyses were processed on two real pig datasets: the published PIC pig dataset and a dataset comprising data from a national pig nucleus herd in Chifeng, North China. Overall, the accuracies of predicted phenotypic performance for traits T1, T2, T3 and T5 in the PIC dataset and average daily gain (ADG) in the Chifeng dataset were higher using the ML methods than the LMM method, while those for trait T4 in the PIC dataset and total number of piglets born (TNB) in the Chifeng dataset were slightly lower using the ML methods than the LMM method. Among all the different ML algorithms, SVM was the most appropriate for genomic prediction. For the genomic feature selection experiment, the most stable and most accurate results across different algorithms were achieved using XGBoost in combination with the SVM algorithm. Through feature selection, the number of genomic markers can be reduced to 1 in 20, while the predictive performance on some traits can even be improved compared to using the full genome data. Finally, we developed a new tool that can be used to execute combined XGBoost and SVM algorithms to realize genomic feature selection and phenotypic prediction.

Funder

National Basic Research Program of China

Fundamental Research Funds for the Central Universities

Publisher

Wiley

Subject

Genetics,Molecular Biology,Biochemistry,Biotechnology

Reference44 articles.

1. Machine learning applications in genetics and genomics

2. Correction: Genomic Selection and Association Mapping in Rice (Oryza sativa): Effect of Trait Genetic Architecture, Training Population Composition, Marker Number and Statistical Model on Accuracy of Rice Genomic Selection in Elite, Tropical Rice Breeding Lines

3. Use of Wrapper Algorithms Coupled with a Random Forests Classifier for Variable Selection in Large-Scale Genomic Association Studies

4. Gene selection and classification of microarray data using random forest

5. Genome-wide association study for backfat thickness in Canchim beef cattle using Random Forest approach

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating deep learning for phenomic and genomic predictive modeling of Eucalyptus trees;Industrial Crops and Products;2024-11

2. Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large‐scale soybean dataset;The Plant Genome;2024-09-10

3. Application of machine learning approach on halal meat authentication principle, challenges, and prospects: A review;Heliyon;2024-06

4. Computational Approaches In Livestock Breeding: A Review;2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG);2024-04-02