Author:
Liang Mang,Chang Tianpeng,An Bingxing,Duan Xinghai,Du Lili,Wang Xiaoqiao,Miao Jian,Xu Lingyang,Gao Xue,Zhang Lupei,Li Junya,Gao Huijiang
Abstract
Machine learning (ML) is perhaps the most useful tool for the interpretation of large genomic datasets. However, the performance of a single machine learning method in genomic selection (GS) is currently unsatisfactory. To improve the genomic predictions, we constructed a stacking ensemble learning framework (SELF), integrating three machine learning methods, to predict genomic estimated breeding values (GEBVs). The present study evaluated the prediction ability of SELF by analyzing three real datasets, with different genetic architecture; comparing the prediction accuracy of SELF, base learners, genomic best linear unbiased prediction (GBLUP) and BayesB. For each trait, SELF performed better than base learners, which included support vector regression (SVR), kernel ridge regression (KRR) and elastic net (ENET). The prediction accuracy of SELF was, on average, 7.70% higher than GBLUP in three datasets. Except for the milk fat percentage (MFP) traits, of the German Holstein dairy cattle dataset, SELF was more robust than BayesB in all remaining traits. Therefore, we believed that SEFL has the potential to be promoted to estimate GEBVs in other animals and plants.
Subject
Genetics(clinical),Genetics,Molecular Medicine
Reference43 articles.
1. Random fourier features for kernel ridge regression: approximation bounds and statistical guarantees;Avron;International Conference on Machine Learning,2017
2. Distributed semi-supervised learning with kernel ridge regression.;Chang;J. Mach. Learn. Res.,2017
3. Genomic selection in plant breeding: methods, models, and perspectives.;Crossa;Trends Plant Sci.,2017
4. Predicting genetic predisposition in humans: the promise of whole-genome markers.;De Los Campos;Nat. Rev. Genet.,2010
Cited by
47 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献