Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice

Author:

Perez Bruno C1ORCID,Bink Marco C A M1ORCID,Svenson Karen L2ORCID,Churchill Gary A2ORCID,Calus Mario P L3ORCID

Affiliation:

1. Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands

2. The Jackson Laboratory, Bar Harbor, ME 04609, USA

3. Wageningen University & Research, Animal Breeding and Genomics, 6700 AH Wageningen, The Netherlands

Abstract

Abstract We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects.

Funder

GENE-SWitCH project that received funding from the European Union’s Horizon 2020 research and innovation programme

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Genetics (clinical),Genetics,Molecular Biology

Reference67 articles.

1. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes;Abdollahi-Arpanahi;Genet Sel Evol,2020

2. Benchmarking parametric and machine learning models for genomic prediction of complex traits;Azodi;G3 (Bethesda),2019

3. Data imputation and machine learning improve association analysis and genomic prediction for resistance to fish photobacteriosis in the gilthead sea bream;Bargelloni;Aquaculture,2021

4. Origin of personalized medicine in pioneering, passionate, genomic research;Barrera-Saldaña;Genomics,2020

5. Can deep learning improve genomic prediction of complex human traits?;Bellot;Genetics,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3