Trait genetic architecture and population structure determine model selection for genomic prediction in naturalArabidopsis thalianapopulations

Author:

Gibbs Patrick M.,Paril Jefferson F.ORCID,Fournier-level AlexandreORCID

Abstract

AbstractGenomic prediction applies to a wide range of agronomically relevant traits, with distinct ontologies and genetic architectures. Selecting the most appropriate model for the distribution of genetic effects and their associated allele frequencies in the training population is crucial. Linear regression models are often preferred for genomic prediction. However, linear models may not suit all genetic architectures and training populations. Machine Learning approaches have been proposed to improve genomic prediction owing to their capacity to capture complex biology including epistasis. However, the applicability of different genomic prediction models, including non-linear/non-parametric approaches, have not been rigorously assessed across a wide variety of plant traits in natural outbreeding populations. This study evaluates genomic prediction sensitivity to trait ontology and the impact of population structure on model selection and prediction accuracy. Examining 36 quantitative traits measured for 1000+ natural genotypes of the model plantArabidopsis thaliana, we assessed the performance of penalised regression, random forest, and multilayer perceptron at producing genomic predictions. Regression models were generally the most accurate, except for biochemical traits where random forest performed best. We link this result to the genetic architecture of each trait – notably that biochemical traits have simpler genetic architecture than macroscopic traits. Moreover, complex macroscopic traits, particularly those related to flowering and yield, were strongly correlated to population structure, while molecular traits were better predicted by fewer, independent markers. This study highlights the relevance of machine learning approaches for simple molecular traits and underscores the need to consider ancestral population history when designing training samples.Article summaryMachine learning and linear models were tested for genomic prediction of multiple traits in the model plantArabidopsis thaliana. We associate the performance of genomic prediction models to trait ontology, finding machine learning approaches applicable to biochemical traits, and linear models best for macroscopic traits. We link this result to the genetic architecture of each trait and patterns of selection in the association panel’s ancestral population, thus underscoring the relevance of these two sensitivities to genomic prediction in plant breeding.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3