Genomic prediction in plants: opportunities for ensemble machine learning based approaches-Reference-Cited by-同舟云学术

Genomic prediction in plants: opportunities for ensemble machine learning based approaches

Published:2023-01-10 Issue: Volume:11 Page:802
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Farooq Muhammad^ORCID,van Dijk Aalt D.J.,Nijveen Harm^ORCID,Mansoor Shahid,de Ridder Dick

Abstract

Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 and h2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.

Funder

Wageningen University and Research

Publisher

F1000 Research Ltd

Subject

General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

https://f1000research.com/articles/11-802/v2/pdf

Reference60 articles.

1. Prediction of total genetic value using genome-wide dense marker maps.;T Meuwissen;Genetics.,2001

2. Heuristic identification of biological architectures for simulating complex hierarchical genetic interactions.;J Moore;Genet. Epidemiol.,2015 Jan

3. The advantages and limitations of trait analysis with GWAS: a review.;A Korte;Plant Methods.,2013

4. SumHer better estimates the SNP heritability of complex traits from summary statistics.;D Speed;Nat. Genet.,2019 Feb

5. Correction for population stratification in random forest analysis.;Y Zhao;Int. J. Epidemiol.,2012

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits;PLOS ONE;2024-08-28

2. Trait genetic architecture and population structure determine model selection for genomic prediction in naturalArabidopsis thalianapopulations;2024-07-11