Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP-Reference-Cited by-同舟云学术

Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP

Published:2023-07-21 Issue: Volume:14 Page:
ISSN:1664-462X
Container-title:Frontiers in Plant Science
language:
Short-container-title:Front. Plant Sci.

Author:

Heilmann Philipp Georg,Frisch Matthias,Abbadi Amine,Kox Tobias,Herzog Eva

Abstract

Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high.

Funder

Bundesministerium für Bildung und Forschung

Publisher

Frontiers Media SA

Subject

Plant Science

Reference78 articles.

1. A comprehensive review of recent advances on deep vision systems;Abbas;Artif. Intell. Rev.,2019

2. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes;Abdollahi-Arpanahi;Genet. Selection Evol.,2020

3. Genome-based prediction of testcross values in maize;Albrecht;Theor. Appl. Genet.,2011

4. Benchmarking parametric and machine learning models for genomic prediction of complex traits;Azodi;G3: Genes Genomes Genet.,2019

5. Yield performance estimation of corn hybrids using machine learning algorithms;Babaie Sarijaloo;Artif. Intell. Agric.,2021

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improvements in Prediction Performance of Ensemble Approaches for Genomic Prediction in Crop Breeding;2024-09-08

2. Portability of genomic predictions trained on sparse factorial designs across two maize silage breeding cycles;Theoretical and Applied Genetics;2024-03