Abstract
AbstractThe refinement of prediction accuracy in genomic prediction is a key factor in accelerating genetic gain for crop breeding. The mainstream strategy for prediction performance improvement has been developing an individual prediction model outperforming others across diverse prediction scenarios. However, this approach has limitations in situations when there is inconsistency in the superiority of individual models, attributed to the existence of complex nonlinear interactions among genetic markers. This phenomenon is expected given the No Free Lunch Theorem, which states that the average performance of an individual prediction model is expected to be equivalent to the others across all scenarios. Hence, we investigate the potential to leverage the concept of a stacked ensemble as an alternative method. We consider two traits, days to anthesis (DTA) and tiller number (TILN), measured on a Nested Association Mapping study, referred to herein as TeoNAM; a public maize (Zea mays) inbred W22 was crossed to five inbred Teosinte lines. The TeoNAM data set and the two traits were selected as the example of choice based on prior evidence that the traits were under the control of networks of genes and high levels of segregation diversity for the nodes of the genetic networks. Our analysis of both traits for the TeoNAM demonstrated an improvement in prediction performance, measured as the Pearson correlation, for the ensemble approach across all the proposed scenarios, for at least more than 95% of cases, compared to the six individual prediction models that contributed to the ensemble; rrBLUP, BayesB, RKHS, RF, SVR and GAT. The observed result indicates that there is a potential for ensemble approaches to enhance the performance of genomic prediction for crop breeding.Key messageEnsemble approach can improve genomic prediction performance by combining information from individual models.
Publisher
Cold Spring Harbor Laboratory