A comparison of methods for training population optimization in genomic selection-Reference-Cited by-同舟云学术

A comparison of methods for training population optimization in genomic selection

Published:2023-03 Issue:3 Volume:136 Page:
ISSN:0040-5752
Container-title:Theoretical and Applied Genetics
language:en
Short-container-title:Theor Appl Genet

Author:

Fernández-González Javier^ORCID,Akdemir Deniz,Isidro y Sánchez Julio^ORCID

Abstract

AbstractKey messageMaximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50–55% (targeted) or 65–85% (untargeted) is needed to obtain 95% of the accuracy. AbstractWith the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50–55% of the candidate set was enough to reach 95–100% of the maximum accuracy in the targeted scenario, while we needed a 65–85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies.

Funder

Ministerio de Ciencia, Innovación y Universidades

Universidad Politécnica de Madrid

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Agronomy and Crop Science,General Medicine,Biotechnology

Link

https://link.springer.com/content/pdf/10.1007/s00122-023-04265-6.pdf

Reference60 articles.

1. Akdemir D (2017) STPGA: selection of training populations with a genetic algorithm. bioRxiv

2. Akdemir D, Isidro-Sánchez J (2019) Design of training populations for selective phenotyping in genomic prediction. Sci Rep 9(1):1446