Two simple methods to improve the accuracy of the genomic selection methodology-Reference-Cited by-同舟云学术

Two simple methods to improve the accuracy of the genomic selection methodology

Published:2023-04-26 Issue:1 Volume:24 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Montesinos-López Osval A.^ORCID,Kismiantini ,Montesinos-López Abelardo

Abstract

Abstract Background Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a regression problem in general has low sensitivity to select the best candidate individuals since a top percentage is selected according to a ranking of predicted breeding values. Results For this reason, in this paper we propose two methods to improve the prediction accuracy of this methodology. One of the methods consist in reformulating the GS (nowadays formulated as a regression problem) methodology as a binary classification problem. The other consists only in a postprocessing step that adjust the threshold used for classification of the lines predicted in its original scale (continues scale) to guarantee similar sensitivity and specificity. The postprocessing method is applied for the resulting predictions after obtaining the predictions using the conventional regression model. Both methods assume that we defined with anticipation a threshold, to divide the training data as top lines and not top lines, and this threshold can be decided in terms of a quantile (for example 80%, 90%, etc.) or as the average (or maximum) of the performance of the checks. In the reformulation method it is required to label as one those lines in the training set that are equal or larger than the specified threshold and as zero otherwise. Then we train a binary classification model with the conventional inputs, but using the binary response variable in place of the continuous response variable. The training of the binary classification should be done to guarantee a more similar sensitivity and specificity, to guarantee a reasonable probability of classification of the top lines. Conclusions We evaluated the proposed models in seven data sets and we found that the two proposed methods outperformed by large margin the conventional regression model (by 402.9% in terms of sensitivity, by 110.04% in terms of F1 score and by 70.96% in terms of Kappa coefficient, with the postprocessing methods). However, between the two proposed methods the postprocessing method was better than the reformulation as binary classification model. The simple postprocessing method to improve the accuracy of the conventional genomic regression models avoid the need to reformulate the conventional regression models as binary classification models with similar or better performance, that significantly improve the selection of the top best candidate lines. In general both proposed methods are simple and can easily be adopted for use in practical breeding programs, with the guarantee that will improve significantly the selection of the top best candidates lines.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

https://link.springer.com/content/pdf/10.1186/s12864-023-09294-5.pdf

Reference31 articles.

1. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29. https://doi.org/10.1093/genetics/157.4.1819.

2. Desta ZA, Ortiz R. Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci. 2014;19(9):592-601. https://doi.org/10.1016/j.tplants.2014.05.006.

3. Ríos OR. Plant breeding in the omics era. Cham: Springer; 2015.

4. Roorkiwal M, Rathore A, Das RR, Singh MK, Jain A, Srinivasan S, et al. Genome-enabled prediction models for yield related traits in Chickpea. Front Plant Sci. 2016;7:1–13. https://doi.org/10.3389/fpls.2016.01666.

5. Crossa J, Pérez-Rodríguez P, Cuevas J, et al. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci. 2017;22(11):961-75. https://doi.org/10.1016/j.tplants.2017.08.011.